Accessorial Charge Taxonomy Mapping: Implementation Guide for Freight Audit Pipelines
Accessorial charge taxonomy mapping operates as the deterministic normalization layer between raw carrier invoice ingestion and the downstream freight audit validation engine. Carrier billing formats diverge significantly across EDI 210 segments, OCR-extracted PDF line items, and carrier portal XML payloads. Without a strict, configuration-driven mapping strategy, audit pipelines produce false-positive overcharges, silently drop legitimate fees, and misroute disputes to incorrect resolution queues. This guide details the implementation of a production-grade taxonomy mapper, emphasizing schema alignment, vectorized ETL execution, threshold-based validation, and explicit pipeline stage boundaries. The process consumes normalized invoice payloads from the broader Freight Contract Architecture & Rate Mapping ecosystem and outputs standardized audit records ready for compliance routing.
Pipeline Stage Boundaries & Scope Definition
To maintain pipeline integrity, this mapping stage must operate within strict functional boundaries:
- In-Scope: Normalizing raw carrier codes/descriptions to internal taxonomy categories, applying contract-specific billability rules, enforcing monetary/weight thresholds, and emitting audit status flags.
- Out-of-Scope: Raw EDI/PDF parsing (handled by upstream ingestion), base freight rate calculation (covered in FTL Base Rate Extraction), LTL class/weight break digitization (handled in LTL Rate Sheet Digitization), and automated dispute generation or carrier communication.
Violating these boundaries introduces coupling that degrades pipeline throughput and complicates version control. The mapper acts purely as a stateless transformation layer.
Input & Output Schema Architecture
The transformation relies on a strictly typed input schema that preserves shipment context alongside raw billing data. Staging tables or DataFrames must enforce decimal precision and null-safe constraints:
Input Schema (InvoiceAccessorialRaw)
carrier_scac:str(4-character SCAC)accessorial_code:str(Carrier-native code)accessorial_desc:str | None(Carrier-provided description)billed_amt:Decimal(Must be ≥ 0)shipment_id:strpro_number:strweight_lbs:Decimal | Nonezone:str | Nonecontract_version_id:str(FK to active rate agreement)
Output Schema (AuditAccessorialNormalized)
internal_accessorial_id:UUIDtaxonomy_category:Enum(LIFTGATE,DETENTION,FUEL_SURCHARGE,REDELIVERY,INSIDE_DELIVERY,UNKNOWN)is_billable:boolmax_allowable_amt:Decimal | Noneaudit_status:Enum(MATCHED,FLAGGED,UNMAPPED)mapping_rule_id:str | None(Traceability for audit logs)
The relational foundation for this architecture relies on indexed lookup tables optimized for high-concurrency reads. For indexing strategies, partition schemes, and query optimization patterns, refer to Building an accessorial charge lookup table in Postgres.
Configuration-Driven Rule Engine
Hardcoding mapping logic creates deployment bottlenecks and audit compliance risks. A YAML-driven configuration allows transportation ops teams to adjust carrier-specific rules, effective dates, and thresholds without code deployments. The configuration must be version-controlled and validated at load time.
carrier_mappings:
SCAC_ABC:
contract_id: "CTR-2024-089"
effective_date: "2024-01-01"
rules:
- carrier_code: "LG"
carrier_desc_pattern: "(?i).*liftgate.*"
internal_category: "LIFTGATE"
billable: true
max_amt: 75.00
requires_weight_threshold: false
- carrier_code: "DET"
carrier_desc_pattern: "(?i).*detention.*"
internal_category: "DETENTION"
billable: true
max_amt: 120.00
requires_weight_threshold: true
min_weight_lbs: 500
- carrier_code: "FSC"
carrier_desc_pattern: "(?i).*fuel.*surcharge.*"
internal_category: "FUEL_SURCHARGE"
billable: true
max_amt: null
requires_weight_threshold: false
Configuration validation should leverage schema enforcement libraries like Pydantic to guarantee type safety, regex compilation readiness, and contract version alignment before ETL execution begins.
Production ETL Implementation
The following Python implementation demonstrates a production-ready, vectorized mapping engine. It prioritizes compiled regex evaluation, explicit error boundaries, and structured logging.
import logging
import re
from decimal import Decimal, InvalidOperation
from typing import Dict, List, Optional
from dataclasses import dataclass
from enum import Enum
import pandas as pd
logger = logging.getLogger("freight_audit.taxonomy_mapper")
class TaxonomyCategory(str, Enum):
LIFTGATE = "LIFTGATE"
DETENTION = "DETENTION"
FUEL_SURCHARGE = "FUEL_SURCHARGE"
REDELIVERY = "REDELIVERY"
INSIDE_DELIVERY = "INSIDE_DELIVERY"
UNKNOWN = "UNKNOWN"
class AuditStatus(str, Enum):
MATCHED = "MATCHED"
FLAGGED = "FLAGGED"
UNMAPPED = "UNMAPPED"
@dataclass
class MappingRule:
carrier_code: str
compiled_pattern: re.Pattern
internal_category: TaxonomyCategory
billable: bool
max_amt: Optional[Decimal]
requires_weight_threshold: bool
min_weight_lbs: Optional[Decimal] = None
class AccessorialTaxonomyMapper:
def __init__(self, config: Dict):
self._rules: Dict[str, List[MappingRule]] = {}
self._load_and_compile(config)
def _load_and_compile(self, config: Dict) -> None:
for scac, contract_data in config.get("carrier_mappings", {}).items():
compiled_rules = []
for rule in contract_data.get("rules", []):
try:
pattern = re.compile(rule["carrier_desc_pattern"])
compiled_rules.append(MappingRule(
carrier_code=rule["carrier_code"],
compiled_pattern=pattern,
internal_category=TaxonomyCategory(rule["internal_category"]),
billable=bool(rule["billable"]),
max_amt=Decimal(str(rule["max_amt"])) if rule.get("max_amt") is not None else None,
requires_weight_threshold=bool(rule.get("requires_weight_threshold", False)),
min_weight_lbs=Decimal(str(rule["min_weight_lbs"])) if rule.get("min_weight_lbs") else None
))
except (re.error, KeyError, ValueError) as e:
logger.error(f"Invalid rule for {scac}: {e}")
self._rules[scac] = compiled_rules
def _evaluate_threshold(self, row: pd.Series, rule: MappingRule) -> bool:
if not rule.requires_weight_threshold:
return True
weight = row.get("weight_lbs")
if pd.isna(weight) or weight is None:
return False
return Decimal(str(weight)) >= rule.min_weight_lbs
def map_batch(self, df: pd.DataFrame) -> pd.DataFrame:
if df.empty:
return df.assign(
internal_accessorial_id=pd.Series(dtype="object"),
taxonomy_category=TaxonomyCategory.UNKNOWN,
is_billable=False,
max_allowable_amt=pd.NA,
audit_status=AuditStatus.UNMAPPED
)
results = []
for _, row in df.iterrows():
scac = row.get("carrier_scac")
raw_code = str(row.get("accessorial_code", "")).strip()
raw_desc = str(row.get("accessorial_desc", "")).strip()
billed_amt = Decimal(str(row.get("billed_amt", 0)))
matched_rule = None
carrier_rules = self._rules.get(scac, [])
for rule in carrier_rules:
if rule.carrier_code == raw_code or rule.compiled_pattern.search(raw_desc):
if self._evaluate_threshold(row, rule):
matched_rule = rule
break
if matched_rule:
status = AuditStatus.MATCHED
if matched_rule.max_amt is not None and billed_amt > matched_rule.max_amt:
status = AuditStatus.FLAGGED
logger.warning(
f"Overcharge flagged: {scac}/{raw_code} billed {billed_amt} > max {matched_rule.max_amt}"
)
results.append({
"internal_accessorial_id": pd.NA,
"taxonomy_category": matched_rule.internal_category,
"is_billable": matched_rule.billable,
"max_allowable_amt": matched_rule.max_amt,
"audit_status": status,
"mapping_rule_id": f"{scac}_{matched_rule.carrier_code}"
})
else:
results.append({
"internal_accessorial_id": pd.NA,
"taxonomy_category": TaxonomyCategory.UNKNOWN,
"is_billable": False,
"max_allowable_amt": pd.NA,
"audit_status": AuditStatus.UNMAPPED,
"mapping_rule_id": None
})
return pd.concat([df, pd.DataFrame(results)], axis=1)
Threshold Validation & Audit Flagging
Threshold validation must occur synchronously during mapping to prevent downstream audit engines from processing invalid monetary values. The implementation above enforces three critical checks:
- Pattern & Code Matching: Regex patterns are compiled at configuration load time to avoid catastrophic backtracking during batch processing. See Python
remodule documentation for best practices on regex optimization in ETL contexts. - Weight Dependency: Certain accessorials (e.g., detention) only apply above contract-defined weight floors. Missing weight data defaults to
FLAGGEDorUNMAPPEDdepending on contract SLAs. - Monetary Caps: When
billed_amtexceedsmax_allowable_amt, the status transitions fromMATCHEDtoFLAGGED. The pipeline must preserve the original billed amount while attaching the cap for downstream dispute calculators.
Error Handling & Observability Strategy
Production pipelines require deterministic failure modes. The mapper implements the following resilience patterns:
- Dead-Letter Routing: Records failing schema coercion (e.g., non-numeric
billed_amt) are isolated to adlq_accessorial_mappingtable with the original payload and exception traceback. - Structured Logging: Every
FLAGGEDandUNMAPPEDrecord emits a JSON-formatted log event containingcarrier_scac,pro_number,rule_id, andfailure_reason. This enables rapid ops triage without querying raw data stores. - Graceful Degradation: If a SCAC lacks configuration, the mapper defaults to
UNMAPPEDrather than raising exceptions, ensuring pipeline continuity while triggering alerting for missing contract data.
Integration Handoffs
Upon successful mapping, the normalized DataFrame is partitioned by audit_status and routed to downstream consumers:
MATCHEDrecords proceed to the automated validation engine for line-item reconciliation.FLAGGEDrecords enter the dispute routing queue, where they are enriched with carrier contact SLAs and historical approval rates.UNMAPPEDrecords trigger a configuration gap alert for transportation ops to update the YAML rule set before the next billing cycle.
By maintaining strict stage isolation, enforcing configuration-driven logic, and implementing explicit error boundaries, this taxonomy mapper ensures high-throughput, auditable freight bill processing without introducing coupling to upstream ingestion or downstream dispute workflows.