Accessorial Charge Taxonomy Mapping: Implementation Guide for Freight Audit Pipelines

Accessorial charge taxonomy mapping operates as the deterministic normalization layer between raw carrier invoice ingestion and the downstream freight audit validation engine. Carrier billing formats diverge significantly across EDI 210 segments, OCR-extracted PDF line items, and carrier portal XML payloads. Without a strict, configuration-driven mapping strategy, audit pipelines produce false-positive overcharges, silently drop legitimate fees, and misroute disputes to incorrect resolution queues. This guide details the implementation of a production-grade taxonomy mapper, emphasizing schema alignment, vectorized ETL execution, threshold-based validation, and explicit pipeline stage boundaries. The process consumes normalized invoice payloads from the broader Freight Contract Architecture & Rate Mapping ecosystem and outputs standardized audit records ready for compliance routing.

Pipeline Stage Boundaries & Scope Definition

To maintain pipeline integrity, this mapping stage must operate within strict functional boundaries:

In-Scope: Normalizing raw carrier codes/descriptions to internal taxonomy categories, applying contract-specific billability rules, enforcing monetary/weight thresholds, and emitting audit status flags.
Out-of-Scope: Raw EDI/PDF parsing (handled by upstream ingestion), base freight rate calculation (covered in FTL Base Rate Extraction), LTL class/weight break digitization (handled in LTL Rate Sheet Digitization), and automated dispute generation or carrier communication.

Violating these boundaries introduces coupling that degrades pipeline throughput and complicates version control. The mapper acts purely as a stateless transformation layer.

Input & Output Schema Architecture

The transformation relies on a strictly typed input schema that preserves shipment context alongside raw billing data. Staging tables or DataFrames must enforce decimal precision and null-safe constraints:

Input Schema (InvoiceAccessorialRaw)

carrier_scac: str (4-character SCAC)
accessorial_code: str (Carrier-native code)
accessorial_desc: str | None (Carrier-provided description)
billed_amt: Decimal (Must be ≥ 0)
shipment_id: str
pro_number: str
weight_lbs: Decimal | None
zone: str | None
contract_version_id: str (FK to active rate agreement)

Output Schema (AuditAccessorialNormalized)

internal_accessorial_id: UUID
taxonomy_category: Enum (LIFTGATE, DETENTION, FUEL_SURCHARGE, REDELIVERY, INSIDE_DELIVERY, UNKNOWN)
is_billable: bool
max_allowable_amt: Decimal | None
audit_status: Enum (MATCHED, FLAGGED, UNMAPPED)
mapping_rule_id: str | None (Traceability for audit logs)

The relational foundation for this architecture relies on indexed lookup tables optimized for high-concurrency reads. For indexing strategies, partition schemes, and query optimization patterns, refer to Building an accessorial charge lookup table in Postgres.

Configuration-Driven Rule Engine

Hardcoding mapping logic creates deployment bottlenecks and audit compliance risks. A YAML-driven configuration allows transportation ops teams to adjust carrier-specific rules, effective dates, and thresholds without code deployments. The configuration must be version-controlled and validated at load time.

carrier_mappings:
  SCAC_ABC:
    contract_id: "CTR-2024-089"
    effective_date: "2024-01-01"
    rules:
      - carrier_code: "LG"
        carrier_desc_pattern: "(?i).*liftgate.*"
        internal_category: "LIFTGATE"
        billable: true
        max_amt: 75.00
        requires_weight_threshold: false
      - carrier_code: "DET"
        carrier_desc_pattern: "(?i).*detention.*"
        internal_category: "DETENTION"
        billable: true
        max_amt: 120.00
        requires_weight_threshold: true
        min_weight_lbs: 500
      - carrier_code: "FSC"
        carrier_desc_pattern: "(?i).*fuel.*surcharge.*"
        internal_category: "FUEL_SURCHARGE"
        billable: true
        max_amt: null
        requires_weight_threshold: false

Configuration validation should leverage schema enforcement libraries like Pydantic to guarantee type safety, regex compilation readiness, and contract version alignment before ETL execution begins.

Production ETL Implementation

The following Python implementation demonstrates a production-ready, vectorized mapping engine. It prioritizes compiled regex evaluation, explicit error boundaries, and structured logging.

import logging
import re
from decimal import Decimal, InvalidOperation
from typing import Dict, List, Optional
from dataclasses import dataclass
from enum import Enum
import pandas as pd

logger = logging.getLogger("freight_audit.taxonomy_mapper")

class TaxonomyCategory(str, Enum):
    LIFTGATE = "LIFTGATE"
    DETENTION = "DETENTION"
    FUEL_SURCHARGE = "FUEL_SURCHARGE"
    REDELIVERY = "REDELIVERY"
    INSIDE_DELIVERY = "INSIDE_DELIVERY"
    UNKNOWN = "UNKNOWN"

class AuditStatus(str, Enum):
    MATCHED = "MATCHED"
    FLAGGED = "FLAGGED"
    UNMAPPED = "UNMAPPED"

@dataclass
class MappingRule:
    carrier_code: str
    compiled_pattern: re.Pattern
    internal_category: TaxonomyCategory
    billable: bool
    max_amt: Optional[Decimal]
    requires_weight_threshold: bool
    min_weight_lbs: Optional[Decimal] = None

class AccessorialTaxonomyMapper:
    def __init__(self, config: Dict):
        self._rules: Dict[str, List[MappingRule]] = {}
        self._load_and_compile(config)

    def _load_and_compile(self, config: Dict) -> None:
        for scac, contract_data in config.get("carrier_mappings", {}).items():
            compiled_rules = []
            for rule in contract_data.get("rules", []):
                try:
                    pattern = re.compile(rule["carrier_desc_pattern"])
                    compiled_rules.append(MappingRule(
                        carrier_code=rule["carrier_code"],
                        compiled_pattern=pattern,
                        internal_category=TaxonomyCategory(rule["internal_category"]),
                        billable=bool(rule["billable"]),
                        max_amt=Decimal(str(rule["max_amt"])) if rule.get("max_amt") is not None else None,
                        requires_weight_threshold=bool(rule.get("requires_weight_threshold", False)),
                        min_weight_lbs=Decimal(str(rule["min_weight_lbs"])) if rule.get("min_weight_lbs") else None
                    ))
                except (re.error, KeyError, ValueError) as e:
                    logger.error(f"Invalid rule for {scac}: {e}")
            self._rules[scac] = compiled_rules

    def _evaluate_threshold(self, row: pd.Series, rule: MappingRule) -> bool:
        if not rule.requires_weight_threshold:
            return True
        weight = row.get("weight_lbs")
        if pd.isna(weight) or weight is None:
            return False
        return Decimal(str(weight)) >= rule.min_weight_lbs

    def map_batch(self, df: pd.DataFrame) -> pd.DataFrame:
        if df.empty:
            return df.assign(
                internal_accessorial_id=pd.Series(dtype="object"),
                taxonomy_category=TaxonomyCategory.UNKNOWN,
                is_billable=False,
                max_allowable_amt=pd.NA,
                audit_status=AuditStatus.UNMAPPED
            )

        results = []
        for _, row in df.iterrows():
            scac = row.get("carrier_scac")
            raw_code = str(row.get("accessorial_code", "")).strip()
            raw_desc = str(row.get("accessorial_desc", "")).strip()
            billed_amt = Decimal(str(row.get("billed_amt", 0)))

            matched_rule = None
            carrier_rules = self._rules.get(scac, [])

            for rule in carrier_rules:
                if rule.carrier_code == raw_code or rule.compiled_pattern.search(raw_desc):
                    if self._evaluate_threshold(row, rule):
                        matched_rule = rule
                        break

            if matched_rule:
                status = AuditStatus.MATCHED
                if matched_rule.max_amt is not None and billed_amt > matched_rule.max_amt:
                    status = AuditStatus.FLAGGED
                    logger.warning(
                        f"Overcharge flagged: {scac}/{raw_code} billed {billed_amt} > max {matched_rule.max_amt}"
                    )
                results.append({
                    "internal_accessorial_id": pd.NA,
                    "taxonomy_category": matched_rule.internal_category,
                    "is_billable": matched_rule.billable,
                    "max_allowable_amt": matched_rule.max_amt,
                    "audit_status": status,
                    "mapping_rule_id": f"{scac}_{matched_rule.carrier_code}"
                })
            else:
                results.append({
                    "internal_accessorial_id": pd.NA,
                    "taxonomy_category": TaxonomyCategory.UNKNOWN,
                    "is_billable": False,
                    "max_allowable_amt": pd.NA,
                    "audit_status": AuditStatus.UNMAPPED,
                    "mapping_rule_id": None
                })

        return pd.concat([df, pd.DataFrame(results)], axis=1)

Threshold Validation & Audit Flagging

Threshold validation must occur synchronously during mapping to prevent downstream audit engines from processing invalid monetary values. The implementation above enforces three critical checks:

Pattern & Code Matching: Regex patterns are compiled at configuration load time to avoid catastrophic backtracking during batch processing. See Python re module documentation for best practices on regex optimization in ETL contexts.
Weight Dependency: Certain accessorials (e.g., detention) only apply above contract-defined weight floors. Missing weight data defaults to FLAGGED or UNMAPPED depending on contract SLAs.
Monetary Caps: When billed_amt exceeds max_allowable_amt, the status transitions from MATCHED to FLAGGED. The pipeline must preserve the original billed amount while attaching the cap for downstream dispute calculators.

Error Handling & Observability Strategy

Production pipelines require deterministic failure modes. The mapper implements the following resilience patterns:

Dead-Letter Routing: Records failing schema coercion (e.g., non-numeric billed_amt) are isolated to a dlq_accessorial_mapping table with the original payload and exception traceback.
Structured Logging: Every FLAGGED and UNMAPPED record emits a JSON-formatted log event containing carrier_scac, pro_number, rule_id, and failure_reason. This enables rapid ops triage without querying raw data stores.
Graceful Degradation: If a SCAC lacks configuration, the mapper defaults to UNMAPPED rather than raising exceptions, ensuring pipeline continuity while triggering alerting for missing contract data.

Integration Handoffs

Upon successful mapping, the normalized DataFrame is partitioned by audit_status and routed to downstream consumers:

MATCHED records proceed to the automated validation engine for line-item reconciliation.
FLAGGED records enter the dispute routing queue, where they are enriched with carrier contact SLAs and historical approval rates.
UNMAPPED records trigger a configuration gap alert for transportation ops to update the YAML rule set before the next billing cycle.

By maintaining strict stage isolation, enforcing configuration-driven logic, and implementing explicit error boundaries, this taxonomy mapper ensures high-throughput, auditable freight bill processing without introducing coupling to upstream ingestion or downstream dispute workflows.