Merge algorithm¶
The _merge_specialist_results() method in src/pipeline/enricher.py combines outputs from multiple specialist strategies into a single list[EnrichedRow].
Priority order¶
Specialists are merged in a fixed priority order. When multiple specialists provide a value for the same column on the same row, the first non-empty value from the highest-priority specialist wins.
| Priority | Strategy | Rationale |
|---|---|---|
| 1 (highest) | auxiliary_table |
Direct data lookups are the most reliable source |
| 2 | image_legend |
Legend diagrams provide explicit visual mappings |
| 3 | text_rule |
Text rules are authoritative but require interpretation |
| 4 | dimension_card |
Dimensional data supplements other sources |
| 5 (lowest) | multi_label |
Compound resolution refines rather than defines |
This priority is defined in the STRATEGY_PRIORITY list:
STRATEGY_PRIORITY = [
StrategyType.AUXILIARY_TABLE,
StrategyType.IMAGE_LEGEND,
StrategyType.TEXT_RULE,
StrategyType.DIMENSION_CARD,
StrategyType.MULTI_LABEL,
]
Merge process¶
For each unique row_id across all specialist outputs, the merge builds a single EnrichedRow by iterating through specialist entries in priority order.
Regular columns¶
For any column other than Special Notes, the first non-empty value encountered (from the highest-priority specialist) becomes the final value. Later specialists cannot overwrite it.
if col not in merged_data or not merged_data[col]:
merged_data[col] = value
if col in row.field_sources:
merged_sources[col] = row.field_sources[col]
Special Notes¶
Special Notes is the exception to the first-wins rule. The merge accumulates unique fragments from all specialists. Each specialist's Special Notes value is split on ; and | delimiters into individual fragments. Each fragment is checked against existing fragments using case-insensitive substring matching — if a new fragment is a substring of an existing one (or vice versa), it is treated as a duplicate and skipped.
for frag in fragments:
frag_lower = frag.lower()
is_dup = any(
frag_lower in existing.lower() or existing.lower() in frag_lower
for existing in special_notes_parts
)
if not is_dup:
special_notes_parts.append(frag)
The final Special Notes value joins all unique fragments with |.
Confidence¶
The merged row's confidence is the minimum confidence across all specialist entries for that row. This ensures the final confidence reflects the least certain contributor.
Reasoning¶
Reasoning strings from all specialists are concatenated with | as a separator. Each entry is prefixed with the strategy name in brackets (e.g., [auxiliary_table] Matched GL-03 to row 2).
Field sources¶
The field_sources dict tracks which specialist filled each column. Since regular columns follow the first-wins rule, the source always corresponds to the highest-priority specialist that provided a value.
Row recovery¶
After merging, the algorithm checks for main schedule rows that no specialist produced output for. These are identified by comparing the authoritative __row_id__ set (derived from main tables during _assign_row_ids()) against the merged row IDs.
Missing rows are recovered as empty EnrichedRow objects with all schema columns set to empty strings and confidence set to 0.0. This guarantees the pipeline never silently drops rows — every row in the main schedule appears in the output.
Tables with index-based __row_id__ values (where primary key column detection failed) are excluded from recovery to avoid false positives.
Merge flow for a single row¶
The following diagram shows how data for a single row flows through the merge when three specialists provide output.
flowchart TB
subgraph inputs["Specialist outputs for row W1"]
T1["T1: auxiliary_table<br/>Glass Type = SNX 62/27<br/>Special Notes = GMT-01 infill<br/>confidence = 0.95"]
T3["T3: image_legend<br/>Operability = Casement Single<br/>Special Notes = Style A<br/>confidence = 0.90"]
T2["T2: text_rule<br/>Glass Type = <i>empty</i><br/>Special Notes = IBC 2406.4 tempered<br/>confidence = 0.85"]
end
subgraph merge["Merge (priority order)"]
direction TB
P1["1. auxiliary_table<br/>Glass Type = SNX 62/27 (set)<br/>Special Notes += GMT-01 infill"]
P2["2. image_legend<br/>Glass Type already set, skip<br/>Operability = Casement Single (set)<br/>Special Notes += Style A"]
P3["3. text_rule<br/>Glass Type already set, skip<br/>Special Notes += IBC 2406.4 tempered"]
P1 --> P2 --> P3
end
subgraph output["Merged EnrichedRow"]
OUT["row_id = W1<br/>Glass Type = SNX 62/27<br/>Operability = Casement Single<br/>Special Notes = GMT-01 infill | Style A | IBC 2406.4 tempered<br/>confidence = 0.85<br/>reasoning = [auxiliary_table] ... | [image_legend] ... | [text_rule] ..."]
end
inputs --> merge --> output