Porting plan¶
This page describes how to integrate Cartex's enrichment pipeline into the Cato-v2 production system. It covers the data model mapping, integration point, known gaps, and risk items identified during a Cato-v2 codebase audit.
Cato-v2 extraction output¶
Cato-v2 does not produce a single structured ExtractionResult. Extraction data is distributed across database tables and in-memory structures.
Current data model¶
| Cato-v2 concept | Storage | Key fields |
|---|---|---|
Evidence |
DB row per detected region | id, type, polygon (bbox), file_key (S3 crop), ocr_text, sub_text |
TakeOffResultItem |
DB row per extracted row | result (JSON blob), evidence_id, evidence_ids, project_file_id |
TakeOffResult |
Parent container | original_result (full Gemini response), template_id, project_file_ids |
Evidence type mapping¶
Each Evidence.type string maps to a Cartex concept.
Evidence.type |
Cartex analog |
|---|---|
"Window Door Unit" |
TableModel with role MAIN |
"Table" |
TableModel — role MAIN or AUXILIARY (no distinction today) |
"Elevation" |
ImageContextModel |
"Floor Plan" |
ImageContextModel |
"Key Notes" |
TextContextModel (category general_note) |
"Drawing Index" |
No Cartex analog |
"Title Info" |
No Cartex analog |
Result item JSON shape¶
Each TakeOffResultItem.result is a flat JSON dict with top-level keys matching the active PromptTemplate fields. Nested fields use dot-separated paths (e.g., Glass.Type, Glass.Arrangement.Configuration).
{
"Label": "W-1",
"Product": "Window",
"Product Type": "Direct Set / Picture / Fixed",
"Operability": "Fixed",
"Width": 36,
"Height": 48,
"Quantity": 1,
"Frame": { "Profile": "", "Material": "" },
"Special Notes": "",
"Source Type": "Image"
}
Mapping to ExtractionResult¶
| Cartex field | Cato-v2 source | Gap |
|---|---|---|
TableModel.table_id |
Evidence.id |
None |
TableModel.role |
Evidence.type |
No MAIN vs AUXILIARY distinction |
TableModel.headers |
PromptTemplateField names |
Indirect — needs extraction from template |
TableModel.rows |
TakeOffResultItem.result entries |
Available, needs reshaping |
TableModel.bbox |
Evidence.polygon |
Available |
TextContextModel.content |
Evidence.ocr_text where type is Key Notes |
Available |
TextContextModel.category |
Hardcoded to notes |
Needs mapping logic |
ImageContextModel.interpretation |
Not available | Critical gap — see below |
Mapping layer design¶
A converter module translates Cato-v2's post-extraction state into Cartex's ExtractionResult. Two functions are needed: to_extraction_result() for the extraction data and build_user_table_schema() for the output schema.
to_extraction_result()¶
This function accepts a TakeOffResult, its associated evidences, and the active template. It iterates over evidences, classifies each by type, and builds the corresponding Cartex model.
def to_extraction_result(
take_off_result: TakeOffResult,
evidence_map: dict[int, Evidence],
template: PromptTemplate,
) -> ExtractionResult:
tables = []
text_contexts = []
image_contexts = []
for evidence in evidence_map.values():
if evidence.type in ("Window Door Unit", "Table"):
role = classify_table_role(evidence, take_off_result)
rows = collect_rows_for_evidence(evidence.id, take_off_result.items)
headers = [f.name for f in template.fields]
tables.append(TableModel(
table_id=str(evidence.id),
role=role,
headers=headers,
rows=rows,
bbox=parse_polygon(evidence.polygon),
))
elif evidence.type == "Key Notes":
text_contexts.append(TextContextModel(
category="general_notes",
content=evidence.ocr_text or "",
))
elif evidence.type in ("Elevation", "Floor Plan"):
image_contexts.append(ImageContextModel(
interpretation=evidence.interpretation,
))
return ExtractionResult(
tables=tables,
context=text_contexts + image_contexts,
)
classify_table_role() is new logic — see Risk R1 for the proposed heuristic.
build_user_table_schema()¶
This function translates a Cato-v2 PromptTemplate into Cartex's UserTableSchema. It extracts column names from template fields and builds per-column instruction strings from config_json.
def build_user_table_schema(template: PromptTemplate) -> UserTableSchema:
column_instructions = {}
columns = []
for field in template.fields:
config = field.config_json
parts = []
if config.get("available_values"):
parts.append(f"Valid values: {', '.join(config['available_values'])}")
if config.get("extraction_rules"):
parts.append("; ".join(config["extraction_rules"]))
if config.get("unit"):
parts.append(f"Unit: {config['unit']}")
column_instructions[field.name] = " | ".join(parts) if parts else ""
columns.append(field.name)
return UserTableSchema(
columns=columns,
column_instructions=column_instructions,
)
The analysis_ai_prompt free-text field on PromptTemplate carries global instructions (e.g., "Focus on aluminum-clad windows"). This should be prepended to each column instruction or mapped to a top-level instruction field if UserTableSchema supports one.
Key Notes OCR text currently baked into Cato-v2 prompts (generate_prompt_from_template line 1038–1041) should instead map to TextContextModel entries in the ExtractionResult.
Integration point¶
The Cartex enricher inserts into the existing DrawingAIService.analyze_item_by_source_type() pipeline in app/services/drawing_ai.py.
Current Cato-v2 flow¶
The extraction pipeline runs in eight steps:
- Load evidences for all project files
- Generate S3 image crops
- OCR Key Notes text
- Generate prompt from
PromptTemplate - Filter to schedule-type evidences
- Call Gemini via
BatchingFiles.process_files()— raw extraction - Parse results into
TakeOffResultItemrows - Save to DB, mark
TakeOffResultas statusAnalyzed
Enricher insertion¶
The enricher call belongs between step 7 (parsing) and step 8 (persistence) as a new step 7.5.
# After step 7, before step 8:
# --- Cartex enrichment ---
extraction_result = to_extraction_result(
take_off_result, evidence_map, template
)
user_schema = build_user_table_schema(template)
enriched_rows = await cartex_enricher.enrich(extraction_result, user_schema)
result_items = merge_enriched_rows(result_items, enriched_rows)
# --- End Cartex enrichment ---
# Step 8: save to DB (existing code)
The following diagram shows where enrichment fits in the pipeline.
flowchart TB
subgraph cato["Cato-v2 pipeline"]
direction TB
S1["1. Load evidences"]
S2["2. Generate S3 crops"]
S3["3. OCR Key Notes"]
S4["4. Generate prompt"]
S5["5. Filter schedule evidences"]
S6["6. Gemini batch extraction"]
S7["7. Parse into TakeOffResultItems"]
S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7
end
subgraph cartex["Cartex enrichment (new step 7.5)"]
direction TB
MAP["to_extraction_result()"]
SCHEMA["build_user_table_schema()"]
ENRICH["Enricher.enrich()"]
MERGE["merge_enriched_rows()"]
MAP --> ENRICH
SCHEMA --> ENRICH
ENRICH --> MERGE
end
subgraph save["Cato-v2 persistence"]
S8["8. Save to DB"]
end
cato --> cartex --> save
Alternative: post-analysis endpoint¶
A decoupled approach adds a new route for on-demand enrichment:
This loads existing TakeOffResultItem rows, builds the ExtractionResult from stored data, runs enrichment, and updates the items. It allows re-enrichment after manual edits but means enrichment does not happen automatically on first analysis.
Image context gap¶
Cato-v2 does not generate text interpretations of images. This is the primary blocker for T3 (legend enrichment) and T4 (dimension enrichment).
P0 blocker
Without ImageContextModel.interpretation, two of the five specialist
strategies cannot function. This must be resolved before Cartex enrichment
launch.
Current state¶
The Cato-v2 pipeline detects bounding boxes via its ML detection model, crops regions, stores PNGs in S3, and runs OCR on label sub-regions only. No step generates a rich text interpretation describing what a legend diagram, item card, or elevation drawing depicts.
Options¶
| Option | Effort | Fidelity |
|---|---|---|
A: Port Cartex's interpretation prompt — Add a Gemini vision call per non-schedule evidence after S3 crop generation. Store the result in a new interpretation TEXT column on evidences. |
Medium | High |
| B: Generate on-demand in mapping layer — The converter calls Gemini at conversion time instead of storing interpretations. | Medium | Medium |
| C: Skip T3/T4 initially — Launch without legend and dimension strategies. | Low | Reduced coverage |
Recommendation¶
Option A is recommended. Add an interpretation column to the evidences table and populate it during DrawingAIService.analyze_item_by_source_type() step 3, after S3 image crop generation. Only the interpretation prompt from Cartex's extractor needs to be retained — CONTEXT_EXTRACTION or an equivalent. The full TABLE_EXTRACTION prompt is not needed because Cato-v2's own extraction handles table detection.
Per-field prompt gap¶
Cato-v2 stores field-level instructions in prompt_template_field rows. Cartex needs these translated into UserTableSchema column instructions.
Where the vocabulary lives¶
| Source | Contains |
|---|---|
prompt_template_field.config_json.available_values |
List of valid values (e.g., ["Steel", "Brass", "Galvanized Steel"]) |
prompt_template_field.config_json.extraction_rules |
Instruction strings (e.g., ["Prioritize exact match"]) |
prompt_template_field.config_json.unit |
Measurement unit for number fields |
prompt_template_field.config_json.default_value |
Fallback value |
prompt_template.analysis_ai_prompt |
Free-text global instructions |
Reference libraries (all_product_types, all_operability, product_attributes) |
Company-specific taxonomies |
Wiring¶
The build_user_table_schema() function (see Mapping layer design) concatenates available_values, extraction_rules, and unit into a single instruction string per column. The analysis_ai_prompt global text should be prepended to each instruction or mapped to a top-level field.
Key Notes OCR text, currently appended directly to Cato-v2 extraction prompts, should flow through TextContextModel instead of being duplicated in column instructions.
Risk items¶
Seven risks were identified during the Cato-v2 audit. Three are P0 blockers, four are P1 items that should be addressed before production launch.
R1: No MAIN vs AUXILIARY table role classification¶
Cato-v2 treats all Window Door Unit and Table evidences identically. Cartex's T1 strategy depends on TableModel.role to distinguish main schedules from reference tables.
Impact. T1 will not fire or will misidentify tables, degrading auxiliary table enrichment.
Mitigation. Short-term: implement a heuristic in classify_table_role() — tables with fewer rows or headers matching known auxiliary patterns (Code, Description, Abbreviation) get role AUXILIARY; the largest table per page gets role MAIN. Long-term: train a classifier or add a user-facing toggle during evidence review.
R2: Image interpretation is completely missing¶
T3 (legend) and T4 (dimension) require ImageContextModel.interpretation. Cato-v2 stores crop images but never generates text descriptions.
Impact. Two specialist strategies are non-functional without this field.
Mitigation. Implement Option A from Image context gap. This is a blocking dependency.
R3: Result format mismatch¶
Cato-v2's TakeOffResultItem.result is a flat JSON dict. Cartex's EnrichedRow adds field_sources, confidence, and reasoning with no current storage location.
Impact. Enrichment metadata (provenance, confidence, reasoning) is lost on save.
Mitigation. Add a cartex_metadata JSON column to take_off_result_item, or store enrichment audit data in a separate linked table.
R4: Prompt template as source of truth for column list¶
UserTableSchema.columns must exactly match the keys in TableModel.rows. Mismatches between the prompt template and actual extraction output cause silent data loss.
Impact. Columns present in extraction but absent from the schema are ignored. Columns in the schema but absent from extraction stay empty without warning.
Mitigation. Add a validation step in the mapping layer that reconciles template field names against actual keys in TakeOffResultItem.result. Log warnings for unmatched fields.
R5: Batch processing architecture¶
Cato-v2 sends all schedule evidence images to Gemini in a single batch call via BatchingFiles.process_files(). The result is a flat list of rows without clear per-evidence attribution. Cartex expects rows grouped by table.
Impact. The mapping layer may struggle to associate extracted rows back to their source evidence, which is needed to build TableModel.rows correctly.
Mitigation. Use evidence_ids on TakeOffResultItem to attribute rows back to source evidences. Consider switching to per-evidence Gemini calls for cleaner table separation at the cost of more API calls.
R6: Gemini model dependency¶
Cartex targets Gemini exclusively. Cato-v2 uses both OpenAI and Gemini with model selection configured per deployment.
Impact. Enrichment calls a different model than extraction in OpenAI-based deployments.
Mitigation. Cartex enrichment should always use Gemini regardless of the extraction model. The enricher maintains its own model configuration.
R7: Product Type preservation for doors¶
For elevation-sourced items, Product Type comes from the detection model's label name. For schedule-sourced items, it depends on whether the active PromptTemplate includes a Product Type field.
Impact. Door rows may arrive at enrichment without Product Type, degrading operability and configuration mapping.
Mitigation. Ensure the default prompt template includes Product Type with comprehensive extraction rules. Add a validation warning if the active template lacks this field.
Risk register summary¶
| # | Risk | Priority |
|---|---|---|
| R1 | No MAIN vs AUXILIARY table role classification | P0 |
| R2 | Image interpretation is completely missing | P0 |
| R3 | Result format mismatch — no storage for enrichment metadata | P1 |
| R4 | Prompt template / extraction output field name alignment | P1 |
| R5 | Batch processing prevents per-table row attribution | P1 |
| R6 | Gemini model dependency | P1 |
| R7 | Product Type field may be absent from template | P1 |
Blocking dependencies¶
The following items must be completed before Cartex enrichment can operate in Cato-v2.
| # | Item | Section | Priority |
|---|---|---|---|
| 1 | Add interpretation column + Gemini vision call for non-schedule evidences |
Image context gap | P0 |
| 2 | Implement classify_table_role() (MAIN/AUXILIARY) |
Mapping layer design, R1 | P0 |
| 3 | Build mapping layer (to_extraction_result, build_user_table_schema) |
Mapping layer design, Per-field prompt gap | P0 |
| 4 | Add cartex_metadata storage for enrichment audit fields |
R3 | P1 |
| 5 | Validate field name alignment between template and extraction output | R4 | P1 |
| 6 | Ensure default template includes Product Type field |
R7 | P1 |