Porting plan¶

This page describes how to integrate Cartex's enrichment pipeline into the Cato-v2 production system. It covers the data model mapping, integration point, known gaps, and risk items identified during a Cato-v2 codebase audit.

Cato-v2 extraction output¶

Cato-v2 does not produce a single structured ExtractionResult. Extraction data is distributed across database tables and in-memory structures.

Current data model¶

Cato-v2 concept	Storage	Key fields
`Evidence`	DB row per detected region	`id`, `type`, `polygon` (bbox), `file_key` (S3 crop), `ocr_text`, `sub_text`
`TakeOffResultItem`	DB row per extracted row	`result` (JSON blob), `evidence_id`, `evidence_ids`, `project_file_id`
`TakeOffResult`	Parent container	`original_result` (full Gemini response), `template_id`, `project_file_ids`

Evidence type mapping¶

Each Evidence.type string maps to a Cartex concept.

`Evidence.type`	Cartex analog
`"Window Door Unit"`	`TableModel` with role `MAIN`
`"Table"`	`TableModel` — role `MAIN` or `AUXILIARY` (no distinction today)
`"Elevation"`	`ImageContextModel`
`"Floor Plan"`	`ImageContextModel`
`"Key Notes"`	`TextContextModel` (category `general_note`)
`"Drawing Index"`	No Cartex analog
`"Title Info"`	No Cartex analog

Result item JSON shape¶

Each TakeOffResultItem.result is a flat JSON dict with top-level keys matching the active PromptTemplate fields. Nested fields use dot-separated paths (e.g., Glass.Type, Glass.Arrangement.Configuration).

{
  "Label": "W-1",
  "Product": "Window",
  "Product Type": "Direct Set / Picture / Fixed",
  "Operability": "Fixed",
  "Width": 36,
  "Height": 48,
  "Quantity": 1,
  "Frame": { "Profile": "", "Material": "" },
  "Special Notes": "",
  "Source Type": "Image"
}

Mapping to `ExtractionResult`¶

Cartex field	Cato-v2 source	Gap
`TableModel.table_id`	`Evidence.id`	None
`TableModel.role`	`Evidence.type`	No MAIN vs AUXILIARY distinction
`TableModel.headers`	`PromptTemplateField` names	Indirect — needs extraction from template
`TableModel.rows`	`TakeOffResultItem.result` entries	Available, needs reshaping
`TableModel.bbox`	`Evidence.polygon`	Available
`TextContextModel.content`	`Evidence.ocr_text` where type is `Key Notes`	Available
`TextContextModel.category`	Hardcoded to `notes`	Needs mapping logic
`ImageContextModel.interpretation`	Not available	Critical gap — see below

Mapping layer design¶

A converter module translates Cato-v2's post-extraction state into Cartex's ExtractionResult. Two functions are needed: to_extraction_result() for the extraction data and build_user_table_schema() for the output schema.

`to_extraction_result()`¶

This function accepts a TakeOffResult, its associated evidences, and the active template. It iterates over evidences, classifies each by type, and builds the corresponding Cartex model.

def to_extraction_result(
    take_off_result: TakeOffResult,
    evidence_map: dict[int, Evidence],
    template: PromptTemplate,
) -> ExtractionResult:
    tables = []
    text_contexts = []
    image_contexts = []

    for evidence in evidence_map.values():
        if evidence.type in ("Window Door Unit", "Table"):
            role = classify_table_role(evidence, take_off_result)
            rows = collect_rows_for_evidence(evidence.id, take_off_result.items)
            headers = [f.name for f in template.fields]
            tables.append(TableModel(
                table_id=str(evidence.id),
                role=role,
                headers=headers,
                rows=rows,
                bbox=parse_polygon(evidence.polygon),
            ))
        elif evidence.type == "Key Notes":
            text_contexts.append(TextContextModel(
                category="general_notes",
                content=evidence.ocr_text or "",
            ))
        elif evidence.type in ("Elevation", "Floor Plan"):
            image_contexts.append(ImageContextModel(
                interpretation=evidence.interpretation,
            ))

    return ExtractionResult(
        tables=tables,
        context=text_contexts + image_contexts,
    )

classify_table_role() is new logic — see Risk R1 for the proposed heuristic.

`build_user_table_schema()`¶

This function translates a Cato-v2 PromptTemplate into Cartex's UserTableSchema. It extracts column names from template fields and builds per-column instruction strings from config_json.

def build_user_table_schema(template: PromptTemplate) -> UserTableSchema:
    column_instructions = {}
    columns = []

    for field in template.fields:
        config = field.config_json
        parts = []
        if config.get("available_values"):
            parts.append(f"Valid values: {', '.join(config['available_values'])}")
        if config.get("extraction_rules"):
            parts.append("; ".join(config["extraction_rules"]))
        if config.get("unit"):
            parts.append(f"Unit: {config['unit']}")

        column_instructions[field.name] = " | ".join(parts) if parts else ""
        columns.append(field.name)

    return UserTableSchema(
        columns=columns,
        column_instructions=column_instructions,
    )

The analysis_ai_prompt free-text field on PromptTemplate carries global instructions (e.g., "Focus on aluminum-clad windows"). This should be prepended to each column instruction or mapped to a top-level instruction field if UserTableSchema supports one.

Key Notes OCR text currently baked into Cato-v2 prompts (generate_prompt_from_template line 1038–1041) should instead map to TextContextModel entries in the ExtractionResult.

Integration point¶

The Cartex enricher inserts into the existing DrawingAIService.analyze_item_by_source_type() pipeline in app/services/drawing_ai.py.

Current Cato-v2 flow¶

The extraction pipeline runs in eight steps:

Load evidences for all project files
Generate S3 image crops
OCR Key Notes text
Generate prompt from PromptTemplate
Filter to schedule-type evidences
Call Gemini via BatchingFiles.process_files() — raw extraction
Parse results into TakeOffResultItem rows
Save to DB, mark TakeOffResult as status Analyzed

Enricher insertion¶

The enricher call belongs between step 7 (parsing) and step 8 (persistence) as a new step 7.5.

# After step 7, before step 8:

# --- Cartex enrichment ---
extraction_result = to_extraction_result(
    take_off_result, evidence_map, template
)
user_schema = build_user_table_schema(template)
enriched_rows = await cartex_enricher.enrich(extraction_result, user_schema)
result_items = merge_enriched_rows(result_items, enriched_rows)
# --- End Cartex enrichment ---

# Step 8: save to DB (existing code)

The following diagram shows where enrichment fits in the pipeline.

flowchart TB
    subgraph cato["Cato-v2 pipeline"]
        direction TB
        S1["1. Load evidences"]
        S2["2. Generate S3 crops"]
        S3["3. OCR Key Notes"]
        S4["4. Generate prompt"]
        S5["5. Filter schedule evidences"]
        S6["6. Gemini batch extraction"]
        S7["7. Parse into TakeOffResultItems"]
        S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7
    end

    subgraph cartex["Cartex enrichment (new step 7.5)"]
        direction TB
        MAP["to_extraction_result()"]
        SCHEMA["build_user_table_schema()"]
        ENRICH["Enricher.enrich()"]
        MERGE["merge_enriched_rows()"]
        MAP --> ENRICH
        SCHEMA --> ENRICH
        ENRICH --> MERGE
    end

    subgraph save["Cato-v2 persistence"]
        S8["8. Save to DB"]
    end

    cato --> cartex --> save

Alternative: post-analysis endpoint¶

A decoupled approach adds a new route for on-demand enrichment:

POST /api/v1/take-off/{take_off_id}/enrich

This loads existing TakeOffResultItem rows, builds the ExtractionResult from stored data, runs enrichment, and updates the items. It allows re-enrichment after manual edits but means enrichment does not happen automatically on first analysis.

Image context gap¶

Cato-v2 does not generate text interpretations of images. This is the primary blocker for T3 (legend enrichment) and T4 (dimension enrichment).

P0 blocker

Without ImageContextModel.interpretation, two of the five specialist strategies cannot function. This must be resolved before Cartex enrichment launch.

Current state¶

The Cato-v2 pipeline detects bounding boxes via its ML detection model, crops regions, stores PNGs in S3, and runs OCR on label sub-regions only. No step generates a rich text interpretation describing what a legend diagram, item card, or elevation drawing depicts.

Options¶

Option	Effort	Fidelity
A: Port Cartex's interpretation prompt — Add a Gemini vision call per non-schedule evidence after S3 crop generation. Store the result in a new `interpretation` TEXT column on `evidences`.	Medium	High
B: Generate on-demand in mapping layer — The converter calls Gemini at conversion time instead of storing interpretations.	Medium	Medium
C: Skip T3/T4 initially — Launch without legend and dimension strategies.	Low	Reduced coverage

Recommendation¶

Option A is recommended. Add an interpretation column to the evidences table and populate it during DrawingAIService.analyze_item_by_source_type() step 3, after S3 image crop generation. Only the interpretation prompt from Cartex's extractor needs to be retained — CONTEXT_EXTRACTION or an equivalent. The full TABLE_EXTRACTION prompt is not needed because Cato-v2's own extraction handles table detection.

Per-field prompt gap¶

Cato-v2 stores field-level instructions in prompt_template_field rows. Cartex needs these translated into UserTableSchema column instructions.

Where the vocabulary lives¶

Source	Contains
`prompt_template_field.config_json.available_values`	List of valid values (e.g., `["Steel", "Brass", "Galvanized Steel"]`)
`prompt_template_field.config_json.extraction_rules`	Instruction strings (e.g., `["Prioritize exact match"]`)
`prompt_template_field.config_json.unit`	Measurement unit for number fields
`prompt_template_field.config_json.default_value`	Fallback value
`prompt_template.analysis_ai_prompt`	Free-text global instructions
Reference libraries (`all_product_types`, `all_operability`, `product_attributes`)	Company-specific taxonomies

Wiring¶

The build_user_table_schema() function (see Mapping layer design) concatenates available_values, extraction_rules, and unit into a single instruction string per column. The analysis_ai_prompt global text should be prepended to each instruction or mapped to a top-level field.

Key Notes OCR text, currently appended directly to Cato-v2 extraction prompts, should flow through TextContextModel instead of being duplicated in column instructions.

Risk items¶

Seven risks were identified during the Cato-v2 audit. Three are P0 blockers, four are P1 items that should be addressed before production launch.

R1: No MAIN vs AUXILIARY table role classification¶

Cato-v2 treats all Window Door Unit and Table evidences identically. Cartex's T1 strategy depends on TableModel.role to distinguish main schedules from reference tables.

Impact. T1 will not fire or will misidentify tables, degrading auxiliary table enrichment.

Mitigation. Short-term: implement a heuristic in classify_table_role() — tables with fewer rows or headers matching known auxiliary patterns (Code, Description, Abbreviation) get role AUXILIARY; the largest table per page gets role MAIN. Long-term: train a classifier or add a user-facing toggle during evidence review.

R2: Image interpretation is completely missing¶

T3 (legend) and T4 (dimension) require ImageContextModel.interpretation. Cato-v2 stores crop images but never generates text descriptions.

Impact. Two specialist strategies are non-functional without this field.

Mitigation. Implement Option A from Image context gap. This is a blocking dependency.

R3: Result format mismatch¶

Cato-v2's TakeOffResultItem.result is a flat JSON dict. Cartex's EnrichedRow adds field_sources, confidence, and reasoning with no current storage location.

Impact. Enrichment metadata (provenance, confidence, reasoning) is lost on save.

Mitigation. Add a cartex_metadata JSON column to take_off_result_item, or store enrichment audit data in a separate linked table.

R4: Prompt template as source of truth for column list¶

UserTableSchema.columns must exactly match the keys in TableModel.rows. Mismatches between the prompt template and actual extraction output cause silent data loss.

Impact. Columns present in extraction but absent from the schema are ignored. Columns in the schema but absent from extraction stay empty without warning.

Mitigation. Add a validation step in the mapping layer that reconciles template field names against actual keys in TakeOffResultItem.result. Log warnings for unmatched fields.

R5: Batch processing architecture¶

Cato-v2 sends all schedule evidence images to Gemini in a single batch call via BatchingFiles.process_files(). The result is a flat list of rows without clear per-evidence attribution. Cartex expects rows grouped by table.

Impact. The mapping layer may struggle to associate extracted rows back to their source evidence, which is needed to build TableModel.rows correctly.

Mitigation. Use evidence_ids on TakeOffResultItem to attribute rows back to source evidences. Consider switching to per-evidence Gemini calls for cleaner table separation at the cost of more API calls.

R6: Gemini model dependency¶

Cartex targets Gemini exclusively. Cato-v2 uses both OpenAI and Gemini with model selection configured per deployment.

Impact. Enrichment calls a different model than extraction in OpenAI-based deployments.

Mitigation. Cartex enrichment should always use Gemini regardless of the extraction model. The enricher maintains its own model configuration.

R7: Product Type preservation for doors¶

For elevation-sourced items, Product Type comes from the detection model's label name. For schedule-sourced items, it depends on whether the active PromptTemplate includes a Product Type field.

Impact. Door rows may arrive at enrichment without Product Type, degrading operability and configuration mapping.

Mitigation. Ensure the default prompt template includes Product Type with comprehensive extraction rules. Add a validation warning if the active template lacks this field.

Risk register summary¶

#	Risk	Priority
R1	No MAIN vs AUXILIARY table role classification	P0
R2	Image interpretation is completely missing	P0
R3	Result format mismatch — no storage for enrichment metadata	P1
R4	Prompt template / extraction output field name alignment	P1
R5	Batch processing prevents per-table row attribution	P1
R6	Gemini model dependency	P1
R7	Product Type field may be absent from template	P1

Blocking dependencies¶

The following items must be completed before Cartex enrichment can operate in Cato-v2.

#	Item	Section	Priority
1	Add `interpretation` column + Gemini vision call for non-schedule evidences	Image context gap	P0
2	Implement `classify_table_role()` (MAIN/AUXILIARY)	Mapping layer design, R1	P0
3	Build mapping layer (`to_extraction_result`, `build_user_table_schema`)	Mapping layer design, Per-field prompt gap	P0
4	Add `cartex_metadata` storage for enrichment audit fields	R3	P1
5	Validate field name alignment between template and extraction output	R4	P1
6	Ensure default template includes `Product Type` field	R7	P1