Skip to content

Porting plan

This page describes how to integrate Cartex's enrichment pipeline into the Cato-v2 production system. It covers the data model mapping, integration point, known gaps, and risk items identified during a Cato-v2 codebase audit.

Cato-v2 extraction output

Cato-v2 does not produce a single structured ExtractionResult. Extraction data is distributed across database tables and in-memory structures.

Current data model

Cato-v2 concept Storage Key fields
Evidence DB row per detected region id, type, polygon (bbox), file_key (S3 crop), ocr_text, sub_text
TakeOffResultItem DB row per extracted row result (JSON blob), evidence_id, evidence_ids, project_file_id
TakeOffResult Parent container original_result (full Gemini response), template_id, project_file_ids

Evidence type mapping

Each Evidence.type string maps to a Cartex concept.

Evidence.type Cartex analog
"Window Door Unit" TableModel with role MAIN
"Table" TableModel — role MAIN or AUXILIARY (no distinction today)
"Elevation" ImageContextModel
"Floor Plan" ImageContextModel
"Key Notes" TextContextModel (category general_note)
"Drawing Index" No Cartex analog
"Title Info" No Cartex analog

Result item JSON shape

Each TakeOffResultItem.result is a flat JSON dict with top-level keys matching the active PromptTemplate fields. Nested fields use dot-separated paths (e.g., Glass.Type, Glass.Arrangement.Configuration).

{
  "Label": "W-1",
  "Product": "Window",
  "Product Type": "Direct Set / Picture / Fixed",
  "Operability": "Fixed",
  "Width": 36,
  "Height": 48,
  "Quantity": 1,
  "Frame": { "Profile": "", "Material": "" },
  "Special Notes": "",
  "Source Type": "Image"
}

Mapping to ExtractionResult

Cartex field Cato-v2 source Gap
TableModel.table_id Evidence.id None
TableModel.role Evidence.type No MAIN vs AUXILIARY distinction
TableModel.headers PromptTemplateField names Indirect — needs extraction from template
TableModel.rows TakeOffResultItem.result entries Available, needs reshaping
TableModel.bbox Evidence.polygon Available
TextContextModel.content Evidence.ocr_text where type is Key Notes Available
TextContextModel.category Hardcoded to notes Needs mapping logic
ImageContextModel.interpretation Not available Critical gap — see below

Mapping layer design

A converter module translates Cato-v2's post-extraction state into Cartex's ExtractionResult. Two functions are needed: to_extraction_result() for the extraction data and build_user_table_schema() for the output schema.

to_extraction_result()

This function accepts a TakeOffResult, its associated evidences, and the active template. It iterates over evidences, classifies each by type, and builds the corresponding Cartex model.

def to_extraction_result(
    take_off_result: TakeOffResult,
    evidence_map: dict[int, Evidence],
    template: PromptTemplate,
) -> ExtractionResult:
    tables = []
    text_contexts = []
    image_contexts = []

    for evidence in evidence_map.values():
        if evidence.type in ("Window Door Unit", "Table"):
            role = classify_table_role(evidence, take_off_result)
            rows = collect_rows_for_evidence(evidence.id, take_off_result.items)
            headers = [f.name for f in template.fields]
            tables.append(TableModel(
                table_id=str(evidence.id),
                role=role,
                headers=headers,
                rows=rows,
                bbox=parse_polygon(evidence.polygon),
            ))
        elif evidence.type == "Key Notes":
            text_contexts.append(TextContextModel(
                category="general_notes",
                content=evidence.ocr_text or "",
            ))
        elif evidence.type in ("Elevation", "Floor Plan"):
            image_contexts.append(ImageContextModel(
                interpretation=evidence.interpretation,
            ))

    return ExtractionResult(
        tables=tables,
        context=text_contexts + image_contexts,
    )

classify_table_role() is new logic — see Risk R1 for the proposed heuristic.

build_user_table_schema()

This function translates a Cato-v2 PromptTemplate into Cartex's UserTableSchema. It extracts column names from template fields and builds per-column instruction strings from config_json.

def build_user_table_schema(template: PromptTemplate) -> UserTableSchema:
    column_instructions = {}
    columns = []

    for field in template.fields:
        config = field.config_json
        parts = []
        if config.get("available_values"):
            parts.append(f"Valid values: {', '.join(config['available_values'])}")
        if config.get("extraction_rules"):
            parts.append("; ".join(config["extraction_rules"]))
        if config.get("unit"):
            parts.append(f"Unit: {config['unit']}")

        column_instructions[field.name] = " | ".join(parts) if parts else ""
        columns.append(field.name)

    return UserTableSchema(
        columns=columns,
        column_instructions=column_instructions,
    )

The analysis_ai_prompt free-text field on PromptTemplate carries global instructions (e.g., "Focus on aluminum-clad windows"). This should be prepended to each column instruction or mapped to a top-level instruction field if UserTableSchema supports one.

Key Notes OCR text currently baked into Cato-v2 prompts (generate_prompt_from_template line 1038–1041) should instead map to TextContextModel entries in the ExtractionResult.

Integration point

The Cartex enricher inserts into the existing DrawingAIService.analyze_item_by_source_type() pipeline in app/services/drawing_ai.py.

Current Cato-v2 flow

The extraction pipeline runs in eight steps:

  1. Load evidences for all project files
  2. Generate S3 image crops
  3. OCR Key Notes text
  4. Generate prompt from PromptTemplate
  5. Filter to schedule-type evidences
  6. Call Gemini via BatchingFiles.process_files() — raw extraction
  7. Parse results into TakeOffResultItem rows
  8. Save to DB, mark TakeOffResult as status Analyzed

Enricher insertion

The enricher call belongs between step 7 (parsing) and step 8 (persistence) as a new step 7.5.

# After step 7, before step 8:

# --- Cartex enrichment ---
extraction_result = to_extraction_result(
    take_off_result, evidence_map, template
)
user_schema = build_user_table_schema(template)
enriched_rows = await cartex_enricher.enrich(extraction_result, user_schema)
result_items = merge_enriched_rows(result_items, enriched_rows)
# --- End Cartex enrichment ---

# Step 8: save to DB (existing code)

The following diagram shows where enrichment fits in the pipeline.

flowchart TB
    subgraph cato["Cato-v2 pipeline"]
        direction TB
        S1["1. Load evidences"]
        S2["2. Generate S3 crops"]
        S3["3. OCR Key Notes"]
        S4["4. Generate prompt"]
        S5["5. Filter schedule evidences"]
        S6["6. Gemini batch extraction"]
        S7["7. Parse into TakeOffResultItems"]
        S1 --> S2 --> S3 --> S4 --> S5 --> S6 --> S7
    end

    subgraph cartex["Cartex enrichment (new step 7.5)"]
        direction TB
        MAP["to_extraction_result()"]
        SCHEMA["build_user_table_schema()"]
        ENRICH["Enricher.enrich()"]
        MERGE["merge_enriched_rows()"]
        MAP --> ENRICH
        SCHEMA --> ENRICH
        ENRICH --> MERGE
    end

    subgraph save["Cato-v2 persistence"]
        S8["8. Save to DB"]
    end

    cato --> cartex --> save

Alternative: post-analysis endpoint

A decoupled approach adds a new route for on-demand enrichment:

POST /api/v1/take-off/{take_off_id}/enrich

This loads existing TakeOffResultItem rows, builds the ExtractionResult from stored data, runs enrichment, and updates the items. It allows re-enrichment after manual edits but means enrichment does not happen automatically on first analysis.

Image context gap

Cato-v2 does not generate text interpretations of images. This is the primary blocker for T3 (legend enrichment) and T4 (dimension enrichment).

P0 blocker

Without ImageContextModel.interpretation, two of the five specialist strategies cannot function. This must be resolved before Cartex enrichment launch.

Current state

The Cato-v2 pipeline detects bounding boxes via its ML detection model, crops regions, stores PNGs in S3, and runs OCR on label sub-regions only. No step generates a rich text interpretation describing what a legend diagram, item card, or elevation drawing depicts.

Options

Option Effort Fidelity
A: Port Cartex's interpretation prompt — Add a Gemini vision call per non-schedule evidence after S3 crop generation. Store the result in a new interpretation TEXT column on evidences. Medium High
B: Generate on-demand in mapping layer — The converter calls Gemini at conversion time instead of storing interpretations. Medium Medium
C: Skip T3/T4 initially — Launch without legend and dimension strategies. Low Reduced coverage

Recommendation

Option A is recommended. Add an interpretation column to the evidences table and populate it during DrawingAIService.analyze_item_by_source_type() step 3, after S3 image crop generation. Only the interpretation prompt from Cartex's extractor needs to be retained — CONTEXT_EXTRACTION or an equivalent. The full TABLE_EXTRACTION prompt is not needed because Cato-v2's own extraction handles table detection.

Per-field prompt gap

Cato-v2 stores field-level instructions in prompt_template_field rows. Cartex needs these translated into UserTableSchema column instructions.

Where the vocabulary lives

Source Contains
prompt_template_field.config_json.available_values List of valid values (e.g., ["Steel", "Brass", "Galvanized Steel"])
prompt_template_field.config_json.extraction_rules Instruction strings (e.g., ["Prioritize exact match"])
prompt_template_field.config_json.unit Measurement unit for number fields
prompt_template_field.config_json.default_value Fallback value
prompt_template.analysis_ai_prompt Free-text global instructions
Reference libraries (all_product_types, all_operability, product_attributes) Company-specific taxonomies

Wiring

The build_user_table_schema() function (see Mapping layer design) concatenates available_values, extraction_rules, and unit into a single instruction string per column. The analysis_ai_prompt global text should be prepended to each instruction or mapped to a top-level field.

Key Notes OCR text, currently appended directly to Cato-v2 extraction prompts, should flow through TextContextModel instead of being duplicated in column instructions.

Risk items

Seven risks were identified during the Cato-v2 audit. Three are P0 blockers, four are P1 items that should be addressed before production launch.

R1: No MAIN vs AUXILIARY table role classification

Cato-v2 treats all Window Door Unit and Table evidences identically. Cartex's T1 strategy depends on TableModel.role to distinguish main schedules from reference tables.

Impact. T1 will not fire or will misidentify tables, degrading auxiliary table enrichment.

Mitigation. Short-term: implement a heuristic in classify_table_role() — tables with fewer rows or headers matching known auxiliary patterns (Code, Description, Abbreviation) get role AUXILIARY; the largest table per page gets role MAIN. Long-term: train a classifier or add a user-facing toggle during evidence review.

R2: Image interpretation is completely missing

T3 (legend) and T4 (dimension) require ImageContextModel.interpretation. Cato-v2 stores crop images but never generates text descriptions.

Impact. Two specialist strategies are non-functional without this field.

Mitigation. Implement Option A from Image context gap. This is a blocking dependency.

R3: Result format mismatch

Cato-v2's TakeOffResultItem.result is a flat JSON dict. Cartex's EnrichedRow adds field_sources, confidence, and reasoning with no current storage location.

Impact. Enrichment metadata (provenance, confidence, reasoning) is lost on save.

Mitigation. Add a cartex_metadata JSON column to take_off_result_item, or store enrichment audit data in a separate linked table.

R4: Prompt template as source of truth for column list

UserTableSchema.columns must exactly match the keys in TableModel.rows. Mismatches between the prompt template and actual extraction output cause silent data loss.

Impact. Columns present in extraction but absent from the schema are ignored. Columns in the schema but absent from extraction stay empty without warning.

Mitigation. Add a validation step in the mapping layer that reconciles template field names against actual keys in TakeOffResultItem.result. Log warnings for unmatched fields.

R5: Batch processing architecture

Cato-v2 sends all schedule evidence images to Gemini in a single batch call via BatchingFiles.process_files(). The result is a flat list of rows without clear per-evidence attribution. Cartex expects rows grouped by table.

Impact. The mapping layer may struggle to associate extracted rows back to their source evidence, which is needed to build TableModel.rows correctly.

Mitigation. Use evidence_ids on TakeOffResultItem to attribute rows back to source evidences. Consider switching to per-evidence Gemini calls for cleaner table separation at the cost of more API calls.

R6: Gemini model dependency

Cartex targets Gemini exclusively. Cato-v2 uses both OpenAI and Gemini with model selection configured per deployment.

Impact. Enrichment calls a different model than extraction in OpenAI-based deployments.

Mitigation. Cartex enrichment should always use Gemini regardless of the extraction model. The enricher maintains its own model configuration.

R7: Product Type preservation for doors

For elevation-sourced items, Product Type comes from the detection model's label name. For schedule-sourced items, it depends on whether the active PromptTemplate includes a Product Type field.

Impact. Door rows may arrive at enrichment without Product Type, degrading operability and configuration mapping.

Mitigation. Ensure the default prompt template includes Product Type with comprehensive extraction rules. Add a validation warning if the active template lacks this field.

Risk register summary

# Risk Priority
R1 No MAIN vs AUXILIARY table role classification P0
R2 Image interpretation is completely missing P0
R3 Result format mismatch — no storage for enrichment metadata P1
R4 Prompt template / extraction output field name alignment P1
R5 Batch processing prevents per-table row attribution P1
R6 Gemini model dependency P1
R7 Product Type field may be absent from template P1

Blocking dependencies

The following items must be completed before Cartex enrichment can operate in Cato-v2.

# Item Section Priority
1 Add interpretation column + Gemini vision call for non-schedule evidences Image context gap P0
2 Implement classify_table_role() (MAIN/AUXILIARY) Mapping layer design, R1 P0
3 Build mapping layer (to_extraction_result, build_user_table_schema) Mapping layer design, Per-field prompt gap P0
4 Add cartex_metadata storage for enrichment audit fields R3 P1
5 Validate field name alignment between template and extraction output R4 P1
6 Ensure default template includes Product Type field R7 P1