Skip to content

Using the Gradio interface

The Gradio interface provides a browser-based way to run the Cartex pipeline against a PDF document and inspect enriched results without writing any code. It is intended for local developer testing and QA runs.

Starting the interface

Launch the interface from the repository root:

chmod +x ./cartex
export PATH="$PWD:$PATH"
cartex ui --open

Gradio starts a local server on port 7860 by default and opens a browser tab automatically. If the browser does not open, navigate to http://localhost:7860.

Note

Run this command from the repository root. Do not set PYTHONPATH=src — Cartex uses src.-prefixed imports and setting that variable creates a double-module identity bug where enum comparisons silently fail.

Running independent UI instances

Cartex supports multiple independent UI processes. Start each instance with a unique port and instance ID.

cartex ui --port 7860 --instance-id ui_a --debug-dir debug/ui_a
cartex ui --port 7861 --instance-id ui_b --debug-dir debug/ui_b

To launch multiple instances automatically:

cartex ui-multi --count 3 --start-port 7860 --open

This prints a URL for each instance (for example http://127.0.0.1:7860, :7861, :7862) and creates per-instance debug directories automatically.

By default, each ui-multi launch gets a unique debug session root:

  • debug/ui_multi_<timestamp>/

Each instance writes terminal logs to:

  • debug/ui_multi_<timestamp>/<instance_id>/ui.log

Example:

tail -f debug/ui_multi_20260413_011500_123456/ui_01/ui.log
tail -f debug/ui_multi_20260413_011500_123456/ui_02/ui.log

Use --inherit-logs if you want all instance logs streamed into the current terminal instead.

In --inherit-logs mode, each line is prefixed so source is explicit:

  • [stdout][ui_01] ...
  • [stderr][ui_02] ...

Flags:

  • --port: binds each UI instance to a different port
  • --host: network interface (default 127.0.0.1)
  • --instance-id: stamps run metadata and run IDs for that process
  • --debug-dir: sends artifacts to an isolated directory
  • --debug-root: sets a custom root for ui-multi instance folders; default is a unique session root per launch

This prevents artifact collisions during concurrent QA sessions.

Uploading a document

Use the Upload PDF file picker to select a PDF document. After upload, the Page Preview gallery renders all pages in the selected page range. The preview updates automatically when the Page Numbers field changes.

Page numbers

The Page Numbers field accepts comma-separated page numbers, ranges, or a combination:

Input Pages processed
1 Page 1 only
1,3,5 Pages 1, 3, and 5
1-5 Pages 1 through 5 inclusive
1,3-5 Pages 1, 3, 4, and 5

Page numbers are 1-indexed. When multiple pages are specified, the pipeline calls extract_pages(), which runs a multi-page extraction pass that deduplicates context items across pages and merges all detected tables into a single ExtractionResult.

Template selection

The Template dropdown controls which column schema is applied during enrichment. Each option maps to a TemplateType enum value and a fixed base column list defined in src/templates.py.

Display name TemplateType Use when
Standard Takeoff STANDARD_TAKEOFF Standard window/door schedule with operability, material, and rough opening
Standard Takeoff + TDL/SDL STANDARD_TAKEOFF_TDL Standard schedule that also tracks divided light types (Dividers TDL Type, Dividers SDL Type)
Glass Schedule GLASS_SCHEDULE Dedicated glass schedules with layer, brand, arrangement, and spacer columns
Shop Details SHOP_DETAILS Shop drawing detail sheets with frame profile, hardware, finish, and installation columns

Additional columns

The Additional Columns checkbox group lets you append fields from FIELD_LIBRARY to the base template columns. FIELD_LIBRARY is the full set of known fields defined in src/templates.py.

Selected columns are appended after the template's default columns in the output. Columns already present in the selected template are silently deduplicated — selecting Special Notes when using Glass Schedule has no effect.

Runtime options

The UI provides two execution toggles:

  • High Accuracy Tables (BBox Crop)
    Enables table detection + per-bbox crop extraction during Stage 1. This affects table extraction only; context extraction is unchanged.
  • Single Specialist Mode (Monolithic)
    Bypasses router/specialist staging and runs the monolithic ENRICHMENT prompt directly during Stage 3.

When both are enabled, extraction uses high-accuracy tables and enrichment still runs monolithic mode.

Running the pipeline

Click Run Pipeline to start processing. The Pipeline Log shows live progress across three stages:

  1. [1/3] Extraction — tables and context items are detected on the specified pages. The Page Preview gallery updates with coloured bounding-box overlays for each page: blue for main tables, green for auxiliary tables, and orange for context regions. Each page image is labeled with its page number. When running on multiple pages, all pages are displayed in the gallery with their respective annotations.
  2. [2/3] Routing — in default mode, the router selects specialist strategies, plans execution order, and assigns context to each specialist. In monolithic mode, this stage is skipped.
  3. [3/3] Enrichment — runs staged specialists (default mode) or one monolithic enrichment call (monolithic mode), then reports the enriched row count.

On success, the Enriched Table displays one row per extracted schedule row. In addition to the template columns, three diagnostic columns appear:

Column Contents
_confidence Numeric confidence score from the enricher
_reasoning Free-text explanation of how the row was enriched
_field_sources JSON object mapping each column name to the FieldSource that produced its value (e.g. auxiliary_table, text_rule, image_legend, dimension_card)

If the pipeline fails, an Error Traceback panel appears below the table with the full Python traceback.

Running multiple documents in parallel (CLI)

For benchmark sweeps or regression batches, use the process-based batch runner instead of opening multiple UI tabs.

cartex batch --jobs misc/jobs.sample.json --workers 4 --output-dir debug/batch_parklane_kingsbrook

Jobs manifest format

Create a JSON array where each object defines one document job.

[
  {
    "job_id": "parklane_p1",
    "file_path": "misc/Parklane.pdf",
    "page_numbers": [1],
    "template": "glass_schedule",
    "use_table_bbox_crop": false,
    "force_monolithic": false
  },
  {
    "job_id": "kingsbrook_p1_2",
    "file_path": "misc/Kingsbrook.pdf",
    "page_numbers": [1, 2],
    "template": "standard_takeoff",
    "extra_columns": ["Source Type"]
  }
]

Supported keys per job:

  • job_id (optional): stable identifier for artifacts and summaries
  • file_path (required): PDF path
  • page_numbers (required): 1-indexed page list
  • template (required): TemplateType value (standard_takeoff, glass_schedule, etc.)
  • columns (optional): explicit full column list (overrides template defaults)
  • extra_columns (optional): append-only list added after template defaults
  • use_table_bbox_crop (optional): enables High Accuracy Tables for this job
  • force_monolithic (optional): enables monolithic enrichment for this job

Parallelism model

run_batch uses one Python process per in-flight job (ProcessPoolExecutor). Start with a small worker count (for example 2-4) and scale based on CPU headroom and Gemini API quota.

Batch artifacts

Each batch run writes:

  • <output-dir>/summary.json with per-job status (ok or failed) and artifact paths
  • one JSON artifact per job containing input metadata, error state, and enriched rows

This makes multi-document runs auditable without mixing outputs from different fixtures.

Reading debug output

Every successful run writes a run directory in the configured debug root:

  • default: debug/run_.../
  • with --instance-id <id> and no explicit --debug-dir: debug/<id>/run_.../
  • with --debug-dir <path>: <path>/run_.../

Each run directory includes:

  • <run_dir>/<run_id>_rows.json
  • <run_dir>/<run_id>_contexts.json
  • <run_dir>/<run_id>_tables.json
  • <run_dir>/annotated-pages/ (bbox-overlaid page images + manifest)

Each file wraps a shared meta object plus one payload key:

  • rows.json => { "meta": ..., "rows": [...] }
  • contexts.json => { "meta": ..., "contexts": [...] }
  • tables.json => { "meta": ..., "tables": [...] }
  • annotated-pages/manifest.json => per-page overlay image paths and dimensions

The shared meta object includes:

Field Type Description
timestamp string ISO 8601 timestamp of the run
file string Absolute path to the PDF processed
pages integer[] 1-indexed page numbers processed
template string TemplateType value used (e.g. glass_schedule)
columns string[] Full column list, including any additional columns
strategies string[] Specialist strategy names that fired, or ["monolithic"]
flags object Runtime switches such as high_accuracy_tables and single_specialist_mode
total_rows integer Number of enriched rows returned

When High Accuracy Tables is enabled, the run directory also includes table-crops/ with a manifest.json and per-table crop images.

Each entry in rows contains data (enriched values), field_sources (per-column provenance), confidence, and reasoning. field_sources and reasoning are the primary fields for diagnosing why a column received a particular value.

Tip

The run directory is the authoritative test artifact for each execution. When reporting a result or filing a bug, attach the full run_* folder (or at minimum *_rows.json, *_contexts.json, and *_tables.json).