Using the Gradio interface¶
The Gradio interface provides a browser-based way to run the Cartex pipeline against a PDF document and inspect enriched results without writing any code. It is intended for local developer testing and QA runs.
Starting the interface¶
Launch the interface from the repository root:
Gradio starts a local server on port 7860 by default and opens a browser tab automatically. If the browser does not open, navigate to http://localhost:7860.
Note
Run this command from the repository root. Do not set PYTHONPATH=src — Cartex uses src.-prefixed imports and setting that variable creates a double-module identity bug where enum comparisons silently fail.
Running independent UI instances¶
Cartex supports multiple independent UI processes. Start each instance with a unique port and instance ID.
cartex ui --port 7860 --instance-id ui_a --debug-dir debug/ui_a
cartex ui --port 7861 --instance-id ui_b --debug-dir debug/ui_b
To launch multiple instances automatically:
This prints a URL for each instance (for example http://127.0.0.1:7860, :7861, :7862) and creates per-instance debug directories automatically.
By default, each ui-multi launch gets a unique debug session root:
debug/ui_multi_<timestamp>/
Each instance writes terminal logs to:
debug/ui_multi_<timestamp>/<instance_id>/ui.log
Example:
tail -f debug/ui_multi_20260413_011500_123456/ui_01/ui.log
tail -f debug/ui_multi_20260413_011500_123456/ui_02/ui.log
Use --inherit-logs if you want all instance logs streamed into the current terminal instead.
In --inherit-logs mode, each line is prefixed so source is explicit:
[stdout][ui_01] ...[stderr][ui_02] ...
Flags:
--port: binds each UI instance to a different port--host: network interface (default127.0.0.1)--instance-id: stamps run metadata and run IDs for that process--debug-dir: sends artifacts to an isolated directory--debug-root: sets a custom root forui-multiinstance folders; default is a unique session root per launch
This prevents artifact collisions during concurrent QA sessions.
Uploading a document¶
Use the Upload PDF file picker to select a PDF document. After upload, the Page Preview gallery renders all pages in the selected page range. The preview updates automatically when the Page Numbers field changes.
Page numbers¶
The Page Numbers field accepts comma-separated page numbers, ranges, or a combination:
| Input | Pages processed |
|---|---|
1 |
Page 1 only |
1,3,5 |
Pages 1, 3, and 5 |
1-5 |
Pages 1 through 5 inclusive |
1,3-5 |
Pages 1, 3, 4, and 5 |
Page numbers are 1-indexed. When multiple pages are specified, the pipeline calls extract_pages(), which runs a multi-page extraction pass that deduplicates context items across pages and merges all detected tables into a single ExtractionResult.
Template selection¶
The Template dropdown controls which column schema is applied during enrichment. Each option maps to a TemplateType enum value and a fixed base column list defined in src/templates.py.
| Display name | TemplateType |
Use when |
|---|---|---|
| Standard Takeoff | STANDARD_TAKEOFF |
Standard window/door schedule with operability, material, and rough opening |
| Standard Takeoff + TDL/SDL | STANDARD_TAKEOFF_TDL |
Standard schedule that also tracks divided light types (Dividers TDL Type, Dividers SDL Type) |
| Glass Schedule | GLASS_SCHEDULE |
Dedicated glass schedules with layer, brand, arrangement, and spacer columns |
| Shop Details | SHOP_DETAILS |
Shop drawing detail sheets with frame profile, hardware, finish, and installation columns |
Additional columns¶
The Additional Columns checkbox group lets you append fields from FIELD_LIBRARY to the base template columns. FIELD_LIBRARY is the full set of known fields defined in src/templates.py.
Selected columns are appended after the template's default columns in the output. Columns already present in the selected template are silently deduplicated — selecting Special Notes when using Glass Schedule has no effect.
Runtime options¶
The UI provides two execution toggles:
- High Accuracy Tables (BBox Crop)
Enables table detection + per-bbox crop extraction during Stage 1. This affects table extraction only; context extraction is unchanged. - Single Specialist Mode (Monolithic)
Bypasses router/specialist staging and runs the monolithicENRICHMENTprompt directly during Stage 3.
When both are enabled, extraction uses high-accuracy tables and enrichment still runs monolithic mode.
Running the pipeline¶
Click Run Pipeline to start processing. The Pipeline Log shows live progress across three stages:
[1/3]Extraction — tables and context items are detected on the specified pages. The Page Preview gallery updates with coloured bounding-box overlays for each page: blue for main tables, green for auxiliary tables, and orange for context regions. Each page image is labeled with its page number. When running on multiple pages, all pages are displayed in the gallery with their respective annotations.[2/3]Routing — in default mode, the router selects specialist strategies, plans execution order, and assigns context to each specialist. In monolithic mode, this stage is skipped.[3/3]Enrichment — runs staged specialists (default mode) or one monolithic enrichment call (monolithic mode), then reports the enriched row count.
On success, the Enriched Table displays one row per extracted schedule row. In addition to the template columns, three diagnostic columns appear:
| Column | Contents |
|---|---|
_confidence |
Numeric confidence score from the enricher |
_reasoning |
Free-text explanation of how the row was enriched |
_field_sources |
JSON object mapping each column name to the FieldSource that produced its value (e.g. auxiliary_table, text_rule, image_legend, dimension_card) |
If the pipeline fails, an Error Traceback panel appears below the table with the full Python traceback.
Running multiple documents in parallel (CLI)¶
For benchmark sweeps or regression batches, use the process-based batch runner instead of opening multiple UI tabs.
Jobs manifest format¶
Create a JSON array where each object defines one document job.
[
{
"job_id": "parklane_p1",
"file_path": "misc/Parklane.pdf",
"page_numbers": [1],
"template": "glass_schedule",
"use_table_bbox_crop": false,
"force_monolithic": false
},
{
"job_id": "kingsbrook_p1_2",
"file_path": "misc/Kingsbrook.pdf",
"page_numbers": [1, 2],
"template": "standard_takeoff",
"extra_columns": ["Source Type"]
}
]
Supported keys per job:
job_id(optional): stable identifier for artifacts and summariesfile_path(required): PDF pathpage_numbers(required): 1-indexed page listtemplate(required):TemplateTypevalue (standard_takeoff,glass_schedule, etc.)columns(optional): explicit full column list (overrides template defaults)extra_columns(optional): append-only list added after template defaultsuse_table_bbox_crop(optional): enables High Accuracy Tables for this jobforce_monolithic(optional): enables monolithic enrichment for this job
Parallelism model
run_batch uses one Python process per in-flight job (ProcessPoolExecutor). Start with a small worker count (for example 2-4) and scale based on CPU headroom and Gemini API quota.
Batch artifacts¶
Each batch run writes:
<output-dir>/summary.jsonwith per-job status (okorfailed) and artifact paths- one JSON artifact per job containing input metadata, error state, and enriched rows
This makes multi-document runs auditable without mixing outputs from different fixtures.
Reading debug output¶
Every successful run writes a run directory in the configured debug root:
- default:
debug/run_.../ - with
--instance-id <id>and no explicit--debug-dir:debug/<id>/run_.../ - with
--debug-dir <path>:<path>/run_.../
Each run directory includes:
<run_dir>/<run_id>_rows.json<run_dir>/<run_id>_contexts.json<run_dir>/<run_id>_tables.json<run_dir>/annotated-pages/(bbox-overlaid page images + manifest)
Each file wraps a shared meta object plus one payload key:
rows.json=>{ "meta": ..., "rows": [...] }contexts.json=>{ "meta": ..., "contexts": [...] }tables.json=>{ "meta": ..., "tables": [...] }annotated-pages/manifest.json=> per-page overlay image paths and dimensions
The shared meta object includes:
| Field | Type | Description |
|---|---|---|
timestamp |
string | ISO 8601 timestamp of the run |
file |
string | Absolute path to the PDF processed |
pages |
integer[] | 1-indexed page numbers processed |
template |
string | TemplateType value used (e.g. glass_schedule) |
columns |
string[] | Full column list, including any additional columns |
strategies |
string[] | Specialist strategy names that fired, or ["monolithic"] |
flags |
object | Runtime switches such as high_accuracy_tables and single_specialist_mode |
total_rows |
integer | Number of enriched rows returned |
When High Accuracy Tables is enabled, the run directory also includes table-crops/ with a manifest.json and per-table crop images.
Each entry in rows contains data (enriched values), field_sources (per-column provenance), confidence, and reasoning. field_sources and reasoning are the primary fields for diagnosing why a column received a particular value.
Tip
The run directory is the authoritative test artifact for each execution. When reporting a result or filing a bug, attach the full run_* folder (or at minimum *_rows.json, *_contexts.json, and *_tables.json).