Benchmark Workflow
This page covers the batch evaluation tools provided by epica, including case discovery, repeated execution, and summary analysis. The main commands on this page are epa_bench and epa_benchall.
Benchmark Scope
epica includes a benchmark harness for large-scale case evaluation. The main goal is to run the same evaluation workflow over many prepared cases and collect consistent summaries.
In the current repository, the primary benchmark workflow uses a generic cases root that contains benchmark/ and GT/.
Multi-Case Harness
epa_bench runs an independent benchmark over cases discovered from a cases-root directory layout.
It:
- discover benchmark cases from prepared dataset and ground-truth directories
- run
epaon each case - write per-case JSON records, logs, and summary tables
Required Inputs
It is recommended to keep datasets outside the repository (for example, /home/username/epa_data) and manage paths through environment variables:
export EPA_DATA_ROOT=/home/username/epa_data
epa_bench defaults to $EPA_CASES_ROOT; if it is unset, it falls back to $EPA_ALIGNANYTHING_ROOT, and then to $EPA_DATA_ROOT/benchmark_cases.
Before running the harness, make sure you have:
- a cases root with
benchmark/andGT/ - a Python 3.10+ environment that can run
epica
The common case only needs the cases root. These path options remain available for scripts and unusual environments:
--cases-root--repo-root--python-bin--epa-src--jobs
Supported Case Layouts
epa_bench discovers trajectory files under benchmark/ and matches each sequence to a GT file under GT/.
The original layout is still supported:
cases_root/
├── benchmark/<dataset>/pose/<method>/<sequence>/*_poses.txt
└── GT/**/<sequence>.txt
The pose/ directory is optional. These layouts are also accepted:
cases_root/
├── benchmark/<dataset>/<method>/<sequence>/trajectory.txt
├── benchmark/<dataset>/<method>/<sequence>_poses.txt
└── GT/**/<sequence>.txt
One <method> directory can contain many sequences. For example:
cases_root/
├── benchmark/<dataset>/<method>/seq_01_poses.txt
├── benchmark/<dataset>/<method>/seq_02_poses.txt
├── benchmark/<dataset>/<method>/seq_03/trajectory.txt
└── GT/**/seq_01.txt
GT files can use .txt, .tum, or .csv. Estimation files can use common trajectory names such as *_poses.txt, trajectory.txt, .tum, or .csv.
Basic Run
If you want the main batch benchmark workflow, start here:
epa_bench /home/username/epa_data/benchmark_cases
The default is --jobs auto, which uses the available CPU cores without exceeding the number of cases. Add --jobs N when you want to override it.
This command:
- discover benchmark cases
- prepare temporary TUM trajectories for each case
- run
epaon each case - aggregate the per-case outputs into a summary table
Full Multi-Case Workflow
Use epa_benchall if you want the full batch workflow in one command:
- run the benchmark harness
- generate summary plots
- generate LaTeX tables
Typical command:
epa_benchall /home/username/epa_data/benchmark_cases
Per-Case Visualization (Raw / GT / Aligned)
For each dataset case, use the main pipeline with --rerun to visualize:
- GT trajectory
- raw trajectory after step-1 time sync
- aligned trajectory after step-3 world alignment
Quick command (recommended):
epa_rerun \
--run-dir outputs/<cases_root_name>_bench \
--case euroc_mav_MH_01_easy_rovio
The command resolves the latest run_* automatically when --run-dir points to the parent harness directory.
Direct epa command (manual paths):
epa /path/to/gt.tum /path/to/est.tum --rerun
Useful Filters
Use these options for smaller or more targeted runs:
--case-pattern: regex filter for case IDs--methods: comma-separated method filter such asrovio,svo_stereo--limit: cap the number of cases--dry-run: discover cases without executing them
Examples:
List matching cases only:
epa_bench \
/home/username/epa_data/benchmark_cases \
--case-pattern euroc \
--dry-run
Run only a subset of methods:
epa_bench \
/home/username/epa_data/benchmark_cases \
--methods rovio,svo_stereo \
--limit 20
Output Structure
Each harness run creates a fresh run directory under the benchmark output root.
Typical layout:
outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/
Typical directory tree:
outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/
├── harness_config.json
├── summary.csv
├── summary.md
├── unresolved_cases.csv
├── cases/
│ └── *.json
├── logs/
│ └── ...
├── paper_tables/
│ ├── main_table.tex
│ ├── dataset_table.tex
│ └── appendix_full_table.tex
Common contents:
summary.csv: machine-readable per-case summarysummary.md: human-readable summary tablepaper_tables/main_table.tex: paper-ready short LaTeX tablepaper_tables/dataset_table.tex: dataset-level aggregate LaTeX tablepaper_tables/appendix_full_table.tex: full per-case LaTeX longtable for appendixcases/*.json: one JSON file per caselogs/: stdout and stderr logs for executed toolsharness_config.json: the run configuration snapshotunresolved_cases.csv: discovered but unresolved cases, when applicable
prepared_tum/ is removed by default to keep benchmark outputs small. Add --keep-prepared if you want to keep those intermediate files for later epa_rerun debugging.
Analysis Notebook
For exploratory benchmark analysis, install the analysis extra and open the notebook:
python -m pip install "epica[analysis]"
jupyter notebook notebooks/benchmark_analysis.ipynb
For local repository development, use:
python -m pip install -e '.[analysis]'
The notebook reads an existing summary.csv, summarizes datasets and methods, ranks suspicious cases, and displays the plots already generated by the benchmark workflow.
Key Columns in summary.csv
Important fields include:
case,dataset,methodstatus,epa_statusepa_offset_est_sepa_ate_rmse_raw_m,epa_ate_rmse_step3_mepa_improve_pctepa_matches_equivalentepa_run_direrror
These fields cover per-case metrics, output locations, and failure diagnosis.
Summary Plot Generation
Use epa_plot_summary to generate charts from a harness summary.csv.
Example:
epa_plot_summary \
--summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv
By default, plots are written to:
<summary_dir>/plots/
Typical figures:
- status count chart
- top-case aligned RMSE chart
- other benchmark summary figures under
<summary_dir>/plots/
Useful option:
--top-k: number of cases included in the top-case bar chart
LaTeX Table Generation
Use epa_latex_summary to generate paper-ready LaTeX tables from a harness summary.csv.
epa_latex_summary \
--summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv
By default, tables are written to <summary_dir>/paper_tables/ with:
main_table.tex: compact main-paper table (automatically compressed when cases are too many)dataset_table.tex: dataset-level aggregate tableappendix_full_table.tex: full longtable for appendix
epa_bench also generates this paper_tables/ directory automatically at the end of each run.
Use in Overleaf (2 ways)
-
Directly drag files into your Overleaf project
-
Drag
paper_tables/*.texinto the project root (or a subfolder). - In your manuscript preamble, add
\usepackage{booktabs}and\usepackage{longtable}. - Insert tables where needed (if files are in a subfolder such as
tables/, use relative paths like\input{tables/main_table.tex}):
\input{main_table.tex}
\input{dataset_table.tex}
\input{appendix_full_table.tex}
-
Copy and paste table code
-
Open the target
.textable file and copy the table block into your manuscript body/appendix. - Keep the same preamble requirements:
\usepackage{booktabs}and\usepackage{longtable}. main_table.texanddataset_table.texaretableenvironments, whileappendix_full_table.texis alongtableenvironment.
metrics.json Aggregation
Use epa_metric_res to aggregate or compare one or more metrics.json files directly, without using the full harness summary workflow.
Typical example:
epa_metric_res \
--metrics-json /path/to/metrics_a.json /path/to/metrics_b.json \
--mode aggregate \
--metric all \
--stage step3
Main modes:
single: stage comparison within one runaggregate: cross-run comparisonauto: infer mode from the number of inputs
Useful options:
--metric--ape-relation--rpe-relation--stage--x-dimension--out-dir
Recommended Workflow
Recommended workflow:
- Run
epa_bench - Inspect
summary.csvandsummary.md - Generate charts with
epa_plot_summary - Drill into failed cases using
cases/*.jsonandlogs/ - Use
epa_metric_reswhen you want additional aggregation across selected runs
Troubleshooting Hints
If a harness run is incomplete or noisy, check these first:
- the cases root path
- the active Python environment
- unresolved cases listed in
unresolved_cases.csv - per-case stderr logs under
logs/
If many cases have poor match counts, revisit:
t-max-diff- minimum match ratio