Benchmark Workflow

This page covers the batch evaluation tools provided by epica, including case discovery, repeated execution, and summary analysis. The main commands on this page are epa_bench and epa_benchall.

Benchmark Scope

epica includes a benchmark harness for large-scale case evaluation. The main goal is to run the same evaluation workflow over many prepared cases and collect consistent summaries.

In the current repository, the primary benchmark workflow uses a generic cases root that contains benchmark/ and GT/.

Multi-Case Harness

epa_bench runs an independent benchmark over cases discovered from a cases-root directory layout.

It:

discover benchmark cases from prepared dataset and ground-truth directories
run epa on each case
write per-case JSON records, logs, and summary tables

Required Inputs

It is recommended to keep datasets outside the repository (for example, /home/username/epa_data) and manage paths through environment variables:

export EPA_DATA_ROOT=/home/username/epa_data

epa_bench defaults to $EPA_CASES_ROOT; if it is unset, it falls back to $EPA_ALIGNANYTHING_ROOT, and then to $EPA_DATA_ROOT/benchmark_cases.

Before running the harness, make sure you have:

a cases root with benchmark/ and GT/
a Python 3.10+ environment that can run epica

The common case only needs the cases root. These path options remain available for scripts and unusual environments:

--cases-root
--repo-root
--python-bin
--epa-src
--jobs

Supported Case Layouts

epa_bench discovers trajectory files under benchmark/ and matches each sequence to a GT file under GT/.

The original layout is still supported:

cases_root/
├── benchmark/<dataset>/pose/<method>/<sequence>/*_poses.txt
└── GT/**/<sequence>.txt

The pose/ directory is optional. These layouts are also accepted:

cases_root/
├── benchmark/<dataset>/<method>/<sequence>/trajectory.txt
├── benchmark/<dataset>/<method>/<sequence>_poses.txt
└── GT/**/<sequence>.txt

One <method> directory can contain many sequences. For example:

cases_root/
├── benchmark/<dataset>/<method>/seq_01_poses.txt
├── benchmark/<dataset>/<method>/seq_02_poses.txt
├── benchmark/<dataset>/<method>/seq_03/trajectory.txt
└── GT/**/seq_01.txt

GT files can use .txt, .tum, or .csv. Estimation files can use common trajectory names such as *_poses.txt, trajectory.txt, .tum, or .csv.

Basic Run

If you want the main batch benchmark workflow, start here:

epa_bench /home/username/epa_data/benchmark_cases

The default is --jobs auto, which uses the available CPU cores without exceeding the number of cases. Add --jobs N when you want to override it.

This command:

discover benchmark cases
prepare temporary TUM trajectories for each case
run epa on each case
aggregate the per-case outputs into a summary table

Full Multi-Case Workflow

Use epa_benchall if you want the full batch workflow in one command:

run the benchmark harness
generate summary plots
generate LaTeX tables

Typical command:

epa_benchall /home/username/epa_data/benchmark_cases

Per-Case Visualization (Raw / GT / Aligned)

For each dataset case, use the main pipeline with --rerun to visualize:

GT trajectory
raw trajectory after step-1 time sync
aligned trajectory after step-3 world alignment

Quick command (recommended):

epa_rerun \
  --run-dir outputs/<cases_root_name>_bench \
  --case euroc_mav_MH_01_easy_rovio

The command resolves the latest run_* automatically when --run-dir points to the parent harness directory.

Direct epa command (manual paths):

epa /path/to/gt.tum /path/to/est.tum --rerun

Useful Filters

Use these options for smaller or more targeted runs:

--case-pattern: regex filter for case IDs
--methods: comma-separated method filter such as rovio,svo_stereo
--limit: cap the number of cases
--dry-run: discover cases without executing them

Examples:

List matching cases only:

epa_bench \
  /home/username/epa_data/benchmark_cases \
  --case-pattern euroc \
  --dry-run

Run only a subset of methods:

epa_bench \
  /home/username/epa_data/benchmark_cases \
  --methods rovio,svo_stereo \
  --limit 20

Output Structure

Each harness run creates a fresh run directory under the benchmark output root.

Typical layout:

outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/

Typical directory tree:

outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/
├── harness_config.json
├── summary.csv
├── summary.md
├── unresolved_cases.csv
├── cases/
│   └── *.json
├── logs/
│   └── ...
├── paper_tables/
│   ├── main_table.tex
│   ├── dataset_table.tex
│   └── appendix_full_table.tex

Common contents:

summary.csv: machine-readable per-case summary
summary.md: human-readable summary table
paper_tables/main_table.tex: paper-ready short LaTeX table
paper_tables/dataset_table.tex: dataset-level aggregate LaTeX table
paper_tables/appendix_full_table.tex: full per-case LaTeX longtable for appendix
cases/*.json: one JSON file per case
logs/: stdout and stderr logs for executed tools
harness_config.json: the run configuration snapshot
unresolved_cases.csv: discovered but unresolved cases, when applicable

prepared_tum/ is removed by default to keep benchmark outputs small. Add --keep-prepared if you want to keep those intermediate files for later epa_rerun debugging.

Analysis Notebook

For exploratory benchmark analysis, install the analysis extra and open the notebook:

python -m pip install "epica[analysis]"
jupyter notebook notebooks/benchmark_analysis.ipynb

For local repository development, use:

python -m pip install -e '.[analysis]'

The notebook reads an existing summary.csv, summarizes datasets and methods, ranks suspicious cases, and displays the plots already generated by the benchmark workflow.

Key Columns in `summary.csv`

Important fields include:

case, dataset, method
status, epa_status
epa_offset_est_s
epa_ate_rmse_raw_m, epa_ate_rmse_step3_m
epa_improve_pct
epa_matches_equivalent
epa_run_dir
error

These fields cover per-case metrics, output locations, and failure diagnosis.

Summary Plot Generation

Use epa_plot_summary to generate charts from a harness summary.csv.

Example:

epa_plot_summary \
  --summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv

By default, plots are written to:

<summary_dir>/plots/

Typical figures:

status count chart
top-case aligned RMSE chart
other benchmark summary figures under <summary_dir>/plots/

Useful option:

--top-k: number of cases included in the top-case bar chart

LaTeX Table Generation

Use epa_latex_summary to generate paper-ready LaTeX tables from a harness summary.csv.

epa_latex_summary \
  --summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv

By default, tables are written to <summary_dir>/paper_tables/ with:

main_table.tex: compact main-paper table (automatically compressed when cases are too many)
dataset_table.tex: dataset-level aggregate table
appendix_full_table.tex: full longtable for appendix

epa_bench also generates this paper_tables/ directory automatically at the end of each run.

Use in Overleaf (2 ways)

Directly drag files into your Overleaf project
Drag paper_tables/*.tex into the project root (or a subfolder).
In your manuscript preamble, add \usepackage{booktabs} and \usepackage{longtable}.
Insert tables where needed (if files are in a subfolder such as tables/, use relative paths like \input{tables/main_table.tex}):

\input{main_table.tex}
\input{dataset_table.tex}
\input{appendix_full_table.tex}

Copy and paste table code
Open the target .tex table file and copy the table block into your manuscript body/appendix.
Keep the same preamble requirements: \usepackage{booktabs} and \usepackage{longtable}.
main_table.tex and dataset_table.tex are table environments, while appendix_full_table.tex is a longtable environment.

`metrics.json` Aggregation

Use epa_metric_res to aggregate or compare one or more metrics.json files directly, without using the full harness summary workflow.

Typical example:

epa_metric_res \
  --metrics-json /path/to/metrics_a.json /path/to/metrics_b.json \
  --mode aggregate \
  --metric all \
  --stage step3

Main modes:

single: stage comparison within one run
aggregate: cross-run comparison
auto: infer mode from the number of inputs

Useful options:

--metric
--ape-relation
--rpe-relation
--stage
--x-dimension
--out-dir

Recommended Workflow

Recommended workflow:

Run epa_bench
Inspect summary.csv and summary.md
Generate charts with epa_plot_summary
Drill into failed cases using cases/*.json and logs/
Use epa_metric_res when you want additional aggregation across selected runs

Troubleshooting Hints

If a harness run is incomplete or noisy, check these first:

the cases root path
the active Python environment
unresolved cases listed in unresolved_cases.csv
per-case stderr logs under logs/

If many cases have poor match counts, revisit:

t-max-diff
minimum match ratio