Benchmark Workflow

This page covers the batch evaluation tools provided by epica, including case discovery, repeated execution, and summary analysis. The main commands on this page are epa_bench and epa_benchall.

Benchmark Scope

epica includes a benchmark harness for large-scale case evaluation. The main goal is to run the same evaluation workflow over many prepared cases and collect consistent summaries.

In the current repository, the primary benchmark workflow uses a generic cases root that contains benchmark/ and GT/.

Multi-Case Harness

epa_bench runs an independent benchmark over cases discovered from a cases-root directory layout.

It:

  • discover benchmark cases from prepared dataset and ground-truth directories
  • run epa on each case
  • write per-case JSON records, logs, and summary tables

Required Inputs

It is recommended to keep datasets outside the repository (for example, /home/username/epa_data) and manage paths through environment variables:

export EPA_DATA_ROOT=/home/username/epa_data

epa_bench defaults to $EPA_CASES_ROOT; if it is unset, it falls back to $EPA_ALIGNANYTHING_ROOT, and then to $EPA_DATA_ROOT/benchmark_cases.

Before running the harness, make sure you have:

  • a cases root with benchmark/ and GT/
  • a Python 3.10+ environment that can run epica

The common case only needs the cases root. These path options remain available for scripts and unusual environments:

  • --cases-root
  • --repo-root
  • --python-bin
  • --epa-src
  • --jobs

Supported Case Layouts

epa_bench discovers trajectory files under benchmark/ and matches each sequence to a GT file under GT/.

The original layout is still supported:

cases_root/
├── benchmark/<dataset>/pose/<method>/<sequence>/*_poses.txt
└── GT/**/<sequence>.txt

The pose/ directory is optional. These layouts are also accepted:

cases_root/
├── benchmark/<dataset>/<method>/<sequence>/trajectory.txt
├── benchmark/<dataset>/<method>/<sequence>_poses.txt
└── GT/**/<sequence>.txt

One <method> directory can contain many sequences. For example:

cases_root/
├── benchmark/<dataset>/<method>/seq_01_poses.txt
├── benchmark/<dataset>/<method>/seq_02_poses.txt
├── benchmark/<dataset>/<method>/seq_03/trajectory.txt
└── GT/**/seq_01.txt

GT files can use .txt, .tum, or .csv. Estimation files can use common trajectory names such as *_poses.txt, trajectory.txt, .tum, or .csv.

Basic Run

If you want the main batch benchmark workflow, start here:

epa_bench /home/username/epa_data/benchmark_cases

The default is --jobs auto, which uses the available CPU cores without exceeding the number of cases. Add --jobs N when you want to override it.

This command:

  1. discover benchmark cases
  2. prepare temporary TUM trajectories for each case
  3. run epa on each case
  4. aggregate the per-case outputs into a summary table

Full Multi-Case Workflow

Use epa_benchall if you want the full batch workflow in one command:

  • run the benchmark harness
  • generate summary plots
  • generate LaTeX tables

Typical command:

epa_benchall /home/username/epa_data/benchmark_cases

Per-Case Visualization (Raw / GT / Aligned)

For each dataset case, use the main pipeline with --rerun to visualize:

  • GT trajectory
  • raw trajectory after step-1 time sync
  • aligned trajectory after step-3 world alignment

Quick command (recommended):

epa_rerun \
  --run-dir outputs/<cases_root_name>_bench \
  --case euroc_mav_MH_01_easy_rovio

The command resolves the latest run_* automatically when --run-dir points to the parent harness directory.

Direct epa command (manual paths):

epa /path/to/gt.tum /path/to/est.tum --rerun

Useful Filters

Use these options for smaller or more targeted runs:

  • --case-pattern: regex filter for case IDs
  • --methods: comma-separated method filter such as rovio,svo_stereo
  • --limit: cap the number of cases
  • --dry-run: discover cases without executing them

Examples:

List matching cases only:

epa_bench \
  /home/username/epa_data/benchmark_cases \
  --case-pattern euroc \
  --dry-run

Run only a subset of methods:

epa_bench \
  /home/username/epa_data/benchmark_cases \
  --methods rovio,svo_stereo \
  --limit 20

Output Structure

Each harness run creates a fresh run directory under the benchmark output root.

Typical layout:

outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/

Typical directory tree:

outputs/<cases_root_name>_bench/run_YYYYmmdd_HHMMSS/
├── harness_config.json
├── summary.csv
├── summary.md
├── unresolved_cases.csv
├── cases/
│   └── *.json
├── logs/
│   └── ...
├── paper_tables/
│   ├── main_table.tex
│   ├── dataset_table.tex
│   └── appendix_full_table.tex

Common contents:

  • summary.csv: machine-readable per-case summary
  • summary.md: human-readable summary table
  • paper_tables/main_table.tex: paper-ready short LaTeX table
  • paper_tables/dataset_table.tex: dataset-level aggregate LaTeX table
  • paper_tables/appendix_full_table.tex: full per-case LaTeX longtable for appendix
  • cases/*.json: one JSON file per case
  • logs/: stdout and stderr logs for executed tools
  • harness_config.json: the run configuration snapshot
  • unresolved_cases.csv: discovered but unresolved cases, when applicable

prepared_tum/ is removed by default to keep benchmark outputs small. Add --keep-prepared if you want to keep those intermediate files for later epa_rerun debugging.

Analysis Notebook

For exploratory benchmark analysis, install the analysis extra and open the notebook:

python -m pip install "epica[analysis]"
jupyter notebook notebooks/benchmark_analysis.ipynb

For local repository development, use:

python -m pip install -e '.[analysis]'

The notebook reads an existing summary.csv, summarizes datasets and methods, ranks suspicious cases, and displays the plots already generated by the benchmark workflow.

Key Columns in summary.csv

Important fields include:

  • case, dataset, method
  • status, epa_status
  • epa_offset_est_s
  • epa_ate_rmse_raw_m, epa_ate_rmse_step3_m
  • epa_improve_pct
  • epa_matches_equivalent
  • epa_run_dir
  • error

These fields cover per-case metrics, output locations, and failure diagnosis.

Summary Plot Generation

Use epa_plot_summary to generate charts from a harness summary.csv.

Example:

epa_plot_summary \
  --summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv

By default, plots are written to:

<summary_dir>/plots/

Typical figures:

  • status count chart
  • top-case aligned RMSE chart
  • other benchmark summary figures under <summary_dir>/plots/

Useful option:

  • --top-k: number of cases included in the top-case bar chart

LaTeX Table Generation

Use epa_latex_summary to generate paper-ready LaTeX tables from a harness summary.csv.

epa_latex_summary \
  --summary-csv outputs/<cases_root_name>_bench/run_xxx/summary.csv

By default, tables are written to <summary_dir>/paper_tables/ with:

  • main_table.tex: compact main-paper table (automatically compressed when cases are too many)
  • dataset_table.tex: dataset-level aggregate table
  • appendix_full_table.tex: full longtable for appendix

epa_bench also generates this paper_tables/ directory automatically at the end of each run.

Use in Overleaf (2 ways)

  1. Directly drag files into your Overleaf project

  2. Drag paper_tables/*.tex into the project root (or a subfolder).

  3. In your manuscript preamble, add \usepackage{booktabs} and \usepackage{longtable}.
  4. Insert tables where needed (if files are in a subfolder such as tables/, use relative paths like \input{tables/main_table.tex}):
\input{main_table.tex}
\input{dataset_table.tex}
\input{appendix_full_table.tex}
  1. Copy and paste table code

  2. Open the target .tex table file and copy the table block into your manuscript body/appendix.

  3. Keep the same preamble requirements: \usepackage{booktabs} and \usepackage{longtable}.
  4. main_table.tex and dataset_table.tex are table environments, while appendix_full_table.tex is a longtable environment.

metrics.json Aggregation

Use epa_metric_res to aggregate or compare one or more metrics.json files directly, without using the full harness summary workflow.

Typical example:

epa_metric_res \
  --metrics-json /path/to/metrics_a.json /path/to/metrics_b.json \
  --mode aggregate \
  --metric all \
  --stage step3

Main modes:

  • single: stage comparison within one run
  • aggregate: cross-run comparison
  • auto: infer mode from the number of inputs

Useful options:

  • --metric
  • --ape-relation
  • --rpe-relation
  • --stage
  • --x-dimension
  • --out-dir

Recommended workflow:

  1. Run epa_bench
  2. Inspect summary.csv and summary.md
  3. Generate charts with epa_plot_summary
  4. Drill into failed cases using cases/*.json and logs/
  5. Use epa_metric_res when you want additional aggregation across selected runs

Troubleshooting Hints

If a harness run is incomplete or noisy, check these first:

  • the cases root path
  • the active Python environment
  • unresolved cases listed in unresolved_cases.csv
  • per-case stderr logs under logs/

If many cases have poor match counts, revisit:

  • t-max-diff
  • minimum match ratio