Performance considerations
This page discusses performance aspects of JURASSIC, including expected scaling behavior, computational cost drivers, and practical strategies for achieving efficient runtimes on both local systems and HPC platforms.
The focus is on understanding where time is spent and how users can influence performance through configuration and workflow design, given the actual parallelization mechanisms implemented in JURASSIC.
Main performance drivers
The computational cost of a JURASSIC run is primarily determined by:
- number of observations / rays,
- number of detector channels (
ND), - number of spectral windows (
NW), - complexity of the ray geometry (limb vs. nadir),
- use of Jacobians or retrievals,
- choice of lookup tables and interpolation density.
Forward simulations are generally much cheaper than kernel or retrieval runs.
Forward model performance
Radiative transfer cost
For forward modelling, runtime scales approximately with:
O(N_obs × N_ray_segments × ND × NG)
where:
N_obsis the number of observations,N_ray_segmentsdepends on ray length and discretization (RAYDS,RAYDZ),NDis the number of detector channels,NGis the number of emitting gases.
Limb geometries typically require more ray segments than nadir geometries and are therefore more expensive.
Forward-model executables use serial execution with optional OpenMP threading.
Kernel and retrieval performance
Jacobians
Kernel calculations require evaluating sensitivities with respect to each retrieved parameter. This typically increases runtime by a factor of:
- O(N_state) relative to a pure forward run.
Analytic Jacobians significantly reduce this overhead compared to finite-difference approaches, but kernel runs remain substantially more expensive than forward simulations.
Retrieval iterations
Retrieval runs require multiple forward-model and Jacobian evaluations. Total runtime scales with:
- number of iterations until convergence,
- size of the state vector,
- cost of matrix operations.
Poorly chosen a priori constraints or noisy input data can significantly increase iteration counts.
Lookup table access
Lookup table interpolation is a frequent operation in JURASSIC and can become a bottleneck if not handled efficiently.
Recommendations:
- prefer binary table formats over ASCII,
- place LUTs on fast local or parallel file systems,
- avoid unnecessary table reloads between runs.
For large campaigns, LUT I/O costs are typically amortized over many simulations.
Parallel scaling behavior
Workflow-level scaling
The most common and scalable form of parallelism in JURASSIC is workflow-level parallelization, where independent simulations or retrievals are executed as separate jobs (e.g. job arrays, campaign splitting).
This approach applies to all executables and scales trivially as long as sufficient resources are available.
MPI scaling
MPI parallelization is implemented only in the retrieval executables and is used exclusively to distribute independent retrieval tasks across MPI ranks.
- Each MPI rank processes a subset of retrieval cases.
- There is no communication between ranks during execution.
- Scaling is close to linear as long as enough retrieval cases are available.
Scaling efficiency decreases if the number of retrieval cases per MPI rank becomes too small or if I/O dominates runtime.
OpenMP scaling
OpenMP is used within a single process to accelerate computationally intensive loops, such as radiative transfer and spectral calculations.
OpenMP scaling is typically limited by:
- memory bandwidth,
- cache behavior,
- load imbalance in inner loops.
Best performance is usually achieved with a moderate number of threads per process.
MPI vs. OpenMP considerations
There is no global hybrid MPI–OpenMP model across the entire JURASSIC code base.
General guidance:
- Use MPI only for retrieval workloads with many independent cases.
- Use OpenMP to accelerate single-case computations.
- Avoid oversubscription (MPI ranks × OpenMP threads > physical cores).
- Running non-retrieval executables under
mpirunprovides no benefit.
Optimal configurations depend on hardware and problem size and should be determined empirically.
Configuration parameters affecting performance
Several control parameters have a direct impact on performance:
-
RAYDS,RAYDZ
Smaller step sizes increase accuracy but also increase runtime. -
ND,NW,NG
Increasing spectral or chemical complexity increases cost. -
WRITE_MATRIX,WRITE_BBT
Diagnostic output can significantly increase runtime and I/O.
Users should balance accuracy requirements against computational cost.
Memory usage
Memory consumption in JURASSIC is generally modest compared to many large-scale models but increases with:
- number of detector channels,
- size of lookup tables,
- enabled diagnostic matrices.
Memory usage is usually dominated by lookup tables rather than by per-observation data.
Benchmarking and validation
Performance tuning should always be accompanied by validation:
- compare results before and after performance-related changes,
- ensure numerical accuracy remains acceptable,
- benchmark representative workloads rather than minimal test cases.
Small configuration changes can have a large impact on both performance and accuracy.
Practical performance tips
- Start with example configurations and modify incrementally.
- Disable diagnostics for production runs.
- Use OpenMP to accelerate compute-heavy kernels.
- Use MPI only for retrieval campaigns with many independent cases.
- Split very large workloads into multiple jobs when appropriate.
- Monitor runtime and scaling behavior during pilot runs.
Summary
JURASSIC performance is driven primarily by problem size, numerical configuration, and workflow design.
MPI parallelization is limited to retrieval executables and is used solely for distributing independent retrieval tasks. OpenMP provides shared-memory acceleration within a single process, while large-scale throughput is typically achieved via workflow-level parallelization.
Understanding these distinctions allows users to choose efficient and robust execution strategies for both small experiments and large HPC campaigns.
Related pages
- Parallelization
- HPC workflows
- Configuration