API Reference

Covariance Mocks Pipeline

Modular pipeline for generating mock galaxy catalogs for covariance analysis. Supports large-scale productions (40,000+ mock generations).

Data Loader

Data Loader Module

Handles loading and filtering of halo catalogs from AbacusSummit. Implements slab decomposition and mass filtering for distributed processing.

Slab decomposition and mass filtering for distributed halo processing.

covariance_mocks.data_loader.build_abacus_path(base_path, suite, box, phase, redshift)[source]

Build full path to AbacusSummit halo catalog directory.

Parameters:
  • base_path (str) – Root directory containing AbacusSummit simulations

  • suite (str) – Simulation suite name (e.g., “AbacusSummit”)

  • box (str) – Box identifier (e.g., “small_c000”)

  • phase (str) – Phase identifier (e.g., “ph3000”)

  • redshift (str) – Redshift string (e.g., “z1.100”)

Returns:

Full path to halo catalog directory

Return type:

str

Example

>>> path = build_abacus_path("/data", "AbacusSummit", "small_c000", "ph3000", "z1.100")
>>> print(path)
/data/AbacusSummit/small_c000_ph3000/halos/z1.100
covariance_mocks.data_loader.load_and_filter_halos(catalog_path, rank=0, size=1, n_gen=None)[source]

Load halo catalog and apply filtering for this MPI rank.

Loads AbacusSummit halo catalog, applies mass filtering, optional test mode limitation, and performs slab decomposition for distributed processing.

Parameters:
  • catalog_path (str) – Path to the AbacusSummit halo catalog directory

  • rank (int, optional) – MPI rank for slab decomposition (default: 0)

  • size (int, optional) – Total number of MPI processes (default: 1)

  • n_gen (int, optional) – Test mode - select only N halos with smallest x-coordinates

Returns:

(logmhost, halo_radius, halo_pos, halo_vel, Lbox) where: - logmhost : jax.numpy.ndarray, shape (N_halos,)

Log10 halo masses for this rank’s slab

  • halo_radiusjax.numpy.ndarray, shape (N_halos,)

    Halo virial radii in Mpc/h

  • halo_posjax.numpy.ndarray, shape (N_halos, 3)

    Halo positions in [0, Lbox] coordinates (Mpc/h)

  • halo_veljax.numpy.ndarray, shape (N_halos, 3)

    Halo velocities in km/s

  • Lboxfloat

    Simulation box size in Mpc/h

Return type:

tuple

Raises:

ValueError – If no halos are found above the minimum mass threshold

Notes

  • Applies minimum mass filter: log10(M) >= LGMP_MIN (default 10.0)

  • Converts halo positions from [-Lbox/2, Lbox/2] to [0, Lbox]

  • In test mode, selects N halos with smallest x-coordinates before slab decomposition

  • Slab decomposition splits halos by y-coordinate: rank gets [rank*Lbox/size, (rank+1)*Lbox/size)

  • Returns JAX arrays with float32 dtype for GPU compatibility

Galaxy Generator

Galaxy Generator Module

Handles galaxy generation coordination with the rgrspit_diffsky package. Manages batch processing and random key management for galaxy population.

Coordinates galaxy population using rgrspit_diffsky with batch processing.

covariance_mocks.galaxy_generator.generate_galaxies(logmhost, halo_radius, halo_pos, halo_vel, Lbox, rank=0)[source]

Generate galaxies for given halos using rgrspit_diffsky.

Populates halos with galaxies using the rgrspit_diffsky package with consistent random key generation for reproducible results across MPI ranks.

Parameters:
  • logmhost (jax.numpy.ndarray, shape (N_halos,)) – Log10 host halo masses

  • halo_radius (jax.numpy.ndarray, shape (N_halos,)) – Halo virial radii in Mpc/h

  • halo_pos (jax.numpy.ndarray, shape (N_halos, 3)) – Halo positions in Mpc/h

  • halo_vel (jax.numpy.ndarray, shape (N_halos, 3)) – Halo velocities in km/s

  • Lbox (float) – Simulation box size in Mpc/h

  • rank (int, optional) – MPI rank for logging (default: 0)

Returns:

Galaxy catalog from mc_galpop_synthetic_subs containing: - ‘pos’ : galaxy positions (N_galaxies, 3) - ‘vel’ : galaxy velocities (N_galaxies, 3) - ‘stellar_mass’ : galaxy stellar masses (N_galaxies,) - Other galaxy properties from rgrspit_diffsky

Return type:

dict

Notes

  • Uses fixed random key (0) for reproducible galaxy generation across all MPI ranks

  • Applies minimum halo mass threshold LGMP_MIN for galaxy population

  • Uses current observational redshift CURRENT_Z_OBS

  • Galaxy catalog includes satellite galaxies via synthetic subhalo population

HDF5 Writer

HDF5 Writer Module

Handles parallel and single-process HDF5 file writing for galaxy catalogs. This module contains the complex collective I/O operations that will be reused across 40,000+ mock generation runs.

Parallel HDF5 writing with MPI collective I/O operations.

covariance_mocks.hdf5_writer.combine_mpi_files(output_path, size)[source]

Combine MPI rank files into final HDF5 catalog (legacy function - not used in parallel HDF5)

covariance_mocks.hdf5_writer.write_parallel_hdf5(galcat, plot_logmhost, plot_halo_radius, plot_halo_pos, plot_halo_vel, output_path, rank, size, comm, Lbox)[source]

Write galaxy catalog using parallel HDF5 for multiple MPI ranks.

Coordinates collective I/O operations across MPI ranks to write a single HDF5 file containing galaxies and halos from all processes.

Parameters:
  • galcat (dict) – Galaxy catalog from rgrspit_diffsky for this rank

  • plot_logmhost (array_like) – Log10 halo masses for this rank

  • plot_halo_radius (array_like) – Halo virial radii for this rank in Mpc/h

  • plot_halo_pos (array_like, shape (N_halos, 3)) – Halo positions for this rank in Mpc/h

  • plot_halo_vel (array_like, shape (N_halos, 3)) – Halo velocities for this rank in km/s

  • output_path (str) – Full path for output HDF5 file

  • rank (int) – MPI rank of this process

  • size (int) – Total number of MPI processes

  • comm (MPI.Comm) – MPI communicator for collective operations

  • Lbox (float) – Simulation box size in Mpc/h

Notes

  • Uses MPI collective operations to coordinate writes

  • Gathers counts and calculates offsets for contiguous data layout

  • All ranks write to same file using parallel HDF5

  • Rank 0 writes metadata and creates file structure

  • Includes temporary rank files cleanup after successful write

  • Handles galaxy and halo data with proper offset calculations

covariance_mocks.hdf5_writer.write_single_hdf5(galcat, plot_logmhost, plot_halo_radius, plot_halo_pos, plot_halo_vel, output_path, Lbox)[source]

Write galaxy catalog to HDF5 file for single process.

Parameters:
  • galcat (dict) – Galaxy catalog from rgrspit_diffsky containing galaxy properties

  • plot_logmhost (array_like) – Log10 halo masses used for galaxy generation

  • plot_halo_radius (array_like) – Halo virial radii in Mpc/h

  • plot_halo_pos (array_like, shape (N_halos, 3)) – Halo positions in Mpc/h

  • plot_halo_vel (array_like, shape (N_halos, 3)) – Halo velocities in km/s

  • output_path (str) – Full path for output HDF5 file

  • Lbox (float) – Simulation box size in Mpc/h

Notes

  • Creates directory structure if it doesn’t exist

  • Saves galaxy properties under ‘galaxies/’ group

  • Saves halo properties under ‘halos/’ group

  • Includes metadata attributes: Lbox, z_obs, lgmp_min, n_halos, n_galaxies

  • Handles structured data by saving components separately

MPI Setup

MPI Setup Module

Handles MPI and JAX initialization patterns for distributed computing. This module standardizes the environment-dependent device configuration and manages single vs multi-process modes.

Standardizes environment-dependent device configuration and process management.

covariance_mocks.mpi_setup.finalize_mpi(comm, rank, size, MPI_AVAILABLE)[source]

Properly finalize MPI communication.

Ensures clean shutdown of MPI processes with proper synchronization.

Parameters:
  • comm (MPI.Comm or None) – MPI communicator from initialize_mpi_jax()

  • rank (int) – Process rank

  • size (int) – Total number of MPI processes

  • MPI_AVAILABLE (bool) – Whether MPI was successfully initialized

Notes

  • Only performs finalization if MPI is available and multi-process

  • Uses MPI barrier to synchronize all ranks before finalization

  • Provides per-rank logging for debugging distributed shutdown

  • Safe to call even if MPI not available (no-op)

covariance_mocks.mpi_setup.initialize_mpi_jax()[source]

Initialize MPI and JAX with proper device configuration.

Sets up MPI communication and configures JAX for distributed computing with environment-dependent device configuration.

Returns:

(comm, rank, size, MPI_AVAILABLE) where: - comm : MPI.Comm or None

MPI communicator for inter-process communication (None if MPI unavailable)

  • rankint

    Process rank (0-based, 0 for single process)

  • sizeint

    Total number of MPI processes (1 for single process)

  • MPI_AVAILABLEbool

    Whether MPI is available and initialized

Return type:

tuple

Notes

  • Attempts to import and initialize mpi4py for parallel execution

  • Falls back to single-process mode if MPI unavailable

  • Configures JAX environment variables for distributed use

  • Initializes JAX distributed backend for multi-process execution

  • Reports JAX backend and available devices for each rank

  • Handles GPU device configuration automatically

Utils

Utilities Module

Common utility functions for the covariance mocks pipeline.

covariance_mocks.utils.generate_output_filename(simulation_box, phase, redshift, n_gen=None)[source]

Generate standardized output filename for mock catalogs.

Parameters:
  • simulation_box (str) – Simulation box identifier (e.g., “AbacusSummit_small_c000”)

  • phase (str) – Phase identifier (e.g., “ph3000”)

  • redshift (str) – Redshift string (e.g., “z1.100”)

  • n_gen (int, optional) – Number of halos for test mode (adds test suffix)

Returns:

Standardized HDF5 filename following naming convention

Return type:

str

Examples

>>> generate_output_filename("AbacusSummit_small_c000", "ph3000", "z1.100")
'mock_AbacusSummit_small_c000_ph3000_z1.100.hdf5'
>>> generate_output_filename("AbacusSummit_small_c000", "ph3000", "z1.100", n_gen=5000)
'mock_AbacusSummit_small_c000_ph3000_z1.100_test5000.hdf5'
covariance_mocks.utils.validate_catalog_path(catalog_path)[source]

Verify that the catalog path exists and is accessible.

Parameters:

catalog_path (str) – Path to AbacusSummit halo catalog directory

Returns:

True if path exists and is accessible

Return type:

bool

Raises:

FileNotFoundError – If catalog path does not exist or is not a directory