API Reference
Covariance Mocks Pipeline
Modular pipeline for generating mock galaxy catalogs for covariance analysis. Supports large-scale productions (40,000+ mock generations).
Data Loader
Data Loader Module
Handles loading and filtering of halo catalogs from AbacusSummit. Implements slab decomposition and mass filtering for distributed processing.
Slab decomposition and mass filtering for distributed halo processing.
- covariance_mocks.data_loader.build_abacus_path(base_path, suite, box, phase, redshift)[source]
Build full path to AbacusSummit halo catalog directory.
- Parameters:
- Returns:
Full path to halo catalog directory
- Return type:
Example
>>> path = build_abacus_path("/data", "AbacusSummit", "small_c000", "ph3000", "z1.100") >>> print(path) /data/AbacusSummit/small_c000_ph3000/halos/z1.100
- covariance_mocks.data_loader.load_and_filter_halos(catalog_path, rank=0, size=1, n_gen=None)[source]
Load halo catalog and apply filtering for this MPI rank.
Loads AbacusSummit halo catalog, applies mass filtering, optional test mode limitation, and performs slab decomposition for distributed processing.
- Parameters:
- Returns:
(logmhost, halo_radius, halo_pos, halo_vel, Lbox) where: - logmhost : jax.numpy.ndarray, shape (N_halos,)
Log10 halo masses for this rank’s slab
- halo_radiusjax.numpy.ndarray, shape (N_halos,)
Halo virial radii in Mpc/h
- halo_posjax.numpy.ndarray, shape (N_halos, 3)
Halo positions in [0, Lbox] coordinates (Mpc/h)
- halo_veljax.numpy.ndarray, shape (N_halos, 3)
Halo velocities in km/s
- Lboxfloat
Simulation box size in Mpc/h
- Return type:
- Raises:
ValueError – If no halos are found above the minimum mass threshold
Notes
Applies minimum mass filter: log10(M) >= LGMP_MIN (default 10.0)
Converts halo positions from [-Lbox/2, Lbox/2] to [0, Lbox]
In test mode, selects N halos with smallest x-coordinates before slab decomposition
Slab decomposition splits halos by y-coordinate: rank gets [rank*Lbox/size, (rank+1)*Lbox/size)
Returns JAX arrays with float32 dtype for GPU compatibility
Galaxy Generator
Galaxy Generator Module
Handles galaxy generation coordination with the rgrspit_diffsky package. Manages batch processing and random key management for galaxy population.
Coordinates galaxy population using rgrspit_diffsky with batch processing.
- covariance_mocks.galaxy_generator.generate_galaxies(logmhost, halo_radius, halo_pos, halo_vel, Lbox, rank=0)[source]
Generate galaxies for given halos using rgrspit_diffsky.
Populates halos with galaxies using the rgrspit_diffsky package with consistent random key generation for reproducible results across MPI ranks.
- Parameters:
logmhost (jax.numpy.ndarray, shape (N_halos,)) – Log10 host halo masses
halo_radius (jax.numpy.ndarray, shape (N_halos,)) – Halo virial radii in Mpc/h
halo_pos (jax.numpy.ndarray, shape (N_halos, 3)) – Halo positions in Mpc/h
halo_vel (jax.numpy.ndarray, shape (N_halos, 3)) – Halo velocities in km/s
Lbox (float) – Simulation box size in Mpc/h
rank (int, optional) – MPI rank for logging (default: 0)
- Returns:
Galaxy catalog from mc_galpop_synthetic_subs containing: - ‘pos’ : galaxy positions (N_galaxies, 3) - ‘vel’ : galaxy velocities (N_galaxies, 3) - ‘stellar_mass’ : galaxy stellar masses (N_galaxies,) - Other galaxy properties from rgrspit_diffsky
- Return type:
Notes
Uses fixed random key (0) for reproducible galaxy generation across all MPI ranks
Applies minimum halo mass threshold LGMP_MIN for galaxy population
Uses current observational redshift CURRENT_Z_OBS
Galaxy catalog includes satellite galaxies via synthetic subhalo population
HDF5 Writer
HDF5 Writer Module
Handles parallel and single-process HDF5 file writing for galaxy catalogs. This module contains the complex collective I/O operations that will be reused across 40,000+ mock generation runs.
Parallel HDF5 writing with MPI collective I/O operations.
- covariance_mocks.hdf5_writer.combine_mpi_files(output_path, size)[source]
Combine MPI rank files into final HDF5 catalog (legacy function - not used in parallel HDF5)
- covariance_mocks.hdf5_writer.write_parallel_hdf5(galcat, plot_logmhost, plot_halo_radius, plot_halo_pos, plot_halo_vel, output_path, rank, size, comm, Lbox)[source]
Write galaxy catalog using parallel HDF5 for multiple MPI ranks.
Coordinates collective I/O operations across MPI ranks to write a single HDF5 file containing galaxies and halos from all processes.
- Parameters:
galcat (dict) – Galaxy catalog from rgrspit_diffsky for this rank
plot_logmhost (array_like) – Log10 halo masses for this rank
plot_halo_radius (array_like) – Halo virial radii for this rank in Mpc/h
plot_halo_pos (array_like, shape (N_halos, 3)) – Halo positions for this rank in Mpc/h
plot_halo_vel (array_like, shape (N_halos, 3)) – Halo velocities for this rank in km/s
output_path (str) – Full path for output HDF5 file
rank (int) – MPI rank of this process
size (int) – Total number of MPI processes
comm (MPI.Comm) – MPI communicator for collective operations
Lbox (float) – Simulation box size in Mpc/h
Notes
Uses MPI collective operations to coordinate writes
Gathers counts and calculates offsets for contiguous data layout
All ranks write to same file using parallel HDF5
Rank 0 writes metadata and creates file structure
Includes temporary rank files cleanup after successful write
Handles galaxy and halo data with proper offset calculations
- covariance_mocks.hdf5_writer.write_single_hdf5(galcat, plot_logmhost, plot_halo_radius, plot_halo_pos, plot_halo_vel, output_path, Lbox)[source]
Write galaxy catalog to HDF5 file for single process.
- Parameters:
galcat (dict) – Galaxy catalog from rgrspit_diffsky containing galaxy properties
plot_logmhost (array_like) – Log10 halo masses used for galaxy generation
plot_halo_radius (array_like) – Halo virial radii in Mpc/h
plot_halo_pos (array_like, shape (N_halos, 3)) – Halo positions in Mpc/h
plot_halo_vel (array_like, shape (N_halos, 3)) – Halo velocities in km/s
output_path (str) – Full path for output HDF5 file
Lbox (float) – Simulation box size in Mpc/h
Notes
Creates directory structure if it doesn’t exist
Saves galaxy properties under ‘galaxies/’ group
Saves halo properties under ‘halos/’ group
Includes metadata attributes: Lbox, z_obs, lgmp_min, n_halos, n_galaxies
Handles structured data by saving components separately
MPI Setup
MPI Setup Module
Handles MPI and JAX initialization patterns for distributed computing. This module standardizes the environment-dependent device configuration and manages single vs multi-process modes.
Standardizes environment-dependent device configuration and process management.
- covariance_mocks.mpi_setup.finalize_mpi(comm, rank, size, MPI_AVAILABLE)[source]
Properly finalize MPI communication.
Ensures clean shutdown of MPI processes with proper synchronization.
- Parameters:
Notes
Only performs finalization if MPI is available and multi-process
Uses MPI barrier to synchronize all ranks before finalization
Provides per-rank logging for debugging distributed shutdown
Safe to call even if MPI not available (no-op)
- covariance_mocks.mpi_setup.initialize_mpi_jax()[source]
Initialize MPI and JAX with proper device configuration.
Sets up MPI communication and configures JAX for distributed computing with environment-dependent device configuration.
- Returns:
(comm, rank, size, MPI_AVAILABLE) where: - comm : MPI.Comm or None
MPI communicator for inter-process communication (None if MPI unavailable)
- rankint
Process rank (0-based, 0 for single process)
- sizeint
Total number of MPI processes (1 for single process)
- MPI_AVAILABLEbool
Whether MPI is available and initialized
- Return type:
Notes
Attempts to import and initialize mpi4py for parallel execution
Falls back to single-process mode if MPI unavailable
Configures JAX environment variables for distributed use
Initializes JAX distributed backend for multi-process execution
Reports JAX backend and available devices for each rank
Handles GPU device configuration automatically
Utils
Utilities Module
Common utility functions for the covariance mocks pipeline.
- covariance_mocks.utils.generate_output_filename(simulation_box, phase, redshift, n_gen=None)[source]
Generate standardized output filename for mock catalogs.
- Parameters:
- Returns:
Standardized HDF5 filename following naming convention
- Return type:
Examples
>>> generate_output_filename("AbacusSummit_small_c000", "ph3000", "z1.100") 'mock_AbacusSummit_small_c000_ph3000_z1.100.hdf5'
>>> generate_output_filename("AbacusSummit_small_c000", "ph3000", "z1.100", n_gen=5000) 'mock_AbacusSummit_small_c000_ph3000_z1.100_test5000.hdf5'
- covariance_mocks.utils.validate_catalog_path(catalog_path)[source]
Verify that the catalog path exists and is accessible.
- Parameters:
catalog_path (str) – Path to AbacusSummit halo catalog directory
- Returns:
True if path exists and is accessible
- Return type:
- Raises:
FileNotFoundError – If catalog path does not exist or is not a directory