Data access & catalog format

Layout

The catalogs are on the m4943 allocation, one HDF5 file per realization at each redshift:

/global/cfs/cdirs/m4943/covariance_mocks/
  v1/
    README.md
    catalogs/
      z1.400/  r3000.hdf5 … r4999.hdf5
      z1.700/  …
    metadata/
      manifest_z1.400.csv
      provenance_z1.400.json

catalogs/z<redshift>/r<NNNN>.hdf5 is realization NNNN at that redshift. z=1.4 is available now (1878 realizations); other redshifts are added as they complete.

Catalog format

Each file has a galaxies/ group of per-object columns (all length n_galaxies):

Column

Units

Description

sfr_corr

Msun/yr

Calibrated star-formation rate.

sfr_raw

Msun/yr

Raw star-formation rate.

mstar_corr

Msun

Calibrated stellar mass.

mstar_raw

Msun

Raw stellar mass.

mpeak

Msun

Halo peak mass.

pos

Mpc/h

Comoving position, shape (N, 3); served as x, y, z.

vel

as stored

Peculiar velocity, shape (N, 3); served as vx, vy, vz.

Box-level attributes include Lbox (= 500 Mpc/h), z_obs (the redshift), n_galaxies, phase (e.g. ph3000), and simulation_box (AbacusSummit_small_c000).

Reading a catalog

from covariance_mocks.selection import Catalog

path = "/global/cfs/cdirs/m4943/covariance_mocks/v1/catalogs/z1.400/r3000.hdf5"
with Catalog.open(path) as cat:
    cat.redshift            # 1.4
    cat.Lbox                # 500.0
    cat.volume              # Lbox**3 in (Mpc/h)^3
    len(cat)                # n_galaxies
    cat.available()         # column names
    cat.column("sfr_corr")  # per-object array
    cat.column("x")         # pos[:, 0]

The ensemble n(>SFR) table

NumberDensity with an ensemble threshold uses the ensemble-averaged cumulative density n(>sfr_corr). Build it from a set of catalogs with build_ensemble_nsfr() (see Quickstart).

Metadata

metadata/manifest_z<redshift>.csv lists each staged realization with its byte size and source. metadata/provenance_z<redshift>.json records the source path, simulation, data model, and realization count.