Skip to contents

This function applies marginal fitting and transforms each gene's expression to follow a standard Gaussian distribution.

Usage

fit_marginals(
  gene_list,
  count_mat,
  cov_mat,
  formula1,
  family1,
  DT = TRUE,
  epsilon = 1e-06,
  ncores = ncores,
  seed = 123
)

Arguments

gene_list

A vector of gene names or indices (row names or row numbers of count_mat) to process.

count_mat

A matrix of raw gene expression counts. Rows correspond to genes, columns to observations (cells/spots).

cov_mat

A matrix or data frame of covariates used for marginal modeling (e.g., spatial coordinates or experimental annotations).

formula1

A formula object or string (e.g., "~ covariate1 + covariate2") specifying the model structure for the mean.

family1

A string specifying the distribution family to be used for modeling. Supported values include "gaussian", "poisson", "nb", or "zinb".

DT

Logical; if TRUE, applies a discrete transformation suitable for count data. Default is TRUE.

epsilon

A small numeric constant to avoid boundary issues (e.g., 0 or 1 values in uniform distribution). Default is 1e-6.

ncores

Integer specifying the number of cores to use for parallel processing via parallel::mclapply.

seed

Random seed for reproducibility. Default is 123.

Value

A list containing two matrices:

marginal

A matrix of transformed values from each gene, transformed to follow a standard normal distribution.

residual

A matrix of values transformed to standard uniform distribution (before applying the Gaussian quantile function).

Examples

data(test_data)
# Fit standardized marginals for gene expressions
marginal_res <- fit_marginals(
  gene_list = test_data$gene_list,
  count_mat = test_data$count_mat,
  cov_mat = test_data$cov_mat,
  formula1 = "layer_annotations",
  family1 = "nb",
  DT = TRUE,
  epsilon = 1e-6,
  ncores = 2
)