API Reference
latest
matching
Covariates

medmodels.matching.covariates.covariates_preprocessing

covariate_coarsen

def covariate_coarsen(
        covariate: NDArray[Union[np.float64, np.int64]],
        n_bins: int = 10) -> NDArray[Union[np.float64, np.int64]]

Bins a continuous variable into discrete intervals. This method divides the range of covariate into n_bins equal-width bins and assigns each value to a bin represented by a discrete integer. It ensures functionality even when all covariate values are equal by adding a small noise.

Arguments:

  • covariate NDArray[Union[np.float64, np.int64]] - The continuous variable to be binned.
  • n_bins int, optional - The number of bins to divide the covariate into. Defaults to 10.

Returns:

NDArray[Union[np.float64, np.int64]]: An array of discrete integers representing the bin assignments for each entry in covariate.

Example:

For covariate = [1, 5, 10, 14, 15] and n_bins = 3, the function might return [1, 1, 2, 3, 3], indicating the bin assignment for each value in covariate.

covariate_add_noise

def covariate_add_noise(covariate: pd.Series[Union[int, float]],
                        n_digits: int = 2) -> pd.Series[float]

Adds noise after a specified number of decimal places to a discrete variable, transforming it into a continuous variable. This is particularly useful for simulations, examples, and tests, allowing discrete variables to be used in contexts requiring continuous variables.

Arguments:

  • covariate pd.Series[Union[int, float]] - The discrete variable to be transformed.
  • n_digits int, optional - Specifies the decimal place after which to add noise. A positive value adds noise with a magnitude less than 1, while a negative value can increase the noise magnitude. Defaults to 2, resulting in noise between 0 and 0.01.

Returns:

  • pd.Series[float] - A pandas Series containing the modified covariate with added noise.

Example:

If covariate is a Series of integers and n_digits is 2, the function will add a random noise between 0 and 0.01 to each entry, effectively making the variable continuous.