medmodels.matching.covariates.covariates_preprocessing
covariate_coarsen
def covariate_coarsen(covariate: np.ndarray, n_bins: int = 10) -> np.ndarray
Bins a continuous variable into discrete intervals. This method divides the range of
covariate
into n_bins
equal-width bins and assigns each value to a bin
represented by a discrete integer. It ensures functionality even when all covariate
values are equal by adding a small noise.
Arguments:
covariate
np.ndarray - The continuous variable to be binned.n_bins
int, optional - The number of bins to divide the covariate into. Defaults to 10.
Returns:
np.ndarray
- An array of discrete integers representing the bin assignments for each entry incovariate
.
Example:
For covariate
= [1, 5, 10, 14, 15] and n_bins
= 3, the function might
return [1, 1, 2, 3, 3], indicating the bin assignment for each value in
covariate
.
covariate_add_noise
def covariate_add_noise(covariate: Series, n_digits: int = 2) -> Series
Adds noise after a specified number of decimal places to a discrete variable, transforming it into a continuous variable. This is particularly useful for simulations, examples, and tests, allowing discrete variables to be used in contexts requiring continuous variables.
Arguments:
covariate
Series - The discrete variable to be transformed.n_digits
int, optional - Specifies the decimal place after which to add noise. A positive value adds noise with a magnitude less than 1, while a negative value can increase the noise magnitude. Defaults to 2, resulting in noise between 0 and 0.01.
Returns:
Series
- A pandas Series containing the modified covariate with added noise.
Example:
If covariate
is a Series of integers and n_digits
is 2, the function will
add a random noise between 0 and 0.01 to each entry, effectively making the
variable continuous.