API Reference
v0.1.0
matching
Algorithms

medmodels.matching.algorithms.propensity_score

calculate_propensity

def calculate_propensity(
    x_train: NDArray[Union[np.int64, np.float64]],
    y_train: NDArray[Union[np.int64, np.float64]],
    treated_test: NDArray[Union[np.int64, np.float64]],
    control_test: NDArray[Union[np.int64, np.float64]],
    model: Model = "logit",
    hyperparam: Optional[Dict[str, Any]] = None
) -> Tuple[NDArray[np.float64], NDArray[np.float64]]

Trains a classification algorithm on training data, predicts the probability of being in the last class for treated and control test datasets, and returns these probabilities.

This function supports multiple classification algorithms and allows specifying hyperparameters. It is designed for binary classification tasks, focusing on the probability of the positive class.

Arguments:

  • x_train NDArray[Union[np.int64, np.float64]] - Feature matrix for training.
  • y_train NDArray[Union[np.int64, np.float64]] - Target variable for training.
  • treated_test NDArray[Union[np.int64, np.float64]] - Feature matrix for the treated group to predict probabilities.
  • control_test NDArray[Union[np.int64, np.float64]] - Feature matrix for the control group to predict probabilities.
  • model Model, optional - Classification algorithm to use. Options: "logit", "dec_tree", "forest".
  • hyperparam Optional[Dict[str, Any]], optional - Manual hyperparameter settings. Uses default if None.

Returns:

Tuple[NDArray[np.float64], NDArray[np.float64]: Probabilities of the positive class for treated and control groups.

Example:

For "dec_tree" model with iris dataset inputs, returns probabilities of the last class for treated and control sets, e.g., ([0.], [0.]).

run_propensity_score

def run_propensity_score(
        treated_set: pl.DataFrame,
        control_set: pl.DataFrame,
        model: Model = "logit",
        metric: Metric = "absolute",
        number_of_neighbors: int = 1,
        hyperparam: Optional[Dict[str, Any]] = None,
        covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame

Executes Propensity Score matching using a specified classification algorithm. Constructs the training target by assigning 1 to the treated set and 0 to the control set, then predicts the propensity score. This score is used for matching using the nearest neighbor method.

This function simplifies the process of propensity score matching, focusing on the use of the propensity score as the sole covariate for matching.

Arguments:

  • treated_set pl.DataFrame - Data for the treated group.
  • control_set pl.DataFrame - Data for the control group.
  • model Model, optional - Classification algorithm for predicting probabilities. Options include "logit", "dec_tree", "forest".
  • metric Metric, optional - Metric for matching. Options include "absolute", "mahalanobis", "exact". Defaults to "absolute".
  • number_of_neighbors int, optional - Number of nearest neighbors to find for each treated unit. Defaults to 1.
  • hyperparam Optional[Dict[str, Any]], optional - Hyperparameters for model tuning. Increases computation time if set. Uses default if None.
  • covariates Optional[MedRecordAttributeInputList], optional - Features for matching. Uses all if None.

Returns:

  • pl.DataFrame - Matched subset from the control set corresponding to the treated set.

medmodels.matching.algorithms.classic_distance_models

nearest_neighbor

def nearest_neighbor(
        treated_set: pl.DataFrame,
        control_set: pl.DataFrame,
        metric: metrics.Metric,
        number_of_neighbors: int = 1,
        covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame

Performs nearest neighbor matching between two dataframes using a specified metric. This method employs a greedy algorithm to pair elements from the treated set with their closest matches in the control set based on the given metric. The algorithm does not optimize for the best overall matching but ensures a straightforward, commonly used approach. The method is flexible to different metrics and requires preliminary size comparison of treated and control sets to determine the direction of matching. It supports optional specification of covariates for focused matching.

Arguments:

  • treated_set pl.DataFrame - DataFrame for which matches are sought.
  • control_set pl.DataFrame - DataFrame from which matches are selected.
  • metric metrics.Metric - Metric to measure closeness between units, e.g., "absolute", "mahalanobis". The metric must be available in the metrics module.
  • number_of_neighbors int, optional - Number of nearest neighbors to find for each treated unit. Defaults to 1.
  • covariates Optional[MedRecordAttributeInputList], optional - Covariates considered for matching. Defaults to all variables.

Returns:

  • pl.DataFrame - Matched subset from the control set.