API Reference
v0.0.1
matching
Algorithms

medmodels.matching.algorithms.classic_distance_models

nearest_neighbor

def nearest_neighbor(treated_set: pd.DataFrame,
                     control_set: pd.DataFrame,
                     metric: str,
                     covariates: Optional[List[str]] = None) -> pd.DataFrame

Performs nearest neighbor matching between two dataframes using a specified metric. This method employs a greedy algorithm to pair elements from the treated set with their closest matches in the control set based on the given metric. The algorithm does not optimize for the best overall matching but ensures a straightforward, commonly used approach. The method is flexible to different metrics and requires preliminary size comparison of treated and control sets to determine the direction of matching. It supports optional specification of covariates for focused matching.

Arguments:

  • treated_set pd.DataFrame - DataFrame for which matches are sought.
  • control_set pd.DataFrame - DataFrame from which matches are selected.
  • metric str - Metric to measure closeness between units, e.g., "absolute", "mahalanobis".
  • covariates Optional[List[str]], optional - Covariates considered for matching. Defaults to all variables.

Returns:

  • pd.DataFrame - Matched subset from the control set.

medmodels.matching.algorithms.propensity_score

calculate_propensity

def calculate_propensity(
        x_train: np.ndarray,
        y_train: np.ndarray,
        treated_test: np.ndarray,
        control_test: np.ndarray,
        hyperparam: Optional[dict] = None,
        metric: str = "logit") -> Tuple[np.ndarray, np.ndarray]

Trains a classification algorithm on training data, predicts the probability of being in the last class for treated and control test datasets, and returns these probabilities.

This function supports multiple classification algorithms and allows specifying hyperparameters. It is designed for binary classification tasks, focusing on the probability of the positive class.

Arguments:

  • x_train np.ndarray - Feature matrix for training.
  • y_train np.ndarray - Target variable for training.
  • treated_test np.ndarray - Feature matrix for the treated group to predict probabilities.
  • control_test np.ndarray - Feature matrix for the control group to predict probabilities.
  • hyperparam Optional[dict], optional - Manual hyperparameter settings. Uses default if None.
  • metric str, optional - Classification algorithm to use. Options: "logit", "dec_tree", "forest".

Returns:

Tuple[np.ndarray, np.ndarray]: Probabilities of the positive class for treated and control groups.

Example:

For "dec_tree" metric with iris dataset inputs, returns probabilities of the last class for treated and control sets, e.g., ([0.], [0.]).

run_propensity_score

def run_propensity_score(treated_set: pd.DataFrame,
                         control_set: pd.DataFrame,
                         model: str = "logit",
                         hyperparam: Optional[Any] = None,
                         covariates: Optional[list] = None) -> pd.DataFrame

Executes Propensity Score matching using a specified classification algorithm. Constructs the training target by assigning 1 to the treated set and 0 to the control set, then predicts the propensity score. This score is used for matching using the nearest neighbor method.

Arguments:

  • treated_set pd.DataFrame - Data for the treated group.
  • control_set pd.DataFrame - Data for the control group.
  • model str, optional - Classification algorithm for predicting probabilities. Options include "logit", "dec_tree", "forest".
  • hyperparam Optional[Any], optional - Hyperparameters for model tuning. Increases computation time if set.
  • covariates Optional[list], optional - Features for matching. Uses all if None.

Returns:

  • pd.DataFrame - Matched subset from the control set corresponding to the treated set.

    This function simplifies the process of propensity score matching, focusing on the use of the propensity score as the sole covariate for matching.