medmodels.matching.algorithms.classic_distance_models
nearest_neighbor
def nearest_neighbor(treated_set: pd.DataFrame,
control_set: pd.DataFrame,
metric: str,
covariates: Optional[List[str]] = None) -> pd.DataFrame
Performs nearest neighbor matching between two dataframes using a specified metric. This method employs a greedy algorithm to pair elements from the treated set with their closest matches in the control set based on the given metric. The algorithm does not optimize for the best overall matching but ensures a straightforward, commonly used approach. The method is flexible to different metrics and requires preliminary size comparison of treated and control sets to determine the direction of matching. It supports optional specification of covariates for focused matching.
Arguments:
treated_set
pd.DataFrame - DataFrame for which matches are sought.control_set
pd.DataFrame - DataFrame from which matches are selected.metric
str - Metric to measure closeness between units, e.g., "absolute", "mahalanobis".covariates
Optional[List[str]], optional - Covariates considered for matching. Defaults to all variables.
Returns:
pd.DataFrame
- Matched subset from the control set.
medmodels.matching.algorithms.propensity_score
calculate_propensity
def calculate_propensity(
x_train: np.ndarray,
y_train: np.ndarray,
treated_test: np.ndarray,
control_test: np.ndarray,
hyperparam: Optional[dict] = None,
metric: str = "logit") -> Tuple[np.ndarray, np.ndarray]
Trains a classification algorithm on training data, predicts the probability of being in the last class for treated and control test datasets, and returns these probabilities.
This function supports multiple classification algorithms and allows specifying hyperparameters. It is designed for binary classification tasks, focusing on the probability of the positive class.
Arguments:
x_train
np.ndarray - Feature matrix for training.y_train
np.ndarray - Target variable for training.treated_test
np.ndarray - Feature matrix for the treated group to predict probabilities.control_test
np.ndarray - Feature matrix for the control group to predict probabilities.hyperparam
Optional[dict], optional - Manual hyperparameter settings. Uses default if None.metric
str, optional - Classification algorithm to use. Options: "logit", "dec_tree", "forest".
Returns:
Tuple[np.ndarray, np.ndarray]: Probabilities of the positive class for treated and control groups.
Example:
For "dec_tree" metric with iris dataset inputs, returns probabilities of the last class for treated and control sets, e.g., ([0.], [0.]).
run_propensity_score
def run_propensity_score(treated_set: pd.DataFrame,
control_set: pd.DataFrame,
model: str = "logit",
hyperparam: Optional[Any] = None,
covariates: Optional[list] = None) -> pd.DataFrame
Executes Propensity Score matching using a specified classification algorithm. Constructs the training target by assigning 1 to the treated set and 0 to the control set, then predicts the propensity score. This score is used for matching using the nearest neighbor method.
Arguments:
treated_set
pd.DataFrame - Data for the treated group.control_set
pd.DataFrame - Data for the control group.model
str, optional - Classification algorithm for predicting probabilities. Options include "logit", "dec_tree", "forest".hyperparam
Optional[Any], optional - Hyperparameters for model tuning. Increases computation time if set.covariates
Optional[list], optional - Features for matching. Uses all if None.
Returns:
-
pd.DataFrame
- Matched subset from the control set corresponding to the treated set.This function simplifies the process of propensity score matching, focusing on the use of the propensity score as the sole covariate for matching.