medmodels.matching.algorithms.propensity_score
calculate_propensity
def calculate_propensity(
x_train: NDArray[Union[np.int64, np.float64]],
y_train: NDArray[Union[np.int64, np.float64]],
treated_test: NDArray[Union[np.int64, np.float64]],
control_test: NDArray[Union[np.int64, np.float64]],
model: Model = "logit",
hyperparam: Optional[Dict[str, Any]] = None
) -> Tuple[NDArray[np.float64], NDArray[np.float64]]
Trains a classification algorithm on training data, predicts the probability of being in the last class for treated and control test datasets, and returns these probabilities.
This function supports multiple classification algorithms and allows specifying hyperparameters. It is designed for binary classification tasks, focusing on the probability of the positive class.
Arguments:
x_train
NDArray[Union[np.int64, np.float64]] - Feature matrix for training.y_train
NDArray[Union[np.int64, np.float64]] - Target variable for training.treated_test
NDArray[Union[np.int64, np.float64]] - Feature matrix for the treated group to predict probabilities.control_test
NDArray[Union[np.int64, np.float64]] - Feature matrix for the control group to predict probabilities.model
Model, optional - Classification algorithm to use. Options: "logit", "dec_tree", "forest".hyperparam
Optional[Dict[str, Any]], optional - Manual hyperparameter settings. Uses default if None.
Returns:
Tuple[NDArray[np.float64], NDArray[np.float64]: Probabilities of the positive class for treated and control groups.
Example:
For "dec_tree" model with iris dataset inputs, returns probabilities of the last class for treated and control sets, e.g., ([0.], [0.]).
run_propensity_score
def run_propensity_score(
treated_set: pl.DataFrame,
control_set: pl.DataFrame,
model: Model = "logit",
metric: Metric = "absolute",
number_of_neighbors: int = 1,
hyperparam: Optional[Dict[str, Any]] = None,
covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame
Executes Propensity Score matching using a specified classification algorithm. Constructs the training target by assigning 1 to the treated set and 0 to the control set, then predicts the propensity score. This score is used for matching using the nearest neighbor method.
This function simplifies the process of propensity score matching, focusing on the use of the propensity score as the sole covariate for matching.
Arguments:
treated_set
pl.DataFrame - Data for the treated group.control_set
pl.DataFrame - Data for the control group.model
Model, optional - Classification algorithm for predicting probabilities. Options include "logit", "dec_tree", "forest".metric
Metric, optional - Metric for matching. Options include "absolute", "mahalanobis", "exact". Defaults to "absolute".number_of_neighbors
int, optional - Number of nearest neighbors to find for each treated unit. Defaults to 1.hyperparam
Optional[Dict[str, Any]], optional - Hyperparameters for model tuning. Increases computation time if set. Uses default if None.covariates
Optional[MedRecordAttributeInputList], optional - Features for matching. Uses all if None.
Returns:
pl.DataFrame
- Matched subset from the control set corresponding to the treated set.
medmodels.matching.algorithms.classic_distance_models
nearest_neighbor
def nearest_neighbor(
treated_set: pl.DataFrame,
control_set: pl.DataFrame,
metric: metrics.Metric,
number_of_neighbors: int = 1,
covariates: Optional[MedRecordAttributeInputList] = None
) -> pl.DataFrame
Performs nearest neighbor matching between two dataframes using a specified metric. This method employs a greedy algorithm to pair elements from the treated set with their closest matches in the control set based on the given metric. The algorithm does not optimize for the best overall matching but ensures a straightforward, commonly used approach. The method is flexible to different metrics and requires preliminary size comparison of treated and control sets to determine the direction of matching. It supports optional specification of covariates for focused matching.
Arguments:
treated_set
pl.DataFrame - DataFrame for which matches are sought.control_set
pl.DataFrame - DataFrame from which matches are selected.metric
metrics.Metric - Metric to measure closeness between units, e.g., "absolute", "mahalanobis". The metric must be available in the metrics module.number_of_neighbors
int, optional - Number of nearest neighbors to find for each treated unit. Defaults to 1.covariates
Optional[MedRecordAttributeInputList], optional - Covariates considered for matching. Defaults to all variables.
Returns:
pl.DataFrame
- Matched subset from the control set.