medmodels.matching.evaluation
relative_diff_in_means
def relative_diff_in_means(control_set: pd.DataFrame,
treated_set: pd.DataFrame) -> pd.DataFrame
Calculates the absolute relative mean difference for each feature between control and treated sets, expressed as a percentage of the control set's mean. This measure provides an understanding of how much each feature's average value changes from the control to the treated group relative to the control.
Arguments:
control_set
pd.DataFrame - DataFrame representing the control group.treated_set
pd.DataFrame - DataFrame representing the treated group.
Returns:
-
pd.DataFrame
- A DataFrame containing the mean values of the control and treated sets for all features and the absolute relative difference in means, expressed as a percentage.The function internally computes the relative difference for each feature, handling cases where the control mean is zero by simply calculating the absolute difference times 100. It provides insights into the percentage change in feature means due to treatment.
average_value_over_features
def average_value_over_features(df: pd.DataFrame) -> float
Calculates the average of the values in the last row of a DataFrame. This function is particularly useful for aggregating measures like differences or percentages across multiple features, providing a single summary statistic.
Arguments:
df
pd.DataFrame - The DataFrame on which the calculation is to be performed.
Returns:
float
- The average value of the last row across all columns.
Example:
Given a DataFrame with the last row containing differences in percentages between treated and control means across features 'a' and 'b', e.g., 75.0% for 'a' and 250.0% for 'b', this function will return the average difference, which is (75.0 + 250.0) / 2 = 162.5.
average_abs_relative_diff
def average_abs_relative_diff(
control_set: pd.DataFrame,
treated_set: pd.DataFrame,
covariates: Optional[List[str]] = None) -> Tuple[float, pd.DataFrame]
Calculates the average absolute relative difference in means over specified covariates between control and treated sets. If covariates are not specified, the calculation includes all features.
This function is designed to assess the impact of a treatment across multiple features by computing the mean of absolute relative differences. It returns both a summary metric and a detailed DataFrame for further analysis.
Arguments:
control_set
pd.DataFrame - DataFrame for the control group.treated_set
pd.DataFrame - DataFrame for the treated group.covariates
Optional[List[str]], optional - List of covariate names to include. If None, considers all features.
Returns:
Tuple[float, pd.DataFrame]: A tuple containing the average absolute relative difference as a float and a DataFrame with detailed mean values and absolute relative differences for all features.
The detailed DataFrame includes means for both control and treated sets and the absolute relative difference for each feature.
medmodels.matching.metrics
absolute_metric
def absolute_metric(*vectors: Tuple[np.ndarray, np.ndarray]) -> float
Calculates the Euclidean distance (L1 norm) between two vectors, providing a measure of the absolute difference between them. This distance is the sum of the absolute differences between each corresponding pair of elements in the two vectors.
Arguments:
vectors
Tuple[np.ndarray, np.ndarray] - Two numpy arrays to be compared.
Returns:
-
float
- The Euclidean distance between the two vectors.The calculation is based on the formula:
exact_metric
def exact_metric(*vectors: Tuple[np.ndarray, np.ndarray]) -> float
Computes the exact metric for matching, which is particularly applicable for discrete or categorical covariates rather than continuous ones. This metric returns 0 if the two vectors are exactly identical, and infinity otherwise, making it suitable for scenarios where exact matches are necessary.
The exact metric is defined as:
Arguments:
vectors
Tuple[np.ndarray, np.ndarray] - Two numpy arrays to be compared.
Returns:
float
- 0 if the vectors are equal, infinity if they are not.
Notes:
This function is designed for exactly two input vectors.