Skip to content

Commit

Permalink
Merge pull request #2 from shafayetShafee/cat-covs
Browse files Browse the repository at this point in the history
added support for mulicategorical variables in compute_smd
  • Loading branch information
shafayetShafee authored Aug 16, 2024
2 parents 4c3a610 + 0f19aab commit e61a9aa
Show file tree
Hide file tree
Showing 4 changed files with 616 additions and 293 deletions.
49 changes: 36 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# skmiscpy

[![PyPI](https://img.shields.io/pypi/v/skmiscpy.svg)](https://pypi.org/project/skmiscpy/) ![Python Versions](https://img.shields.io/pypi/pyversions/skmiscpy) ![License](https://img.shields.io/pypi/l/skmiscpy) [![Build](https://github.com/shafayetShafee/skmiscpy/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/shafayetShafee/skmiscpy/actions/workflows/ci-cd.yml) [![codecov](https://codecov.io/github/shafayetShafee/skmiscpy/graph/badge.svg?token=OAZ6C1KHC9)](https://codecov.io/github/shafayetShafee/skmiscpy)

Contains a few functions useful for data-analysis, causal inference etc.

## Installation
Expand Down Expand Up @@ -44,23 +46,44 @@ plot_mirror_histogram(
```
### Compute Standardized Mean Difference (SMD)

``` python
data = pd.DataFrame({
'group': [1, 0, 1, 0, 1, 0],
'age': [23, 35, 45, 50, 22, 30],
'bmi': [22.5, 27.8, 26.1, 28.5, 24.3, 29.0],
'blood_pressure': [120, 130, 140, 135, 125, 133],
'weights': [1.2, 0.8, 1.5, 0.7, 1.0, 0.9]
```python
sample_df = pd.DataFrame({
'age': np.random.randint(18, 66, size=100),
'weight': np.round(np.random.uniform(120, 200, size=100), 1),
'gender': np.random.choice(['male', 'female'], size=100),
'race': np.random.choice(
['white', 'black', 'hispanic'],
size=100, p=[0.4, 0.3, 0.3]
),
'educ_level': np.random.choice(
['bachelor', 'master', 'doctorate'],
size=100, p=[0.3, 0.4, 0.3]
),
'ps_wts': np.round(np.random.uniform(0.1, 1.0, size=100), 2),
'group': np.random.choice(['treated', 'control'], size=100),
'date': pd.date_range(start='2024-01-01', periods=100, freq='D')
})

# Compute SMD for 'age', 'bmi', and 'blood_pressure' under ATE estimand
smd_results = compute_smd(data, vars=['age', 'bmi', 'blood_pressure'], group='group', estimand='ATE')
# 1. Basic usage with unadjusted SMD only:
compute_smd(sample_df, vars=['age', 'weight', 'gender'], group='group', estimand='ATE')

# Compute SMD adjusted by weights
smd_results_with_weights = compute_smd(data, vars=['age', 'bmi', 'blood_pressure'], group='group', wt_var='weights')
# 2. Including weights for adjusted SMD:
compute_smd(
sample_df,
vars=['age', 'weight', 'gender'],
group='group', wt_var='ps_wts',
estimand='ATE'
)

print(smd_results)
print(smd_results_with_weights)
# 3. Including categorical variables for adjusted SMD:
compute_smd(
sample_df,
vars=['age', 'weight', 'gender'],
group='group',
wt_var='ps_wts',
cat_vars=['race', 'educ_level'],
estimand='ATE'
)
```

### Create a love plot (point plot of SMD)
Expand Down
Loading

0 comments on commit e61a9aa

Please sign in to comment.