Forecast total cases of Dengue Fever in the cities of San Juan and Iquitos.
- EDA\Visualizations
- Analysis of Covariate Relation to Target
- Documentation, Observations, Statistical Tests
- Forecast Model
Results saved in src/eda
python eda.py
python -m ipykernel install --user --name=venv-azureml-37
jupyter notebook
Results saved in src/ar_models
python ar_model.py
- Baseline (something simple to improve on - lagged inputs only)
- Complex (clearly exogeneous variables matter here, this is tricky though get to if time permits)
1. Normalize (cases per 100k rather than total cases)
2. Missing Data
- mean
- same as last
- mode for categoric
- MICE
3. Standardize Timestamp Frequency
- Not a big deal but make every period start on same day
- Aggregate covariates appropriately
3. Features
- AutoRegressive (Lags)
- Quickly test a few simple AR Models (lagged observations clearly matter)
- Exogeneous
- Clearly lagged variables don't contain all the information (non-cyclical plot)
- This is a mosquito born virus and we're provided lots of NOAA data...
- Rainfaill/Humidity and Temperature likely important
- Linear Correlation is low (include plots)
- Perhaps order matters?
- (i.e. mild winter followed by humid wet/hot season)
- El Nino? (perhaps more foreseeable compared to humidity/temp)
- Including Exogeneous predictors is not easy in a time-series context
- forecasts will be required for each co-variate, so it's important to establish a baseline and demonstrate meaningfull improvement
Need some version of Python3.9
in $PythonPath
then use pipenv
to install dependencies:
$PythonPath\python.exe -m pip install pipenv
set PIPENV_VENV_IN_PROJECT="enabled"
pipenv install
pipenv shell
cd src