Hannes Datta, [email protected] Hulai Zhang, [email protected]
To illustrate the versatility of automating one's research, we are using various tools (R, Python, STATA, PERL and TeX/LyX) in this workflow!
- Install Python.
- Anaconda is recommended. Download Anaconda.
- check availability: type
anaconda --version
in the command line.
- Install the Python Package TableFill:
pip install git+https://github.com/mcaceresb/tablefill
- Install the Kaggle package.
- Kaggle API instruction for installation and setup.
- Install Automation tools.
- GNU make: already installed in Mac and Linux OS. Download Make for Windows OS and install.
- Windows OS users only: make
Make
available via the command line.- Right Click on
Computer
- Go to
Property
, and clickAdvanced System Settings
- Choose
Environment Variables
, and choosePath
under the system variables, clickedit
- Add the bin of
Make
- Right Click on
- check availability: type
make --version
in the command line.
- Install Stata.
- making Stata available via the command line. Instruction for adding Stata to path.
- check availability: type
$STATA_BIN --version
in the command line.
- Install LyX.
- LyX is an open source document processor based on the LaTeX. Download LyX.
- make sure LyX available via the command line. Instruction for adding LyX to path.
- check availability: type
$LYX_BIN
in the command line.
Open your command line tool:
- Check whether your present working directory is
airbnb-workflow
by typingpwd
in terminal- if not, type
cd yourpath/airbnb-workflow
to change your directory toairbnb-workflow
- if not, type
- Type
make
in the command line.
Make sure makefile
is put in the present working directory. The directory structure for the workflow is shown below.
├── data
├── gen
│ ├── analysis
│ │ ├── input
│ │ ├── output
│ │ │ ├── figure
│ │ │ ├── log
│ │ │ └── table
│ │ └── temp
│ ├── data_preparation
│ │ ├── audit
│ │ │ ├── figure
│ │ │ ├── log
│ │ │ └── table
│ │ ├── input
│ │ ├── output
│ │ │ ├── figure
│ │ │ ├── log
│ │ │ └── table
│ │ └── temp
│ └── paper
│ ├── input
│ ├── output
│ └── temp
└── src
├── analysis
├── data_preparation
└── paper
- gen: all generated files such as tables, figures, logs.
- Three parts: data_preparation, analysis, and paper.
- audit: put the resulting log/tables/figures of audit program. It has three sub-folders: figure, log, and table.
- temp : put the temporary files, such as some intermediate datasets. We may delete these filed in the end.
- output: put results, including the generated figures in sub-folder figure, log files in sub-folder log, and tables in sub-folder table.
- input: put all temporary input files
- data: all raw data.
- src: all source codes.
- Three parts: data_preparation, analysis, and paper (including tex files).