- Figure out which (open source) Bazyes code is out there that we might want to use. You can learn about Naive Bayes here. There is quite a lot more on the web. You can get some Python-specific examples here.
- Start writing the glue logic.
- Figure out the format for the (training and test) data and how it may need to be reformatted to suit our needs. The training and test data are here. Incidentally, they are csv (comma-spaced-variables) format. There is Python libraries built specifically for handling csv files, which you can read about here.
- Start writing code for each of the six Bayes classifiers (toxic, severe_toxic, obscene, threat, insult, identity_hate). Ideally, the will be six different sub-teams allocated to each of the topics. It is envisioned that the six different classifiers will have different feature inputs to the Bayes classifiers, hence each of the six sub-teams can concentrate on their specific topic.
- Integrate the six classifiers into the glue logic to generate the output csv file that is to be uploaded to Kaggle.