Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: Group 24: Crime Prediction in Vancouver #8

Open
1 task done
jasmineortega opened this issue Nov 29, 2021 · 7 comments
Open
1 task done

Submission: Group 24: Crime Prediction in Vancouver #8

jasmineortega opened this issue Nov 29, 2021 · 7 comments
Assignees

Comments

@jasmineortega
Copy link

jasmineortega commented Nov 29, 2021

Submitting authors: @thomassiu, @sy25wang, @RamiroMejia, @jasmineortega

Repository: https://github.com/UBC-MDS/DSCI_522_Crime_Prediction_Vancouver
Report link: https://github.com/UBC-MDS/DSCI_522_Crime_Prediction_Vancouver/blob/main/doc/vancouver_crime_predict_report.md

Abstract/executive summary:
In this project, we attempted to create a classification prediction model to predict the types of crimes that happens in Vancouver, BC based on neighborhood location and time of the crime. Based on our EDA results and model tuning, including necessary data cleaning tasks, we identified that the Logistic Regression model performed the best among all the models tested based on f1 score. The performance of predicting the results of the unseen data was not satisfied, that we believed was due to the lack of associations between the features (Time and Location) and crime type. We proposed further improvements in the future iterations of model optimisations, such as including adding relevant data from outside score (i.e. Vancouver weather, Vancouver housing etc).

Editor: @thomassiu, @sy25wang, @RamiroMejia, @jasmineortega
Reviewer: @lipcai, @arijc76, @zzhzoe, @junrongz

  • I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.
@jasmineortega jasmineortega changed the title Submission: <GROUP 24: Crime Prediction in Vancouver> Submission: GROUP 24: Crime Prediction in Vancouver Nov 30, 2021
@jasmineortega jasmineortega changed the title Submission: GROUP 24: Crime Prediction in Vancouver Group 24: Crime Prediction in Vancouver Nov 30, 2021
@jasmineortega jasmineortega changed the title Group 24: Crime Prediction in Vancouver Submission: Group 24: Crime Prediction in Vancouver Nov 30, 2021
@zzhzoe zzhzoe assigned zzhzoe and unassigned zzhzoe Nov 30, 2021
@arijeetchatterjee
Copy link

Data analysis review checklist

Reviewer: @arijc76

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2

Review Comments:

Nice work. I liked this analysis and it's unfortunate that the quality of the available data is not conducive to getting the desired results. I can think of the following as improvements on the work done so far:

  • Add tests for the script files.
  • Specify the version numbers for tidyverser and knitr (Based on personal experience, the version number will be useful for any user with a older version to diagnose any potential issues in running the analysis).
  • I was unable to run the automation script to reproduce the analysis. Got the below error after following the instructions to install the conda environment and run the script to execute the data analysis pipeline.
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
Execution halted

As per stackoverflow, this error ccould be solved by inserting the below code in the R Script prior to the render command. You can investigate this further and incorporate any changes in the installation or data analysis pipeline running instructions.

Sys.setenv(RSTUDIO_PANDOC="--- insert directory here ---")

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@lipcai
Copy link

lipcai commented Dec 4, 2021

Reviewer: @lipcai

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5

Review Comments:

Good job! The structure of the repo is clean and organized. The research topic is interesting! the scripts and the report are very well-structured. Please find my comments in the following.
#Points being done well:

  1. The model could be very useful for predicting the size of the forest fires and could potentially guide the rescue forces in the realistic context as needed.
  2. The data visualizations from the EDA part is formatted beautifully. The scripts, EDA, and coding are well designed and fully described.
  3. The source code was well separated into meaningful functions / modules.

#Points could be improved:

  1. I think you make the README file pretty clear so I can reproduce the entire data analysis. I wolud suggest to add a flow chart / workflow diagram in README so the readers can have a better understanding of the overall data flow with adequate graphical representation.
  2. You guys have a very detailed data EDA, but for those interested in the raw data, I would hope in the final report, Data Section can be added to give a summary about the dataset (e.g. the number of columns, the type or description of the columns and total observations) that is used for the analysis of this project.
  3. I think you guys still need to add tests for the script files. it is one of the metrics standard.
  4. In src folder, there are too many file in just the root, it would be better to seperate and arrange those file into more appropriate ones.
  5. As what we have been taught in 531 (data visuaiazation), there could add a narrative so that it will be easy for others to follow along with what you have done here .I am sorry, this one is picky(I really can't find another one but it demands at least 5-point constructive feedback)

Again, great work! It's really hard for me to pick out other points that need to be improved and I got some great ideas for our project after reading yours! Thank you!
Linhan

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@lipcai lipcai closed this as completed Dec 4, 2021
@thomassiu thomassiu reopened this Dec 4, 2021
@zjr-mds
Copy link

zjr-mds commented Dec 4, 2021

Data analysis review checklist

Reviewer: <GITHUB_USERNAME>

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance of this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2 hours

Review Comments:

Well done guys! Thanks for the impressive project, I really enjoyed reviewing it:) Here are some detailed suggestions, and hope they can be helpful for any improvements!

  1. EDA Figure 2 summary part: Considering the timeline with this figure (crime evolution from 2016 to 2020), it might be inaccurate to say the steep increase of theft from vehicles in 2018 is related to the start of COVID. --- relevant text from EDA for potential edition: "This may be due to the start of Covid that causes a series of social problems."
  2. Prediction Report Figure 2 I would like to switch the x and y-axis variables so that it's easier for the audience to read the name of these neighbourhoods.
  3. I noticed that you have different versioned predict report, EDA (.ipynb, .Rmd, etc), and README; but it should be okay to remove some of the versions to avoid having too many repetitive files. For example, keeping only .html and .md version predict report, keep the most informative EDA (I have noticed that .ipynb EDA actually has additional and fancier visualization data than the .Rmd and .md files)
  4. Name of 'crime_vancouver_eda.py' might need a change; looks like it's the plot generating script instead of the EDA report and this could be confusing since the name is the same as the above EDA report's names (except for the suffix).
  5. Inconsistent tests of the functions: exception handling was implemented for pre-processed data script but missing for some of the functions in modelling script.
  6. Also, I'm not sure if you're still working on the Makefile since it's not the due date yet, but I was running into this issue below
make: Nothing to be done for `all'.$ make all
make: Nothing to be done for `all'.

when I run make all from the terminal. I had the same error before and it was fixed after I correct the indentation. Just a reminder to double-check this.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@zzhzoe
Copy link

zzhzoe commented Dec 4, 2021

Data analysis review checklist

Reviewer: <@zzhzoe>

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2 hours

Review Comments:

Very well done guys, the research topic is well introduced and your report had a clear structure to follow along. I was very engaged reading your project overall. Please find my comments below.
#Strength:

  1. Great presentation of your data and explanation of your model to demonstrate that they are tailoring to your goal, which is to predict types of crimes based on neighborhood location and time.
  2. A good variety of data visualization was carefully chosen to clearly deliver the takeaway from the data analysis.

#Suggestions:

  1. Your conda environment is good but when I ran the Makefile, the following error message appears.
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
Execution halted.
make: *** [src/crime_vancouver_eda.md] Error 1
  1. I would suggest you add tests for the script files. It is a small but important step before you run the code.
  2. It would be more clear if you can add in the EDA section some information on which dataset and what characteristics you used to produce the EDA visualizations. Really great EDA section but just not enough information on the dataset used.
  3. I would suggest you organize your repo folders more. The current layout is a bit confusing with different versions of files unlabeled. I would suggest only keeping the necessary files that can be used to trace the EDA presented in the report.
  4. It would be helpful if you can disclose your Makefile code in the usage section, It's not a red flag by any means, but just a nice-to-have quick improvement. Same as the dataset link can be add in the report data section.
  5. Add npm install -g vega vega-cli vega-lite canvas to environment instruction. Because different user may encounter JSON decoder error. My group has this problem for one of reviewer in our group.

Overall, very well done! Such an interesting project and you clearly delivered well. Minor suggestions and lots of good things to learn from.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@RamiroMejia
Copy link

RamiroMejia commented Dec 10, 2021

Thank you for your comments!

We really appreciate your feedback. We made the following changes regarding your comments:

  1. Regarding Arijeet 's feedback on issue , we added test.py that contains tests for functions used in the script files. Here is the commit c4d25b3

  2. We added instructions to the README.md file on how to solve the pandoc version issue. The changes are here: cf275a3

  3. As per Li Cai's suggestion we reorganized the /src folder: f2c116e

  4. Regarding the issue, we moved all the notebooks to /raw folder: e8fd231

  5. All the files are in the repository now, regarding the issue, the Makefile was added: 71dcef7

@sy25wang
Copy link

Data analysis review checklist

Reviewer: @arijc76

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 2

Review Comments:

Nice work. I liked this analysis and it's unfortunate that the quality of the available data is not conducive to getting the desired results. I can think of the following as improvements on the work done so far:

  • Add tests for the script files.
  • Specify the version numbers for tidyverser and knitr (Based on personal experience, the version number will be useful for any user with a older version to diagnose any potential issues in running the analysis).
  • I was unable to run the automation script to reproduce the analysis. Got the below error after following the instructions to install the conda environment and run the script to execute the data analysis pipeline.
Error: pandoc version 1.12.3 or higher is required and was not found (see the help page ?rmarkdown::pandoc_available).
Execution halted

As per stackoverflow, this error ccould be solved by inserting the below code in the R Script prior to the render command. You can investigate this further and incorporate any changes in the installation or data analysis pipeline running instructions.

Sys.setenv(RSTUDIO_PANDOC="--- insert directory here ---")

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Hi Arijeet,

Thank you for your review! We improved our project by the following

  • Tests have been added for scripts. Please refer to /src/scripts/tests.py
  • Environment file has been updated to reflect the latest required packages (including version requirements)
  • Please refer to README instructions on how to handle pandoc error

If you have more questions or concerns, please kindly let us know.

@sy25wang
Copy link

Reviewer: @lipcai

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing: 1.5

Review Comments:

Good job! The structure of the repo is clean and organized. The research topic is interesting! the scripts and the report are very well-structured. Please find my comments in the following. #Points being done well:

  1. The model could be very useful for predicting the size of the forest fires and could potentially guide the rescue forces in the realistic context as needed.
  2. The data visualizations from the EDA part is formatted beautifully. The scripts, EDA, and coding are well designed and fully described.
  3. The source code was well separated into meaningful functions / modules.

#Points could be improved:

  1. I think you make the README file pretty clear so I can reproduce the entire data analysis. I wolud suggest to add a flow chart / workflow diagram in README so the readers can have a better understanding of the overall data flow with adequate graphical representation.
  2. You guys have a very detailed data EDA, but for those interested in the raw data, I would hope in the final report, Data Section can be added to give a summary about the dataset (e.g. the number of columns, the type or description of the columns and total observations) that is used for the analysis of this project.
  3. I think you guys still need to add tests for the script files. it is one of the metrics standard.
  4. In src folder, there are too many file in just the root, it would be better to seperate and arrange those file into more appropriate ones.
  5. As what we have been taught in 531 (data visuaiazation), there could add a narrative so that it will be easy for others to follow along with what you have done here .I am sorry, this one is picky(I really can't find another one but it demands at least 5-point constructive feedback)

Again, great work! It's really hard for me to pick out other points that need to be improved and I got some great ideas for our project after reading yours! Thank you! Linhan

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Hi @lipcai,

Thank you for your review! To your comments

  • You can find the flowchart from /src/flow-chart
  • We go through the data in details under data cleaning section. To avoid redundant information, we did not describe our dataset with much details. You can find detailed information about our dataset from our EDA report.
  • Tests are added for scripts and you can now find it from /src/scripts
  • We have reorganized our /src folder
  • We made minor changes to our visualizations.

Thanks again for your comment. Please let us know if you have more questions or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants