Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: GROUP 30: Stock Search Trend & Return Volatility Association Analysis #27

Open
1 task done
BooleanJulien opened this issue Dec 1, 2021 · 5 comments
Open
1 task done

Comments

@BooleanJulien
Copy link

BooleanJulien commented Dec 1, 2021

Submitting authors: Amir Shojakhani, Helin Wang, Julien Gordon

Repository: https://github.com/UBC-MDS/Stock-Price-Trend-Volatility-Analysis
Report link: https://github.com/UBC-MDS/Stock-Price-Trend-Volatility-Analysis/blob/main/doc/Stock_Price_Trend_Volatility_Analysis_report.md
Abstract/executive summary:

Investment firms are increasingly looking to data science and unusual data sources to provide informational advantages to bolster their portfolio strategies. In this project, we are investigating whether Google Trends data on stock ticker names can provide insight into return volatility**. Investors are often interested in understanding the volatility of stock returns. Some financial derivative trading strategies try to take advantage of changes in a stocks' volatility, as certain options are sensitive to changes in implied volatility. See a primer on option vega if you are interested! https://www.investopedia.com/terms/v/vega.asp

Consider this project a screening exercise for whether Google Trends could be useful in volatility-based trading strategies.

In order to assess the association between stock return volatility and search trend volatility, we analyse the standard deviation of weekly search trends and weekly returns for over 300 stocks in the S&P 500 over a one-year period from July 2020 to July 2021. We conduct a simple linear regression with a confidence level of 0.95 with the return volatility as the dependent variable and search trends volatility as the independent variable. Our null hypothesis is that there is no association between the two volatilities, with the alternative being that there is an association.

Ultimately, we find a significant coefficient of trend volatility and reject the null hypothesis in favour of the alternative. The R^2 value indicates that our simple model is explaining very little of the variation in return volatility. Moreover, the effect size seems to be fairly small in relation to the range of return volatility that we observe in the data. These caveats are to be expected considering we are using a very simple model to understand markets which contain lots of complexity. Nonetheless, this positive result is exciting and warrants future investigation into the use of Google Trends for Financial Analysis.

**Note that in statistical terms, the volatility is simply the standard deviation of returns. https://www.investopedia.com/terms/v/volatility.asp

Editor: @flor14
Reviewer: Steven Lio, Chaoron Wang, Wenjia Zhu, & Nico Van den Hooff

  • I agree to abide by MDS's Code of Conduct during the review process and in maintaining my package should it be accepted.
@nicovandenhooff
Copy link

nicovandenhooff commented Dec 1, 2021

Data analysis review checklist

Reviewer: @nicovandenhooff

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

45 minutes

Review Comments:

Overall I really liked your project, here are some things I specifically liked:

  • The repo is very organized and easy to follow
  • The EDA is well done and has a nice balance of charts and discussion
  • The flowchart is nice and clear/easy to follow

A couple of minor suggestions below:

  • I think some of the scripts in the /src directory could be merged/combined into one .py file. For example stocks-trends-merge.py, stocks-price-merge.py, price_trend_merger.py could be merged into one document where the code happens sequentially.
  • For the above, you could also consider moving the code into separate functions in a merged script, and then call these 3 functions from a main function (this improves code modularity).
  • You could consider moving the eda files and the pandas profiling report could be moved to a separate directory called /eda, which would then allow /src to only contain python and R files.
  • I would add a list of your names in our report (as authors)
  • I would suggest adding a section for License in your README, it could say something like "The source code for the site is licensed under the MIT license"
  • In the data folder, you could further separate these files by /processed and /raw to help with the overall project structure
  • Minor but there is are a few files in the repo called to_be_deleted.txt, assuming you can delete these now

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@showcy
Copy link

showcy commented Dec 1, 2021

Data analysis review checklist

Reviewer: @showcy

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

30 minutes

Review Comments:

  • I like your research interest especially during this sad week which all my portfolio are red now. :(
  • Your explanation of background, methods, and the EDA plots are clearly and easy to understand.
  • Better to include your license and reference in the README file.
  • Better to include the file name in the execution script directly so that we do not need to modify them manually but be able to run them one by one directly. Right now, I need to run the codes, check the respective files to see what the output file name is, and replace the execution script with corresponding file name to run them successfully.
  • Better to minimize the number of execution scripts the readers need to run. For example, the scripts for transformation and calculation of weekly volatility might be combined to one single data processing script.
  • The report does not include a list of authors with their affiliations.
  • The EDA plots might be moved to results folder.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@stevenlio88
Copy link

Reviewer: @stevenlio88

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies?
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelines: Does the code adhere to well-known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robustness?

Reproducibility

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? -Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance of this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging?

Estimated hours spent reviewing:

30 minutes

Review Comments:

Overall the report is very well written and it is clear and in a good logical flow. Explanations are very thorough. Regarding the repository, the data folder should contain the raw/processed data for the analysis. But result tables were also included in the same said folder which was supposed to be included in the results folder. The instructions for running the necessary scripts are using generic variables (instead of relative path, actual data file name). The visualization in the final report can be improved by increasing the font size, margin, and axis ranges.

Recommendations on model:

It would be interesting to explore some more in-depth models and explore time series data analysis, correlated time series analysis, etc. A plot of the time-series data + the predicted value could be useful to be looked at to assess model performance. Also, the time series may experience a delay effect (search first then the price goes volatile or the price goes volatile cause of some news then searches) this may contribute to some delay effect or any seasonal effect may violate the linear assumption in the model used.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@PANDASANG1231
Copy link

PANDASANG1231 commented Dec 4, 2021

Data analysis review checklist

Reviewer: PANDASANG1231 

Conflict of interest

  • As the reviewer I confirm that I have no conflicts of interest for me to review this work.

Code of Conduct

General checks

  • Repository: Is the source code for this data analysis available? Is the repository well organized and easy to navigate?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

Documentation

  • Installation instructions: Is there a clearly stated list of dependencies? 
  • Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
  • Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Code quality

  • Readability: Are scripts, functions, objects, etc., well named? Is it relatively easy to understand the code?
  • Style guidelides: Does the code adhere to well known language style guides?
  • Modularity: Is the code suitably abstracted into scripts and functions?
  • Tests: Are there automated tests or manual steps described so that the function of the software can be verified? Are they of sufficient quality to ensure software robsutness?

Reproducibility

 

  • Data: Is the raw data archived somewhere? Is it accessible?
  • Computational methods: Is all the source code required for the data analysis available?
  • Conditions: Is there a record of the necessary conditions (software dependencies) needed to reproduce the analysis? Does there exist an easy way to obtain the computational environment needed to reproduce the analysis?
  • Automation: Can someone other than the authors easily reproduce the entire data analysis?

Analysis report

  • Authors: Does the report include a list of authors with their affiliations?
  • What is the question: Do the authors clearly state the research question being asked?
  • Importance: Do the authors clearly state the importance for this research question?
  • Background: Do the authors provide sufficient background information so that readers can understand the report?
  • Methods: Do the authors clearly describe and justify the methodology used in the data analysis? Do the authors communicate any assumptions or limitations of their methodologies?
  • Results: Do the authors clearly communicate their findings through writing, tables and figures?
  • Conclusions: Are the conclusions presented by the authors correct? 
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
  • Writing quality: Is the writing of good quality, concise, engaging? 

Estimated hours spent reviewing:  45 minutes

Review Comments: 

   -  It is really interesting that you choose this topic, I think doing volatility analysis is extremely useful because the price of options and some stock alpha strategies will be related to volatility.
   -  The EDA report and plots are clear and easy to follow. The whole report is in a good structure.

$ $
   - The script can be combined somehow so that people can reproduce it easier.
   -  I think it will be better if you do more analysis on time series. Because although weekly is a good time period, still it will be better if we see the trend.  For example, how is the relationship when the time window is an hour, 1-day, 5-day, a month? It will be even better if you put time series on the x-axis and show us the trends in one single plot.
   -  Also, you can explore how google trends in time $t$ is related to the return in time $t+1$, $t+2$. Like answering a question that is google trend a sign in advance or behind. 
   -  Although google trends are a good angle to look at this topic, logically google trends might not be a signal in advance regarding investment. 
   -  I think return volatility is not the final thing people want to know, maybe bridge return volatility to the options price will make the conclusion and insight fancier. And the theory that return volatility is related to option price actually guaranteed it, it is just better to show that to non-tech people. I mean it is a simple and low-risk step but makes the conclusion even better.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

@BooleanJulien
Copy link
Author

BooleanJulien commented Dec 10, 2021

Thank you for all of your comments, review team! We appreciated, agreed with, and implemented many of your comments, but we will highlight a few examples of implementation for the purposes of the assignment deliverables.

From @nicovandenhooff

- I would suggest adding a section for License in your README, it could say something like "The source code for the site is licensed under the MIT license"
- In the data folder, you could further separate these files by /processed and /raw to help with the overall project structure
- Minor but there is are a few files in the repo called to_be_deleted.txt, assuming you can delete these now

Our implementation:

From @stevenlio88

But result tables were also included in the same said folder which was supposed to be included in the results folder. 

Our implementation

Moving regression results to the results folder
UBC-MDS/Stock-Price-Trend-Volatility-Analysis@0daad56

From Eric, our TA

Suggested adding/fixing figure captions

Our implementation

We addressed this in a few commits

UBC-MDS/Stock-Price-Trend-Volatility-Analysis@13df4e7
UBC-MDS/Stock-Price-Trend-Volatility-Analysis@829c7ca
UBC-MDS/Stock-Price-Trend-Volatility-Analysis@13df4e7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants