Forecasting SP500 stocks with XGBoost and Python Part 1: Sourcing the data

Cover image (source)

Sourcing the data

Sourcing the data is as much if not more important than building the model itself. The old adage “garbage in garbage out” rings true in any data-related work, even more so in ML.

from pandas_datareader import data as web
msft_stocks = web.DataReader('MSFT', 'yahoo', '2017-01-01', '2022-03-31')

The code

The script I wrote to download the history for all tickers is slightly more involved. The keyword here is slightly, because the script doesn’t do much more than run that second line of code inside a loop of tickers.

Functions to download and combine stocks data into a s single export



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
José Fernando Costa

José Fernando Costa

I write about data science to help other people who might come across the same problems