New Forum

Visit the new forum at http://godelsmarket.com/bb

Saturday, June 20, 2015

Automated Daily Stock Database Updates Using The R Statistics Project

I received a request from pcavatore several posts ago. pcavatore was interested in "database update automation via R script." He wanted to know "how to run a daily task to update prices in the database."

In this article we'll be using R and the RMySQL package to access and update a MySQL database with stock information from Yahoo! Finance. If you do not have R installed, you can grab a copy at http://www.r-project.org/.

(If you're interested in more articles like this, sign up for the Gödel's Market Newsletter. Each sign up is a sign of support. Topic suggestions are always welcome; and, there are currently two other suggestions in the works).

1. Install RMySQL

In the R console run the following:
install.packages("RMySQL")
 Alternatively, you can go to "Packages" at the top of the R UI. Click "Install packages(s)..." and choose "RMySQL".

2. Test the connection to your MySQL database by running the following. Change the user, password, and database name to match your own.
library("RMySQL")
mydb = dbConnect(MySQL(), user='root', password='password', dbname='stockdb', host='localhost')
Make sure to change the parameters to match your database details. Next run:
dbListTables(mydb)
You should see a print out of the tables in your database.

3. Download data from Yahoo using the following:
stockcsv <- read.csv("http://real-chart.finance.yahoo.com/table.csv?s=SPY&a=05&b=17&c=2015&d=5&e=18&f=2015&g=d&ignore=.csv")
Run the following to see the data:
stockcsv 
4. Insert downloaded data into your database. You'll use the dbGetQuery(mydb, query) function along with a query generated from the stockcsv data.

The following assumes you are storing your data in a table called "stock_data" and that the table has the necessary fields.
query = paste("INSERT INTO stock_data (symbol, date, open, high, low, close, volume, adj_close) VALUES('", stock_symbol, "', '", stockcsv$Date[1], "', ", stockcsv$Open[1], ", ", stockcsv$High[1], ", ", stockcsv$Low[1], ", ", stockcsv$Close[1], ", ", stockcsv$Volume[1], ", ", stockcsv$Adj.Close[1], ")", sep="")

dbGetQuery(mydb, query)
5. Generate url. Since we don't want to have to update the url every day, you can use the following to generate the Yahoo csv url for whatever happens to be the current date:
todaydate <- Sys.Date()
today_day = format(todaydate, "%d")
#yahoo months start with Jan = 0, i.e. typical_month_num - 1
today_month = toString(as.numeric(format(todaydate, "%m")) -1)
today_year = format(todaydate, "%Y")
url_create = paste("http://real-chart.finance.yahoo.com/table.csv?s=", stock_symbol, "&a=", today_month, "&b=", today_day, "&c=", today_year, "&d=", today_month,"&e=", today_day, "&f=", today_year, "&g=d&ignore=.csv", sep="")
6. The Script

Putting it all together we get something like the following script. Can also be seen at its github repo.
library("RMySQL")
mydb = dbConnect(MySQL(), user='root', password='password', dbname='stockdb', host='localhost')
#get today's date
todaydate <- Sys.Date()
today_day = format(todaydate, "%d")
#yahoo months start with Jan = 0, i.e. typical_month_num - 1
today_month = toString(as.numeric(format(todaydate, "%m")) -1)
today_year = format(todaydate, "%Y")
#stock symbol
stock_symbol = "SPY"
#creating the url
url_create = paste("http://real-chart.finance.yahoo.com/table.csv?s=", stock_symbol, "&a=", today_month, "&b=", today_day, "&c=", today_year, "&d=", today_month,"&e=", today_day, "&f=", today_year, "&g=d&ignore=.csv", sep="")
#print out url_create for sanity check
url_create
#download and read in the csv
stockcsv <- read.csv(url_create)
#create INSERT query
query = paste("INSERT INTO stock_data (symbol, date, open, high, low, close, volume, adj_close) VALUES('", stock_symbol, "', '", stockcsv$Date[1], "', ", stockcsv$Open[1], ", ", stockcsv$High[1], ", ", stockcsv$Low[1], ", ", stockcsv$Close[1], ", ", stockcsv$Volume[1], ", ", stockcsv$Adj.Close[1], ")", sep="") 
#run query to insert into database
dbGetQuery(mydb, query)
#disconnect from db
dbDisconnect(mydb)
7. Automation. Run the script from a cron job or Windows task once a day after the market closes and your database will be kept up to date!

If you enjoyed this article and would like to see more like it, consider signing up for the Gödel's Market Newsletter. Each sign up is a sign of support. Topic suggestions are always welcomed and always encouraged.

1 comment:

  1. Very interesting, thanks...as per my previous comment it would be great to be able to run some database maintenance for stock corporate actions like dividends and stock splits with an R script as well.

    Ie. you may want to check when a corporate action is due and re-download the entire price series for that stock.

    ReplyDelete