(Note: Sign up for the new Gödel’s Market Newsletter if you find the following useful and want more articles on useful information, including interviews and research.)
Some may say that learning to use an RDBMS takes too much effort. But, depending on how you’re doing your data storage right now, an RDBMS could easily help you organize better (aren’t those CSV/TXT/Excel files piling up?), give you results faster and even give you greater insight into your data.
For this article we’re going to use MySQL. It is among the most popular RDBMS’ out there, and it is open source and free. Many sites use it, including Facebook, which serves over 1 billion people tons and tons of data.
Steps:
- You’ll want to download a copy at the following link:
http://dev.mysql.com/downloads/mysql/ (make sure to select the download appropriate for the operating system you are using).
- Installation should be fairly automated. Just click next and make sure you don’t install any “Ask Jeeves” adware…Make sure you do a full install which should include the MySQL Workbench. This will be our method for working with the MySQL Server.
- Next, grab some sample data from Yahoo. Go to your favorite stock ticker, and select “Historical Data” on the left hand side. Then go down to the bottom of the page and click on “Download to Spreadsheet”. Here’s a link for MSFT: http://finance.yahoo.com/q/hp?s=MSFT+Historical+Prices This will download a file called “table.csv” which is exactly what you want. If you open it up you’ll see your stock’s data (date, open, high, low, etc).
- Now open up MySQL Workbench CE 6.* (I’m using 6.2, but they should all be very similar). You should have a database called “local”. Double click on that. If you need to setup a database you’ll want to create one locally. There will be a “root” user which should be fine for our purposes at the moment. Assign a password and connect.
- Now that you’re connected to the database, take a look at the left side. You’ll see something called “Schemas”. We need to create a new one for stocks. You can either right click below the existing Schemas and select “Create Schema” or you can go up to the top bar and click the icon “Create a new schema in the connected server.”
- For your new schema, name it something like “stocks”. Use the default “Collation” and click “Apply”. Click “Apply” again on the creation script and again if needed. Then click “Finish” and you should have a new schema to the left.
- Click on the drop down arrow to the left of your new schema. Right click on “Tables” and choose “Create Table…” For table name choose something like “historicaldata”. Go down to Column Name and double click below it to add a new column. You’ll want a column for every row in your “table.csv” file. So one column for “Date”, “Open”, “High”, “Low”, “Close”, “Volume”, “Adj Close”.
For Datatypes, you’ll want “DATETIME” for “Date”. For “Volume” you’ll want to use “INT”. For the others you’ll want to use “DECIMAL(10,3)”. You can use other data types, but these work well enough for what we’re doing.
- Now right click on your new table “historicaldata” under “Tables” within your newly created schema on the left hand side. You’ll want to choose “Select Rows”. This will bring up all the data you currently have in the table, which should be none. Go to the middle portion where you’ll see some icons following “Result Grid”. Go to the “Export/Import” tag, and click on the “Import records from an external file” icon.
This will let you choose a “CSV” file for import. Before you choose “table.csv” open it up and remove the first line with the headers (i.e. Date, Open, High, etc etc). If you don’t, this will also get imported (there are ways around this, but I’m keeping this intro simple). Now you can choose your “table.csv” file for import. It’ll take a little bit to import and then should load up the data on screen.
- Now you can run your first query. Go up to the icon with “SQL” and a little plus on it. It should be right below “File” at the top. This should open a new tab for SQL queries. Type the following and press “CTRL” + “ENTER” or click the “Execute” icon (a little lightning bolt).
SELECT * FROM stocks.historicaldata
This should return everything from your table. Not very useful, but it’s a start. Next try something like:
SELECT * FROM stocks.historicaldata WHERE date > ‘2015-03-01’
Try:
SELECT * FROM stocks.historicaldata WHERE open > close
I’m sure you can come up with some more on your own! There will definitely be more on this in the future.
I’ll be posting more on how to continue on from here. In the meantime, sign up for the Gödel’s Market Newsletter. Information not available on the blog is sent out through the newsletter. Click here to sign up!
Best of luck.