First Blog Post - Predicting S&P 500 Using Price Images

ptear · May 28, 2021, 3:49pm

Here is my first blog post: Peter’s Blog (ptear-blog.herokuapp.com).
Here are the notebooks too: Image Generating
CNN copying lesson 1

I watched Jeremy’s recent video on YouTube and decided I would start blogging. I am not using fastpages because I already made this blog in a previous course when I was learning Python.

Apologies if it is hard to read etc.

A couple of questions which I cover in the post:

What does untar_data do? In Jeremy’s cat example, it seems it just gives a pointer to where the cat images are stored. Is that a correct interpretation?
How would I easily make the validation set only data from a certain date? Since my data is time series data, I am introducing look-ahead bias by using a random sample.

I look forward to discussing things with people.

dhoa · May 28, 2021, 8:32pm

You can find the fastai documentation here: https://docs.fast.ai/

untar_data: External data | fastai . It download data from an URL and save it to the destination folder.
This is a great idea. For this task, I think you can take a look at fastai DataBlock tutorial: Data block tutorial | fastai . Then writting your own splitter Helper functions for processing data and basic transforms | fastai which return 2 index lists (one for training set and one for validation set)

Hope it helps

ptear · May 29, 2021, 5:53pm

That makes sense. So is the /‘images’ at the end telling it where to put the data?

Thanks for the pointers. I’ll try this out and let you know how it goes.

By the way, do you have any suggestions for my general blog writing?

dhoa · May 29, 2021, 7:05pm

I’m not considered my self anything near to a good writer . I’ve just skim through your blog and it’s quite good, I have some small comments here:

Maybe you can put more image or explanation (show your batch images, …) about the global point and put all your explanation details of where you were struggled in the notes and the end of the blog. With this the reader will not be disconcentrated to the big picture and can always find the details somewhere.

Hope it helps,

ptear · May 31, 2021, 2:00pm

UPDATE:

I have updated the model referring to lesson 2 and the links from @dhoa .

It was actually fairly easy to set up my own validation set which was not randomly selected. I noticed one of the splitter functions (GrandparentSplitter()) just needed you to put the image files in separate ‘train’ and ‘valid’ folders. I also noticed a labelling function (parent_label()) labelled based on the parent folder. So I set up the various folders: train/buy, train/sell, valid/buy, valid/sell.

It was a simple task from there on. The notebooks can be viewed here:
Image Generating and Setting Up Folders and Fitting Model.

The results were terrible. In the end worse than flipping a coin. This is maybe reassuring, given now we have taken out the look-ahead bias there was previously.