Stanford cars fastai v3

I am trying to get through lesson 1 using Stanford cars dataset. Used URLs.CARS to download data using untar_data(URLs.CARS). Have anyone tried this dataset. The dataset and the annotations used doesn’t match . Stanford Cars notebook

1 Like

boredmgr@ Did you ever resolve this issue? I decided to try training on this dataset as homework for lesson1. It appears that the filenames in cars_annos.mat don’t match. For example the label for corresponding to “car_ims/008143.jpg” is “FIAT 500 Convertible 2012” in cars_annos.mat … but I looked at the file “cars_train/08143.jpg” and I’m no expert but I’m pretty sure this is Hummer? https://pasteboard.co/IXRQZTA.png

I haven’t checked this dataset, so it’s possible I messed something up! If you do any more investigation, let us know what you find.

I’m fairly certain the labels don’t correspond correctly. It’s certainly possible I was reading the dataset wrong, but instead of fiddling around with it too much, I ended up getting around this by just downloading the dataset from the original webpage here

I ended up finding someone else’s notebook using the cars dataset.
They have some code to correctly extract it here

For reference, I have a notebook with the exact commands I used to download and extract the dataset here

And the original notebook where I was trying to verify the URLs.CARS dataset didn’t correspond (much messier) here

2 Likes

This is excellent Thomas. Thank you so much for posting your solution, especially the data extraction segment.

Tried the same thing and I realized that the stanford car dataset home page offers two different options:

  1. Two image datasets (train and test) with their own labels as .mat file each
  1. A consolidated set of images (train and test) with its consolidated .mat file

It appears that the data downloaded at with URLs.CARS downloads the train and test folders from the first set but then uses the consolidated cars_annos.mat.mat file instead of two .mat files.

To make it work, you must also download the proper .mat files by using untar_data with the url https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.

I left a notebook with the full code here

1 Like

What seems to happen is that the annotation file (cars_annos.mat) that comes with the two folders is not the correct annotation file. It is the one for the consolidated dataset where all images (train and test) are in the same folder. To download the correct annotation file, must get the two files from https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz

I posted a notebook with the code higher in this thread.

Hope it helps.

1 Like

Thanks very much for your detailed answer. Yet I failed to download this dataset by wget command on my Linux server. Since the response I got was HTTP request sent, awaiting response... 404 Not Found, I reckon that the author had removed the files. I’ve also tried to download from Kaggle, however, I could not find the “devkit”. I would appreciate it if you can provide other ways for downloading.

Hi Xuan Chen, you may be right. Worth trying to look at Jonathan Krause’s main page at https://ai.stanford.edu/~jkrause, as there is a long list of datasets there. May just have changes the name