I am trying to get through lesson 1 using Stanford cars dataset. Used URLs.CARS to download data using untar_data(URLs.CARS). Have anyone tried this dataset. The dataset and the annotations used doesn’t match . Stanford Cars notebook
boredmgr@ Did you ever resolve this issue? I decided to try training on this dataset as homework for lesson1. It appears that the filenames in cars_annos.mat don’t match. For example the label for corresponding to “car_ims/008143.jpg” is “FIAT 500 Convertible 2012” in cars_annos.mat … but I looked at the file “cars_train/08143.jpg” and I’m no expert but I’m pretty sure this is Hummer? https://pasteboard.co/IXRQZTA.png
I haven’t checked this dataset, so it’s possible I messed something up! If you do any more investigation, let us know what you find.
I’m fairly certain the labels don’t correspond correctly. It’s certainly possible I was reading the dataset wrong, but instead of fiddling around with it too much, I ended up getting around this by just downloading the dataset from the original webpage here
I ended up finding someone else’s notebook using the cars dataset.
They have some code to correctly extract it here
For reference, I have a notebook with the exact commands I used to download and extract the dataset here
And the original notebook where I was trying to verify the URLs.CARS dataset didn’t correspond (much messier) here
This is excellent Thomas. Thank you so much for posting your solution, especially the data extraction segment.
Tried the same thing and I realized that the stanford car dataset home page offers two different options:
- Two image datasets (train and test) with their own labels as
- train images at http://imagenet.stanford.edu/internal/car196/cars_train.tgz
- test images at http://imagenet.stanford.edu/internal/car196/cars_test.tgz
- metadata (labels and bounding box info) at https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz
- A consolidated set of images (train and test) with its consolidated
- all images at http://imagenet.stanford.edu/internal/car196/car_ims.tgz
- metadata (labels, …) at http://imagenet.stanford.edu/internal/car196/cars_annos.mat
It appears that the data downloaded at with
URLs.CARS downloads the
test folders from the first set but then uses the consolidated
cars_annos.mat.mat file instead of two
To make it work, you must also download the proper
.mat files by using
untar_data with the url
I left a notebook with the full code here
What seems to happen is that the annotation file (
cars_annos.mat) that comes with the two folders is not the correct annotation file. It is the one for the consolidated dataset where all images (train and test) are in the same folder. To download the correct annotation file, must get the two files from https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz
I posted a notebook with the code higher in this thread.
Hope it helps.