Stanford cars fastai v3

boredmgr · June 29, 2019, 2:48pm

I am trying to get through lesson 1 using Stanford cars dataset. Used URLs.CARS to download data using untar_data(URLs.CARS). Have anyone tried this dataset. The dataset and the annotations used doesn’t match . Stanford Cars notebook

thomassw66 · March 6, 2020, 5:35pm

boredmgr@ Did you ever resolve this issue? I decided to try training on this dataset as homework for lesson1. It appears that the filenames in cars_annos.mat don’t match. For example the label for corresponding to “car_ims/008143.jpg” is “FIAT 500 Convertible 2012” in cars_annos.mat … but I looked at the file “cars_train/08143.jpg” and I’m no expert but I’m pretty sure this is Hummer? https://pasteboard.co/IXRQZTA.png

jeremy · March 6, 2020, 6:04pm

I haven’t checked this dataset, so it’s possible I messed something up! If you do any more investigation, let us know what you find.

thomassw66 · March 9, 2020, 2:24pm

I’m fairly certain the labels don’t correspond correctly. It’s certainly possible I was reading the dataset wrong, but instead of fiddling around with it too much, I ended up getting around this by just downloading the dataset from the original webpage here

I ended up finding someone else’s notebook using the cars dataset.
They have some code to correctly extract it here

For reference, I have a notebook with the exact commands I used to download and extract the dataset here

And the original notebook where I was trying to verify the URLs.CARS dataset didn’t correspond (much messier) here

sb01 · April 2, 2020, 1:24pm

This is excellent Thomas. Thank you so much for posting your solution, especially the data extraction segment.

vtecftwy · April 10, 2021, 11:38am

Tried the same thing and I realized that the stanford car dataset home page offers two different options:

Two image datasets (train and test) with their own labels as .mat file each

train images at http://imagenet.stanford.edu/internal/car196/cars_train.tgz
test images at http://imagenet.stanford.edu/internal/car196/cars_test.tgz
metadata (labels and bounding box info) at https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz

A consolidated set of images (train and test) with its consolidated .mat file

all images at http://imagenet.stanford.edu/internal/car196/car_ims.tgz
metadata (labels, …) at http://imagenet.stanford.edu/internal/car196/cars_annos.mat

It appears that the data downloaded at with URLs.CARS downloads the train and test folders from the first set but then uses the consolidated cars_annos.mat.mat file instead of two .mat files.

To make it work, you must also download the proper .mat files by using untar_data with the url https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.

I left a notebook with the full code here

vtecftwy · April 10, 2021, 11:43am

What seems to happen is that the annotation file (cars_annos.mat) that comes with the two folders is not the correct annotation file. It is the one for the consolidated dataset where all images (train and test) are in the same folder. To download the correct annotation file, must get the two files from https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz

I posted a notebook with the code higher in this thread.

Hope it helps.

novacx0222 · July 4, 2023, 3:43pm

Thanks very much for your detailed answer. Yet I failed to download this dataset by wget command on my Linux server. Since the response I got was HTTP request sent, awaiting response... 404 Not Found, I reckon that the author had removed the files. I’ve also tried to download from Kaggle, however, I could not find the “devkit”. I would appreciate it if you can provide other ways for downloading.

vtecftwy · July 5, 2023, 2:02pm

Hi Xuan Chen, you may be right. Worth trying to look at Jonathan Krause’s main page at https://ai.stanford.edu/~jkrause, as there is a long list of datasets there. May just have changes the name