About iMaterialist Challenge (Fashion) at FGVC5, Kaggle Competition

Hi All,

This will be my first participation in Kaggle so any help is appreciated.
Dataset provided for this competition is in JSON format which contains the URL of image and image ID. I am able to change it to pandas dataframe like below image.

But for using,

data = ImageClassifierData

do i actually need to download all images from URL’s provided in JOSN file to train/valid folders respectively or is there any other way to pass the URL’s directly to ImageClassifierData.
@jeremy

1 Like

IMO, you will need to download the images yourself before using ImageClassifierData from_csv method. The Kaggle competition has a number of kernels where folks have posted their script for downloading. I used them to download the images for the competition couple of weeks back.

Thanks @ramesh , but after downloading the images the file with labels will be in .josn format. Can we use ImageClassifierData.from_csv to read .json label file or do we need to convert it into .csv format?

Load the JSON file using pandas pd.read_json. Then save the file to csv (make sure multi-labels are space separated. Now, you can load it using ImageClassifierData.from_csv

1 Like

Hello @Ankit89 ,

I used the approach from https://www.kaggle.com/nlecoy/imaterialist-downloader-util/ to download the files to the train/test/valid directories and generated an csv file from the data supplied in the json file based on https://www.kaggle.com/anqitu/for-starter-json-to-multilabel-in-24-seconds (be sure to separate the labels like @ramesh said above).

With that I can start the learning-rate finder and the training with F1 score (see F1 measure not improving with epochs).

Currently I’m figuring out how to generate the multi-class labels for the test data.
Maybe somebody has a tipp to resources for the fast.ai library and multi-class image classification (I’m still searching the forum and the examples).

Best regards
Michael

2 Likes

For this challenge I am unable to import resent34, it is not allowing the URL to be loaded: URLError: <urlopen error [Errno -2] Name or service not known>’. How can I overcome this?

Do you work with the latest fast.ai GitHub repo?

I just installed fast.ai using “!pip install fastai” according the the GitHub of fast.ai as I don’t know how to access the terminal on Kaggle,if thats even possible, for the other installation route.

I then used “!pip install --upgrade fast.ai” however it threw “Could not find a version that satisfies the requirement fast.ai (from versions: ). No matching distribution found for fast.ai”.

Just as an addition even when I do:
from keras.applications.resnet50 import ResNet50
resnet1 = ResNet50(weights=‘imagenet’)
learn = ConvLearner.pretrained(resnet1, data, precompute=True)

I get :Exception: URL fetch failure on https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5: None – [Errno -2] Name or service not known

Can someone please help me how to import these models onto the kernal

You are mixing up libraries here. You can’t use Keras ResNet50 with Fastai’s ConvLearner. I would recommend you follow and try to replicate using Course notebooks as reference first to get familiar with the library and usage before applying to new problems and datasets.

1 Like

I just trained a basic resnet but my score is really bad after submission and my F1 score is also not really improving during the training.

It was also tricky with my basic pandas knowledge to get the submission file but I somehow managed it (so this was good training).

In my code I use some kind of threshold after I made the preds. Is this the way to go?
Interestingly, the lower the threshold the higher my score.

I guess I have to dig deeper into the multi-label classification approach.

@MicPie I’m new to Deep Learning as well and tried the Fashion competition after studying Fast.AI’s lesson 2 and 3 but I had similar trouble getting a F1 score about 0.10. My initial thought is that it might be related to the fact that the labels are numerical vs text. Maybe when the fast.ai library is feeding the numerical labels into the neural net it affects the output. Going to look at the source code in more detail to try and figure it out. I figure most of the information about the competition will be released after the competition is over so for the time being we will likely have to figure it out ourselves.

I thought about the labels too, but the final layer in the network looks fine which is based on the feed labels (you can print the network architecture info with „learn“).
I will also have a closer look at the examples from the validation set.
There must be a fix to that problem. :wink:

You’re right about the labels. Also the labels are strings anyway so the model should interpret them similarly to the text labels in the fast.ai example workbook. @MicPie the other thing I was thinking about was that each of the classes might not have enough samples in them to enable the classifier to learn individual labels effectively so I am looking at more data augmentations. On the Kaggle competition kernels/discussion people mentioned that heavy data augmentation was required for getting even reasonable results. One guy did get a F1 score of around .50 with Resnet50 so it is definitely possible.

@radek Congrats to Radek for winning the Kaggle competition! Gratulacje!

Thank you :slight_smile: :bowing_man:

5 Likes

Fabulous result @radek, congrats! And nice to see 1cycle in the mix.

It’s not quite reaching the heights of @radek with iMaterialist (Fashion) but I was able to do well in the related competitions. These are part of the Fine Grained Visual Classification section of the Computer Vision and Pattern Recognition conference held in the US this week.

2nd in the iFood 2018 challenge, classifying 200 food dish classes.
2nd in the iFungi 2018 challenge, classifying 1500 fungi 1500 species.
Top 5% (18/436) in the iMaterialist (Furniture) 2018 challenge, with 128 furniture classes.

I relay this here not to blow my own trumpet (OK maybe a little bit), but rather to incentivise others to use the competitive environment of Kaggle to advance their learning.

At the beginning of 2018 I’d touched neither python or machine learning, and just months later it is possible to gain prize winning positions against professionals. I can’t sing the praises of fast.ai and kaggle enough. My advice is to find competitions that interest you and use these as scaffolding for further training (sic).

4 Likes

@digitalspecialists Could you post in separate topic what you did this year? What courses/ competitions, a little about your background etc?

I am new to fastai. after lesson 2 and 3, I am working on the challenge as well.
with resnet34, I got really bad f1 score as @buzz_aldi around 0.10 after 30 iterations.
I tried 1cycle as well but it didn’t show much improvement on the f1 score.
can anyone who has worked on the competition share some experience please?
thanks

Does you’re F1 score function work?
Check the code here: F1 measure not improving with epochs

Otherwise I can share my (messy) notebook with you, if you want.

Best regards
Michael

1 Like

Michael,
I used exactly the wrong f1 measure as the one in the post you shared. I will start to train with the new f1 measure. thank you, will let you know how it goes!