Lesson 4 In-Class Discussion

jamesrequa · November 22, 2017, 7:06pm

So basically the idea is that any increase % you make to minority classes during training you would want to decrease by the same % their predicted probabilities at the end? What if its a binary classification, in that case does the same rule apply?

jeremy · November 22, 2017, 7:17pm

I’m not sure if it will be exactly decreasing by the same % at the end - you may have to experiment with a few amounts to see which works best. It’s not something I’ve studied closely, and don’t know of anyone else who has either, but it’s clearly an important issue!

jamesrequa · November 22, 2017, 7:25pm

Yea I’m currently working on this Kaggle comp and the difficulty is that it has an imbalanced dataset where only approx 10% of the images in the training set have a “threat”. To make it more difficult the images have a lot of noise and the threats are also only visible from certain angles and classifications need to be segmented by body zones

I tried increasing threat images in the training set but like you mentioned that just ended up increasing the probabilities so a lot of false positives. Its almost like the inherent rarity of the threat is a good thing because it naturally makes the model more selective in what it classifies as a threat.

jeremy · November 22, 2017, 9:19pm

Try using the over-sampling I suggested and then try rescaling the probabilities by a few different values to find the best amount. Hopefully it’ll be a little improvement.

jeremy · November 22, 2017, 9:28pm

Elfayoumi · November 23, 2017, 12:32am

Hello
After updating fastai, I am getting error, nothing changed except pulling the new code…

ImportError Traceback (most recent call last)
in ()
----> 1 from fastai.structured import *
2 from fastai.column_data import *
3 np.set_printoptions(threshold=50, edgeitems=20)
4
5 PATH=‘data/rossman/’

~/workspace/fastai/courses/dl1/fastai/structured.py in ()
----> 1 from .imports import *
2
3 from sklearn_pandas import DataFrameMapper
4 from sklearn.preprocessing import LabelEncoder, Imputer, StandardScaler
5 from pandas.api.types import is_string_dtype, is_numeric_dtype

~/workspace/fastai/courses/dl1/fastai/imports.py in ()
2 import PIL, os, numpy as np, math, collections, threading, json, bcolz, random, scipy, cv2
3 import random, pandas as pd, pickle, sys, itertools, string, sys, re, datetime, time
----> 4 import seaborn as sns, matplotlib
5 import IPython, graphviz, sklearn_pandas, sklearn, warnings
6 from abc import abstractmethod

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/seaborn/init.py in ()
8 from .palettes import *
9 from .regression import *
—> 10 from .categorical import *
11 from .distributions import *
12 from .timeseries import *

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/seaborn/categorical.py in ()
15
16 from . import utils
—> 17 from .utils import iqr, categorical_order, remove_na
18 from .algorithms import bootstrap
19 from .palettes import color_palette, husl_palette, light_palette, dark_palette

ImportError: cannot import name ‘remove_na’

jeremy · November 23, 2017, 1:35am

It’s saying you’ve got a problem in the seaborn package. Try updating it.

Elfayoumi · November 23, 2017, 2:29am

Thanks Jeremy, it works now…

arjunrajkumar · November 23, 2017, 3:26am

Hey… Had a question.

How do you decide how many fully connected layers to add? For e.g. in the dog breeds we could just have ONE fully connected layer that converts the input to the number of classes i.e. 120, instead of two… Another related question, when we add more than one additional layer, we can specify how big we want the layer to be. Is there any method to choosing how big the layer should be? How do we arrive at this number?

jeremy · November 23, 2017, 4:16am

Adding two layers to CNNs for images pretty much always works for me. I haven’t had much if any need to change the number or size of layers when fine tuning a network from the fastai defaults. If anyone finds situations that need to be much different, let me know!

For structured data, I have much less confidence in knowing the right answer to this question. I still have to experiment quite a bit! But the amounts shown in Rossmann are a good start generally.

groverpr · November 24, 2017, 1:20am

Can you explain why in binary variables, we don’t want embedding and use that variable as continuous?

–edit–
got answer. embeddings doesn’t seem to improve anything for binary variables (as Jeremy answered)

sermakarevich · November 24, 2017, 9:59am

Maybe this article can help. On page 144 in 4.1 Prior Correction you can find how to change intercept in logistic regression to be able to restore original probability after sampling. It is possible to mathematically transform this formula to use with any model (not only logit).

ange · November 24, 2017, 3:44pm

Hi,

Does anyone know how the data is supposed to be set up for the courses/dl1/lesson3-rossman notebook ?

nbuser@jupyter:~/fastai/courses/dl1/data/rossman$
nbuser@jupyter:~/fastai/courses/dl1/data/rossman$ ls
googletrend      rossmann.tgz           state_names.csv  store_states.csv  train.csv  weather.csv
googletrend.csv  sample_submission.csv  store.csv        test.csv          weather
nbuser@jupyter:~/fastai/courses/dl1/data/rossman$ ls -al
total 48076
drwxr-xr-x 4 nbuser nbuser     6144 Nov 24 15:30 .
drwxr-xr-x 3 nbuser nbuser     6144 Nov 24 15:13 ..
drwxr-xr-x 2 nbuser nbuser     6144 Nov 24 15:29 googletrend
-rw-r--r-- 1 nbuser nbuser    86605 Jan 11  2017 googletrend.csv
-rw-r--r-- 1 nbuser nbuser  7730448 Nov 24 15:14 rossmann.tgz
-rw-r--r-- 1 nbuser nbuser   317611 Sep 29  2015 sample_submission.csv
-rw-r--r-- 1 nbuser nbuser      265 Jan 11  2017 state_names.csv
-rw-r--r-- 1 nbuser nbuser    45010 Sep 29  2015 store.csv
-rw-r--r-- 1 nbuser nbuser     9051 Jan  6  2017 store_states.csv
-rw-r--r-- 1 nbuser nbuser  1427425 Sep 29  2015 test.csv
-rw-r--r-- 1 nbuser nbuser 38057952 Sep 29  2015 train.csv
drwxr-xr-x 2 nbuser nbuser     6144 Nov 24 15:30 weather
-rw-r--r-- 1 nbuser nbuser  1518814 Jan 11  2017 weather.csv
nbuser@jupyter:~/fastai/courses/dl1/data/rossman$

Reading the code for the concat_csvs function, am i supposed to create a directory called googletrend and then copy the googletrend.csv file to it or do i copy all of the csv files to the googletrend directory ?

jeremy · November 24, 2017, 4:21pm

You don’t need to run the commented out concat_csv lines - they’ve already been run for you, and googletrend.csv was created from them and is part of what you downloaded.

ange · November 24, 2017, 8:33pm

thanks @jeremy

yinterian · November 25, 2017, 1:32am

It would also depend on how much data you have to learn from.

miguel_perez · November 25, 2017, 4:20pm

Experiencing very frequent freezes and slow response when trying to work with Crestle.

I git pull always to last version but if I understand correctly in Crestle nothing more to do, is that right? (We can not conda env update neither source fastai activate, all is taken care of by default, isn’t it? I connect from Spain (in case location maters), have a quite good internet symetric connection.

No clue on how to make it run smoother / with fewer freezes, @anurag, if I am skiping some important step please let me know. (Problem happens with different notebooks, with or without gpu)

cvgoudar · November 26, 2017, 11:04am

I still see a possibility where embedding may be relevant for binary flag especially if one wants to use the binary flag for some sort of similarity measure for additional analysis. This may not impact classification results. If a variable doesn’t have predictive power binary flag indicates they are separate classes but the embedding may indicate they are very similar.

pete.condon · November 26, 2017, 11:45am

I’m pretty sure that mathematically it won’t make any difference for binary flags. The values of the embeddings will be multiplied by a weight in the fully connected layer, so I can’t see how there would be any difference between training the embedding value and training the weight. Although I’d be very happy to be proven wrong

cvgoudar · November 26, 2017, 12:12pm

I didn’t refer to the impact on classification results which I agree. I am working on something to use features created from combination of embeddings for clustering purposes after the prediction is done. In one case if I directly use the binary flags it indicates they are different. However if represented as embedding they may not show so much differentiation if the feature doesn’t have predictive power.