Lesson 3 In-Class Discussion ✅

adrian · November 9, 2018, 3:02am

yes forgot to pull

aloo · November 9, 2018, 3:03am

I am getting this as well. I ran git pull followed by restarting the kernel, but still get the error.

KevinB · November 9, 2018, 3:03am

Could someone talk a bit more about the data block ideology? I’m not quite sure how the blocks are meant to be used. Do they have to be in a certain order? Is there any other library that uses this type of programming I could look at?

jlromanlujano · November 9, 2018, 3:04am

Looking at the MNIST example at the end of the lesson 1 notebook, I saw that the unzipped version of the file had 3 items in it: labels.csv, a valid folder, and a train folder.

That being said, when creating the ImageDataBunch like this:
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)

How does the default value of valid_pct = 0.2 come into the picture? Will the ImageDataBunch just have a validation set of 20% of what was in the validation folder or does the function look at all the images in both the valid and train folder in order to make the validation set?

whatrocks · November 9, 2018, 3:05am

How frequently does the fit / training function use the validation set? Is it once (or multiple times) an epoch?

Unclear to met when the fitting process looks at training set vs validation set.

sgugger · November 9, 2018, 3:05am

This is a toy example where we put everything to be able to test efficiently. I don’t recommend looking at it for a first understanding.
Here for instance, the files labels.csv has filenames in train and valid.

lesscomfortable · November 9, 2018, 3:06am

It runs a full run through the validation set after each epoch. So once per epoch. It is used to see if your model is able to generalize and tune hyperparameters. It is not used for updating the weights. The model uses the validation set’s information only indirectly, when the user changes hyperparameters according to the training and validation losses.

shinto · November 9, 2018, 3:07am

Tried to load some xray images (grayscale) using ImageDataBunch. When tried to visualize with show_batch it comes in black and white (not grayscale). Does some custom dataset is required?

bjcmit · November 9, 2018, 3:08am

You have loaded the labels for the training set via train_v2.csv. How do you load the labels for the validation and test sets?

PegasusWithoutWinds · November 9, 2018, 3:08am

Why do we use size 128 for the planet dataset when loading the images?

simonw · November 9, 2018, 3:09am

Does anyone know where I can see the notebook with the example dataset stuff in? It’s not the same as the https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-planet.ipynb one

KevinB · November 9, 2018, 3:09am

Does the order of things matter for Data Blocks?

agr · November 9, 2018, 3:09am

More generally, I’d love a run through of how to include a test set and extract not just labels but predictions (for prep to submit for a competition, for example).

Kaspar · November 9, 2018, 3:10am

if your data are 16bit grayscale (likely) then you will have to make your own open_image

Oliver47 · November 9, 2018, 3:10am

Can you add a test folder if you created label from a CSV file?

pjha · November 9, 2018, 3:10am

Can we use fastai for feature extraction?

simonw · November 9, 2018, 3:10am

Looks like it’s here: https://github.com/fastai/fastai/blob/master/docs_src/data_block.ipynb

adrian · November 9, 2018, 3:11am

worked for me - im using the dev version

mayank4 · November 9, 2018, 3:11am

is it possible to do multi label classification with a model trained on single labels?

mukeshjangir · November 9, 2018, 3:11am

Can we use DataBlock API with text DataSet?