MNIST sample - data block creation issue

andrew77 · June 24, 2019, 2:27pm

I managed to download and there is ‘train’ and ‘valid’ directory and of course the there is a labels.csv to label the images.

I copied this from somewhere and I’m not sure if I did this correctly. I think the datablock is pretty confusing how do learn it intuitively?

Also how do I input the label? Thanks

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=64);data

ImageDataBunch;

Train: LabelList (12396 items)
x: ImageList
Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64)
y: CategoryList
7,7,7,7,7
Path: /root/.fastai/data/mnist_sample;

Valid: LabelList (2038 items)
x: ImageList
Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64)
y: CategoryList
7,7,7,7,7
Path: /root/.fastai/data/mnist_sample;

heye0507 · June 24, 2019, 5:50pm

Hi, it seems correct, also, you can try to read docs on fastai docs. Under datablock API it should have explained in examples

Another way to learn the datablock API is by going through the later courses, where Jeremy has explained things in lesson 5, how you interpret fastai and pytorch.

For your question about label, you have already labeled the data. When you are using default function from_folder, fastai assumes your data is in ImageNet style, where folder name is label name, and all the images in the folder belongs to that label (folder name)

If you want to have the ability to control how you label the input, you probably want to check from_csv method. Where you can pass a csv file as input label pair so fastai can read them accordingly.

I rarely use from_df method, when you have data_frame, the best way I would do is turn it into a csv file, so later on I can re-use / check the data processing part.

Hope this makes sense

andrew77 · June 25, 2019, 9:49am

Yeah the name of the sub directory has been labelled. so now I’m good to do modelling?

I’m kinda experimenting with the fast.ai/PyTorch. Oh Jeremy will discuss this in more detail in lesson 5? I’m only at lesson 3.

BTW, I thought the method we use depends on how images are prepared.

Is imagenet type of data preparation a ‘gold standard’?

Ok thanks.

LaurentH · June 25, 2019, 10:24am

Yeah the name of the sub directory has been labelled. so now I’m good to do modelling?

Likely, yeah. We’re not seeing all of your code so it’s a bit hard for us to say.

Is imagenet type of data preparation a ‘gold standard’?

Not necessarily, but it’s definitely a common one. As you toy with more data sets you’ll get a feeling for what’s common and what isn’t.

I think the datablock is pretty confusing how do learn it intuitively?

I would recommend reading the Data Block docs, but don’t read all of it! Just read the “Step” parts (i.e. focus on what Step 1 is, what Step 2 is etc. The details don’t matter too much).
If you only look at the steps, you’ll notice that the API is always the same:

Where is the data? How can fastai grab it?
What kind of data is it? How should fastai handle it?
How should fastai split the data (if at all?)
Anything else you want fastai to do with the data (maybe transform it?)

The Data Block docs are pretty large, I struggled with it too. Just try to focus on only the top-level steps described and you should get a better intuitive feeling. Once you have that down, you’re ready to dig more into its details.

AjayStark · July 3, 2019, 7:14am

Hi, i’m in lesson -7 MNIST and i have a few doubts.

class resblock(nn.Module):
def init(self,nf):
super().init()
self.conv1= conv_layer(nf,nf)
self.conv2= conv_layer(nf,nf)

def forward(self, x): return x + self.conv2(self.conv1(x))

In the above resblock, the arguments for conv_layer are both nf. But shouldn’t they be ni and nf ? because when we used the conv_layer without res_block both ni and nf were passed to conv_layer.

And in the return statement, why only a single item ‘x’ is passed to conv1 and not a batch ‘xb’ ?