SOURCE CODE: Mid-Level API

In today’s session, the last question/issue we had was with knowing exactly how splits and split_idx would work inside of Datasets, DataLoader, etc. Specifically, we wanted to figure out how we might go about restricting a Transform to a particular subset of our data. For example, there may be times when you do not want to apply particular augmentations to images in your validation set.
For this, we were looking at the source code for TfmdDL and Datasets and trying to understand how it’s all working. That’s exactly where we stopped and decided to continue the next day.

But I did some digging afterwards and wrote this blog about dealing with that exact issue, and understanding a few more subtleties about spreading your transforms among your Datasets splits.

Here it is: Using separate Transforms for your training and validation sets in fastai2

This blog is a bit technical and goes slightly towards the advanced side of things. Also, it’s kind of rough at this point simply because I wanted to finish it off quickly :sweat_smile:. But I guess it gets the point across.

Any feedback is very much appreciated!
(I’ll also add it to the wiki)

2 Likes

By the way, as this came up here is how everything is applied in the API in order (from the DataBlock):

  1. get_items
  2. Splits
  3. type_tfms (PILImage/Categorify)
  4. item_tfms
  5. batch_tfms
4 Likes

We’ll meet again at 10 AM IST.

2 Likes

Could you guys upload yesterday’s recordings?

Meeting URL? Should I set-up zoom call? (with a 40 minute limit)

New Link: https://zoom.us/j/290640150?pwd=N25BKzJCc2VGMzVMQTk0bFNYV24wUT09

how long before he uploads the videos and the notebooks?

Hi all,

I just finished uploading the videos, here they are:

  1. DataBlock Overview, Categorize block and CategoryMap
  2. Datasets, Dataloaders and TfmdDL source code walkthru
  3. Complete Pipelines and detailed Datasets
  4. TfmdLists complete walkthrough

In these videos we start looking in to the source code in VIM and also run our own experiments in Jupyter Notebook. We try to ask questions in the videos and also answer them.

The past two days have been one of the most fruitful in terms of learning about Fastai. Particularly, want to thank @akashpalrecha, @init_27, @barnacl for being on a call with me for pretty much all day yesterday and getting unstuck on things together.

From today, the study group has grown and we have decided that we’ll be having weekly sessions every Saturday 7:30am IST and half of Sundays to suit everyone.

We will be spending other days Mon-Fri to work on our personal blogs/experiments/projects using the library and on the weekends dig deep into the library.

As a rough plan, starting next week we will start looking into the learner source code and also much of optimizers/callbacks etc. It only get’s more interesting from here because the above four videos set the base and we can move forward towards implementing new deep learning research papers/loss functions/customize the API to meet our needs. The more interesting bits lie ahead of us :slight_smile:

16 Likes

After analyzing split_idx it in depth, don’t you feel it’s an awkward bit of the design? We are requiring a Transform, which is the most basic building block, to be aware of an implementation detail of Datasets which has the splits. So it’s a backward reference in our layered API.

I’d rather have the Datasets tell the loaders which transforms to use. Maybe just have the transforms expose whether they are meant to be used traintime through a field or a decorator and then pipelines can filter them out respectively. Likewise RandomTransforms could have randomized and deterministic behaviour and the decision which to use happens higher up the chain.

I admit I don’t understand all the details here, but it would be cool if we can come up with a refactoring that makes it cleaner and more elegant.

3 Likes

Just spend my Sunday watching your 4 videos! Amazing stuff, I really like your guys approach =) Thank you for doing this =)

3 Likes

After looking into it carefully, what I’ve found is that I basically only need to set a Transform’s split_idx attribute to decide what split of my data it will be applied to. I think that as far as the API is concerned, this is actually simple and easy to understand for most people at a high level.
For the internal implementation, I’m not sure I entirely understand what you mean.

What your’e saying here is kind of what’s actually going on inside the hood.
I could say my_transform.split_idx = 0 and that would mean that any Pipeline that uses this instance of the transform will know that it only needs to be applied to the training set.

What irks me is that it leaks an implementation detail from Datasets into Transform the transform is otherwise not required to be aware of all the machinery that surrounds it, it just needs to know how to transform the data it’s given. My suggestion was to have something like a train_only tag you could attach to a transform instead. It would work in pretty much the same way but look cleaner.
Anyways, I understand it’s a minor detail but we can always strive to make the code more elegant, right? :slight_smile:

2 Likes

Yes. I’d agree on the fact that the actual implementation is a bit tricky and isn’t immediately apparent.
I guess what you can say can simply be implemented by adding a @property to the Transform class called train_only and valid_only which when set to True gives the desired functionality. But then, you run into problems when you have multiple splits (> 2) in your datasets with this. I guess what’s needed is a very thin wrapper over the whole split_idx and splits interface that makes the functionality apparent in English.

I was thinking of tagging the pipelines as well, so by default the first two have train and valid, but you can change it if you need to and extend to as many as you want.

Thanks for the videos. I’ve learned a lot watching them!

I am trying to build a AudioBlock and Audio Transformations to read and augment Audio files and convert them to spectrograms to then use a cnn_learner to use resent models for classification. (I know that there is a fastai2_audio library but i want to understand data blocks :))

What works:

'''
class AudioTransform(Transform):

    def encodes(self, o):
        y, sr = librosa.load(o)
        y = librosa.util.fix_length(y, int(sr*0.75))
        y = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, n_fft=1024, hop_length=140)
        y = librosa.power_to_db(y, ref=np.max)
        y = y - y.min()
        y = np.flip(y, axis=0)
        
        y = PILImage.create(y)
        return y
    
        

def AudioBlock():
    return TransformBlock(type_tfms=AudioTransform(), batch_tfms=IntToFloatTensor)

dblocks = DataBlock(blocks = (AudioBlock, CategoryBlock),
                 get_items=get_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=parent_label,
                 item_tfms=None,
                 )

dls=dblocks.dataloaders(path)
'''

Now im trying to use different transforms to

  1. read the wav file
  2. augment e.g. add noise
  3. convert to PILImage

Whats the right approach to achive that? I tried to put #1 as type_tfms and then write own Transforms for #2 and #3 and add them to item_tfms. To test the approach I m just using #1 (type_tfms) and #3 (item_tfms) but I get an error message.

'''
class AudioToImage(Transform):
    def encodes(self, o): 
        print('audio to image', o.shape)
        img = PILImage.create(o)
        return img


class AudioTransform(Transform):

    def encodes(self, o):
        y, sr = librosa.load(o)
        y = librosa.util.fix_length(y, int(sr*0.75))
        y = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128, n_fft=1024, hop_length=140)
        y = librosa.power_to_db(y, ref=np.max)
        y = y - y.min()
        y = np.flip(y, axis=0)
        
        y = np.uint8(y)
        return y
    


def AudioBlock():
    return TransformBlock(type_tfms=AudioTransform(), batch_tfms=IntToFloatTensor)

dblocks = DataBlock(blocks = (AudioBlock, CategoryBlock),
                 get_items=get_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=parent_label,
                 item_tfms=[AudioToImage],
                 )

dls=dblocks.dataloaders(path)
'''

Error:
audio to image (128, 119)
audio to image torch.Size([])
Could not do one pass in your dataloader, there is something wrong in it

What am I doing wrong?

Today at 7:30pm IST @barnacl, @init_27, @akashpalrecha and I will get together again to look at CallBacks - particularly the implementation of Mixup and try to implement a new DataAugmentation callback of our own and contribute to the library - Fmix or CutMix.

You are invited :slight_smile:

@init_27 can you please setup the Zoom call? :smiley:

6 Likes

Sure thing!

[Link Removed] for 1st April, 7:30 PM IST.

Note: This meeting will be recorded.

2 Likes

So based on how pipelines are implemented inside of the library right now, this might be a bit problematic. The way it’s implemented, you have the same Pipeline being applied to all your sets, but every time you are accessing the data loader for a different split of your data, that same pipeline has a different split idx value which then causes the transforms to be aware of what split of your data they are working on.

2 Likes

I guess you should have your AudioToImage transform in the type transforms. Or at least maybe add a type annotation so that they only get applied to your audio files. Because right now it is also getting applied to your CategoryBlock items and that’s problematic

3 Likes

Hi all,
Did you guys meet today? are you planning for another call in the near future?. thankyou so much for doing this, it is really useful, looking forward also to your great blog posts, I have learnt a lot. and @arora_aman your project to integrate Pytorch in a smoother way sounds amazing. Thanks

2 Likes