Fastai v2 chat

Pytorch 1.3.1 has been released, I can confirm the vision issue has been fixed on Colab

2 Likes

Question:

I was looking into TfmdList and above is something that I did not expect to happen.

From my understanding, calling setup in TfmdList calls setup on self.tfms passing in self.train, which is what I have tried to replicate in the img.

I am not sure why I get different vocabs, what am I doing wrong please?

1 Like

Never mind for now :slight_smile:

Different dsrc objects passed to Categorize lead to this inconsistency.

I will keep looking and see why that happens.

# tl dsrc
TfmdList: [PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/basset_hound_111.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Siamese_178.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/german_shorthaired_94.jpg')]
tfms - (#1) [Transform: True (object,object) -> RegexLabeller ]


# pipe dsrc
TfmdList: [PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/basset_hound_111.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Siamese_178.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/german_shorthaired_94.jpg')]
tfms - (#2) [Transform: True (object,object) -> RegexLabeller ,Categorize: True (object,object) -> encodes (object,object) -> decodes]
1 Like

bcolz is a great choice. You can also create iterable datasets in fastai v2. Have a look at the notebook where DataLoader is defined to see various approaches to that. Since I haven’t actually had a need to use them myself yet, they’re not tested in a practical setting - so do let us know if you try them out and have any issues.

It’s a pleasure! :slight_smile:

Somehow, I haven’t been able to replicate setup method of TfmdList, could I please double check my understanding of the code?

Referring to the image:
The first step is to getattr(self, 'train', self) which means, we get the train subset where i=0 and therefore, in step-2, we get splits[0], and return items with that split.

Then, in step-2 itself, we call _new which takes us to step-4 (skipping step-3 as that is only _get to get items defined in L), where we call super._new(items, tfms=self.tfms, do_setup=False, **kwargs).

This takes us to step-5, because _new is defined inside L which all it does is return a new object of type TfmdList with items with idx in split[0], do_setup=False. So we get a new TfmdList with same tfms.

This takes us back to step-6, because now we have self.train which is a TfmdList with items in splits[0].
In step-6., we call self.tfms.setup() passing in this very self.train object.

Since self.tfms is a Pipeline, now we get to step-7 which finally performs setup on the individual Transforms which for pets tutorial for y variable are [RegexLabeller(pat), Categorize].

This Pipeline object self is nothing but TfmdList.tfms or TfmdList.train.tfms, right? The items arg that we have passed to step-7, is the train TfmdList with the same tfms as the original TfmdList.

Therefore, this items obj, (which is nothing but original TfmdLists train subset) passed to Pipeline.setup looks something like:

TfmdList: [PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/basset_hound_111.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Siamese_178.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/german_shorthaired_94.jpg')]
tfms - (#2) [Transform: True (object,object) -> RegexLabeller ,Categorize: True (object,object) -> encodes (object,object) -> decodes]

If originally,

items
>> (#5) [/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg,/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Siamese_178.jpg,/home/ubuntu/.fastai/data/oxford-iiit-pet/images/german_shorthaired_94.jpg,/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Abyssinian_92.jpg,/home/ubuntu/.fastai/data/oxford-iiit-pet/images/basset_hound_111.jpg]

splits
>> ((#4) [4,1,0,2], (#1) [3])

tfms
>> [<local.data.transforms.RegexLabeller at 0x7f49c026fd30>,
 local.data.transforms.Categorize]

This is where it gets a little confusing.

When we do self.fs.clear(), the items are now:

ipdb> items
TfmdList: [PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/basset_hound_111.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/Siamese_178.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'), PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/german_shorthaired_94.jpg')]
tfms - (#0) []

The tfms are cleared on items which was the train subset of original TfmdList!!

Why would this be?

My understanding:

  1. A TfmdList and a TfmdList.subset(0) share the same tfms stored in both objects self.tfms
  2. This self.tfms is a Pipeline that we call setup on, thus when we do self.fs.clear(), it clears the Transforms on TfmdList and TfmdList.subset(0).
  3. Since this TfmdList.subset(0) is what get’s passed as items, therefore, tfms get cleared when we do self.fs.clear().

Following on the above understanding, therefore, when I do:

_tl = TfmdList(items[splits[0]], tfms=[RegexLabeller(pat)], do_setup=False)
pipe = Pipeline(tfms, as_item=True); 
pipe.setup(_tl)
pipe.vocab

>> (#4) [Siamese,basset_hound,german_shorthaired,keeshond]

It works :slight_smile:

1 Like

can I get some advice with installing v2? On my windows 10 machine I’m doing :

git clone git@github.com:fastai/fastai_dev.git
cd .\fastai_dev\
conda env create -f environment.yml
conda activate fastai_dev
pip install git+https://github.com/fastai/fastai_dev

then running python from the shell where I get the following error

>>> from fastai2.basics import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'fastai2'

Does anyone have any advice?

I had a thought and I want your opinion before I try to do anything large like this. We have ClassificationInterpretation, but why not just an Interpreter. Don’t limit to classification. For example, regression. We plot the worst guesses (for image points), or for tabular we do a cluster of how close the guesses were to the correct point. Also we could implement permutation importance in for tabular here too, to make it easier to analyze tabular models. In terms of NLP, I need to revisit the NLP course again but I imagine something along the lines of word clustering and how close we came for our Language Models. I can try and come up with something for the tabular or vision models if you want a visualization (which I know always helps).

Let me know your thoughts on this and if you know of any other regression-based analysis techniques that should be included in the above. :slight_smile:

1 Like

Thought I might mention that there are SegmentationInterpretation and TextClassificationInterpretation classes already exist. I am not sure if it makes sense to have a general Interpreter class as each task will have different interpreting methods, so I think that’s why current approach has been to have separate classes for separate tasks.

1 Like

I suppose that makes a lot more sense and sounds better :slight_smile: I was thinking something along the lines (in terms of code) a generic that can pick up what type is being passed in and apply particular functions to it, hence the Interpreter class.

1 Like

Sure I guess it wouldn’t be hard to do something like that with type dispatching (IIRC that’s the correct term). However, it’s still hard as a user because there aren’t any set functions because there are different interpretation functions for each task.

Thanks for the check! (We just learned about type dispatching in my Intermidate class a few weeks ago, forgot the proper term for a moment).

That makes sense, so perhaps then mabye TextInterpretation, VisionInterpretation, and TabularInterpretation? (If these regression/language inferences were added)

Again, there could be various NLP or vision or tabular tasks with their own interpretation methods. However, I don’t want to be too negative and I will let Jeremy and Sylvain decide what is the best approach over here.

1 Like

You’re certainly not being too negative :slight_smile: Constructed criticism is always welcome :slight_smile: For now, I’ll wait for Jeremy or Sylvain, and lull over which specific implementation ideas could convert over to one of the existing interps easily (a feature importance or feature visualization could easily be done IMO for tabular atleast)

1 Like

There is already an interpretation class, and a plot_top_losses method that has a type-dispatched counterpart (like show_batch and show_results). For now it only handles image and text classification but more visualization cases can be added with the type-dispatch.

3 Likes

Got it! I’ll look into that more. Thanks @sgugger :slight_smile:

Just a very quick code question for anyone please:

 dls = [dl_type(self.subset(i), bs=b, shuffle=s, drop_last=s, n=n if i==0 else None, **kwargs, **dk)
                for i,(b,s,dk) in enumerate(zip(bss,shuffles,dl_kwargs))]

Inside, FilteredBase databunch method, regarding above code, does this mean that we pass None to dl_type when i != 0 ?

Seems like we pass None:

[(i, b) if i==0 else None for i,b in enumerate([1,5])]

>> [(0, 1), None]

Follow up question then, how does the second dl get it’s dataset then? :o

My bad :slight_smile:

The if-else condition is only for n=n if i==0 else, I wasn’t aware we could have if else inside function args too.

Something new to learn everyday :slight_smile:

1 Like

I am bit confused about swift4tensorflow. Will fastai v2 be plain python or shifted to swift/tf ?

it’s in python. S4TF is a separate project.

This is called the “ternary operator”: https://book.pythontips.com/en/latest/ternary_operators.html

2 Likes