Lesson 4 - Official Topic

Really quick: these lines of code pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(seed=42), get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name')) pets1.summary(path/"images") obviously return RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension......... this is done intentionally? I didn’t catch that from Jeremy. Thank you

Perhaps the implication is that you’ll have a dataset that is able to be loaded and thus trained. It can’t be so corrupted that you can’t create a DataBlock or DataLoader! But afterwards, the model (which can be initialized and run for 1 epoch super quickly) can diagnose exactly what sort of data cleaning you may want to do.

2 Likes

Did anyone get the image cleanser widgets to work in paperspace?

Yes, it is done intentionally.
It is to show that Resize is missing, e.g. you are trying to stack images of different sizes together in a single batch.

1 Like

I think there is an assumption that the majority of the data is good (maybe 80%+).

2 Likes

While running

pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")

I got this error

Setting-up type transforms pipelines
Collecting items from /home/tyoc213/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize

Building one sample
  Pipeline: PILBase.create
    starting from
      /home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x334
  Pipeline: partial -> Categorize
    starting from
      /home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
    applying partial gives
      Russian_Blue
    applying Categorize gives
      TensorCategory(9)

Final sample: (PILImage mode=RGB size=500x334, TensorCategory(9))


Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILImage mode=RGB size=500x334, TensorCategory(9))
    applying ToTensor gives
      (TensorImage of size 3x334x500, TensorCategory(9))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 334, 500]),torch.Size([3, 300, 239]),torch.Size([3, 332, 500]),torch.Size([3, 333, 500])

These aren’t really probabilities though, are they? Wouldn’t we need to calibrate our predictions to get this? (i.e. something like this: https://www.r-bloggers.com/calibration-affirmation/)

This is intentional

If all probabilities sum to 1, is there a case where having too many categories makes resolving between top categories difficult?

What could be an alternative to SoftMax when we have images without any of our classes?

1 Like

On the topic of visualising NN Layers: Just discovered OpenAI has released “OpenAI Microscope”

Microscope systematically visualizes every neuron in several commonly studied vision models, and makes all of those neurons linkable.

7 Likes

Softmax won’t be useful for this. We will look at a good activation function in the next lesson (spoiler alert: it’s sigmoid for each activation).

4 Likes

Good point.
They actually can be interpreted as probabilities with good confidence.
The reason is that we are optimizing for CrossEntropyLoss, which under the hood is actually regressing on probabilities.
Classifiers which do that (such as a Logistic Regression) generate well calibrated outputs by definition.
Calibration is mostly needed for algos which DON’T optimize for probabilities directly in the loss, such as TreeEnsembles, e.g. Random Forests.

2 Likes

Will the Fast AI NLP course be redone sometime? Would love to join :slight_smile:

1 Like

IMO they are very upto date-I believe these were taught at the uni and later released as a MOOC.

There are also active weekly study groups around the course materials.

There are no plans to redo the NLP course at this time, but it is less than a year old and all freely available online: https://www.fast.ai/2019/07/08/fastai-nlp/

11 Likes

why does the loss function need to be negative ?

2 Likes

1 vote to redo NLP course soon :slight_smile:

Thanks for sharing!!

1 Like

Can you explain how the likelihood comes into play? I always thought likelihoods were related to fitting distributions

modification->multiplication