Lesson 4 - Official Topic

mario_carrillo · April 15, 2020, 3:22am

Really quick: these lines of code pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(seed=42), get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name')) pets1.summary(path/"images") obviously return RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension......... this is done intentionally? I didn’t catch that from Jeremy. Thank you

jwuphysics · April 15, 2020, 3:23am

Perhaps the implication is that you’ll have a dataset that is able to be loaded and thus trained. It can’t be so corrupted that you can’t create a DataBlock or DataLoader! But afterwards, the model (which can be initialized and run for 1 epoch super quickly) can diagnose exactly what sort of data cleaning you may want to do.

akandi · April 15, 2020, 3:23am

Did anyone get the image cleanser widgets to work in paperspace?

FraPochetti · April 15, 2020, 3:24am

Yes, it is done intentionally.
It is to show that Resize is missing, e.g. you are trying to stack images of different sizes together in a single batch.

DanielLam · April 15, 2020, 3:27am

I think there is an assumption that the majority of the data is good (maybe 80%+).

tyoc213 · April 15, 2020, 3:29am

While running

pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")

I got this error

Setting-up type transforms pipelines
Collecting items from /home/tyoc213/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize

Building one sample
  Pipeline: PILBase.create
    starting from
      /home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=500x334
  Pipeline: partial -> Categorize
    starting from
      /home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
    applying partial gives
      Russian_Blue
    applying Categorize gives
      TensorCategory(9)

Final sample: (PILImage mode=RGB size=500x334, TensorCategory(9))


Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILImage mode=RGB size=500x334, TensorCategory(9))
    applying ToTensor gives
      (TensorImage of size 3x334x500, TensorCategory(9))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 334, 500]),torch.Size([3, 300, 239]),torch.Size([3, 332, 500]),torch.Size([3, 333, 500])

dcooper01 · April 15, 2020, 3:29am

These aren’t really probabilities though, are they? Wouldn’t we need to calibrate our predictions to get this? (i.e. something like this: https://www.r-bloggers.com/calibration-affirmation/)

FraPochetti · April 15, 2020, 3:30am

This is intentional

MJB · April 15, 2020, 3:34am

If all probabilities sum to 1, is there a case where having too many categories makes resolving between top categories difficult?

yfrancois · April 15, 2020, 3:34am

What could be an alternative to SoftMax when we have images without any of our classes?

init_27 · April 15, 2020, 3:36am

On the topic of visualising NN Layers: Just discovered OpenAI has released “OpenAI Microscope”

Microscope systematically visualizes every neuron in several commonly studied vision models, and makes all of those neurons linkable.

sgugger · April 15, 2020, 3:36am

Softmax won’t be useful for this. We will look at a good activation function in the next lesson (spoiler alert: it’s sigmoid for each activation).

FraPochetti · April 15, 2020, 3:36am

Good point.
They actually can be interpreted as probabilities with good confidence.
The reason is that we are optimizing for CrossEntropyLoss, which under the hood is actually regressing on probabilities.
Classifiers which do that (such as a Logistic Regression) generate well calibrated outputs by definition.
Calibration is mostly needed for algos which DON’T optimize for probabilities directly in the loss, such as TreeEnsembles, e.g. Random Forests.

steef · April 15, 2020, 3:39am

Will the Fast AI NLP course be redone sometime? Would love to join

init_27 · April 15, 2020, 3:41am

IMO they are very upto date-I believe these were taught at the uni and later released as a MOOC.

There are also active weekly study groups around the course materials.

rachel · April 15, 2020, 3:42am

There are no plans to redo the NLP course at this time, but it is less than a year old and all freely available online: https://www.fast.ai/2019/07/08/fastai-nlp/

pinaki · April 15, 2020, 3:43am

why does the loss function need to be negative ?

steef · April 15, 2020, 3:44am

1 vote to redo NLP course soon

Thanks for sharing!!

bibsian · April 15, 2020, 3:44am

Can you explain how the likelihood comes into play? I always thought likelihoods were related to fitting distributions

Pomo · April 15, 2020, 3:45am

modification->multiplication