Really quick: these lines of code pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(seed=42), get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name')) pets1.summary(path/"images")
obviously return RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension.........
this is done intentionally? I didn’t catch that from Jeremy. Thank you
Perhaps the implication is that you’ll have a dataset that is able to be loaded and thus trained. It can’t be so corrupted that you can’t create a DataBlock or DataLoader! But afterwards, the model (which can be initialized and run for 1 epoch super quickly) can diagnose exactly what sort of data cleaning you may want to do.
Did anyone get the image cleanser widgets to work in paperspace?
Yes, it is done intentionally.
It is to show that Resize
is missing, e.g. you are trying to stack images of different sizes together in a single batch.
I think there is an assumption that the majority of the data is good (maybe 80%+).
While running
pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")
I got this error
Setting-up type transforms pipelines
Collecting items from /home/tyoc213/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize
Building one sample
Pipeline: PILBase.create
starting from
/home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
applying PILBase.create gives
PILImage mode=RGB size=500x334
Pipeline: partial -> Categorize
starting from
/home/tyoc213/.fastai/data/oxford-iiit-pet/images/Russian_Blue_212.jpg
applying partial gives
Russian_Blue
applying Categorize gives
TensorCategory(9)
Final sample: (PILImage mode=RGB size=500x334, TensorCategory(9))
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline: IntToFloatTensor
Building one batch
Applying item_tfms to the first sample:
Pipeline: ToTensor
starting from
(PILImage mode=RGB size=500x334, TensorCategory(9))
applying ToTensor gives
(TensorImage of size 3x334x500, TensorCategory(9))
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 334, 500]),torch.Size([3, 300, 239]),torch.Size([3, 332, 500]),torch.Size([3, 333, 500])
These aren’t really probabilities though, are they? Wouldn’t we need to calibrate our predictions to get this? (i.e. something like this: https://www.r-bloggers.com/calibration-affirmation/)
This is intentional
If all probabilities sum to 1, is there a case where having too many categories makes resolving between top categories difficult?
What could be an alternative to SoftMax when we have images without any of our classes?
On the topic of visualising NN Layers: Just discovered OpenAI has released “OpenAI Microscope”
Microscope systematically visualizes every neuron in several commonly studied vision models, and makes all of those neurons linkable.
Softmax won’t be useful for this. We will look at a good activation function in the next lesson (spoiler alert: it’s sigmoid for each activation).
Good point.
They actually can be interpreted as probabilities with good confidence.
The reason is that we are optimizing for CrossEntropyLoss, which under the hood is actually regressing on probabilities.
Classifiers which do that (such as a Logistic Regression) generate well calibrated outputs by definition.
Calibration is mostly needed for algos which DON’T optimize for probabilities directly in the loss, such as TreeEnsembles, e.g. Random Forests.
Will the Fast AI NLP course be redone sometime? Would love to join
IMO they are very upto date-I believe these were taught at the uni and later released as a MOOC.
There are also active weekly study groups around the course materials.
There are no plans to redo the NLP course at this time, but it is less than a year old and all freely available online: https://www.fast.ai/2019/07/08/fastai-nlp/
why does the loss function need to be negative ?
1 vote to redo NLP course soon
Thanks for sharing!!
Can you explain how the likelihood comes into play? I always thought likelihoods were related to fitting distributions
modification->multiplication