Lesson 4 - Official Topic

I agree with the last comment about ReLU.

I address your first remark in this answer.

Thanks. I believe my comment is indeed consistent with your answer.

If we want to recognize numbers more than 10 or alphanumeric words or english words, how do we go about?

Do we have to train with those words with multiple images or is there a pre-trained model that can be used?

How about numbers more than 10 like 1213123123 or alphanumber words like ANHRDS1021820?

Regards
Ganesh Bhat

I think what you are looking for is real OCR which is as far as I understand a mix of object detection (find the regions in an image that contain text) and character recognition (like our MNIST example). You’ll find more information here:

Videos no course NLP sobre Regex

video 6 (Rachel)

video 7 (Rachel)

video 9 (Jeremy)

Notebook: https://github.com/fastai/course-nlp/blob/master/4-regex.ipynb

1 Like

Thanks @florianl.

I was trying to take the MNIST example to the next level just to understand if numbers between 0 to 9 and alphabets between A to Z or a to z can be recognized in the similar manner when they form words in images. Once an object is detected (say text block), can we use the above logic to do it? Is it less efficient to OCR or are there any more challenges?

1 Like

I was reading through the below statement and wanted to understand, the below statement where the augmentations applied are different on train and validation sets. Can someone please help me understand what is different?

Random crop and augment : This is in batch_tfms , so it’s applied to a batch all at once on the GPU, which means it’s fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentation is done first.

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(r’(.+)_\d+.jpg$’), ‘name’),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/“images”)

@ganesh.bhat to understand how it’s applied differently to train vs valid you should look at each transforms source code. Each one contains a split_idx. If it’s 0 it’s applied to the training set, if it’s 1 it’s applied to the validation. (And if there’s none it’s both I think? @sgugger is that right)

Yes, that’s right.

1 Like

I understand the split_idx and the train/ valid mapping thanks to the session by @arora_aman.

Let me rephrase my question - The question is that on the validation set only resize is applied to the final size I.e. 224 whereas on the training all the Augmentations are applied. Is my understanding correct? If yes, why is it so?

Do we do it because we treat it like a test set and try to predict?

If I was trying to look at the distribution of the dataset, what would be the easiest way to do it?
I couldn’t find any method to show the occurence of each class in a dataset…

You can take the output of the regex function which has list of all the classes and convert it into panda dataframe. Value_count will give you the occurrence.

Also, dls.train.vocab or dls.valid.vocab has the occurrence, if I am not wrong.

In the case that we pick a different value for threshold, do we need to adjust our value in the below function too (accuracy test against sigmoid of predictions)? So if we set our thresh to say 1.0, then does it follow that:

correct = (preds>0.7311) ==yb for example, where 0.7311 is the sigmoid(1).

def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()

There are in fact two thresholds and my answer in the discussion did not help to dissipate the ambiguity. I will answer your question in two points.

  1. The threshold that I mentioned in my answer refers to the first training approach described in the lesson, somewhere between
    def linear1(xb): return xb@weights + bias
    
    and the chapter Sigmoid. In this approach (which is not implemented, by the way, as it is not practical, as Jeremy explains), the output of linear1 is compared against a threshold to predict a category for the sample, then these predictions are compared against the targets to calculate the accuracty of the model, and the idea is to modify the model in order to improve the accuracy. For emphasis, the sigmoid function does not enter the picture at this point, and the threshold plays no role in training the model (again, in this hypothetical approach).
  2. The second point is that the batch_accuracy which you refer to
    concerns the second training approach. In this approach, the accuracy plays no role in the training of the model but only its evaluation. In turn, the threshold (here set to 0.5) entering the definition of batch_accuracy only affects the accuracy, and does not enter the training of the model.

In principle this should answer your question. On the other hand I think it might not be a waste of time to revisit the narrative of the lesson.

I am looking at notebook 04_mnist_basics.ipynb and will paraphrase quite a bit of it.

As a machine learning practitioner, you are given a sample and asked to determine whether it is a “7” or a “3”. For your model, you pick weights and bias at random and calculate the output pred of your model for your sample x using the function linear1. If pred > 0.0, then you declare that the sample is a “3”, and otherwise a “7”. Of course, there is no reason that this first iteration would give a good accuracy (which compares the predictions preds of all samples against the targets, which are their known labels). The idea is to find a better choice for weights and bias so that the predictions of the samples with this new model will give a better accuracy. For emphasis, the predictions must again be compared with the same threshold value of 0.0. The strategy is that, iterating this process for a sufficiently long time will produce a model with optimal accuracy.

In this process, one could have chosen a different value for the threshold thresh, by declaring that a sample is a “3” if pred > thresh, whatever thresh is, as long as this value is fixed along the iterative process. Therefore, I won’t specify this threshold in the future, it just needs to be fixed along the training.

Note that the output pred to linear1 is not quite the predicted category, but rather (pred > thresh).float(). That is, a “7” is encoded with a 0 and a “3” with a 1.

To summarize, the first training algorithm searches for weights and bias such that the function linear1 maps hopefully most samples which are a “3” to values pred > thresh and otherwise to values pred <= thresh.

This thresh is the threshold that I was referring to in my initial comment to manavk’s question. Note that at this point there is no sigmoid function.

The problem with the training algorithm described above is that there is no obvious procedure to determine how to change weights and bias in order to improve accuracy. Indeed, since targets as well as (preds > thresh).float() are either 0. or 1., except in very special circumstances, wiggling weights and bias will not produce any change in the predictions, and therefore in the accuracy. We need a new approach.

The second training approach brings two important changes. First, instead of computing our prediction in the form of (pred > thresh).float(), giving either 0. or 1., telling us if the sample is a “7” or a “3” respectively, we are going to calculate (a number which can be interpreted as) our level of confidence that the sample is a “3”. This number is calculated as sigmoid(pred), where pred is as before the output to function linear1. Here, sigmoid(pred) is a number between 0 and 1.. Because the sigmoid function varies continuously (as opposed to a jump from 0. to 1.), a small change in weights and bias will produce a small change in sigmoid(pred).

But this number sigmoid(pred) does not answer the question “is the sample a “3” or a “7”?” (This, I would say, really is our prediction.) I will intentionally leave this point for later, to clearly emphasize that we don’t need to make predictions (in the form of either 0. or .1) in order to train the model.

The other important change in the second training approach is that accuracy itself is not used in the training of the model. It is only used to evaluate the model.

Instead, we are going to calculate an error between our ouput sigmoid(pred) and target. This is in essence the loss function mnist_loss. The precise definition is not too important for our discussion, the point is that there is no threshold involved.

The training in this second approach is essentially contained in the functions calc_grad and train_epoch. Again, no threshold there.

Finally, a prediction needs to be either 0. or 1., which we obtain by comparing sigmoid(pred) against another threshold, say thresh2. In other words, our prediction is (sigmoid(pred) > thresh2).float(). This threshold is chosen .5 in batch_accuracy, but it plays a role completely different from thresh in the first training approach. Changing it to a value other than .5 will only change the accuracy of the model, but to say it one last time, this threshold is not used in the training the model.

5 Likes

@Antoine.C Thank you for taking the time to provide such a detailed answer. It really helped. I’m looking forward to getting some sleep tonight. :slight_smile:

I am moving here a discussion that started with @muellerzr in another thread but that I believe can be of general interest for the course. If resolved, I will also add the answer in there for completeness. My question was the following:

How to evaluate the performance of the model on a test dataset after training? My dataset is cats vs dogs and I have the new images in a folder called test with subfoders cat and dog. I trained my model with a folder called train which had subfolders cat and dog and I split them 80/20. I treated it as a multi-label problem so that the model return nothing if not confident enough of the prediction.

Dataset -> !wget https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip

pets = DataBlock(blocks = (ImageBlock, MultiCategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(valid_pct=0.2, seed=42),
                 get_y=Pipeline([parent_label, lambda label: [label]]), # little trick to get out of a category a list of them!
                 item_tfms=Resize(460),
                 batch_tfms=[*aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)])
dls = pets.dataloaders(path)

Notice the get_y - This is a nice trick to create a multi-label problem out of single labels parent folders. I find it super useful so that your models return empty list [] when no class reaches the confidence level you specified. More details in the awesome repo from @muellerzr in here.

Then, after training as usual - I move to the testing part:

test_path = Path('../Cat_Dog_data/test')
item_tfms = [Resize(460), ToTensor()]
batch_tfms = [IntToFloatTensor(), *aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)]
dsets = Datasets(items, tfms=[[PILImage.create], [parent_label, MultiCategorize()])
dls = dsets.dataloaders(after_item = item_tfms, after_batch=batch_tfms, bs=64)

test_dl = dls.test_dl(get_image_files(test_path), with_labels=True)
learn.validate(dl=test_dl)

This gives an error:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai2/metrics.py in accuracy_multi(inp, targ, thresh, sigmoid)
166 def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
167     "Compute accuracy when `inp` and `targ` are the same size."
--> 168     inp,targ = flatten_check(inp,targ)
    169     if sigmoid: inp = inp.sigmoid()
    170     return ((inp>thresh)==targ.bool()).float().mean()

I would suspect it is because I am not getting the labels right. I should not be doing parent_label but something similar as I did in get_y to get back a list rather than a single label. However, I wrote the code in multiple ways but I could not get it to work. Can somebody help me debug this issue?

This also made me realize I would like to understand the flow from the image to the data loader but I find Datasets as a class very cryptic so any resources (e.g. video, blog posts, book) would also be of great help to understand for instance the why of the parentheses [] in tfms.

1 Like

Can you share the full stack trace so we can see the error message?

Enabling the python debugger is super helpful too (if you haven’t already), just run %pdb before running the cell that gives you an error (pdb cheatsheet here). Then print the size of the tensors that are giving you trouble (inp and targ I guess), you’ll probably find that one of them is the wrong shape (or is wrapped in a list or tuple or something else unexpected)

(Also, I could be mistaken as test_dl is a weak point for me, but do you need to recreate your dls when loading the test set? Wouldn’t test_dl = dls.test_dl(get_image_files(test_path), with_labels=True) on your first dls work?)

1 Like

If you need to adjust where you get your label from, that lives in the TypeTransforms, so you may want to adjust that in your dls before making your test_dl

1 Like

Thanks to both - I changed to pets dataset to make collaboration easier. You can now check (and run) the notebook in here. Also, you can download the test folder in here.

test_path = Path('test')
item_tfms = [Resize(460), ToTensor()]
batch_tfms = [IntToFloatTensor(), *aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)]
dsets = Datasets(get_image_files(test_path), tfms=[[PILImage.create], [parent_label, MultiCategorize()]])
dls = dsets.dataloaders(after_item = item_tfms, after_batch=batch_tfms, bs=64)


test_dl = dls.test_dl(get_image_files(test_path), with_labels=True)
learn.validate(dl=test_dl)

Error message is attached below:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in one_batch(self, i, b)
    160             if len(self.yb) == 0: return
--> 161             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    162             if not self.training: return

24 frames
ValueError: Target size (torch.Size([192])) must be the same as input size (torch.Size([128]))

During handling of the above exception, another exception occurred:

AssertionError                            Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastcore/test.py in test(a, b, cmp, cname)
     20     "`assert` that `cmp(a,b)`; display inputs and `cname or cmp.__name__` if it fails"
     21     if cname is None: cname=cmp.__name__
---> 22     assert cmp(a,b),f"{cname}:\n{a}\n{b}"
     23 
     24 # Cell

AssertionError: ==:
128
192
1 Like

You didn’t allow access to your colab notebook, change the permissions for the notebook.