Fastbook Chapter 6 questionnaire solutions (wiki)

Here are the questions

  1. how could multi-label classification improve the usability of the bear classifier?

This would allow for the classification of no bears present. Otherwise, a mutli-class classification model will predict the presence of a bear even if it’s not there (unless a separate class is explicitly added).

  1. How do we encode the dependent variable in a multi-label classification problem?

This is encoded as a one-hot encoded vector. Essentially, this means we have a zero vector of the same length of the number of classes, and ones are present at the indices for the classes that are present in the data.

  1. How do you access the rows and columns of a DataFrame as if it was a matrix?

You can use .iloc. For example, df.iloc[10,10] will select the element in the 10th row and 10th column as if the DataFrame is a matrix.

  1. How do you get a column by name from a DataFrame

This is very simple. You can just index it! Ex: df['column_name']

  1. What is the difference between a dataset and DataLoader?

A Dataset is a collection which returns a tuple of your independent and dependent variable for a single item. DataLoader is an extension of the Dataset functionality. It is an iterator which provides a stream of mini batches, where each mini batch is a couple of a batch of independent variables and a batch of dependent variables.

  1. What does a Datasets object normally contain?

A training set and validation set.

  1. What does a DataLoaders object normally contain?

A trainin dataloader and validation dataloader.

  1. What does lambda do in Python?

Lambdas are shortcuts for writing functions (writing one-liner functions). It is great for quick prototyping and iterating, but since it is not serializable, it cannot be used in deployment and production.

  1. What are the methods to customise how the independent and dependent variables are created with the data block API?

get_x and get_y

  • get_x is used to specify how the independent variables are created.
  • get_y is used to specify how the datapoints are labelled
  1. Why is softmax not an appropriate output activation function when using a one hot encoded target?

Softmax wants to make the model predict only a single class, which may not be true in a multi-label classification problem. In multi-label classification problems, the input data could have multiple labels or even no labels.

  1. Why is nll_loss not an appropriate loss function when using a one hot encoded target?

Again, nll_loss only works for when the model only needs to predict one class, which is not the case here.

  1. What is the difference between nn.BCELoss and nn.BCEWithLogitsLoss?

nn.BCELoss does not include the initial sigmoid. It assumes that the appropriate activation function (ie. the sigmoid) has already been applied to the predictions. nn.BCEWithLogitsLoss, on the other hand, does both the sigmoid and cross entropy in a single function.

  1. Why can’t we use regular accuracy in a multi-label problem?

The regular accuracy function assumes that the final model-predicted class is the one with the highest activation. However, in multi-label problems, there can be multiple labels. Therefore, a threshold for the activations needs to be set for choosing the final predicted classes based on the activations, for comparing to the target claases.

  1. When is it okay to tune an hyper-parameter on the validation set?

It is okay to do so when the relationship between the hyper-parameter and the metric being observed is smooth. With such a smooth relationship, we would not be picking an inappropriate outlier.

  1. How is y_range implemented in fastai? (See if you can implement it yourself and test it without peaking!)

y_range is implemented using sigmoid_range in fastai.

def sigmoid_range(x, lo, hi): return x.sigmoid() * (hi-lo) + lo

  1. What is a regression problem? What loss function should you use for such a problem?

In a regression problem, the dependent variable or labels we are trying to predict are continuous values. For such problems, the mean squared error loss function is used.

  1. What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?

You need to use the correct DataBlock. In this case, it is the PointBlock. This DataBlock automatically handles the application data augmentation to the input images and the target point coordinates.


@muellerzr Please wiki-fy! :slight_smile:


I got a bit confused about softmax vs sigmoid when building the loss function and question 10 is exactly about this:

  1. Why is softmax not an appropriate output activation function when using a one hot encoded target?

As far as I understand (and the answer points in this direction too), there is nothing wrong in using softmax for one-hot encoded data. The problem arises when one wants to use softmax on multi-label data, since this kind of data allow for >1 label and therefore limiting the activations to sum up to 1 would be a huge limitation.

Do you guys have any comment? I found the text in the book a bit misleading/confusing …

If you have no label for a given input, then softmax will still want to pick one. Softmax takes into account the other activations in the final layer, which is useful when you know there is one and only one label. Sigmoid just minds its own business by mapping each activation to a value between 0 and 1 independently from the other activations. This way you basically get “Yes” or “No” (depending on your threshold) for each label.

indeed, the problem is with multi-label (e.g. examples with no label, or examples with multiple labels), not with one-hot encoding per se. I do not see a problem in using soft-max for one-hot encoding for a multi class problem (say MNIST)

There is none, but using a one-hot encoded target is less memory efficient. The one-hot encoded target working makes sense when you think about the following cross-entropy formulation.


p(x) will be 0 for every class except for the target where it will be one. Yealding the final result -log(q(x)), which is what we want.

In keras there are 2 different cross entropy loss functions, one working with index and the other with one-hot encoded target.

I don’t know if this answers your question.

1 Like

Thanks a lot! Technical reasons (e.g. memory usage) are totally legit reasons for discouraging something.
Thanks a lot! :slightly_smiling_face:

Can anyone explain the bit about why lambdas are not serialisable? Some context about what exactly does the exported pickle file contain and why lambdas can’t be included in them would help. Thanks!

very useful!! :grinning: