Lesson 1 In-Class Discussion ✅

Hi,

I think there maybe two possible reasons:

  1. The model has already seen the data a several number of time (8 epochs).
  2. The high learning rate causes even a minute error to be punished heavily leading to strong impacts on weight that inturn causes the model to loose all its stability and hence the high error value.

In case you have resolved the issue by the time you are seeing this, please let me know the reason for the same.

New starter to the course, was looking to get help on the best way to upload my own image files from my laptop HD to the notebook on paperspace?

Hi all,

I did two runs of lesson 1 codes. First time without mounting Google Drive as permanent disk. Second time, mounting Gdrive as permanent storage.

The Gdrive version’s error rate is 1.5% higher than non-Gdrive version. The runtime per cycle also goes up from 1:47 to 2.02.

My question is, is there any correlation between disk performance/throughput, training time, and error rate?

In lesson1-pets ipynb, we use two lines to import the modules/libraries:
from fastai.vision import *
from fastai.metrics import error_rate

and for the dataset we use the URLs class, and pass URLs.pets to untar_data.

Now URLs and untar_data() is defined under fastai.datasets. But we didn’t import anything from the datasets module. Then how come we are able to access the untar function and URLs class?

1 Like

They might have imported them inside the vision and metrics class.

Oh yes, that maybe the case. Thanks!

Why is the documentation of ClassificationInterpretation Class spread over two pages - https://docs.fast.ai/train and https://docs.fast.ai/vision.learner. When I use doc(ClassificationInterpretation), it directs me to the former link. However, I found some methods such as plot_top_losses() in the later link. Is there any specific reason for this segragation?

I want to use ImageDataBunch.from_lists but I don’t know what path does mean (in fastai).

For example, I need to to write

data = ImageDataBunch.from_lists(path, fnames, labels=labels, ds_tfms=tfms, size=256)

fnames are file names. For example, fname = array([’/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/cat.0.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/cat.1.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/cat.10.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/cat.100.jpg’, …, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/dog.9997.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/dog.9998.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/dog.9999.jpg’, ‘/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/train’], dtype=’

labels = np.array([(0 if 'cat' in fname else 1) for fname in fnames])

But I don’t know why do we need path in data = ImageDataBunch.from_lists( path , fnames, labels=labels, ds_tfms=tfms, size=256)

What do I need to write as path ?

1 Like

Hi ddd777 hope your having an excellent day!

When I did lesson 1 it was my understanding that path referred to the location where my data was stored.

The path variable was created by extracting my data to it. Two other paths were created one of which is used in creating my DataBunch.

Your path may have to be something like

/kaggle/input/dogs-vs-cats-redux-kernels-edition/train/

Hope this helps mrfabulous1 :smiley::grinning:

2 Likes

Not able to install kaggle using the command ! {sys.executable} -m pip install kaggle --upgrade
Getting:
-bash: {sys.executable}: command not found

Has anyone faced this issue?

Hey All!
I’m getting ‘IndexError: no such group’ error when trying to use ‘ImageDataBunch.from_name_re’

My code is
path_img = Path('/myDatasetNew/images'); path_img
fnames = get_image_files(myDatasetNew); fnames[-5:]
pat = re.compile(r'[^/myDatasetNew/images][a-zA-Z]+'); pat
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=tfms, bs=bs ).normalize(imagenet_stats)

My file names are like
PosixPath(’/myDatasetNew/images/ valueiWant _ rose-165819__340.jpg’),

P.s. All the images have extension ‘.jpg’ I’ve made sure of it. :frowning:

Please note that from_name_re gets group(1) from the match. In case there is no match, the group(1) raises IndexError: no such group.

Thus, you must have a mistake in your re pattern.
r'^/[A-Za-z]+/[A-Za-z]+/([A-Za-z]+)' worked for me:

fn = '/myDatasetNew/images/valueiWant_rose-165819__340.jpg'

def _get_label(fn): # code from `from_name_re`
    if isinstance(fn, Path): fn = fn.as_posix()
    res = pat.search(str(fn))
    assert res,f'Failed to find "{pat}" in "{fn}"'
    return res.group(1)

_get_label(fn)
'valueiWant'

Note this service for RE debugging: https://regex101.com/
I hope this helps. :heart:

Hi, anyway to load image from CSV, I have a dataframe of shape 3000 x 784, where each rows of data is an image and the last column of dataframe is the label. Thank You

@sfsfsf, I suppose 3000 is the number of rows (images) and each image is 28x28 pixels encoded as 0 or 1, correct? There must be 785 columns then, one for label.

Can you show one row?

Yes Sir , you are right, the dimension suppose to be 3000 x 785.

I did read about the documentation and the only way I can think of is to convert the data frame into image in jpg format So that I can make the databunch object.

Just wondering if there any any ways to convert my data frame into databunch

Please take a look (Kaggle Notebook) how I did that.

Briefly,

  • I read CSV into a Pandas DataFrame
  • converted rows into tensors
  • converted tensors into images
  • created customised LabelList – an MNISTClass
  • created a ImageDataBunch from a LabelList with create_from_ll

Feel free to ask further questions.

1 Like

Thank you very much, regarding the part of converting tensor into image, basically you will produce a bunch of images and store them in a folder ? May I know which function/library you are using?

No, there is no need to create images on disk. You can create Image object from the class offered by the fastai library.

I’ve created and use this helper function:

def tensor2Images(x,channels=3):
    assert channels == 3 or channels == 1, "Channels: 1 - mono, or 3 - RGB"
    return [Image(x[i].reshape(-1,28,28).repeat(channels, 1, 1)/255.) for i in range(x.shape[0])]

Here is the example and the result:

X.shape
>>> torch.Size([42000, 784])

tensor2Images(X)
>>> 
[Image (3, 28, 28),
 Image (3, 28, 28),
 Image (3, 28, 28),
...]
1 Like

Okay I would try to apply on my problem, once again thank you very much for your help.

Hi everyone, I’m really new to coding and am having a bit of a problem with my data for the first homework assignment of lesson 1. I’m trying to build a CNN that tells trees appart using images on google images. However, tree pictures have a lot of variance even with precise google searches, you get a lot of unwanted pictures such as distribution maps and essential oils that I would like to remove completely, but you can also get pictures of branches, leaves, bark or the full tree. From my understanding of CNN, putting birch leaves and bark in the same category will mislead the neural net during training and gives me an accuracy of about 50% over 30 classes. Like the neural networks but it’s not nearly accurate enough.

I was thinking of building an initial CNN for preprocessing the images into three to four categories while removing the “junk” and build a CNN for each category. I think I could get a higher accuracy while doing that.

First question: is it common practise in ML to build multiple neural nets on top of each other that will feed to another neural net depending on its result or should I do it with only a single neural net?

Second question: how do I use a neural net to partition a databunch into different databunch elements I can use to train their respective models for?

Third question: my first neural net will have to split the data into 4 classes W,X,Y,Z and detect those that are not(W,X,Y,Z), what kind of training set do i need for the “not” class?