Lesson 1 In-Class Discussion ✅

deniro8355 · December 23, 2019, 3:08pm

Hey All!
I’m getting ‘IndexError: no such group’ error when trying to use ‘ImageDataBunch.from_name_re’

My code is
path_img = Path('/myDatasetNew/images'); path_img
fnames = get_image_files(myDatasetNew); fnames[-5:]
pat = re.compile(r'[^/myDatasetNew/images][a-zA-Z]+'); pat
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=tfms, bs=bs ).normalize(imagenet_stats)

My file names are like
PosixPath(’/myDatasetNew/images/ valueiWant _ rose-165819__340.jpg’),

P.s. All the images have extension ‘.jpg’ I’ve made sure of it.

kdorichev · December 23, 2019, 4:52pm

Please note that from_name_re gets group(1) from the match. In case there is no match, the group(1) raises IndexError: no such group.

Thus, you must have a mistake in your re pattern.
r'^/[A-Za-z]+/[A-Za-z]+/([A-Za-z]+)' worked for me:

fn = '/myDatasetNew/images/valueiWant_rose-165819__340.jpg'

def _get_label(fn): # code from `from_name_re`
    if isinstance(fn, Path): fn = fn.as_posix()
    res = pat.search(str(fn))
    assert res,f'Failed to find "{pat}" in "{fn}"'
    return res.group(1)

_get_label(fn)
'valueiWant'

Note this service for RE debugging: https://regex101.com/
I hope this helps.

sfsfsf · December 24, 2019, 4:04am

Hi, anyway to load image from CSV, I have a dataframe of shape 3000 x 784, where each rows of data is an image and the last column of dataframe is the label. Thank You

kdorichev · December 24, 2019, 5:20am

@sfsfsf, I suppose 3000 is the number of rows (images) and each image is 28x28 pixels encoded as 0 or 1, correct? There must be 785 columns then, one for label.

Can you show one row?

sfsfsf · December 24, 2019, 5:38am

Yes Sir , you are right, the dimension suppose to be 3000 x 785.

I did read about the documentation and the only way I can think of is to convert the data frame into image in jpg format So that I can make the databunch object.

Just wondering if there any any ways to convert my data frame into databunch

kdorichev · December 24, 2019, 7:40am

Please take a look (Kaggle Notebook) how I did that.

Briefly,

I read CSV into a Pandas DataFrame
converted rows into tensors
converted tensors into images
created customised LabelList – an MNISTClass
created a ImageDataBunch from a LabelList with create_from_ll

Feel free to ask further questions.

sfsfsf · December 24, 2019, 12:56pm

Thank you very much, regarding the part of converting tensor into image, basically you will produce a bunch of images and store them in a folder ? May I know which function/library you are using?

kdorichev · December 24, 2019, 1:08pm

No, there is no need to create images on disk. You can create Image object from the class offered by the fastai library.

I’ve created and use this helper function:

def tensor2Images(x,channels=3):
    assert channels == 3 or channels == 1, "Channels: 1 - mono, or 3 - RGB"
    return [Image(x[i].reshape(-1,28,28).repeat(channels, 1, 1)/255.) for i in range(x.shape[0])]

Here is the example and the result:

X.shape
>>> torch.Size([42000, 784])

tensor2Images(X)
>>> 
[Image (3, 28, 28),
 Image (3, 28, 28),
 Image (3, 28, 28),
...]

sfsfsf · December 24, 2019, 1:21pm

Okay I would try to apply on my problem, once again thank you very much for your help.

Eriblou · December 24, 2019, 3:40pm

Hi everyone, I’m really new to coding and am having a bit of a problem with my data for the first homework assignment of lesson 1. I’m trying to build a CNN that tells trees appart using images on google images. However, tree pictures have a lot of variance even with precise google searches, you get a lot of unwanted pictures such as distribution maps and essential oils that I would like to remove completely, but you can also get pictures of branches, leaves, bark or the full tree. From my understanding of CNN, putting birch leaves and bark in the same category will mislead the neural net during training and gives me an accuracy of about 50% over 30 classes. Like the neural networks but it’s not nearly accurate enough.

I was thinking of building an initial CNN for preprocessing the images into three to four categories while removing the “junk” and build a CNN for each category. I think I could get a higher accuracy while doing that.

First question: is it common practise in ML to build multiple neural nets on top of each other that will feed to another neural net depending on its result or should I do it with only a single neural net?

Second question: how do I use a neural net to partition a databunch into different databunch elements I can use to train their respective models for?

Third question: my first neural net will have to split the data into 4 classes W,X,Y,Z and detect those that are not(W,X,Y,Z), what kind of training set do i need for the “not” class?

aashray18521 · December 24, 2019, 4:00pm

It could be possible that where you have the files is not accessible by your jupyter notebook. Try using path.ls() to locate your notebook and check if the path you’ve mentioned in your code is according to that or not.
Also like @kdorichev said, it’s group(1)

sfsfsf · December 26, 2019, 10:55am

image=tensor2Images(x)

Any way for me to convert the image to databunch? or what function should I use? I had read the docs but most of the function requires a path and folder but I now have image object instead. Thanks

kdorichev · December 26, 2019, 12:54pm

I have already answered this in my previous post.

perceptron · December 26, 2019, 1:51pm

Hello all,

I’ve just started with this course. After watching the first lesson, I’m planning to apply ResNet to a Kaggle dataset but the data give is in the form of pixel array and not in JPEG etc.
The (grayscale) images are of size 28x28 which have been flattened to a pixel array of length 784.
So, I just wanted to ask how should I process this data so that I can feed it into ResNet?

Thanks,
perceptron

kdorichev · December 26, 2019, 2:06pm

Please take a look (Kaggle Notebook) how I did that.

@perceptron, Welcome to the community. And good luck!

perceptron · December 26, 2019, 2:20pm

@kdorichev Thank you so much! I would love to have a look and get back to you if I don’t get anything.

Roszko · December 29, 2019, 9:57am

Hi, the Lesson 1 files out-of-the-box does not work for me. Followed all instructions - fresh install / did not change anything (on Gradient / Paperspace). Really stuck:/ will appreciate help. Is it possible that something is outdated?

On step: interp.plot_top_losses(9, figsize=(15,11))

I get an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-26-1b2e75ee979a> in <module>
      1 #interp.plot_top_losses(9, figsize=(7,6))
----> 2 interp.plot_top_losses(6, figsize=(15,11))

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/learner.py in _cl_int_plot_top_losses(self, k, largest, figsize, heatmap, heatmap_thresh, alpha, cmap, show_text, return_fig)
    174     if show_text: fig.suptitle('Prediction/Actual/Loss/Probability', weight='bold', size=14)
    175     for i,idx in enumerate(tl_idx):
--> 176         im,cl = self.data.dl(self.ds_type).dataset[idx]
    177         cl = int(cl)
    178         title = f'{classes[self.pred_class[idx]]}/{classes[cl]} / {self.losses[idx]:.2f} / {self.preds[idx][cl]:.2f}' if show_text else None

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in __getitem__(self, idxs)
    647     def __getitem__(self,idxs:Union[int,np.ndarray])->'LabelList':
    648         "return a single (x, y) if `idxs` is an integer or a new `LabelList` object if `idxs` is a range."
--> 649         idxs = try_int(idxs)
    650         if isinstance(idxs, Integral):
    651             if self.item is None: x,y = self.x[idxs],self.y[idxs]

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/torch_core.py in try_int(o)
    365     "Try to convert `o` to int, default to `o` if not possible."
    366     # NB: single-item rank-1 array/tensor can be converted to int, but we don't want to do this
--> 367     if isinstance(o, (np.ndarray,Tensor)): return o if o.ndim else int(o)
    368     if isinstance(o, collections.Sized) or getattr(o,'__array_interface__',False): return o
    369     try: return int(o)

AttributeError: 'Tensor' object has no attribute 'ndim'

kdorichev · December 29, 2019, 4:34pm

This was fixed in pytorch 1.2. Please uprgade pytorch.

ameerkat · December 29, 2019, 8:12pm

I am trying to apply the resnet34 model from lesson one to the old Kaggle competition for humpback whale tail identification. Given I don’t have a lot of background here I was wondering a few things

There are 4000+ classes as each whale is individually labeled. Would the resnet network be suited for that large a number of classes? Does the number of classes make a difference here? When you have a large number of classes what kind of things would you do differently compared to having 30 or so like the pet example?
All unidentified whales are labeled as “new_whale”. It seems like having such a class would cause issues as all the new whales are not necessarily related and would cause the class to become over eager. Should I throw out the new whales or is there a way to effectively use this data?

Thanks for any help!

Tom3 · December 29, 2019, 8:41pm

I’ ve encountered exactly the same problem on Paperspace/Gradient with a Free-GPU notebook, but till now did not find a way to upgrade pytorch from version 1.0.0 (“quota exceeded…”). Thanks for any help!