Lesson 1 In-Class Discussion ✅

Hey All!
I’m getting ‘IndexError: no such group’ error when trying to use ‘ImageDataBunch.from_name_re’

My code is
path_img = Path('/myDatasetNew/images'); path_img
fnames = get_image_files(myDatasetNew); fnames[-5:]
pat = re.compile(r'[^/myDatasetNew/images][a-zA-Z]+'); pat
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=tfms, bs=bs ).normalize(imagenet_stats)

My file names are like
PosixPath(’/myDatasetNew/images/ valueiWant _ rose-165819__340.jpg’),

P.s. All the images have extension ‘.jpg’ I’ve made sure of it. :frowning:

Please note that from_name_re gets group(1) from the match. In case there is no match, the group(1) raises IndexError: no such group.

Thus, you must have a mistake in your re pattern.
r'^/[A-Za-z]+/[A-Za-z]+/([A-Za-z]+)' worked for me:

fn = '/myDatasetNew/images/valueiWant_rose-165819__340.jpg'

def _get_label(fn): # code from `from_name_re`
    if isinstance(fn, Path): fn = fn.as_posix()
    res = pat.search(str(fn))
    assert res,f'Failed to find "{pat}" in "{fn}"'
    return res.group(1)

_get_label(fn)
'valueiWant'

Note this service for RE debugging: https://regex101.com/
I hope this helps. :heart:

Hi, anyway to load image from CSV, I have a dataframe of shape 3000 x 784, where each rows of data is an image and the last column of dataframe is the label. Thank You

@sfsfsf, I suppose 3000 is the number of rows (images) and each image is 28x28 pixels encoded as 0 or 1, correct? There must be 785 columns then, one for label.

Can you show one row?

Yes Sir , you are right, the dimension suppose to be 3000 x 785.

I did read about the documentation and the only way I can think of is to convert the data frame into image in jpg format So that I can make the databunch object.

Just wondering if there any any ways to convert my data frame into databunch

Please take a look (Kaggle Notebook) how I did that.

Briefly,

  • I read CSV into a Pandas DataFrame
  • converted rows into tensors
  • converted tensors into images
  • created customised LabelList – an MNISTClass
  • created a ImageDataBunch from a LabelList with create_from_ll

Feel free to ask further questions.

1 Like

Thank you very much, regarding the part of converting tensor into image, basically you will produce a bunch of images and store them in a folder ? May I know which function/library you are using?

No, there is no need to create images on disk. You can create Image object from the class offered by the fastai library.

I’ve created and use this helper function:

def tensor2Images(x,channels=3):
    assert channels == 3 or channels == 1, "Channels: 1 - mono, or 3 - RGB"
    return [Image(x[i].reshape(-1,28,28).repeat(channels, 1, 1)/255.) for i in range(x.shape[0])]

Here is the example and the result:

X.shape
>>> torch.Size([42000, 784])

tensor2Images(X)
>>> 
[Image (3, 28, 28),
 Image (3, 28, 28),
 Image (3, 28, 28),
...]
1 Like

Okay I would try to apply on my problem, once again thank you very much for your help.

Hi everyone, I’m really new to coding and am having a bit of a problem with my data for the first homework assignment of lesson 1. I’m trying to build a CNN that tells trees appart using images on google images. However, tree pictures have a lot of variance even with precise google searches, you get a lot of unwanted pictures such as distribution maps and essential oils that I would like to remove completely, but you can also get pictures of branches, leaves, bark or the full tree. From my understanding of CNN, putting birch leaves and bark in the same category will mislead the neural net during training and gives me an accuracy of about 50% over 30 classes. Like the neural networks but it’s not nearly accurate enough.

I was thinking of building an initial CNN for preprocessing the images into three to four categories while removing the “junk” and build a CNN for each category. I think I could get a higher accuracy while doing that.

First question: is it common practise in ML to build multiple neural nets on top of each other that will feed to another neural net depending on its result or should I do it with only a single neural net?

Second question: how do I use a neural net to partition a databunch into different databunch elements I can use to train their respective models for?

Third question: my first neural net will have to split the data into 4 classes W,X,Y,Z and detect those that are not(W,X,Y,Z), what kind of training set do i need for the “not” class?

It could be possible that where you have the files is not accessible by your jupyter notebook. Try using path.ls() to locate your notebook and check if the path you’ve mentioned in your code is according to that or not.
Also like @kdorichev said, it’s group(1) :slight_smile:

1 Like
image=tensor2Images(x)

Any way for me to convert the image to databunch? or what function should I use? I had read the docs but most of the function requires a path and folder but I now have image object instead. Thanks

I have already answered this in my previous post. :wink:

Hello all,

I’ve just started with this course. After watching the first lesson, I’m planning to apply ResNet to a Kaggle dataset but the data give is in the form of pixel array and not in JPEG etc.
The (grayscale) images are of size 28x28 which have been flattened to a pixel array of length 784.
So, I just wanted to ask how should I process this data so that I can feed it into ResNet?

Thanks,
perceptron

Please take a look (Kaggle Notebook) how I did that.

@perceptron, Welcome to the community. And good luck!

2 Likes

@kdorichev Thank you so much! I would love to have a look and get back to you if I don’t get anything. :slight_smile:

Hi, the Lesson 1 files out-of-the-box does not work for me. Followed all instructions - fresh install / did not change anything (on Gradient / Paperspace). Really stuck:/ will appreciate help. Is it possible that something is outdated?

On step: interp.plot_top_losses(9, figsize=(15,11))

I get an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-26-1b2e75ee979a> in <module>
      1 #interp.plot_top_losses(9, figsize=(7,6))
----> 2 interp.plot_top_losses(6, figsize=(15,11))

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/learner.py in _cl_int_plot_top_losses(self, k, largest, figsize, heatmap, heatmap_thresh, alpha, cmap, show_text, return_fig)
    174     if show_text: fig.suptitle('Prediction/Actual/Loss/Probability', weight='bold', size=14)
    175     for i,idx in enumerate(tl_idx):
--> 176         im,cl = self.data.dl(self.ds_type).dataset[idx]
    177         cl = int(cl)
    178         title = f'{classes[self.pred_class[idx]]}/{classes[cl]} / {self.losses[idx]:.2f} / {self.preds[idx][cl]:.2f}' if show_text else None

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in __getitem__(self, idxs)
    647     def __getitem__(self,idxs:Union[int,np.ndarray])->'LabelList':
    648         "return a single (x, y) if `idxs` is an integer or a new `LabelList` object if `idxs` is a range."
--> 649         idxs = try_int(idxs)
    650         if isinstance(idxs, Integral):
    651             if self.item is None: x,y = self.x[idxs],self.y[idxs]

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/torch_core.py in try_int(o)
    365     "Try to convert `o` to int, default to `o` if not possible."
    366     # NB: single-item rank-1 array/tensor can be converted to int, but we don't want to do this
--> 367     if isinstance(o, (np.ndarray,Tensor)): return o if o.ndim else int(o)
    368     if isinstance(o, collections.Sized) or getattr(o,'__array_interface__',False): return o
    369     try: return int(o)

AttributeError: 'Tensor' object has no attribute 'ndim'
2 Likes

This was fixed in pytorch 1.2. Please uprgade pytorch.

4 Likes

I am trying to apply the resnet34 model from lesson one to the old Kaggle competition for humpback whale tail identification. Given I don’t have a lot of background here I was wondering a few things

  1. There are 4000+ classes as each whale is individually labeled. Would the resnet network be suited for that large a number of classes? Does the number of classes make a difference here? When you have a large number of classes what kind of things would you do differently compared to having 30 or so like the pet example?
  2. All unidentified whales are labeled as “new_whale”. It seems like having such a class would cause issues as all the new whales are not necessarily related and would cause the class to become over eager. Should I throw out the new whales or is there a way to effectively use this data?

Thanks for any help!

I’ ve encountered exactly the same problem on Paperspace/Gradient with a Free-GPU notebook, but till now did not find a way to upgrade pytorch from version 1.0.0 (“quota exceeded…”). Thanks for any help!

2 Likes