Hey All!
I’m getting ‘IndexError: no such group’ error when trying to use ‘ImageDataBunch.from_name_re’
My code is path_img = Path('/myDatasetNew/images'); path_img fnames = get_image_files(myDatasetNew); fnames[-5:] pat = re.compile(r'[^/myDatasetNew/images][a-zA-Z]+'); pat tfms = get_transforms(do_flip=False) data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=tfms, bs=bs ).normalize(imagenet_stats)
My file names are like
PosixPath(’/myDatasetNew/images/ valueiWant _ rose-165819__340.jpg’),
P.s. All the images have extension ‘.jpg’ I’ve made sure of it.
Hi, anyway to load image from CSV, I have a dataframe of shape 3000 x 784, where each rows of data is an image and the last column of dataframe is the label. Thank You
@sfsfsf, I suppose 3000 is the number of rows (images) and each image is 28x28 pixels encoded as 0 or 1, correct? There must be 785 columns then, one for label.
Yes Sir , you are right, the dimension suppose to be 3000 x 785.
I did read about the documentation and the only way I can think of is to convert the data frame into image in jpg format So that I can make the databunch object.
Just wondering if there any any ways to convert my data frame into databunch
Thank you very much, regarding the part of converting tensor into image, basically you will produce a bunch of images and store them in a folder ? May I know which function/library you are using?
Hi everyone, I’m really new to coding and am having a bit of a problem with my data for the first homework assignment of lesson 1. I’m trying to build a CNN that tells trees appart using images on google images. However, tree pictures have a lot of variance even with precise google searches, you get a lot of unwanted pictures such as distribution maps and essential oils that I would like to remove completely, but you can also get pictures of branches, leaves, bark or the full tree. From my understanding of CNN, putting birch leaves and bark in the same category will mislead the neural net during training and gives me an accuracy of about 50% over 30 classes. Like the neural networks but it’s not nearly accurate enough.
I was thinking of building an initial CNN for preprocessing the images into three to four categories while removing the “junk” and build a CNN for each category. I think I could get a higher accuracy while doing that.
First question: is it common practise in ML to build multiple neural nets on top of each other that will feed to another neural net depending on its result or should I do it with only a single neural net?
Second question: how do I use a neural net to partition a databunch into different databunch elements I can use to train their respective models for?
Third question: my first neural net will have to split the data into 4 classes W,X,Y,Z and detect those that are not(W,X,Y,Z), what kind of training set do i need for the “not” class?
It could be possible that where you have the files is not accessible by your jupyter notebook. Try using path.ls() to locate your notebook and check if the path you’ve mentioned in your code is according to that or not.
Also like @kdorichev said, it’s group(1)
Any way for me to convert the image to databunch? or what function should I use? I had read the docs but most of the function requires a path and folder but I now have image object instead. Thanks
I’ve just started with this course. After watching the first lesson, I’m planning to apply ResNet to a Kaggle dataset but the data give is in the form of pixel array and not in JPEG etc.
The (grayscale) images are of size 28x28 which have been flattened to a pixel array of length 784.
So, I just wanted to ask how should I process this data so that I can feed it into ResNet?
Hi, the Lesson 1 files out-of-the-box does not work for me. Followed all instructions - fresh install / did not change anything (on Gradient / Paperspace). Really stuck:/ will appreciate help. Is it possible that something is outdated?
On step: interp.plot_top_losses(9, figsize=(15,11))
I get an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-26-1b2e75ee979a> in <module>
1 #interp.plot_top_losses(9, figsize=(7,6))
----> 2 interp.plot_top_losses(6, figsize=(15,11))
/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/learner.py in _cl_int_plot_top_losses(self, k, largest, figsize, heatmap, heatmap_thresh, alpha, cmap, show_text, return_fig)
174 if show_text: fig.suptitle('Prediction/Actual/Loss/Probability', weight='bold', size=14)
175 for i,idx in enumerate(tl_idx):
--> 176 im,cl = self.data.dl(self.ds_type).dataset[idx]
177 cl = int(cl)
178 title = f'{classes[self.pred_class[idx]]}/{classes[cl]} / {self.losses[idx]:.2f} / {self.preds[idx][cl]:.2f}' if show_text else None
/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in __getitem__(self, idxs)
647 def __getitem__(self,idxs:Union[int,np.ndarray])->'LabelList':
648 "return a single (x, y) if `idxs` is an integer or a new `LabelList` object if `idxs` is a range."
--> 649 idxs = try_int(idxs)
650 if isinstance(idxs, Integral):
651 if self.item is None: x,y = self.x[idxs],self.y[idxs]
/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/torch_core.py in try_int(o)
365 "Try to convert `o` to int, default to `o` if not possible."
366 # NB: single-item rank-1 array/tensor can be converted to int, but we don't want to do this
--> 367 if isinstance(o, (np.ndarray,Tensor)): return o if o.ndim else int(o)
368 if isinstance(o, collections.Sized) or getattr(o,'__array_interface__',False): return o
369 try: return int(o)
AttributeError: 'Tensor' object has no attribute 'ndim'
I am trying to apply the resnet34 model from lesson one to the old Kaggle competition for humpback whale tail identification. Given I don’t have a lot of background here I was wondering a few things
There are 4000+ classes as each whale is individually labeled. Would the resnet network be suited for that large a number of classes? Does the number of classes make a difference here? When you have a large number of classes what kind of things would you do differently compared to having 30 or so like the pet example?
All unidentified whales are labeled as “new_whale”. It seems like having such a class would cause issues as all the new whales are not necessarily related and would cause the class to become over eager. Should I throw out the new whales or is there a way to effectively use this data?
I’ ve encountered exactly the same problem on Paperspace/Gradient with a Free-GPU notebook, but till now did not find a way to upgrade pytorch from version 1.0.0 (“quota exceeded…”). Thanks for any help!