Share your work here ✅

Hey I am playing around with the same competition. I am confused about how to create validation set.
I see that you have used valid_pct =0.2. From what I understand, this argument will randomly take 20% of the data from the train folder and move to valid folder.Is this the correct way to do this? I only ask this because it is mentioned in this post that we should not have any driver common in test and validation set. The valid_pct = 0.2 will not ensure this. So what is the correct approach here?

1 Like

I find this experiment very interesting by design and for learning more so i had a look at your notebook.
I believe that you can improve on your score by working more with get_transforms. You are using the default setting and could probably improve the results by analysing an improve the effect of each tranform. Cropping doesn’t look meaning full in this cases

Another question is that looking at the data it is not clear to me what your are trying to learn the network - there is only on image of each count. should it count or make pattern matching ?

If you wanted the network to be able to distinguish between 49 and 50 then it would help a lot if you create thousands og different 49 and 50 dot images and tried out a regression objective also.

A next interesting level would be to do the same with dots of difference sizes.

I would love to see you take this further.

1 Like

Maybe the result could be improved upon if you don’t use any pretrained model (a fastai heresy!!) but instead train it from scratch. The images you have are quite different from imagenet such that maybe even the features learned by resnet in the earlier layers might only be just slightly useful.

1 Like

just to add, I suspect doing the vertical flip in get_transforms would be useful too:

get_transforms(flip_vert= True)
1 Like

great stuff ! thanks for sharing !

Hi pierre, just a little note about your post on medium. Please, be aware that deep learning is far from being an algorithm, it actually uses many different ones but it cannot be considered an algorithmic approach. Your expression “a Deep Learning algorithm” seems to me an oxymoron. Sad no one told you that before

1 Like

i can’t seem to get my versions of your example to work…
i even copy/pasted your code from github … but i get this error as i do with other attempts
when created dataBunch :
IndexError: index 0 is out of bounds for axis 0 with size 0

!curl https://course.fast.ai/setup/colab | bash
is this required or not
i do get this error here:
Updating fastai… spacy 2.0.18 has requirement numpy>=1.15.0, but you’ll have numpy 1.14.6 which is incompatible. Done.
tia -doug

Did u download the image data before creating databunch?
Before running databunch, you need to generate the image dataset.

It’s not much, but I made a model that classifies whether I or someone else said a trigger word.
I converted the recordings into a waveform image and used a cnn to classify those images.
Github: https://github.com/MohamedElshazly/Cool-DeepLearning-projects-using-Fastai-/blob/master/activation_word%20(1).ipynb

1 Like

yes -i ran it from the top . so im puzzled.

IndexError Traceback (most recent call last)
in ()
1 data = ImageDataBunch.from_folder(’/content/plants/’, bs=bs,
----> 2 ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
3 data

/usr/local/lib/python3.6/dist-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, valid_pct, classes, **kwargs)
116 path=Path(path)
117 il = ImageItemList.from_folder(path)
–> 118 if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)
119 else: src = il.random_split_by_pct(valid_pct)
120 src = src.label_from_folder(classes=classes)

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in split_by_folder(self, train, valid)
176 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:
177 “Split the data depending on the folder (train or valid) in which the filenames are.”
–> 178 return self.split_by_idxs(self._get_by_folder(train), self._get_by_folder(valid))
179
180 def random_split_by_pct(self, valid_pct:float=0.2, seed:int=None)->‘ItemLists’:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _get_by_folder(self, name)
172
173 def _get_by_folder(self, name):
–> 174 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
175
176 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in (.0)
172
173 def _get_by_folder(self, name):
–> 174 return [i for i in range_of(self) if self.items[i].parts[self.num_parts]==name]
175
176 def split_by_folder(self, train:str=‘train’, valid:str=‘valid’)->‘ItemLists’:

IndexError: index 0 is out of bounds for axis 0 with size 0

@douglas I’m able to reproduce the steps in notebook. Can you please confirm if the images got downloaded actually. What do u get if u type ! ls /content. Are you getting the files as in the below image:
image

big-cats.tar.gz data models plants-classification

no plants dir / just plant-classification

@douglas You should be having /plants folder. Not sure why you’re not getting that folder. Hope u won’t face any errors, by running the notebook directly in colab: https://colab.research.google.com/github/leslyarun/plant-fruit-classifier/blob/master/PlantClassifier.ipynb

Hi Fabrizio. Thanks for your feedback. I share your point of view and I just edited my post on medium.

my pleasure! nice post btw :wink:

Im working on identifying birds, as a test case.

The errors, are some mis-labeled images, a stuffed animal…

One question I have, everytime I run the notebook I get slightly different answers. trending towards better results. Why?

You were correct regarding the transformations. I was actually able to reach 100% accuracy after:

  1. Adding more data
  2. Removing all transforms
  3. Training the full network

Full Notebook on GitHub.

Another question is that looking at the data it is not clear to me what your are trying to learn the network - there is only on image of each count. should it count or make pattern matching ?

I’m not sure I understand, there are thousand different images of each count. That said, the network is not learning to “count” because of the way I have set up the problem; it can only predict which class a given image fits into. For example, if I showed it an image with 10 circles, it would have no way to output “10” as an answer. As suggested by @miko this project would probably better formulated as a regression problem instead of a classification problem. I will investigate this tomorrow.

A next interesting level would be to do the same with dots of difference sizes.

I tried this as well and it seemed to have no trouble getting 100% accuracy on this problem as well!

Here is what the data looked like with various sized circles:

1 Like

Thanks for the suggestion. I added this transform and removed most of the others (zoom, warp etc.) and was able to achieve 100% accuracy.

Full notebook on GitHub.

3 Likes
Wow! Look at that, this time we're getting 100% accuracy.  
It looks like if we throw enough data at it (and use proper transforms) this is a problem that can actually be trivially solved by convolutional neural networks.   
***I honestly did not expect that at all going into this.***

To be honest, me too :slight_smile:

this is quite impressive I would have tought that dots of difference sizes would be more difficult to handle. thx for doing the extra investingation