Lesson 1 official topic

eyp · April 14, 2023, 7:02pm

Maybe you would need to clean up a bit your dataset, but in lesson 2 you’ll get some tips about that. Continue the course, and come back to your variation of the model when you have learned more things.

Abawesomed · April 14, 2023, 7:12pm

I didnt have internet turned on

got it working now.

Next in the tutorial it instructs me to build an image uploader

import ipywidgets as widgets
from IPython.display import display
from PIL import Image

uploader = widgets.FileUpload(accept='image/*', multiple=False)
display(uploader)

I’ve tried a few versions but I always get a

Error displaying widget: model not found

Are you able to make an image uploader on kaggle?

eyp · April 14, 2023, 9:56pm

This is how I’ve done it:

import ipywidgets as widgets
uploader = widgets.FileUpload()
uploader

yactouat · April 15, 2023, 6:32pm

Hello,

Jeremy uses forest to discriminate what is a bird and what is not in the lesson. Could he have used any other object, such as a chair, a construction site, or really anything else ?

I’m asking because I don’t know if the object to compare to the thing we want to recognize in the image classifier has somehow to be related to this object: e.g. we would compare birds and forests because they are in the same “domain” … does it matter at all ?

thanks in advance !

ForBo7 · April 16, 2023, 9:17am

It could be similar, or it could be totally different. It doesn’t matter!

ForBo7 · April 16, 2023, 9:21am

Try adding the following parameter and argument to your DataBlock:
batch_tfms=aug_transforms()

What this essentially does is twist, warp, crop, and rotate your images in various ways so your model can learn to “see” them in various ways. It helps improve accuracy by quite a bit.

yactouat · April 16, 2023, 9:23am

thx ! and also thx for the tip below

eyp · April 18, 2023, 6:12am

Yes, it could have been whatever.

VickyB · April 18, 2023, 9:38am

Hi everybody! I’m running the Kaggle notebook for the first lesson, but when running line 20:

from fastdownload import download_url
dest = 'bird.jpg'
download_url(urls[0], dest, show_progress=False)

from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)

I get a truly enormous error consisting of the below followed by (what looks like) the html for a webpage or several webpages about birds, which is labelled as ‘HTTP error 404, ===ERROR BODY===’

HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_17/3449441972.py in <module>
      1 from fastdownload import download_url
      2 dest = 'bird.jpg'
----> 3 download_url(urls[0], dest, show_progress=False)
      4 
      5 from fastai.vision.all import *

/opt/conda/lib/python3.7/site-packages/fastdownload/core.py in download_url(url, dest, timeout, show_progress)
     21         pbar.total = tsize
     22         pbar.update(count*bsize)
---> 23     return urlsave(url, dest, reporthook=progress if show_progress else None, timeout=timeout)
     24 
     25 # Cell

/opt/conda/lib/python3.7/site-packages/fastcore/net.py in urlsave(url, dest, reporthook, headers, timeout)
    182     dest = urldest(url, dest)
    183     dest.parent.mkdir(parents=True, exist_ok=True)
--> 184     nm,msg = urlretrieve(url, dest, reporthook, headers=headers, timeout=timeout)
    185     return nm
    186 

/opt/conda/lib/python3.7/site-packages/fastcore/net.py in urlretrieve(url, filename, reporthook, data, headers, timeout)
    147 def urlretrieve(url, filename=None, reporthook=None, data=None, headers=None, timeout=None):
    148     "Same as `urllib.request.urlretrieve` but also works with `Request` objects"
--> 149     with contextlib.closing(urlopen(url, data, headers=headers, timeout=timeout)) as fp:
    150         headers = fp.info()
    151         if filename: tfp = open(filename, 'wb')

/opt/conda/lib/python3.7/site-packages/fastcore/net.py in urlopen(url, data, headers, timeout, **kwargs)
    106         if not isinstance(data, (str,bytes)): data = urlencode(data)
    107         if not isinstance(data, bytes): data = data.encode('ascii')
--> 108     try: return urlopener().open(urlwrap(url, data=data, headers=headers), timeout=timeout)
    109     except HTTPError as e:
    110         e.msg += f"\n====Error Body====\n{e.read().decode(errors='ignore')}"

/opt/conda/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

/opt/conda/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

/opt/conda/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

/opt/conda/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

/opt/conda/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

Any ideas what’s going on here? I tried a Chromium browser (Opera) just in case Firefox was breaking something, but no luck. Pretty stumped, as I only started learning python at all very recently.

theaeonwanderer · April 18, 2023, 11:16am

I guess the URL stored in url[0] is a broken one.

The previous lines might be the following:

urls = search_images('bird photos', max_images=1)
urls[0]

Make the following modifications and check if it works:

urls = search_images('bird photos', max_images=5)
urls[3]

Try with some other numbers as well.

VickyB · April 18, 2023, 11:40am

Thanks! That worked perfectly.

Ifeanyi · April 18, 2023, 7:44pm

Is there a way to source data for segmentation task? Like a way to create segmentation mask from self sourced data?

lucasvw · April 20, 2023, 12:15pm

@Ifeanyi, you could try https://segment-anything.com/, it was released not so long ago (1 week ago?) and seems to be pretty good at segmenting anything… which I guess you could use to generate training data.

Abawesomed · April 21, 2023, 2:57pm

Hello all,

In this first lesson as I understand it, the model is fine-tuned to recognize if something is a bird OR a tree (no other option of not-tree / not-bird).

I trained the model and then downloaded and picture of a cow and received the following results:

Code

label,index,probs = learn.predict(PILImage.create('cow.jpg'))
print(f"This is a: {label}.")
print(f"This is a: {index} label.")
print(f"Probability it's a bird: {probs[0]:.4f}")

Output

This is a: bird.
This is a: 0 label.
Probability it's a bird: 0.9228

Questions:

The model can only decide on one of the two labels correct? As it does not have any other labels to use?
I would imagine that although it may pick the bird label as its an animal and the cow is an animal, that the confidence would not be so close to 1. Why would it give it a 0.92?

FelipeBandeira · April 28, 2023, 10:37am

Hey there! It seems like I’m taking the course a year late, and I’m not even sure if this topic is still active, but I’ll give it a try. I was experimenting with the Kaggle notebook and got a few questions I’d appreciate help with!

Question 1: when we search for the sequence of 400 images of bird and forest, where exactly are they downloaded and stored? As in: in the cloud where I’m running my colab notebook or somewhere in my computer and I’m just not seeing?

Question 2: how does the verify_images() function actually work? What is it verifying, and how does it do that?

Question 3: suppose I’m downloading images of birds and forests, but it turns out several of my bird images are corrupted/cannot be used. How does this imbalance in the dataset affect the predictions my model makes? Will it be biased to predict more often the category with the greatest number of valid images or something like that?

Question 4: how do I know the ideal number of fine-tuning cycles to use? In other words, how do you differentiate making the model better and overfitting?

Thanks a lot for the help!

FelipeBandeira · April 28, 2023, 10:42am

Hey there! A few answers from someone also trying to figure it all out:

Yes, the model can only predict based on the two categories you trained it to. If you feed anything else to it (such as the photo of a cow), it will see how close the cow is to the two categories you trained it for and will return to you its best guess. To get more labels in the output, you need to train it for that.
I believe it’s not much about the bird and the cow being animals, but instead: how similar are the features of the cow photo similar to the features of the bird and forest categories? The shapes, the colors, the combinations of pixels… I might be wrong here, but I think the 92% means something along the lines of “we can see a 92% match between the characteristics of the cow photo and the characteristics we usually see in cow photos”. But when we say “characteristics”, again, think of it in abstract terms, such as shapes and colors present in each one

theaeonwanderer · May 5, 2023, 5:30am

Is it possible to understand what weights might look like for a particular DL task? Understanding how exactly it would impact the model? Or is it hard and we should treat it as a black box- something which is hidden inside a NN?

For an image classifier what would the weights look like? For tabular tasks how would it look like? and so on…

An example from chapter 1:

For instance, in Samuel’s checkers program, different values of the
weights would result in different checkers-playing strategies

How to know this for other tasks?

theaeonwanderer · May 9, 2023, 8:10am

We explored 5 different tasks in Lesson 1 (lecture and the chapter):

Image Classification
Image Segmentation
Sentiment Analysis with NLP
Tabular Analysis
Collaborative Filtering

Out of these 5, we can use only 1 and 3 to test for our own input data right?!

The others are sort of locked in themselves based on the data used in training them. Like I can’t use the segmentation thing to segment my own images as it was done for the particular set of CAMVID dataset. Likewise, I can’t input anything in the tabular analysis thing as it is specific for what we did. Same for Collaborative filtering.

The only reason I am asking is because I wanted to try something with my own data and hence, want to know where it is possible and where it is not, at least as of now in Lesson 1.

fcaldas · May 15, 2023, 3:35am

Good day everybody,

Just finished Lesson 1 and trying to train my own model. It has two categories, just like the forest and a bird but, in my case, I have uploaded the dataset to kaggle directly and I am not using the functions to download images from the web.

I am getting stuck on how to load the images from the input/dataset directory.
Should I be reading them from input/dataset or should I copy them the output/working/my-dataset ?

I’ve tried both, but getting the same error when I create the DataBlock

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[82], line 9
      1 dls = DataBlock(
      2     blocks=(ImageBlock, CategoryBlock), 
      3     get_items=get_image_files, 
   (...)
      6     item_tfms=[Resize(192, method='squish')]
      7 ).dataloaders(path, bs=32)
----> 9 dls.show_batch(max_n=6)

File /opt/conda/lib/python3.10/site-packages/fastai/data/core.py:149, in TfmdDL.show_batch(self, b, max_n, ctxs, show, unique, **kwargs)
    147     old_get_idxs = self.get_idxs
    148     self.get_idxs = lambda: Inf.zeros
--> 149 if b is None: b = self.one_batch()
    150 if not show: return self._pre_show_batch(b, max_n=max_n)
    151 show_batch(*self._pre_show_batch(b, max_n=max_n), ctxs=ctxs, max_n=max_n, **kwargs)

File /opt/conda/lib/python3.10/site-packages/fastai/data/load.py:171, in DataLoader.one_batch(self)
    170 def one_batch(self):
--> 171     if self.n is not None and len(self)==0: raise ValueError(f'This DataLoader does not contain any batches')
    172     with self.fake_l.no_multiproc(): res = first(self)
    173     if hasattr(self, 'it'): delattr(self, 'it')

ValueError: This DataLoader does not contain any batches

before the DataBlock, I have the code:

path = Path('/kaggle/working/carnival-dataset/carnival')
    
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)```
which returns zero.

Could anyone point to what I am doing wrong, please? Thank you!

fcaldas · May 15, 2023, 3:57am

I"ve worked it out. /input is readonly, so I copied the directory to /working and can progress further