Lesson 1 official topic

No need to apologise - just trying to help you get help! :smiley:

4 Likes

I figured it out. Somehow the initial download of the dataset didn’t have the test folder. I deleted both the imdb and imdb_tok folders and re-ran the script. No errors. Lesson learned - check your downloads. :slight_smile:

4 Likes

Hi there,

I’m using the first lesson to see if I can train AI to differentiate between a sardine and codfish, all runs perfect on the Jupiter notebook, but once I create a standalone python file, I get this error:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pjfonseca/python/is_sardine.py", line 39, in <module>
    resize_images(path/o, max_size=400, dest=path/o)
  File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastai/vision/utils.py", line 105, in resize_images
    parallel(resize_image, files, src=path, n_workers=max_workers, max_size=max_size, dest=dest, n_channels=n_channels, ext=ext,
  File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/parallel.py", line 110, in parallel
    return L(r)
  File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/foundation.py", line 97, in __call__
    return super().__call__(x, *args, **kwargs)
  File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/foundation.py", line 105, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/basics.py", line 61, in listify
    elif is_iter(o): res = list(o)
  File "/usr/lib64/python3.10/concurrent/futures/process.py", line 570, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 609, in result_iterator
    yield fs.pop().result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
shutil.SameFileError: Path('sardine_or_not/sardines/104deea5-c161-4d02-803a-2f0e8b2e4a4a.jpg') and Path('sardine_or_not/sardines/104deea5-c161-4d02-803a-2f0e8b2e4a4a.jpg') are the same file

Since I’m not a python expert but do have some coding skills, it seems to me that the resized file should have a different naming to avoid conflicts? What am I doing wrong? Sorry if this is a dumb question.

Loved the first lesson and jumping into lesson 2 in a few.

1 Like

It looks like the exception is thrown by shutil and is complaining something about same file.

@PJFonseca welcome to the fastai community :smiley:. Can you provide the code, so we too can try reproducing the same issue?

1 Like

I’m facing the same issue - Same File Path Error while Resizing images - lesson 1 (I didn’t find this question is previously asked, hence just created the question on the forum ).

1 Like

Sure, here you have it

from fastcore.all import *
from fastai.vision.all import *
from fastdownload import download_url
import time

def search_images(term, max_images=200):
    url = 'https://duckduckgo.com/'
    res = urlread(url,data={'q':term})
    searchObj = re.search(r'vqd=([\d-]+)\&', res)
    requestUrl = url + 'i.js'
    params = dict(l='us-en', o='json', q=term, vqd=searchObj.group(1), f=',,,', p='1', v7exp='a')
    urls,data = set(),{'next':1}
    while len(urls)<max_images and 'next' in data:
        data = urljson(requestUrl,data=params)
        urls.update(L(data['results']).itemgot('image'))
        requestUrl = url + data['next']
        time.sleep(0.2)
    return L(urls)[:max_images]


urls = search_images('photos of sardines', max_images=1)
urls[0]

dest = 'sardines.jpg'
download_url(urls[0], dest, show_progress=False)

im = Image.open(dest)
im.to_thumb(256,256)
download_url(search_images('photos of cod fish', max_images=1)[0], 'codfish.jpg', show_progress=False)
Image.open('codfish.jpg').to_thumb(256,256)

searches = 'codfish','sardines'
path = Path('sardine_or_not')

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'photos of {o}'))
    resize_images(path/o, max_size=400, dest=path/o)

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(100, method='squish')]
).dataloaders(path)

dls.show_batch(max_n=6)

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

is_sardine,_,probs = learn.predict(PILImage.create('codfish.jpg'))
print(f"This is a: {is_sardine}.")
print(f"Probability it's a sardine: {probs[0]:.4f}")
1 Like

Try deleting all subfolders under /sardine_or_not and see if you can run the code again

edit: you actually don’t need to run the following code in this cell again

searches = 'forest','bird'
path = Path('bird_or_not')

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    resize_images(path/o, max_size=400, dest=path/o)

since all the images had been already downloaded the previous time you ran the code. So if you run it twice there will be an error message.

But if you really want to run this cell and repeat the download process again for some reason, you may need to delete those previously downloaded images manually or by adding shutil.rmtree which will remove all files and folder at /bird_or_not

searches = 'forest','bird'
path = Path('bird_or_not')

import shutil
shutil.rmtree(path)

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    resize_images(path/o, max_size=400, dest=path/o)

Try and see if it works.

3 Likes

Looking at your code it looks like the way you have used resize_images method looks kind of wrong to me, as you are resizing to same folder. In Jeremy’s notebook in Paddy competition, I have seen him resizing like the below code:

trn_path = Path('sml')
resize_images(path/'train_images', dest=trn_path, max_size=256, recurse=True)

I have tweaked the code similarly and it’s running as expected when I tried running it as a python script.

from fastcore.all import *
from fastai.vision.all import *
from fastdownload import download_url
import time


def search_images(term, max_images=200):
    url = "https://duckduckgo.com/"
    res = urlread(url, data={"q": term})
    searchObj = re.search(r"vqd=([\d-]+)\&", res)
    requestUrl = url + "i.js"
    params = dict(
        l="us-en", o="json", q=term, vqd=searchObj.group(1), f=",,,", p="1", v7exp="a"
    )
    urls, data = set(), {"next": 1}
    while len(urls) < max_images and "next" in data:
        data = urljson(requestUrl, data=params)
        urls.update(L(data["results"]).itemgot("image"))
        requestUrl = url + data["next"]
        time.sleep(0.2)
    return L(urls)[:max_images]


urls = search_images("photos of sardines", max_images=1)
urls[0]

dest = "sardines.jpg"
download_url(urls[0], dest, show_progress=False)

im = Image.open(dest)
im.to_thumb(256, 256)
download_url(
    search_images("photos of cod fish", max_images=1)[0],
    "codfish.jpg",
    show_progress=False,
)

searches = "codfish", "sardines"
path = Path("sardine_or_not")
resize_path = Path("resized")

# print searches
for o in searches:
    dest = path / o
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f"photos of {o}"))
    resize_images(path / o, max_size=400, dest=resize_path / o)

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(100, method="squish")],
).dataloaders(path)

dls.show_batch(max_n=6)

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1)
2 Likes

Looking into Is it a bird? Creating a model from your own data | Kaggle and comparing, “your” version is better understable and I can see why the error.

Thank you so much for the help, not knowing python makes me do this dumb questions, sorry.

2 Likes

There is no dumb questions :smiley:.

5 Likes

Please don’t apologise - there are no dumb questions! Thank you very much for asking :smiley:

6 Likes

I was following up with the lesson 1 notebook, and trying to use a different dataset.

I was curious and had a couple of questions :

  1. The valid_loss is currently 0.8* (* signifies followed by any digits) in the model that I trained, in lesson 1 official notebooks, i saw that the valid_loss is (0.02*) after 3 epochs. The difference between both the losses seem noteworthy. Does it imply that my dataset isn’t quite good for model to train yet ? I used the exact same commands present in the lesson 1 notebook (just modified the dataset)
  2. I see that the loss increases at 2nd epoch, and then decreases in the 3rd epoch. Shouldn’t the loss be ideally unidirectional (i.e., decreasing after each epoch). What does this imply ?
    I’m new to Deep Learning, and so pls excuse if my questions are too basic :sweat_smile:
1 Like

I just cloned this Kaggle notebook to my account but when I try to run the second cell in it (the one that installs the fastai package) I get the following error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, which is not installed.
explainable-ai-sdk 1.3.2 requires xai-image-widget, which is not installed.
tensorflow 2.6.2 requires numpy~=1.19.2, but you have numpy 1.20.3 which is incompatible.
tensorflow 2.6.2 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.6.2 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.2 which is incompatible.
tensorflow 2.6.2 requires wrapt~=1.12.1, but you have wrapt 1.13.3 which is incompatible.
tensorflow-transform 1.5.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.15.0 which is incompatible.
tensorflow-transform 1.5.0 requires numpy<1.20,>=1.16, but you have numpy 1.20.3 which is incompatible.
tensorflow-transform 1.5.0 requires pyarrow<6,>=1, but you have pyarrow 6.0.1 which is incompatible.
tensorflow-transform 1.5.0 requires tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<2.8,>=1.15.2, but you have tensorflow 2.6.2 which is incompatible.
tensorflow-serving-api 2.7.0 requires tensorflow<3,>=2.7.0, but you have tensorflow 2.6.2 which is incompatible.
flake8 4.0.1 requires importlib-metadata<4.3; python_version < "3.8", but you have importlib-metadata 4.11.3 which is incompatible.
apache-beam 2.34.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.
apache-beam 2.34.0 requires httplib2<0.20.0,>=0.8, but you have httplib2 0.20.2 which is incompatible.
apache-beam 2.34.0 requires pyarrow<6.0.0,>=0.15.1, but you have pyarrow 6.0.1 which is incompatible.
aioitertools 0.10.0 requires typing_extensions>=4.0; python_version < "3.10", but you have typing-extensions 3.10.0.2 which is incompatible.
aiobotocore 2.1.2 requires botocore<1.23.25,>=1.23.24, but you have botocore 1.24.20 which is incompatible.
3 Likes

You can safely ignore this error and proceed further :slight_smile: Reference :- Is it a bird? Creating a model from your own data | Kaggle

6 Likes

Thanks! I was able to reproduce the results. However, when I tried to replace ‘bird’, ‘forest’ with ‘tiger’, ‘lion’, it doesn’t seem to work completely. For example, it correctly guesses the category but the probability is 0.0000. I’m guessing it might be related to the float formatting or rounding (the initial value is 5.174458692636108e-07) Does someone know what is wrong with it? Here’s my notebook: https://www.kaggle.com/code/knivetes/is-it-a-bird-creating-a-model-from-your-own-data

1 Like

I’m not able to view your notebook (Error 404), I think I might not have permissions to view. But if I understand correctly, The learn.predict() returns 1) the label(or category), 2) index to look from the probability tensor, 3) probabilities for all category (as a tensor) .

For E.g., in the above screenshot, the learner predicted that the person is ‘sad’ based on the 2nd index (or the 3rd position) from the probability tensor. Hope that helps!

I just set the access to public, can you see it now?

Perhaps it’s because the order of the categories in searches doesn’t matter, because they will be sorted alphabetically under the hood so the probability array will match the alphabetical order

2 Likes

doesn’t seem like there is a problem there, the model is very confident about its prediction at probability 0.000000517.
I suspect the tiger stripes help the model to be confident.