No need to apologise - just trying to help you get help!
I figured it out. Somehow the initial download of the dataset didn’t have the test folder. I deleted both the imdb and imdb_tok folders and re-ran the script. No errors. Lesson learned - check your downloads.
Hi there,
I’m using the first lesson to see if I can train AI to differentiate between a sardine and codfish, all runs perfect on the Jupiter notebook, but once I create a standalone python file, I get this error:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/pjfonseca/python/is_sardine.py", line 39, in <module>
resize_images(path/o, max_size=400, dest=path/o)
File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastai/vision/utils.py", line 105, in resize_images
parallel(resize_image, files, src=path, n_workers=max_workers, max_size=max_size, dest=dest, n_channels=n_channels, ext=ext,
File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/parallel.py", line 110, in parallel
return L(r)
File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/foundation.py", line 97, in __call__
return super().__call__(x, *args, **kwargs)
File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/foundation.py", line 105, in __init__
items = listify(items, *rest, use_list=use_list, match=match)
File "/home/pjfonseca/.local/lib/python3.10/site-packages/fastcore/basics.py", line 61, in listify
elif is_iter(o): res = list(o)
File "/usr/lib64/python3.10/concurrent/futures/process.py", line 570, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 609, in result_iterator
yield fs.pop().result()
File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
shutil.SameFileError: Path('sardine_or_not/sardines/104deea5-c161-4d02-803a-2f0e8b2e4a4a.jpg') and Path('sardine_or_not/sardines/104deea5-c161-4d02-803a-2f0e8b2e4a4a.jpg') are the same file
Since I’m not a python expert but do have some coding skills, it seems to me that the resized file should have a different naming to avoid conflicts? What am I doing wrong? Sorry if this is a dumb question.
Loved the first lesson and jumping into lesson 2 in a few.
It looks like the exception is thrown by shutil and is complaining something about same file.
@PJFonseca welcome to the fastai community . Can you provide the code, so we too can try reproducing the same issue?
I’m facing the same issue - Same File Path Error while Resizing images - lesson 1 (I didn’t find this question is previously asked, hence just created the question on the forum ).
Sure, here you have it
from fastcore.all import *
from fastai.vision.all import *
from fastdownload import download_url
import time
def search_images(term, max_images=200):
url = 'https://duckduckgo.com/'
res = urlread(url,data={'q':term})
searchObj = re.search(r'vqd=([\d-]+)\&', res)
requestUrl = url + 'i.js'
params = dict(l='us-en', o='json', q=term, vqd=searchObj.group(1), f=',,,', p='1', v7exp='a')
urls,data = set(),{'next':1}
while len(urls)<max_images and 'next' in data:
data = urljson(requestUrl,data=params)
urls.update(L(data['results']).itemgot('image'))
requestUrl = url + data['next']
time.sleep(0.2)
return L(urls)[:max_images]
urls = search_images('photos of sardines', max_images=1)
urls[0]
dest = 'sardines.jpg'
download_url(urls[0], dest, show_progress=False)
im = Image.open(dest)
im.to_thumb(256,256)
download_url(search_images('photos of cod fish', max_images=1)[0], 'codfish.jpg', show_progress=False)
Image.open('codfish.jpg').to_thumb(256,256)
searches = 'codfish','sardines'
path = Path('sardine_or_not')
for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'photos of {o}'))
resize_images(path/o, max_size=400, dest=path/o)
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(100, method='squish')]
).dataloaders(path)
dls.show_batch(max_n=6)
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)
is_sardine,_,probs = learn.predict(PILImage.create('codfish.jpg'))
print(f"This is a: {is_sardine}.")
print(f"Probability it's a sardine: {probs[0]:.4f}")
Try deleting all subfolders under /sardine_or_not
and see if you can run the code again
edit: you actually don’t need to run the following code in this cell again
searches = 'forest','bird'
path = Path('bird_or_not')
for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'{o} photo'))
resize_images(path/o, max_size=400, dest=path/o)
since all the images had been already downloaded the previous time you ran the code. So if you run it twice there will be an error message.
But if you really want to run this cell and repeat the download process again for some reason, you may need to delete those previously downloaded images manually or by adding shutil.rmtree
which will remove all files and folder at /bird_or_not
searches = 'forest','bird'
path = Path('bird_or_not')
import shutil
shutil.rmtree(path)
for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'{o} photo'))
resize_images(path/o, max_size=400, dest=path/o)
Try and see if it works.
Looking at your code it looks like the way you have used resize_images
method looks kind of wrong to me, as you are resizing to same folder. In Jeremy’s notebook in Paddy competition, I have seen him resizing like the below code:
trn_path = Path('sml')
resize_images(path/'train_images', dest=trn_path, max_size=256, recurse=True)
I have tweaked the code similarly and it’s running as expected when I tried running it as a python script.
from fastcore.all import *
from fastai.vision.all import *
from fastdownload import download_url
import time
def search_images(term, max_images=200):
url = "https://duckduckgo.com/"
res = urlread(url, data={"q": term})
searchObj = re.search(r"vqd=([\d-]+)\&", res)
requestUrl = url + "i.js"
params = dict(
l="us-en", o="json", q=term, vqd=searchObj.group(1), f=",,,", p="1", v7exp="a"
)
urls, data = set(), {"next": 1}
while len(urls) < max_images and "next" in data:
data = urljson(requestUrl, data=params)
urls.update(L(data["results"]).itemgot("image"))
requestUrl = url + data["next"]
time.sleep(0.2)
return L(urls)[:max_images]
urls = search_images("photos of sardines", max_images=1)
urls[0]
dest = "sardines.jpg"
download_url(urls[0], dest, show_progress=False)
im = Image.open(dest)
im.to_thumb(256, 256)
download_url(
search_images("photos of cod fish", max_images=1)[0],
"codfish.jpg",
show_progress=False,
)
searches = "codfish", "sardines"
path = Path("sardine_or_not")
resize_path = Path("resized")
# print searches
for o in searches:
dest = path / o
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f"photos of {o}"))
resize_images(path / o, max_size=400, dest=resize_path / o)
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(100, method="squish")],
).dataloaders(path)
dls.show_batch(max_n=6)
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(1)
Looking into Is it a bird? Creating a model from your own data | Kaggle and comparing, “your” version is better understable and I can see why the error.
Thank you so much for the help, not knowing python makes me do this dumb questions, sorry.
There is no dumb questions .
Please don’t apologise - there are no dumb questions! Thank you very much for asking
I was following up with the lesson 1 notebook, and trying to use a different dataset.
I was curious and had a couple of questions :
- The valid_loss is currently 0.8* (* signifies followed by any digits) in the model that I trained, in lesson 1 official notebooks, i saw that the valid_loss is (0.02*) after 3 epochs. The difference between both the losses seem noteworthy. Does it imply that my dataset isn’t quite good for model to train yet ? I used the exact same commands present in the lesson 1 notebook (just modified the dataset)
- I see that the loss increases at 2nd epoch, and then decreases in the 3rd epoch. Shouldn’t the loss be ideally unidirectional (i.e., decreasing after each epoch). What does this imply ?
I’m new to Deep Learning, and so pls excuse if my questions are too basic
I just cloned this Kaggle notebook to my account but when I try to run the second cell in it (the one that installs the fastai
package) I get the following error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-io 0.21.0 requires tensorflow-io-gcs-filesystem==0.21.0, which is not installed.
explainable-ai-sdk 1.3.2 requires xai-image-widget, which is not installed.
tensorflow 2.6.2 requires numpy~=1.19.2, but you have numpy 1.20.3 which is incompatible.
tensorflow 2.6.2 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
tensorflow 2.6.2 requires typing-extensions~=3.7.4, but you have typing-extensions 3.10.0.2 which is incompatible.
tensorflow 2.6.2 requires wrapt~=1.12.1, but you have wrapt 1.13.3 which is incompatible.
tensorflow-transform 1.5.0 requires absl-py<0.13,>=0.9, but you have absl-py 0.15.0 which is incompatible.
tensorflow-transform 1.5.0 requires numpy<1.20,>=1.16, but you have numpy 1.20.3 which is incompatible.
tensorflow-transform 1.5.0 requires pyarrow<6,>=1, but you have pyarrow 6.0.1 which is incompatible.
tensorflow-transform 1.5.0 requires tensorflow!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<2.8,>=1.15.2, but you have tensorflow 2.6.2 which is incompatible.
tensorflow-serving-api 2.7.0 requires tensorflow<3,>=2.7.0, but you have tensorflow 2.6.2 which is incompatible.
flake8 4.0.1 requires importlib-metadata<4.3; python_version < "3.8", but you have importlib-metadata 4.11.3 which is incompatible.
apache-beam 2.34.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.4 which is incompatible.
apache-beam 2.34.0 requires httplib2<0.20.0,>=0.8, but you have httplib2 0.20.2 which is incompatible.
apache-beam 2.34.0 requires pyarrow<6.0.0,>=0.15.1, but you have pyarrow 6.0.1 which is incompatible.
aioitertools 0.10.0 requires typing_extensions>=4.0; python_version < "3.10", but you have typing-extensions 3.10.0.2 which is incompatible.
aiobotocore 2.1.2 requires botocore<1.23.25,>=1.23.24, but you have botocore 1.24.20 which is incompatible.
You can safely ignore this error and proceed further Reference :- Is it a bird? Creating a model from your own data | Kaggle
Thanks! I was able to reproduce the results. However, when I tried to replace ‘bird’, ‘forest’ with ‘tiger’, ‘lion’, it doesn’t seem to work completely. For example, it correctly guesses the category but the probability is 0.0000. I’m guessing it might be related to the float formatting or rounding (the initial value is 5.174458692636108e-07
) Does someone know what is wrong with it? Here’s my notebook: https://www.kaggle.com/code/knivetes/is-it-a-bird-creating-a-model-from-your-own-data
I’m not able to view your notebook (Error 404), I think I might not have permissions to view. But if I understand correctly, The learn.predict()
returns 1) the label(or category), 2) index to look from the probability tensor, 3) probabilities for all category (as a tensor) .
For E.g., in the above screenshot, the learner predicted that the person is ‘sad’ based on the 2nd index (or the 3rd position) from the probability tensor. Hope that helps!
I just set the access to public, can you see it now?
Perhaps it’s because the order of the categories in searches
doesn’t matter, because they will be sorted alphabetically under the hood so the probability array will match the alphabetical order
doesn’t seem like there is a problem there, the model is very confident about its prediction at probability 0.000000517.
I suspect the tiger stripes help the model to be confident.