Same File Path Error while Resizing images - lesson 1

Hello, I just finished with lesson 1 of the course (Newbie here). While executing this code :

searches = ‘happy person’,‘sad person’,‘angry person’
path = Path(‘happy_sad_angry’)

for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f’{o} photo’))
resize_images(path/o, max_size=400, dest=path/o)

I’m getting the error -


---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/mambaforge/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/root/mambaforge/lib/python3.9/concurrent/futures/process.py", line 205, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/root/mambaforge/lib/python3.9/concurrent/futures/process.py", line 205, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/root/mambaforge/lib/python3.9/site-packages/fastcore/parallel.py", line 58, in _call
    return g(item)
  File "/root/mambaforge/lib/python3.9/site-packages/fastai/vision/utils.py", line 93, in resize_image
    else: shutil.copy2(file, dest_fname)
  File "/root/mambaforge/lib/python3.9/shutil.py", line 444, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/root/mambaforge/lib/python3.9/shutil.py", line 244, in copyfile
    raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
shutil.SameFileError: Path('happy_sad_angry/happy person/bea49f63-fa26-431c-9853-1d487d692591.jpg') and Path('happy_sad_angry/happy person/bea49f63-fa26-431c-9853-1d487d692591.jpg') are the same file
"""

The above exception was the direct cause of the following exception:

SameFileError                             Traceback (most recent call last)
Input In [25], in <cell line: 4>()
      6 dest.mkdir(exist_ok=True, parents=True)
      7 download_images(dest, urls=search_images(f'{o} photo'))
----> 8 resize_images(path/o, max_size=400, dest=path/o)

File ~/mambaforge/lib/python3.9/site-packages/fastai/vision/utils.py:105, in resize_images(path, max_workers, max_size, recurse, dest, n_channels, ext, img_format, resample, resume, **kwargs)
    103 files = get_image_files(path, recurse=recurse)
    104 files = [o.relative_to(path) for o in files]
--> 105 parallel(resize_image, files, src=path, n_workers=max_workers, max_size=max_size, dest=dest, n_channels=n_channels, ext=ext,
    106                img_format=img_format, resample=resample, resume=resume, **kwargs)

File ~/mambaforge/lib/python3.9/site-packages/fastcore/parallel.py:123, in parallel(f, items, n_workers, total, progress, pause, threadpool, timeout, chunksize, *args, **kwargs)
    121     if total is None: total = len(items)
    122     r = progress_bar(r, total=total, leave=False)
--> 123 return L(r)

File ~/mambaforge/lib/python3.9/site-packages/fastcore/foundation.py:97, in _L_Meta.__call__(cls, x, *args, **kwargs)
     95 def __call__(cls, x=None, *args, **kwargs):
     96     if not args and not kwargs and x is not None and isinstance(x,cls): return x
---> 97     return super().__call__(x, *args, **kwargs)

File ~/mambaforge/lib/python3.9/site-packages/fastcore/foundation.py:105, in L.__init__(self, items, use_list, match, *rest)
    103 def __init__(self, items=None, *rest, use_list=False, match=None):
    104     if (use_list is not None) or not is_array(items):
--> 105         items = listify(items, *rest, use_list=use_list, match=match)
    106     super().__init__(items)

File ~/mambaforge/lib/python3.9/site-packages/fastcore/basics.py:59, in listify(o, use_list, match, *rest)
     57 elif isinstance(o, list): res = o
     58 elif isinstance(o, str) or is_array(o): res = [o]
---> 59 elif is_iter(o): res = list(o)
     60 else: res = [o]
     61 if match is not None:

File ~/mambaforge/lib/python3.9/concurrent/futures/process.py:562, in _chain_from_iterable_of_lists(iterable)
    556 def _chain_from_iterable_of_lists(iterable):
    557     """
    558     Specialized implementation of itertools.chain.from_iterable.
    559     Each item in *iterable* should be a list.  This function is
    560     careful not to keep references to yielded objects.
    561     """
--> 562     for element in iterable:
    563         element.reverse()
    564         while element:

File ~/mambaforge/lib/python3.9/concurrent/futures/_base.py:609, in Executor.map.<locals>.result_iterator()
    606 while fs:
    607     # Careful not to keep a reference to the popped future
    608     if timeout is None:
--> 609         yield fs.pop().result()
    610     else:
    611         yield fs.pop().result(end_time - time.monotonic())

File ~/mambaforge/lib/python3.9/concurrent/futures/_base.py:439, in Future.result(self, timeout)
    437     raise CancelledError()
    438 elif self._state == FINISHED:
--> 439     return self.__get_result()
    441 self._condition.wait(timeout)
    443 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/mambaforge/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

SameFileError: Path('happy_sad_angry/happy person/bea49f63-fa26-431c-9853-1d487d692591.jpg') and Path('happy_sad_angry/happy person/bea49f63-fa26-431c-9853-1d487d692591.jpg') are the same file

I checked in my folder, and don’t see any duplicate files downloaded. Can someone pls help here. Thanks!

2 Likes

Try deleting all three subfolders under /happy_sad_angry and see if you can run the code again
ie delete these:
/happy_sad_angry/“happy person”/
/happy_sad_angry/“sad person”/
/happy_sad_angry/“angry person”/

Yeah, I did that but still faced the same error

That’s strange. :grimacing:
I got the same error when I tried to run this cell in Kaggle the second time too.

SameFileError: Path('bird_or_not/forest/43bf75a4-3020-4ad6-9f82-86e1895a93ee.jpg') 
and Path('bird_or_not/forest/43bf75a4-3020-4ad6-9f82-86e1895a93ee.jpg') 
are the same file

but once I delete the folders using shutil.rmtree(path) as shown below I could run the same cell many times

searches = 'forest','bird'
path = Path('bird_or_not')

import shutil
shutil.rmtree(path)

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    resize_images(path/o, max_size=400, dest=path/o)
2 Likes

Hi,

I got the same error. What I did to solve the problem was to use 2 folders: one folder for downloading the images, and the other for containing the resized images:

searches = fruits_list
downloaded_path = Path("downloaded_fruits")
resized_path = Path("resized_fruits")
for o in searches:
    print(o)
    downloaded_dest = (downloaded_path/o)
    downloaded_dest.mkdir(exist_ok = True, parents = True)
    download_images(downloaded_dest, urls = search_images(f"{o} fruit photo"))
    resize_images(downloaded_path/o, max_size = 400, dest = resized_path/o)

Best,
Long.

9 Likes

Thanks, this worked !! but I’m curious when running the notebook without changing anything (i.e., just executing the is_bird classifier that Jeremy created), I didn’t encounter this error. However, when tried with a new dataset, with almost no change in the code, the cell failed with the above error.

Thanks for the answer, but that didn’t work in my case :frowning: . I had to provide a different destination path for resized_images to proceed further.

2 Likes

@kashish18 can you check if this works for you

2 Likes

Yeah I also found the same hypothesis correct. The reason probably is because how resize_images function is written in fastai:

2 Likes

Yes, that worked, Thanks! However, I was referring to this Notebook : Is it a bird? Creating a model from your own data | Kaggle , where the dest parameter in resize_images is same as the source path.

2 Likes

Hello, I had also the same issue. I investigate and found that in FastAi code, resize images ( fastai/utils.py at b3ec3e1ab032a34777565d1649f091110ff2fa8f · fastai/fastai · GitHub) call the function shutil.copy2 even when the file doesn’t need to be modified (which could happen if image is small enough and dest folder is same than source folder). And shutil.copy2 doesn’t accept source and destination file to be the exactly the same.
I’ve applied the same workaround than you : define a destination folder distinct from source path

Furthermore, I will try to propose a PR to fast AI code in order to fix this issue.

5 Likes

Hi everybody,
Here is the issue : SameFileError on resizeImages · Issue #3744 · fastai/fastai · GitHub

And the PR : fix same file error message when resizing image by cvergnes · Pull Request #3743 · fastai/fastai · GitHub

Hope it will help

1 Like

Awesome Thanks!

1 Like

Hi Long,

Thank you for helping out with the error. If possible, please share the link of your notebook, this will really help.

Regards,
Ashish

Hi Kashish, I would like to go through your corrected code. If possible, please share the link of your note book. Thanks in advance. :innocent:

1 Like

Hi Everyone,

Ran into the same issue as well and was stuck for two days. Thanks so much for all the helpful sharing, greatly appreciate it!

1 Like

Getting Started - Creating Categorical Image Predictor Models | fastpages Here’s a blog post I made for lesson 1, the section that I shared contains the corrected code :slight_smile:

2 Likes

Thank you Kashish. I have read the blog and thanks for explaining the snippets of code at the end. This is really helpful. :innocent:

Just three days … that’s all it took for this community to identify and resolve a frustrating error. Lesson 1 is awesome, but to really master it, you have to test the code, try something different and the moment you do, you get a very difficult error to resolve.

But, not for this community. Thanks everyone.

First it was @longm89 who discovers a workaround, then a PR is issued, then @kashish18 publishes a very clean clear blog post that explains everything with a fix before the PR is fixed.

Thanks Everyone !!

Another workaround, that worked for me and avoids having two folders uses a try-except statement. Imho this also save a little storage. Are there any disadvantages in following this approach?

import shutil
try:
      resize_images(path/o, max_size=400, dest=path/o)
except(shutil.SameFileError):
      print("SameFileError occured, and was ignored")