Using download_images retaining source file name

Hi,
I’m downloading images from URLs stored in a multi-tab CSV file using download_images. As per the master document list, this utility now saves images with the source file name. Do you have any idea which option to use so that downloaded files names are retained?

I had the same question, looking at the code it seems that it’s not possible. I changed the code locally to make it work:

def download_image(url,dest, timeout=4):
    try: r = download_url(url, dest, overwrite=True, show_progress=False, timeout=timeout)
    except Exception as e: print(f"Error {url} {e}")

def _download_image_inner(dest, info, i, timeout=4):
    url = info[0]
    name = info[1]
    suffix = re.findall(r'\.\w+?(?=(?:\?|$))', url)
    suffix = suffix[0] if len(suffix)>0  else '.jpg'
    download_image(url, dest/f"{name}{suffix}", timeout=timeout)

def download_images(urls:Collection[str], dest:PathOrStr, names:PathOrStr=None, max_pics:int=1000, max_workers:int=8, timeout=4):
    "Download images listed in text file `urls` to path `dest`, at most `max_pics`"
    urls = open(urls).read().strip().split("\n")[:max_pics]
    if names:
      names = open(names).read().strip().split("\n")[:max_pics]
    else:
      names = [f"{index:08d}" for index in range(0,len(urls))]
    info_list = list(zip(urls, names))
    dest = Path(dest)
    dest.mkdir(exist_ok=True)
    parallel(partial(_download_image_inner, dest, timeout=timeout), info_list, max_workers=max_workers)

Example, you need to pass a file with the names (same as you pass the urls):

download_images(base_path/file_urls, dest, names=base_path/file_names, max_pics=100)

Thanks Daniel…

Thanks Daniel… Have another question. I’m working on multi-level image classification problem and need prediction level for all the images specified in a train/validation dataset. I’m using the followings to create Data bunch…

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.1, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Getting the nice predictions, however the outcome file is not having the image name but series of integers starting 0 , 1 ,2 …like below-

|Image URL|Category|predictions|

|0 |4 |(tensor(0.1108), tensor(0.0471), tensor(0.1016), tensor(0.0283), tensor(0.5477), tensor(0.1645))|

|1 |3 |(tensor(0.0773), tensor(0.0042), tensor(0.1109), tensor(0.5051), tensor(0.0715), tensor(0.2310))|

|2 |4 |(tensor(0.0959), tensor(0.0434), tensor(0.1411), tensor(0.0792), tensor(0.4816), tensor(0.1588))|

This means that I need to have a way to pass on Image names to ImageDataBunch or should have a way by which I can map the image URL names with the integers (0 , 1 ,2 …) appearing in the first column like this…

|Image URL|Category|predictions|

|URL2 |4 |(tensor(0.1108), tensor(0.0471), tensor(0.1016), tensor(0.0283), tensor(0.5477), tensor(0.1645))|

| URL7 |3 |(tensor(0.0773), tensor(0.0042), tensor(0.1109), tensor(0.5051), tensor(0.0715), tensor(0.2310))|

| URL9 |4 |(tensor(0.0959), tensor(0.0434), tensor(0.1411), tensor(0.0792), tensor(0.4816), tensor(0.1588))|

Just wondering about if there is any way?

Tried ImageList.from_csv option as well without success…