Source code for verify_images function

Hey everyone,

I am new here - just working through the second chapter and came across the verify_images function. I am looking through the source code to get a better understanding. I understand the logic on a high level but I am a little unfamiliar with how it is written, this bit more specifically:

    return L(fns[i] for i,o in enumerate(parallel(verify_image, fns)) if not o)

I understand this might be super naive but I would appreciate if someone can either explain or point me in the direction where I can read more about this approach of writting scripts in a condensed/efficient manner.

Full source code is included for your reference:

def verify_images(fns):
    "Find images in `fns` that can't be opened"
    return L(fns[i] for i,o in enumerate(parallel(verify_image, fns)) if not o)

This is very condensed code, so I try to break it down.
On a high level, this function iterates a list or tuple of filenames and tries to open them as images. It will return a list of all image filenames (specifically, the enhanced list type L), which could not be opened.

At first, verify_image is called, which will try to open an image under the filename fn. If the image can be opened, it will return True; if the file can’t be opened, it will return False.

def verify_image(fn):
    "Confirm that `fn` can be opened"
    try:
        im = Image.open(fn)  # uses PIL.Image to open an image at fn
        im.draft(im.mode, (32,32)) 
        im.load()
        return True
    except: return False

If you have an extensive list of filenames, this can take some time. Therefore, the function is applied in parallel.

def parallel(f, items, *args, n_workers=defaults.cpus, total=None, progress=None, pause=0,
             threadpool=False, timeout=None, chunksize=1, **kwargs):
    "Applies `func` in parallel to `items`, using `n_workers`"

The parallel function enables the usage of multiple CPU cores for a computing task. So instead of having one core checking all filenames, multiple cores can be used, greatly speeding up the operation. The parallel function will return a list with the output from the function applied in parallel. In this case, as the verify_image function was applied, the output will be a list with True or False in it.

The list is then enumerated. This means the first item o with its index i is returned first, then the second item with its index, then the third …
If the image could be opened, o will be true. If the image could not be opened, o will be false.

In the last step, a list comprehension with an additional logical condition is applied. It does look unfamiliar as L was used, which is an enhanced version of the python list. But one could write it just as a classical list comprehension (but lose all benefits from the L class)

[fns[i] for i,o in enumerate(parallel(verify_image, fns)) if not o]

If o is False (image could not be opened), that means not o is True. And if not o, the filename in the list at place i (fns[i]) will be returned. As this is a list comprehension, the operation yields a list with all filenames, which could not be opened as an image.

1 Like

Thank you @BresNet for taking the time to explain that. It makes sense!

1 Like