Lesson 1: Regular expression fails to match Windows paths

Hello, I tried running the lesson 1 notebook on my local Windows setup and got an error on the line containing:

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

AttributeError: 'NoneType' object has no attribute 'group'

It seems that the regular expression r'/([^/]+)_\d+.jpg$' is unable to match the Windows path, which is in the following format:

WindowsPath('D:/git/course-v3/nbs/dl1/data/oxford-iiit-pet/images/Abyssinian_1.jpg')

I am not very familiar with regular expressions. So, I could really use some help. Here’s a gist of the notebook for a better look at the issue.

The regular expression is being used here to extract class names from the file path. You can test regex using sites such as https://regex101.com/. However, I couldn’t see anything obviously wrong with this regex so you may want to double check all the files in path_img are of the correct format.

1 Like

Hey Madhurjya.

The gist looks fine - the only comment would be that when you run path.ls() on the 6th cell - the output should show the path of both /images and /annotations. It may have been a faulty or incomplete download of extraction. Try looking in the path and seeing what is there. Or try to re download the data set and run the notebook again.

Otherwise everything looks normal.

1 Like

I saw that while converting paths to string, the forward slashes got automatically changed to backslashes. Changing the regular expression to r'[/\\]([^/\\]+)_\d+.jpg$' fixes the code. Thank you for mentioning the website. It is super handy for testing regular expressions.

Hey Kieran,

Thank you so much for reviewing my code. It seems that the download was indeed incomplete. However, that wasn’t what caused this issue. Converting the path to string changes the forward slashes to DOS-style backslashes. That is why the regular expression failed. I will raise a PR changing the regular expression, so others won’t have to deal with this.

1 Like

Ah, maybe depends on the platform ? I tend to use pathlib to avoid problems with paths.

1 Like

Okay, I have figured it out.

Here’s the function from_name_re from the current release of fastai on conda (1.0.38):

@classmethod
def from_name_re(cls, path:PathOrStr, fnames:FilePathList, pat:str, valid_pct:float=0.2, **kwargs):
    "Create from list of `fnames` in `path` with re expression `pat`."
    pat = re.compile(pat)
    def _get_label(fn): return pat.search(str(fn)).group(1)
    return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, **kwargs)

And here’s the one on the git repository’s master branch:

See that fn.as_posix() in the latter? That will generate paths with forward slashes after converting to str (even on Windows) and make the same regular expression work on both platforms.

So, the fix is to use the bleeding edge git release or change the regular expression from
r'/([^/]+)_\d+.jpg$' to r'[/\\]([^/\\]+)_\d+.jpg$'

OK, as_posix() is a pathlib method which, as you said, returns a string of the path with forward slashes ( / )