_url2path Question

KevinB · September 1, 2019, 5:26pm

Currently, _url2path is defined as such:

def _url2path(url, c_key=ConfigKey.Archive):
    fname = url.split('/')[-1]
    local_path = URLs.LOCAL_PATH/('models' if c_key==ConfigKey.Model else 'data')/fname
    if local_path.exists(): return local_path
    return get_path(c_key)/fname

My question is, why do we look at local_path before we return get_path(c_key)/fname?

My thought is that we should always return the get_path(c_key)/fname path so that it behaves consistently. Curious to get everybody else’s thoughts on this.

My proposed change:

def _url2path(url, c_key=ConfigKey.Archive):
    fname = url.split('/')[-1]
    return get_path(c_key)/fname

nareshr8 · September 2, 2019, 3:53am

I guess it checks if the file is already downloaded. If we already have the file, we would just pull that file from local.

sgugger · September 2, 2019, 12:26pm

We look at local path because the reop comes with 4 of the tiny datasets. That way they are already there for the tests in the CI, which then doesn’t rely on downloading them again.