Documentation improvements

to improve the docs of untar_data

untar_data [source][test]

untar_data ( url : str , fname : PathOrStr = None , dest : PathOrStr = None , data = True , force_download = False ) → Path

Download url to fname if it doesn’t exist, and un-tgz to folder dest .


it above in its semantic context refers to fname, but according to the source code, it should refer to dest, because only when not dest.exist() returns True, download_data will be executed

I would like to provide the following docs for untar_data

In general, untar_data use a url to download a tgz file under fname, and then un-tgz fname into a folder under dest.

After initial download, if running untar_data again with force_download=True or the tgz file under fname is corrupted somehow, then existing fname and dest will be removed and start to download again.

After initial downloading, if dest does not exist, meaning no folder under dest exist (the folder could be removed or renamed somehow), then running untar_data will execute download_data; and if the tgz file under fname exist, then there will be no actual downloading rather than un-tgz fname into dest; if fname does not exist, then downloading for the tgz file will be actually executed.

Note: the url you feed to untar_data must be one of URLs.something.

What do you think of this version of docs? Thanks
@stas @sgugger

1 Like