Error Downloading Dataset

Rai123 · March 26, 2019, 2:18am

Hello,

I am currently trying to go through lesson 1 and I keep receiving an error when I try to execute the following code : path = untar_data(URLs.PETS); path.

I noticed that it seems to be a connection error so I checked the url for URLs.PETS which shows up as “https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet” but that url seems to be broken.

I am using a mac but I am doing the actual lesson via Kaggle kernel. I have also been able to replicate this issue through colab so I was wondering if someone could point out what I could be doing wrong. I copy pasted the error I am receiving below.

Thank you for your time.

gaierror Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self)
140 conn = connection.create_connection(
–> 141 (self.host, self.port), self.timeout, **extra_kw)
142

/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
59
—> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
61 af, socktype, proto, canonname, sa = res

/opt/conda/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
744 addrlist = []
–> 745 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
746 af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
–> 601 chunked=chunked)
602

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
–> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, ‘sock’, None): # AppEngine might not have .sock
–> 850 conn.connect()
851

/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in connect(self)
283 # Add certificate verification
–> 284 conn = self._new_conn()
285

/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self)
149 raise NewConnectionError(
–> 150 self, “Failed to establish a new connection: %s” % e)
151

NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fe28738cbe0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
448 retries=self.max_retries,
–> 449 timeout=timeout
450 )

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
–> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()

/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
–> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389

MaxRetryError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7fe28738cbe0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
in ()
----> 1 path = untar_data(URLs.PETS); path

/opt/conda/lib/python3.6/site-packages/fastai/datasets.py in untar_data(url, fname, dest, data, force_download)
221 if dest.exists(): shutil.rmtree(dest)
222 if not dest.exists():
–> 223 fname = download_data(url, fname=fname, data=data)
224 data_dir = Config().data_path()
225 if url in _checks:

/opt/conda/lib/python3.6/site-packages/fastai/datasets.py in download_data(url, fname, data, ext)
203 if not fname.exists():
204 print(f’Downloading {url}’)
–> 205 download_url(f’{url}{ext}’, fname)
206 return fname
207

/opt/conda/lib/python3.6/site-packages/fastai/core.py in download_url(url, dest, overwrite, pbar, show_progress, chunk_size, timeout, retries)
174 s = requests.Session()
175 s.mount(‘http://’,requests.adapters.HTTPAdapter(max_retries=retries))
–> 176 u = s.get(url, stream=True, timeout=timeout)
177 try: file_size = int(u.headers[“Content-Length”])
178 except: show_progress = False

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in get(self, url, **kwargs)
544
545 kwargs.setdefault(‘allow_redirects’, True)
–> 546 return self.request(‘GET’, url, **kwargs)
547
548 def options(self, url, **kwargs):

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
531 }
532 send_kwargs.update(settings)
–> 533 resp = self.send(prep, **send_kwargs)
534
535 return resp

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in send(self, request, **kwargs)
644
645 # Send the request
–> 646 r = adapter.send(request, **kwargs)
647
648 # Total elapsed time of the request (approximately)

/opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
514 raise SSLError(e, request=request)
515
–> 516 raise ConnectionError(e, request=request)
517
518 except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7fe28738cbe0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

fogx · March 28, 2019, 11:12pm

Hey, did you find a solution to this problem? I have the exact same issue. Following the link leads to amazonAWS putting out a “noSuchKey” error.

Maybe there is an alternative location for the dataset someone could point out?
I’ve tried using the direct link from the university of oxford, but this throws the same error. (http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz)

I also installed anaconda to view the current path and it is the same as the one found in kaggle. So no update issues there.

fogx · April 24, 2019, 10:30pm

If anyone else has this issue, you can easily add a dataset by clicking + ADD DATASET and searching for the oxford pet dataset. Multiple people have uploaded it. just click on it to add it to your kernel.
The path will be referencable by using ‘…/input/images/’ path, which contains the the images.
There is no need to run the “untar” command, as kaggle datasets are automatically unzipped (and dont need to be downloaded).
cheers

frozander · September 4, 2019, 3:50pm

Ok, it has been a while but I have found the problem.

Since Kaggle needs phone verification to connect your notebook to internet, you just need to verify your phone number, then enable internet access on the notebook settings. That should fix it.

environmenteng · September 5, 2019, 7:52pm

Is this just another command line or do I have to put this instead of path.ls() ?
‘…/input/images/’ path

Thanks in advance

fogx · September 6, 2019, 9:08am

its the path, from your current relative path to the dataset. There’s a typo though, it should be
‘…/input/images/’
/parent
/input/images
/you

if you type path.ls(’…/’) you see the parent directory.

environmenteng · September 6, 2019, 2:05pm

Sorry, I don’t get it. I have the dataset from kaggle in the workspace now. I have also the file (zipped) on the computer locally.

Is it possible now to edit the line path= … to somthing that will make use directly of the data in the workspace? Or is ther eno option and this means the addon in the annotation in the workspace (“read only”)?

what do you mean by:

current realtive path
parent directory

and where do I enter ‘…/input/images/’ ? Is it for everyone valid or does someone have to edit this path?

Sorry for these questions, I hope I clarified my misunderstanding in this way…

fogx · September 6, 2019, 4:17pm

kaggle provides you with a linux VM.Read up on relative and absolute path names here
To use your own dataset you must first create a kaggle dataset and import that into your kaggle envirnoment. To use the oxford pet dataset you need to search for it and add it to your kernel.
Once you added the dataset it will be located in the input folder, as seen in the workspace dropdown on your right. This folder is not in the same directory as your working jupyter notebook file, but is located in the parent folder.
There are multiple ways to see and navigate paths and directories in python. i would recommend you use os right now.
import os
os.listdir("./")
will show you your current directory and the containing jupyter file you are in right now.
os.listdir("…/") <- two dots, not three
will show the parent path. here you can see the input folder
reference this input folder by storing it as a path
path_variable = Path("…/input") <-- two dots, not three
then use that path however you want. For example in a databunch:
data = ImageDataBunch.from_folder(path_variable, …

environmenteng · September 7, 2019, 1:26pm

Thanks for the reply, however it doesn’t work for me. I get
[’.ipynb_checkpoints’, ‘notebook_source.ipynb’]
when entering os.listdir("./")

I found in another kernel who used the dataset on kaggle (https://www.kaggle.com/tanlikesmath/oxford-iiit-pet-dataset-classification-with-fastai) with just images the following:
import os
cwd = os.getcwd()
print(cwd)

which prints me out:
/kaggle/working

I tried defining the path for the images as mentioned in the kernel as follows:
path_img = Path(’…/input/images’)

does not work

then I tried:
path_img = Path(‘kaggle/working/input/images’)

does not work

when entering the code:
fnames = get_image_files(path_img)
fnames[:5]

Error code:
FileNotFound

fogx · September 7, 2019, 6:53pm

please read the answers properly! I

kaggle provides you with a linux VM.Read up on relative and absolute path names here

learn paths in linux
the fast.ai forum seems to format two dots as 3 dots, which is why the command is wrong → learn relative paths. it should be “. ./” without the space inbetween

os.listdir(“./”)
will show you your current directory and the containing jupyter file you are in right now.

os.listdir(“./”) is showing you exactly what it should, as i wrote → its your juptyer notebook file

kaggle provides you with a linux VM.Read up on relative and absolute path names here

again, learn how to use linux paths using the link. The correct way to adress absolute paths is with a / at the beginning of the path, otherwise its a relative path from your current location. since there is no kaggle directory in your path, as seen with listdir, Path(“kaggle/anything”) cant work.

chusc · September 27, 2019, 7:40pm

This error may mean that your Kaggle instance doesn’t have access to the outside internet turned on, or is an unverified account. Kaggle requires you to verify your account via sms message if you want your notebook to access the outside internet.

Check your internet settings in the notebook to see if it has access to the internet or if sms verfication is required
Complete verification or turn on internet access for the notebook
Restart the kernel and your session to make sure that the settings have taken effect.