Lesson 1 In-Class Discussion ✅

Final Update - Ignore all I’ve said below - seems to be up to date on a fesh Data Science VM from the Azure templates (chaep pre-emptive) - just not finding doc. Maybe missing an import? And it is using GPU just fine. Might still need to work out and document future updates for notebooks and the environment in this config.

Which version of FastAI do you have @sonicviz ? The notebooks and FastAI get updated quite frequently - sounds like you are out of sync Paul. I have the same Azure setup, but have been using my Windows config recently. I’ll start up that VM and see if I repro.
UPDATE: I started my VM, and did a git pull for the course-v3 to get the latest notebooks. I didn’t update my conda environment (FastAI at that point was 1.0.34) and as expected I hit issues - different to yours - untar was a ore recent edition. I’ve seen others in this situation have to uninstall and reinstall
conda to get the latest FastAI. Mine wouldn’t update even after updating conda. I’ll update again when I resolve it.
UPDATE 2:
My Azure VM was created quite a while ago - so is not the current one that has the FastAI config already set up. I’ll contact the team to get the Azure setup docs change. For me - I just blew away my conda environment and rebuit using the following commands (my environment was called fastai-3.7.
conda env remove -n fastai-3.7
conda create -n fastai-3.7 python=3.7
conda activate fastai-3.7
conda install -c pytorch -c fastai fastai
conda install nb_conda_kernels
python -m ipykernel install --user --name fastai-3.7 --display-name “Python (fastai-3.7)”
I’ve noticed this is running slow though - and does not appear to be hitting the GPU…

Hi!

I just finished working with the lesson. I created a model from another dataset (multi-class classification problem) and wanted to get predictions on a test set. I passed test parameter when creating data bunch as shown:

`data = ImageDataBunch.from_csv(path=train_img, csv_labels=data_path/'full_data.csv', test=data_path/'test', size=224, fn_col=0, label_col=1)`

I now want to make predictions on the test set that I passed and used the learn.get_preds(3). I observed that the predictions were random. Is there a way to get the test set image names along with their corresponding predictions??

I have searched the forums but failed to find anything related to the v1 library.
Any help is appreciated :slight_smile:

1 Like

You might want to checkout lesson 2 where Jermey highlights how to run inference for a single image.

Thanks for confirming. I’m not using a pre-emptive, but an assigned build as I want to save.

I don’t know if it’s missing an import, it’s the stock notebook, being the introduction demo one that fails initially. If that doesn’t work nothing will! It seems to be missing module nbformat which is the Jupyter notebook format for some reason.

At the least the docs (!) need to be updated to make people aware this isn’t working.

1 Like

Looks like the Azure image needs:
conda install nb_conda_kernels
Then the docs command worked for me:

I’ll speak with the team who built it to get it added to the base image. Also I thought the pre-emptive did allow saving - as I have closed and re-opened my image and my notebook copies were still there. Is it just not guaranteed perhaps?

hmmm…I tried that in the console (after enabling boot diagnostics) and it’s still not working.

I think I’ll switch to another provider/setup, but it would be nice if this worked out of the box.
One of the biggest problems I’ve had with AI courses is they end up in timesinks just trying to get the environment working when you just want to get on with the work.

Thanks for the help though, please keep me posted if the team replies.

Also I thought the pre-emptive did allow saving - as I have closed and re-opened my image and my notebook copies were still there. Is it just not guaranteed perhaps?

If it gets deallocated you lose it I think.

" Lowering your cloud compute cost

Azure offers pre-emptable instances at a significant discount on compute usage charges compared to standard instances. These instances maybe deallocated from your subscription at any time depending on factors like demand for the compute on Azure. Sometimes if you retry after a short period or choose a different VM size you may be able to get another pre-emptable instance. Your work from the deallocated VM is not saved by default." https://course.fast.ai/start_azure.html

1 Like

Agreed - I understand your frustrations. I fixed mine by connecting to the VM (x2go) and then:
conda activate FastAI
conda install nb_conda_kernels
I also had to add some stuff to .bashrc so that I could activate.
I then ran Jupyter notebooks and it worked.
I’ll let you know if this gets fixed oob.

Thank you! :slight_smile:

Could you elaborate what you did here?

I don’t have the exact details as my VM isn’t up - but it came up as a suggestion when I tried to activate - but the command wasn’t quite correct
The correct version should be something like:
echo “. /data/anaconda/etc/profile.d/conda.sh” >> ~/.bashrc
There is a space after the . and you need to add a / in front of data that is not in the suggested command.

And to start Jupyter Notebooks just type Jupyter Notebooks and it will start in the VM (if you are using x2go or similar) or use the remote connection via the VM IP:8000.

You can do it by URL, and jump straight to the notebook once the VM is up.

I spent some time evaluating different cloud hosts.

Azure took about ~2’30" for the 4 epoch run in notebook1
By comparison:
Crestle Total time: 03:16 (and crashed updating the VM)
Floydhub Total time: 04:58 (has issue visualizing heatmaps though)
Kaggle Total time: 10:44
Google Compute Cloud w/T4 Total time: 01:56

It’s a pity the Azure VM is broken in the doc help, as it’s the fastest to start and run so far (but I guess you get what you pay for!)

Hi,
I’m trying to run the first lesson’s notebook for the first time, and I get a connection error when downloading the dataset using ‘untar_data’:


Downloading https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet

gaierror Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self)
140 conn = connection.create_connection(
–> 141 (self.host, self.port), self.timeout, **extra_kw)
142

/opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
59
—> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
61 af, socktype, proto, canonname, sa = res

/opt/conda/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
744 addrlist = []
–> 745 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
746 af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
600 body=body, headers=headers,
–> 601 chunked=chunked)
602

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
345 try:
–> 346 self._validate_conn(conn)
347 except (SocketTimeout, BaseSSLError) as e:

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn)
849 if not getattr(conn, ‘sock’, None): # AppEngine might not have .sock
–> 850 conn.connect()
851

/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in connect(self)
283 # Add certificate verification
–> 284 conn = self._new_conn()
285

/opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self)
149 raise NewConnectionError(
–> 150 self, “Failed to establish a new connection: %s” % e)
151

NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f35961dac18>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
448 retries=self.max_retries,
–> 449 timeout=timeout
450 )

/opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
638 retries = retries.increment(method, url, error=e, _pool=self,
–> 639 _stacktrace=sys.exc_info()[2])
640 retries.sleep()

/opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
387 if new_retry.is_exhausted():
–> 388 raise MaxRetryError(_pool, url, error or ResponseError(cause))
389

MaxRetryError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f35961dac18>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
in ()
----> 1 path = untar_data(URLs.PETS); path

/opt/conda/lib/python3.6/site-packages/fastai/datasets.py in untar_data(url, fname, dest, data, force_download)
160 shutil.rmtree(dest)
161 if not dest.exists():
–> 162 fname = download_data(url, fname=fname, data=data)
163 data_dir = Config().data_path()
164 assert _check_file(fname) == _checks[url], f"Downloaded file {fname} does not match checksum expected! Remove that file from {data_dir} and try your code again."

/opt/conda/lib/python3.6/site-packages/fastai/datasets.py in download_data(url, fname, data)
142 if not fname.exists():
143 print(f’Downloading {url}’)
–> 144 download_url(f’{url}.tgz’, fname)
145 return fname
146

/opt/conda/lib/python3.6/site-packages/fastai/core.py in download_url(url, dest, overwrite, pbar, show_progress, chunk_size, timeout, retries)
165 s = requests.Session()
166 s.mount(‘http://’,requests.adapters.HTTPAdapter(max_retries=retries))
–> 167 u = s.get(url, stream=True, timeout=timeout)
168 try: file_size = int(u.headers[“Content-Length”])
169 except: show_progress = False

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in get(self, url, **kwargs)
544
545 kwargs.setdefault(‘allow_redirects’, True)
–> 546 return self.request(‘GET’, url, **kwargs)
547
548 def options(self, url, **kwargs):

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
531 }
532 send_kwargs.update(settings)
–> 533 resp = self.send(prep, **send_kwargs)
534
535 return resp

/opt/conda/lib/python3.6/site-packages/requests/sessions.py in send(self, request, **kwargs)
644
645 # Send the request
–> 646 r = adapter.send(request, **kwargs)
647
648 # Total elapsed time of the request (approximately)

/opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
514 raise SSLError(e, request=request)
515
–> 516 raise ConnectionError(e, request=request)
517
518 except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7f35961dac18>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

I can’t access the data by manually accessing the link in the browser.
Was the dataset removed?

Thanks

I can still access the datasets.
Can you ensure that you have a good interenet connection?

If the untar method is still not working, try downloading it manually from here.

Thanks for the response.
When I click ‘https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet’ I get:

NoSuchKey The specified key does not exist. oxford-iiit-pet 27A2D66A1523EA9A +sSRTNWbVpmZZwschd6IIvZbLRMVb50HbPyYLIrpVKvAK2T/ILgVpyAZlaI7rU8nxOrJZNTmAgk=

Please try the link here.

Good news @sonicviz - the Azure Data Science VM team responded quickly and have fixed the doc issue (just needed to include nbconvert I their FastAI Python environment) and have also changed the pre-emptable model to not delete the storage on deallocation. This will then incur a storage charge while not running the VM but will still be pretty cheap. Of course if you get deallocated while using the VM you would lose what is in memory and not saved - so would make sense to save models (which most of the notebooks do). Could still be inconvenient of course - but a big potential saving.
I just created a new pre-emptable Azure VM from the template linked from the Azure setup https://course.fast.ai/start_azure.html and ran through the pets notebook to the end of the resnet34 section - all looking good! (Thanks Gopi and team!)

@brismith Hey that’s awesome, thanks for stepping up and handling that. Always pays to test:grinning:

Are you doing official support for the course or something? How do you know the Azure Data Science VM team?

I managed to test it on a few providers in the meantime, always good to get a feel for other platforms. The new T4’s on GCC seem fast.

:slight_smile: I work at Microsoft. ML/AI isn’t my day job but something I’m passionate about and very impressed with this course - and I wanted to make sure FastAI participants could use Azure as an option. I’m tempted to try out the Azure ND and NDv2 too.

1 Like

Just a note: If someone is looking at this thread in March 2018 or (presumably) more recently, Colab memory issue was fixed and Colab now works well for FastAI.