Absolute Beginner - Loading MNIST instead of dogs vs cats

telafferty · June 29, 2020, 7:22pm

Hi there!
I’m running a Jupyter notebook on a Paperspace Gradient P5000 machine. I’m attempting my first run through of lesson 1 using the MNIST dataset instead of the dogs and cats.

I changed this line of code:
path = untar_data(URLs.MNIST); path

When I step through the notebook line by line and run it by pressing Shift + Enter as I go, it doesn’t appear to do anything at all. No errors. No data. Nothing.

Here are my questions:

What have I done wrong?
Where should I look in the lesson one video for clarification?

For whatever it’s worth, I have nearly no experience with Python, and I’m completely new to machine learning and neural networks.

Ezno · July 1, 2020, 10:59am

The untar_data function will download data, if you don’t already have it on your machine. If for whatever reason it is already downloaded, it can look like nothing is happening. If you put path.ls() in it’s own cell and run that cell do you get anything back?

Another thing you can try if there is nothing in the ‘path’ variable you define, is to go to ‘kernal’ -> ‘restart kernel’ if you feel that the notebook really isn’t executing any commands. You can also try entering some simple code to make sure the notebook is executing code (ie 2+2). If you put that as the last line in the same sell with your untar_data function, you will know for sure that it is executing that cell and getting past that function.

telafferty · July 1, 2020, 7:30pm

Thanks a million for the input - I will try it a little bit later this afternoon and will let you know if that resolved the issue. Thanks again for your help!

telafferty · July 6, 2020, 10:52pm

It seems pretty clear that I don’t know what I’m doing…

Here’s the code I’ve modified which appears to run successfully:
#=================================================
%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.vision import *
from fastai.metrics import error_rate

bs = 64
path = untar_data(URLs.MNIST)
data = ImageDataBunch.from_folder(path, train=‘training’, valid=‘testing’)

path.ls()

path_anno = path/‘annotations’
path_img = path/‘images’

path.ls()

Print success indicator

print(‘ready’)
#============================

It’s the next step that I’m not sure how to approach.

When I look at the images, I see that they are stored in directories named:
0
1
2
3
4
5
6
7
8
9

The png filenames seem to have no relationship to the images, which is fine. I would assume, for example, if I’m training to recognize the digit ‘0’ that I should pull all the files in the directory named ‘0’, but I don’t know how to do that.

When I run this line:
testing = get_image_files(path_img)

I would expect it to load all the images from this path:
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training’

Instead, I get a file not found error:

FileNotFoundError Traceback (most recent call last)
in
----> 1 testing = get_image_files(path_img)

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in get_image_files(c, check_ext, recurse)
19 def get_image_files(c:PathOrStr, check_ext:bool=True, recurse=False)->FilePathList:
20 “Return list of files in c that are images. check_ext will filter to image_extensions.”
—> 21 return get_files(c, extensions=(image_extensions if check_ext else None), recurse=recurse)
22
23 def get_annotations(fname, prefix=None):

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py in get_files(path, extensions, recurse, exclude, include, presort, followlinks)
42 return res
43 else:
—> 44 f = [o.name for o in os.scandir(path) if o.is_file()]
45 res = _get_files(path, path, f, extensions)
46 if presort: res = sorted(res, key=lambda p: _path_to_same_str§, reverse=False)

FileNotFoundError: [Errno 2] No such file or directory: ‘/notebooks/course-v3/nbs/dl1/data/mnist_png/images’

Perhaps I should spend the recommended year learning Python first?

I’ve spent quite a bit of time in VBA, but it may not be sufficient.

Thanks again for your help!

Thomas

Ezno · July 7, 2020, 1:13am

The more the struggle, the faster you learn from it. I don’t think it’s possible to learn without struggling through details like you are now. To me, it seems like you are on a good path for learning. If you can stick with it, I think the current path you are on is a great one! It is challenging as you get started, but you will learn. My advice is to keep learning python as you are going through the fastai courses. You could try the intro to machine learning course on fastai as well, which may be a bit easier and still keep you moving toward machine learning and python skillset.

You are very close, what you are trying to do is what the ImageDataBunch is designed to help with and you have actually already done! You can run a few things from your data variable such as

data.show_batch(rows=3, figsize=(4,4)) will show you one of your batches of images and what it looks like.

data.train_ds[3] will give the 3rd item in the training dataset, in my case (Image (3, 28, 28), Category 9). Which is showing that the input is a 28*28 pixel image, that has the label 9.

If you want to train a model with it, you just pass it directly to the learner.

learn = cnn_learner(data, models.resnet18, metrics=accuracy)
learn.fit_one_cycle(1,1e-2)

In the Jupyter notebook, you can see what other mdthods you can run off your data variable by running this in a cell data.*?

If you really want to look at individual images outside of the data bunch, you could start by looking at this below.

threes = (path/'train'/'3').ls().sorted()

im3 = Image.open(threes[1])

# show image
im3

#look at tensor shape
tensor(im3).shape

#look at it in a dataframe
pd.DataFrame(tensor(im3)).loc[3:24,6:20].style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

telafferty · July 7, 2020, 2:04am

AWESOME!! I don’t mean to shout, but well… - I’m just thrilled! I was just about to throw in the towel and enroll in a Python programming course.

I’m gonna play with this for a while now. Thank you very much! Any way other than verbal to give kudos on the forum?

Thomas

telafferty · July 7, 2020, 2:17am

Holy cow - this thing is accurate! When I looked at some of the images it gone wrong I would have gotten them wrong as well. Thanks again for straightening this out.

I’m a true noob at this, so this was most helpful!

Looking at this code:
threes = (path/‘train’/‘3’).ls().sorted()

I also get an error. Changing ‘train’ to ‘training’ yields:

FileNotFoundError: [Errno 2] No such file or directory: ‘/notebooks/course-v3/nbs/dl1/data/mnist_png/train/3’

Ezno · July 7, 2020, 12:27pm

Check path.ls(). This is just listing files in that directory. Then you navigate to the folder with the images. If yours are in a folder caller training, then try (path/‘train’).ls(). Keep going until for find the images. For example in the attached you can see how I navigate to find the folder with the images of threes. ls() is just listing files in that folder the same way you would see in your files browser/finder.

Ezno · July 8, 2020, 3:53pm

Yeah, I think you would get less from a python programming course than you may think. It’s like taking a badminton class to learn tennis. Obviously there’s a lot of skills that are transferrable, but isn’t quite the most expedient path to the goal.

The Intro to Machine Learning course has a bit of a kinder start than the deep learning ones in my opinion, though either are doable to start from. Just expirament a lot, apply things from the lectures to your own projects and don’t set a strict timeline for the lectures. You learn informaiton in the lectures but you learn how to actually apply that information in your personal expiraments and projects. You’ll be surprised at how many things seem to be totally obvious after listening to the lecture that take a lot of work when you sit down to do it yourself! You’ll learn the pieces of python you really need as you do that.

Later if you really enjoy the programming side, you can take a dedicated python course. Or if you really enjoy the math bits you can take some dedicated math courses. And just explore the areas that interest you or confuse you further.

At least that’s my advice/2 cents.

telafferty · July 9, 2020, 3:21am

Hi Ezno -
Your data appears to have been uploaded differently than mine. When I run:

path.ls()

I get:

[PosixPath(‘/notebooks/course-v3/nbs/dl1/data/mnist_png/training’),
PosixPath(‘/notebooks/course-v3/nbs/dl1/data/mnist_png/testing’)]

So when I run
path(/‘training’)

I get
File “”, line 1
path(/‘training’)
^
SyntaxError: invalid syntax

When I run
path(/‘train’)

I get the same error. I’ve tried several variations on the same thing and get error each time. The one different result is if I put the / inside of the single quotes, then I get posix path not callable.

??

Ezno · July 9, 2020, 4:02am

Try path/‘training’ with no parenthesis. That should give you the path. Then do ls() on the whole thing by putting it in parenthesis.

(path/‘training’).ls()

telafferty · July 9, 2020, 4:33am

I’m going to try that right now - as soon as I’ve restarted my instance. I signed off a bit ago. Will let you know shortly, and again, a million thanks!

telafferty · July 9, 2020, 4:42am

This:
(path/‘training’).ls()

Gives:
[PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/4’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/2’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/3’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/9’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/6’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/7’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/5’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/8’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/0’),
PosixPath(’/notebooks/course-v3/nbs/dl1/data/mnist_png/training/1’)]

But this:
threes = (path/‘training/3’).ls().sorted()

im3 = Image.show(threes[1])

show image

im3

Gives:

AttributeError Traceback (most recent call last)
in
----> 1 threes = (path/‘training/3’).ls().sorted()
2
3 im3 = Image.show(threes[1])
4
5 # show image

AttributeError: ‘list’ object has no attribute ‘sorted’

When I try sort instead of sorted:

TypeError Traceback (most recent call last)
in
1 threes = (path/‘training/3’).ls().sort()
2
----> 3 im3 = Image.show(threes[1])
4
5 # show image

TypeError: ‘NoneType’ object is not subscriptable

Which I believe tells me I have an empty set?

telafferty · July 9, 2020, 4:54am

So I tried:
threes = (path/‘training’/‘3’).ls().sort()

strong textim3 = Image.show(threes)

show image

im3

is a bit closer as it yields a very long series of code lines with errors…

abcde13 · July 9, 2020, 4:56am

Well, it may be closer or not. Highly doubt showing threes will actually get you anywhere. You definitely want a certain index. Can you just show the output of threes alone after the sort call?

Also, as you may have learned, sorted() is not a method on lists.

EDIT: Oh, I’m pretty sure its because sort internally changes the object and doesn’t return a new object. Check out doing

threes = (path/'training'/'3').ls()
threes.sort()
textim3 = Image.show(threes[1])

telafferty · July 11, 2020, 2:09am

I figured it out! Here’s the fix:

'# print out the posix path
path.ls()
'# print posix paths of all directories in training path
(path/‘training’).ls()
'# Set a variable for which number you’d like to work with, for example sixes
'# print out number of items in directory
sixes = (path/‘training’/‘6’).ls()
x=len(sixes)
print(x)
'# determine how many sample images you would like to display.
'# in this example, I chose 5 images
'# set a counter from 0-5 and use a while loop to display each image
'# by its position in the list (0=position 1, 1=position 2 etc)
'# there will be a pause while the code runs, so display ready when finished
'# executing
fives.sort()
c=0
while c < 5:
img=open_image(sixes[c])
img.show(figsize=(5, 1), title='Image ’ + str©)
c+=1
print(‘ready’)