Lesson 1 - Official topic

A 1080Ti should be able to handle a bs=32 on that cell. I should know as I have the same configuration.
It sounds like the gpu memory is not clearing properly. I would restart the kernel, run the first cell of the notebooks to do the imports, then skip to run this cell.

Maybe have a second shell going to watch the ram usage with nvidia-smi?

1 Like

Thanks Michael for the advice. I’ll fall back on using cloud platform if problem persists. However, I find it curious that I was able to run all the other cells in the notebook.

Cheers,
Nasir

Thanks for the advice. I’ll plow through the remaining notebooks and see if the problem recurs elsewhere. Despite multiple kernel restarts and importing of libraries and reducing batch size to 4 , I get the same error.
Cheers,
Nasir

Hey guys,

thanks for this amazing tutorial.
So im not 100% sure if i got everything corretly.

I am not able to open the files in jupyter, where do i get the files? Can i directly open them from github?

Idk how but it works on my google Colab.
In the first lession you say: if this takes more than 5 min, its obvious that something goes wrong -> mine took about 30 mins and i even stopped it.

May you help me?
THis course seems to be super nice but not really good structured.

thanks in advance

Hi Community,
Can anyone help me understand why there are differences in the error_rates (first dogs/cats model) between runs ? First time I run the “#CLICK ME” cell I got pretty good error_rates (around 0.005…), in the following executions of the cell I’m getting better or worst rates, sometimes even pretty bad ones like 0.015
How to understand those differences between executions ?
Thanks

Similar to @Nasir65 above, I am not able to run that particular cell from the 01_intro notebook, except that I am using a ml.p2.xlarge instance within Sagemaker as recommended by the book.

from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

The initial code with no batch size specified returned the CUDA out of memory error. I then lowered to a batch size of 32 and let the model run for ~25 minutes but my output seems to show that epoch 0 repeated?

A batch size of 16 shows the same behavior.

I’ve restarted the kernel and just let this one cell run after importing the libraries at the top but no luck. Any suggestions? All the other models train in less than a minute so I feel like something is off here.

I’m new to Deep Learning and this book. This question is regarding chapter 1.I’m able to run a notebook and the cats and dogs model. But when I try and “test” the model in the next step using an image of my own, I encounter an error. I’m hoping to find what I’m doing wrong and how to remedy. Thanks!

In notebook have quote:

A Jupyter widget could not be displayed because the widget state could not be found. This could happen if the kernel storing the widget is no longer available, or if the widget state was not saved in the notebook. You may be able to create the widget by running the appropriate cells.

May be this help you

uploader = SimpleNamespace(data = ['images/chapter1_cat_example.jpg'])

Hi, I m getting the below error in 01_intro.ipynb when executing
path = untar_data(URLs.CAMVID_TINY)
dls = SegmentationDataLoaders.from_label_func(
path, bs=8, fnames = get_image_files(path/“images”),
label_func = lambda o: path/‘labels’/f’{o.stem}_P{o.suffix}’,
codes = np.loadtxt(path/‘codes.txt’, dtype=str)
)

learn = unet_learner(dls, resnet34)
learn.fine_tune(8)

errror

TypeError Traceback (most recent call last)
in ()
7
8 learn = unet_learner(dls, resnet34)
----> 9 learn.fine_tune(8)

17 frames
/usr/local/lib/python3.6/dist-packages/torch/overrides.py in handle_torch_function(public_api, relevant_args, *args, **kwargs)
1069 raise TypeError("no implementation found for ‘{}’ on types that implement "
1070 ‘torch_function: {}’
-> 1071 .format(func_name, list(map(type, overloaded_args))))
1072
1073 def has_torch_function(relevant_args: Iterable[Any]) -> bool:

TypeError: no implementation found for ‘torch.nn.functional.cross_entropy’ on types that implement torch_function: [<class ‘fastai.torch_core.TensorImage’>, <class ‘fastai.torch_core.TensorMask’>]

Hello FastAI community,

I am new learner of ML and related stuff. I have been experimenting with Tensorflow and playing with teachable machine however, I was having trouble with a large number of false positives returned by my image recognition models. This led me to search for better ways and I ended up doing this course.

So, I started with Lesson 1, and thanks to Jeremy Howard excellent style of the book, I was able to train the cat vs dog model pretty quickly. However, when I pass this model images of babies having two pony tails they get categorized as cats with more than 90% confidence. I believe that the data set used in the book must be of very good quality compared to what I can collect on my own and if that data set is not good enough to train an accurate model than mine’s will never going to work :frowning:

I’d appreciate if anyone can guide me on what sort of steps one can take to reduce chances of having false positives? Thanks

Hi sttaq I hope you are having marvelous day!

You are likely always to have some false positives and negatives unless your model is a 100% accurate and recognises every image sent to it.

Here are some links that discuss what you are experiencing. Your model is acting as expected based on the data it was trained on and the image sent.

One of Jeremy’s key points is making sure that, test data contains some of the same images, that a model will see in production.

Your model is behaving as expected, you can try some of the ideas in the above threads, multilabel classification and training your model with more images to help improve your model.

Hope this helps

Cheers mrfabulous1 :smiley: :smiley:

Thanks @mrfabulous1, I’ll go through the links.

Hi – I’m seeing the “CLICK ME” “first_training” cell take a very long time (>15min for first epoch) to train/fine_tune on Google Colab, even with a GPU instance. Is this expected?

I’m having the same problem now. This does not seem to be normal. I ran the same code on an AWS GPU instance and it took only a minute. There seems to be a problem with Colab.

Personally, I have a local checkout of fastai and fastcore. I open notebooks I’m working on there. I use doc to get links to the documentation on docs.fast.ai. I know the source well enough I don’t really need an IDE to help jump to definitions of functions, but if I did I’d open the fastai modules directory in vscode or vim to allow me to directly jump to definitions.

You’ll find out in lesson 3 :slight_smile:

2 Likes

I’ve had similar experiences with this part of the notebook; small changes flip the sentiment from pos to neg . I would think that such changes would/should not affect the overall sentiment of a text fragment, but they tend to do that. So, I’m not sure how robust this actually is.

I’ve seen similar results when uploading completely unrelated pics (like I uploaded picture of a child and it classified it as “cat” lol) … I’m not sure if it’s worthwhile adding a third output which would basically say “other” … ?

Outstanding question!! I’ll have to admit, at first I was wondering how practical a question this would be. I thought, maybe some people might build something Perceptron-like for their own amusement (heck, I still play DOS games and watch MAS*H), but it would seem like a massive technological step backward. But then I thought, maybe there are still applications for physical NN’s today, like some specialized devices that somehow need a NN framework.

Turns out, NTT/Cornell have done exactly that, with some results that are faster (though not more accurate) than traditional computer processing. First paragraph of this tantalizing article: “You may not be able to teach an old dog new tricks, but Cornell researchers have found a way to train physical systems, ranging from computer speakers and lasers to simple electronic circuits, to perform machine-learning computations, such as identifying handwritten numbers and spoken vowel sounds.”

1 Like

I think IBM did some work in Neuromorphic computing and there were even some chips made. From what I gather, ANN research is sort of “stuck” in the very early models of neural networks that came out in the 1940’s and 50’s. Curren ANNs are basically giant functions with billions of parameters. IMO the “intelligence” in biological systems arises out of the interconnections of neurons. Single biological neurons seem to do computations which rival that of what we refer to “neural networks” these days.

I feel that if we are ever to achieve any sort of intelligent behaviour in silicon, there will need to be more interdisciplinary work between what Neuroscience has discovered and how that can be modeled and simulated in the ANN arena.

One huge stumbling block is that the term intelligence is automatically assumed to be “human” intelligence. Before we get to human level intelligence, we need to get to Fly level intelligence and there is a lot that can be done by networks that exhibit “fly like intelligence”. One obvious example would be navigation in 3D space. A fly can navigate in three dimensional space without catastrophic failure for the system (death?) with only 100,000 neurons. And those 100k neurons are doing all the things need to keep the system alive to the point where it can reproduce and transfer genes to the next gen.

So, we have a long way to go, and current implementations of NNs are just narrowly focused algorithms whose capabilities pale in comparison to what actually, demonstrably, can be accomplished using physical computing units (biological neurons for example).

1 Like