Lesson 1 In-Class Discussion

Hi, KevinB, I come across the same problem “RuntimeError: CUDNN_STATUS_INTERNAL_ERROR”, how did you solve this? Thanks!

Try deleting your tmp folder. You will need to rerun everything at that point. If I remember correctly, that solved it for me, but I may not…

PaperSpace Setup Problems.

I am having difficulty with the PaperSpace setup. I have followed the setup up on 2 machines which encountered a problem at the same point, but manifested different problems.

On both occasions the install “froze” at the “???-seaborne-???” install step. It reached 100% and the nothing happened. No cursor, no command prompt, nothing. If I leave it eventually the machine goes to sleep and I have to go back to the console and launch the machine again. Refreshing the browser does nothing.
If I install this via a terminal the terminal disconnects with a “broken pipe” error. I also noted it didnt have a data directory installed
If I try and redo the install it fails as it cannot find a directory to remove /etc/???confi.d something or another. Even through I can navigate to it manually.
My only option was the delete the machine and start again.

Created a new machine which got stuck at the same point. Left it alone until it eventually went to sleep. Went back to console and relaunched, this time the data directory was there.
Tried to git pull from root but it didnt recognise the command. Navigated to fastai dir and git pull worked ok.
Tried the update conda and it didnt recognise the command. Navigated to several locations and no luck.
Now the machine has gone to sleep and I had to go the the console and go to machine actions menu to restart.
Either the restart is super slow or doesnt work. Just got a cursor with no command prompt.

Anyone else encounter this? Just noticed I’ve now lost the cursor - the in browser terminal is just blank.

Dont have this issue with AWS…

I met the same problem while running the following codes.

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
learn = ConvLearner.pretrained(resnet34, data, precompute=True)
learn.fit(0.01, 3)

I did the follow steps and it was fixed.

  1. restart the VM
  2. reinstall all necessary libraries
  3. install Pillow 5.0.0 by pip3 install pillow==5.0.0.

Maybe you should try installing Pillow 5.0.0 first.


I have changed the batch size from default to 4, 8, 16 but I am still facing the same problem.
How do I fix it. Please suggest

I am new to Fast AI, when I run lesson I am getting “name ‘resnet34’ is not defined”. I had lot of errors on import so used below one to import

from IPython.lib.deepreload import reload as dreload
import PIL, os, numpy as np, math, collections, threading, json, random, scipy
import pandas as pd, pickle, sys, itertools, string, sys, re, datetime, time, shutil, copy
import seaborn as sns, matplotlib.pyplot as plt
import IPython,sklearn, warnings, pdb

Can somebody, help me with the error, I am getting with reset34?

I had problems with both the RuntimeError: CUDNN_STATUS_INTERNAL_ERROR error and an out of memory error showing up from time to time. What worked for me was to restart the kernel from jupyter notebook (from the top menu: “Kernel > Restart”).

How can I run the lesson 1 on a normal notebook, seems like I need to use paperpack machines to run this. resnet34 is giving error. Can somebody explain this.

thanks , i had same issue when setting up on AWS

Hello Folk’s,

so trying to use TF to run through terminal to train images but receive this similar error. I know it’s because TensorFLow is currently formatted to run with Python 3.5 and not 3.6 at this time.

Possible solutions, should I delete Python 3.6 and reinstall 3.5 with Anaconda?

Traceback (most recent call last):
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/Mari/miniconda3/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/Mari/miniconda3/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/Mari/miniconda3/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/Mari/miniconda3/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/Mari/miniconda3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found

Failed to load the nativeTensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help

can someone please help me this doubt no one is answering to doubts on forum, maybe its inactive after 2019.
Can you help me on how to perform evaluation of object detection model on fastai?
I have already trained the model , and I have test data also ready , its an object detection model retinanet trained on midog 2021 challenge dataset.
I need various evaluation metrics for my model based on iou thresholding on bounding boxes predictions of model over ground truth bounding boxes(classic MSCOCO format object detection to classification evals)
This is my sample code:

train, valid ,test = ObjectItemListSlide(train_images), ObjectItemListSlide(valid_images), ObjectItemListSlide(test_images)
item_list = ItemLists(".", train, valid)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()


learn = Learner(data, model, loss_func=crit, 
                callback_fns=[ShowGraph,CSVLogger,partial(GradientClipping, clip=2.0)])  
learn.split([model.encoder[6], model.c5top5])
learn.freeze_to(-2)
learn.load('trained_model_bs64_GC',with_opt=True)
#test_data
item_list_t = ItemLists(".", train, test)
lls_t = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls_t = lls_t.transform(tfms, tfm_y=True, size=patch_size)
data_t= lls_t.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()
detect_thresh = 0.5 
nms_thresh = 0.2 
image_count=15 

show_results_side_by_side(learn, anchors, detect_thresh=detect_thresh, nms_thresh=nms_thresh, image_count=image_count)

I can see the results after the last function but its just prediction of box over with score over random patches of my data ,
I need the precision,recall, accuracy , confusion matrix ,roc auc curve ,etc, on all the test images . The metric for classification is iou =0.5 over the bounding box if the bounding boxe predicted by machine has iou >0.5 it is to be considered as true positive for positive ground truth, and vice versa.
Can you guys please share a notebook on how can I perform such an evaluation of model? Any kind of notebooks, resources, code snippets are welcome.
Thanking all of you for the great support on this wonderful platform.
You can mail me, or message me on this forum, all suggestions are really welcome.
Warm regards,
Harshit
Harshit_joshi@iiitb.ac.in