Developer chat

So now we need to either return nothing or a dict of valid key:value pairs that can be handled. Thanks for the update

1 Like

Under development but should work as long as you are on master for fastai and fastprogress, resume training from where it was automatically.
If AWS shuts down your spot instance, the marvelous fastec2 will wait for them to be available again and it would be nice to restart training from when you last where (at least the last completed epoch). This is now possible with two callbacks! Just pass:

callback_fns = [TrackEpochCallback, partial(SaveModelCallback, every='epoch')]

in you call to fit, fit_one_cycle or with a custom scheduler and it should work. Interrupted trainings will start back from the last completed epoch!

Note that to force the training to go back from start, you need to remove the epoch file that will pops out in your model_dir because that’s where we keep track of the progress made.

4 Likes

Breaking change (this is all @radek fault, so don’t be mad at me ^^)

create_cnn won’t exist anymore in v1.0.47 and later on, it will be replaced by cnn_learner for consistency with all the other convenience functions that return a Learner. Updating the docs, will do the course and the deployment kits once the release is out.

3 Likes

And since it’s I’m going to break things day:

  • random_split_by_pct becomes split_by_rand_pct
  • no_split becomes split_none
    so that all split methods begin with split_

Again, blame @radek :wink:

1 Like

A product of hard work of @ashaw, @Benudek and yours humbly is now ready for your enjoyment (project thread).

  1. Each API that’s tested in the fastai test suite now has a test entry in the documentation (about 200+ of those):

    e.g. go here https://docs.fast.ai/basic_train.html#Learner.save and click on [test], and you will get:

So in addition to tutorials and code snippets in the docs, you can now also look at how the API is tested, including expected results and expected failures.

  1. Each API that isn’t yet tested invites you to do so:

    e.g. go here https://docs.fast.ai/basic_train.html#Recorder.plot_metrics and click on [test], and you will get:

  1. And similar to show_doc you can also do show_test from your notebook once you install the new fastai 1.0.47 when it’s released, or using git master now:

Same as show_doc arguments, except it’s show_test.

What makes it all possible is a special this_tests call in every test. https://docs.fast.ai/dev/test.html#test-registry

Bottom line, if you click on [test] or you run show_test and you get ‘No tests found’ - please contribute some!

10 Likes

kudos to @stas for leading and executing ! @sgugger

1 Like

I hope this is the right place to post. I have a suggestion for improving the ClassificationInterpretation class. Right now there is a very useful feature, most_confused, that shows you the classes that your model is predicting incorrectly. There is also plot_top_losses, also very useful, to visualize the cases the model had the highest loss on. However, if you discover the model is mixing up two classes that it shouldn’t be, I don’t believe there is any way to visualize the data for just those classes.

For example if we have a fruit classifier, maybe most_confused shows something like this…

interp.most_confused(min_val=2) 

[('pear', 'apple', 4),
('papaya', 'mango', 3),
('watermelon', 'apple', 3),
('apple', 'pear', 2)]

These are all reasonable except for watermelon/apple which should be very easy to distinguish so I’d think of this as a good place to take a closer look, but I don’t think there’s a way to do it easily.

Would this be useful enough to consider as a feature, or would it add bloat? Would it be best to add it’s own function? Or would it be better to add two additional parameters to plot_top_losses (pred_class=None, actual_class=None) to act as a filter. I’d be happy to do it and submit a PR if people would find it useful, but I’ve never submitted a PR so would probably need guidance on the implementation.

Thanks!

-Rob

1 Like

FYI, fastai-1.0.47 is out. Here is the list of changes.

I would like to propose a context manager for progress_disabled(). When doing a quick search for batch size or image size it would help to not have a bunch of red-bars coming up.

class progress_diabled():
    ''' Context manager to disable the progress update bar and Recorder print'''
    def __init__(self,learner:Learner):
        self.learn = learner
        self.orig_callback_fns = copy(learner.callback_fns)
    def __enter__(self):
        #silence progress bar
        fastprogress.fastprogress.NO_BAR = True
        fastai.basic_train.master_bar, fastai.basic_train.progress_bar = fastprogress.force_console_behavior()
        self.learn.callback_fns[0] = partial(Recorder,add_time=True,silent=True) #silence recorder
        return self.learn
    
    def __exit__(self,type,value,traceback):
        fastai.basic_train.master_bar, fastai.basic_train.progress_bar = master_bar,progress_bar
        self.learn.callback_fns = self.orig_callback_fns

Used like this:

with progress_diabled(learn) as tmp_learn:
    tmp_learn.fit(1)

Code is pretty simple (below) and happy to make a PR with this included. One change I would need is around Recorder. I wanted to ask if there is a reason that self.silent is not exposed in the __init__ for Recorder? See here.

Another question is where to put it into the library. Inside basic_train.py does not quite seem right, so open to suggestions.

Thanks for any feedback or suggestions.

2 Likes

I updated from 1.0.43.post1 to 1.0.47 and the ImageDataBunch creation for object detection is now much much faster, specifically the call to label_from_func which took 10s of seconds and now is instantaneous. I’m curious which change was responsible for this speedup. Great job!!

There was a bug where we were loading all the targets at creation, which required opening all the images to get their sizes. Fixed it and we’re back to things being loaded on the fly when needed, that’s why it’s now faster!

3 Likes

No reason, you can add it in the init.

1 Like

fyi, fastai-1.0.47-post1 has been released with all the hot fixes backported since 1.0.47 release till now.

List of fixes: https://github.com/fastai/fastai/commits/release-1.0.47

fit_one_cycle seems to be broken in 1.0.47. Using Lesson #9’s SSD model, I have two modes of training:

ssd.learn.fit(50, lr=0.004)
fit_one_cycle(ssd.learn, cyc_len=30, max_lr=0.004)

The first one runs fine, but the second on with fit_one_cycle hangs during the metric callback. Also it takes about 6x time to finish the validation loss computation.

I’ll take a look later but just wanted to mention this in case someone has a hint what may be happening here.

Thanks for the report, @vha14

This is always a great opportunity to add a new test that fails, since our current test suite has no problems.

Since the callbacks were revamped it’s possible that some weren’t ported correctly. So having a test that fails makes it much easier to identify and fix the problem and avoid causing it again in the future.

Thank you.

And using our newly released test regisry,
https://docs.fast.ai/basic_train.html#fit_one_cycle - click on [test], gives you:

Tests found for fit_one_cycle :

  • pytest -sv tests/test_callback.py::test_callbacks_fit [source]
  • pytest -sv tests/test_train.py::test_fit_one_cycle [source]

so that you have it if you need to find a starting point to build upon.

1 Like

Hi @stas,
One of the changes in version 1.0.47-post1 has been the removal of create_cnn
and its replacement by create_cnn_model in the list of exported methods (see
this commit) which doesn’t seem to match it in usage, and might not be needed as its replacement method cnn_learner has already been previously incorporated.

This also broke a lot of the existing notebooks and documentation including the videos (since they all refer to create_cnn) so I might suggest adding it back to the list of exported methods with the current deprecation to allow for a smoother transition?

Best regards,
Butch

Hey guys, I would like to make a PR for a simple bug fix but I am stuck following this guide…fast.ai - How to make a pull request

When I call make test in step 5, I get the following error:

E /bin/sh: 1: /usr/local/cuda/bin/nvcc: not found

The full error message is listed below. Also if I try to import fastai in my notebooks it no longer works. I’m guessing this is because I uninstalled fastai in the previous step but haven’t finished replacing it with my own branch. Please help if you can. Thanks.

``make test
python setup.py --quiet test
warning: no previously-included files matching ‘pycache’ found under directory ‘*’
warning: no files found matching ‘conf.py’ under directory ‘docs’
warning: no files found matching ‘Makefile’ under directory ‘docs’
warning: no files found matching ‘make.bat’ under directory ‘docs’
============================= test session starts ==============================
platform linux – Python 3.6.8, pytest-4.3.0, py-1.8.0, pluggy-0.9.0
rootdir: /notebooks/fastai-fork, inifile: setup.cfg
plugins: xdist-1.26.1, forked-1.0.2
collected 256 items / 1 errors / 255 selected

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_text_qrnn.py ___________________
/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/cpp_extension.py:946: in _build_extension_module
check=True)
/opt/conda/envs/fastai/lib/python3.6/subprocess.py:438: in run
output=stdout, stderr=stderr)
E subprocess.CalledProcessError: Command ‘[‘ninja’, ‘-v’]’ returned non-zero exit status 1.

During handling of the above exception, another exception occurred:
tests/test_text_qrnn.py:3: in
from fastai.text.models import qrnn
fastai/text/models/qrnn.py:11: in
forget_mult_cuda = load(name=‘forget_mult_cuda’, sources=[fastai_path/f for f in files])
/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/cpp_extension.py:645: in load
is_python_module)
/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/cpp_extension.py:814: in jit_compile
with_cuda=with_cuda)
/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/cpp_extension.py:863: in write_ninja_file_and_build
build_extension_module(name, build_directory, verbose)
/opt/conda/envs/fastai/lib/python3.6/site-packages/torch/utils/cpp_extension.py:959: in build_extension_module
raise RuntimeError(message)
E RuntimeError: Error building extension ‘forget_mult_cuda’: [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=forget_mult_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/TH -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/fastai/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS
-D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS__ --compiler-options ‘-fPIC’ -std=c++11 -c /notebooks/fastai-fork/fastai/text/models/forget_mult_cuda_kernel.cu -o forget_mult_cuda_kernel.cuda.o
E FAILED: forget_mult_cuda_kernel.cuda.o
E /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=forget_mult_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/TH -isystem /opt/conda/envs/fastai/lib/python3.6/site-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/envs/fastai/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options ‘-fPIC’ -std=c++11 -c /notebooks/fastai-fork/fastai/text/models/forget_mult_cuda_kernel.cu -o forget_mult_cuda_kernel.cuda.o
E /bin/sh: 1: /usr/local/cuda/bin/nvcc: not found
E ninja: build stopped: subcommand failed.
!!! Interrupted: 1 errors during collection !!!
=========================== 1 error in 4.21 seconds ============================
Makefile:169: recipe for target ‘test’ failed
make: *** [test] Error 2``

anyone solved the error in recent version when running tests?

Exception: Error: libsixel is needed. See https://github.com/saitoha/libsixel

Issue logged here: https://github.com/fastai/fastai/issues/1798

wondering if there is a simple pip install to work around it until fixed

@PegasusWithoutWinds

you ran this command after pulling down the code?

and make an editable install with the developer prerequisites:

pip install -e ".[dev]"

https://docs.fast.ai/dev/git.html

Note, in the latest release, on testing there is an error - see my post above

I have got the same error.
Yes, using editable install and the latest code.
Error pops up when importing from fastai.tabular import *

Exception                                 Traceback (most recent call last)
<ipython-input-1-e6990e2f588f> in <module>
----> 1 from fastai.tabular import *
      2 from fastai.callbacks import ReduceLROnPlateauCallback,EarlyStoppingCallback
      3 from sklearn.metrics import roc_auc_score

~/fastai-fork/fastai/tabular/__init__.py in <module>
----> 1 from .. import basics
      2 from ..basics import *
      3 from .data import *
      4 from .transform import *
      5 from .models import *

~/fastai-fork/fastai/basics.py in <module>
----> 1 from .basic_train import *
      2 from .callback import *
      3 from .core import *
      4 from .basic_data import *
      5 from .data_block import *

~/fastai-fork/fastai/basic_train.py in <module>
      8 from fastprogress.fastprogress import format_time, IN_NOTEBOOK
      9 from time import time
---> 10 from fastai.sixel import plot_sixel
     11 
     12 __all__ = ['Learner', 'LearnerCallback', 'Recorder', 'RecordOnCPU', 'fit', 'loss_batch', 'train_epoch', 'validate',

~/fastai-fork/fastai/sixel.py in <module>
      3 libsixel = try_import('libsixel')
      4 if not libsixel:
----> 5     raise Exception('Error: `libsixel` is needed. See https://github.com/saitoha/libsixel')
      6 
      7 def _sixel_encode(data, width, height):

Exception: Error: `libsixel` is needed. See https://github.com/saitoha/libsixel
1 Like