Part 2 Lesson 10 wiki

sgugger · April 26, 2018, 1:55am

To be more precise, the output the RNNLearner is indeed a tuple with three things: decoded, raws, outs

decoded, the first one, is the result of the last hidden state that went through the decoder. With a softmax, you can turn it into the probs of each word. Its shape is sequence_length * batch_size by vocab_size
raws, the hidden layers of our LSTMs. There is three of them in the language model (which is our nl) so it’s a list of three tensors that have a size of sequence_lenght by batch_size by the hidden size of its corresponding LSTM.
outs, same as raws, but after the last dropout layer.

The reason it returns all of this, and not only the decoded output, is that sometimes (when you want to build an attention layer on top of your model for instance) you need the hidden states.

nok · April 30, 2018, 6:14pm

Did anyone have idea how do we format the data if there is multi-labels text data (In lecture Jeremy mention about “labels”, follow by “text” csv file)? i.e., it can be both class at the same time. I have tried to look into the source code but I am not sure how I can do it. Did we have any standard api for this kind of text dataset?

In addition, how can we tweak the model to support multi-output? I try to look at lesson 9 when we are doing multi-class output for image classification, but cannot figure it out to work for LM. Would love some help or just point me to things that I should look at, thanks!

wgpubs · May 1, 2018, 5:26pm

Starting to go back through part 2 class and I have a few questions on Lesson 10 and the imdb notebook:

When you build the .csv files for classification you eliminated the “unsup” labels for train.csv but not for test.csv, why?
In your fixup() method you replace a bunch of things with other values based on what you discovered after looking at 12 different datasets. Given a corpus, what can/should we do to figure out what should and should not be “fixed up”?
Instead of using xfld 1 for delimiting fields, would it not be better to use xfld_1 as the token “1” is likely to be used elsewhere in corpus?
Thoughts on using the entire corpus to build the vocab rather than just the training set? I’ve seen both on kaggle competitions and wondering what the consensus is as well as pros/cons for both approaches
At the end of this notebook you mention that “with bidir we get a 95.4% accuracy.” Did you do this by just using the pre-trained language model as is with the bwd_wt103.h5 weights -or- did you fully train a language model using the pre-trained weights to start with (as you did in the notebook)?
Training an LM even with pre-trained weights takes a long time. Is the ultimate objective to be able to use the encoder from a pre-trained LM to do classification without first training an LM on their particular corpus?

jeremy · May 1, 2018, 6:58pm

wgpubs:

When you build the .csv files for classification you eliminated the “unsup” labels for train.csv but not for test.csv, why?

In your fixup() method you replace a bunch of things with other values based on what you discovered after looking at 12 different datasets. Given a corpus, what can/should we do to figure out what should and should not be “fixed up”?

Instead of using xfld 1 for delimiting fields, would it not be better to use xfld_1 as the token “1” is likely to be used elsewhere in corpus?

Thoughts on using the entire corpus to build the vocab rather than just the training set? I’ve seen both on kaggle competitions and wondering what the consensus is as well as pros/cons for both approaches

At the end of this notebook you mention that “with bidir we get a 95.4% accuracy.” Did you do this by just using the pre-trained language model as is with the bwd_wt103.h5 weights -or- did you fully train a language model using the pre-trained weights to start with (as you did in the notebook)?

Training an LM even with pre-trained weights takes a long time. Is the ultimate objective to be able to use the encoder from a pre-trained LM to do classification without first training an LM on their particular corpus?

I don’t think there should be unsup in test
I just looked for odd tokenization issues or markup in the docs manually
The concept of “new field” can only be learned is xfld is a separate token. The RNN can learn about xfld 1 as a concept by using state
If you’re training an LM, makes sense to use the whole thing
I repeated the whole process end to end for the backward model
If you’ve got an LM that’s somewhat close to your target corpus, you could just fine-tune it briefly, or even skip straight to the classifier.

nok · May 3, 2018, 7:21pm

I was trying to have a multi-label model, i.e. with output of 7 class [0,0,0,1,0,0,1].
I change the model crit to F.binaray_cross_entropy and get this error and running.

I struggle to debug this as there was multiple class passing around.

I also try to pass in an input to visualize the output but fail.

tmp = iter(md.trn_dl)
*xs, y = next(tmp)
m(*VV(xs))

jeremy · May 3, 2018, 9:23pm

You need to call reset on your model first, if it’s an RNN, when debugging it in this way. That’s what creates the initial hidden state.

nok · May 5, 2018, 7:32pm

Oh my! I actually spent a lot of time to go through the code but cannot find the solution…I don’t know why I didn’t see any notification… Thanks! It works like a charm.

Why is the reset function call after looping self.hidden? If so how does hidden state created before this when I am doing fit(). I tried read through the source code but I cannot understand the flow exactly…

jeremy · May 5, 2018, 8:10pm

Sorry I don’t understand your question - can you give more detail please?

nok · May 6, 2018, 7:11am

Sorry for not being clear.

From my understanding, self.hidden is created from RNN_Encoder.reset(), however, in the MultiBatchRNN forward(), the self.hidden loop comes earlier than super().forward(). So how does the self.hidden created at the first place? This essentially equivalent to the question, why do I have to call m.reset() manually when I pass a batch to the model, but I don’t have to do so when calling fit() or lr_find()? ( It would be great if you can give me some advice how could I use pdb to find the solution of this question, I only try to print things in function so I know the calling stack of function, but I don’t know how can I use debugger to find out how self.hidden is created)

jeremy · May 6, 2018, 5:41pm

The Stepper for RNNs handles calling reset for you.

Kasianenko · May 14, 2018, 12:39pm

Hi,
I’m running latest version of notebook from Lesson resources on my Windows laptop with 960m GPU, that is not supported by pytorch 0.3, so using latest conda env with pytorch 0.4.0.

I’m running into same problem as NotImplementedError: During the fitting of Language Model in rnn_reg.py
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

debugging proves that there is no Embedding in fastai\lib\site-packages\torch\nn\backends\backend.py when calling self.function_classes.get(name) where name is 'Embedding'.

I wonder is there a path forward to solve this? Should I open this kind of issue in pytorch github?

P.S. Is it OK not to have backward compatability for pytorch? E.g. functions that worked in 0.3 don’t work in 0.4

jeremy · May 14, 2018, 1:09pm

You’ll need to git pull to get the latest version which fixes this.

Kasianenko · May 14, 2018, 2:16pm

It fixed that error. I was running out of memory on 4GB GPU, so I decreased bs=16, and now there is another error running learner.fit

RuntimeError                              Traceback (most recent call last)
<ipython-input-20-b544778ca021> in <module>()
----> 1 learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)

C:\Users\Developer\fastai\courses\dl2\fastai\learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
    285         self.sched = None
    286         layer_opt = self.get_layer_opt(lrs, wds)
--> 287         return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
    288 
    289     def warm_up(self, lr, wds=None):

C:\Users\Developer\fastai\courses\dl2\fastai\learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs)
    232             metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16,
    233             swa_model=self.swa_model if use_swa else None, swa_start=swa_start,
--> 234             swa_eval_freq=swa_eval_freq, **kwargs)
    235 
    236     def get_layer_groups(self): return self.models.get_layer_groups()

C:\Users\Developer\fastai\courses\dl2\fastai\model.py in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, **kwargs)
    130             batch_num += 1
    131             for cb in callbacks: cb.on_batch_begin()
--> 132             loss = model_stepper.step(V(x),V(y), epoch)
    133             avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
    134             debias_loss = avg_loss / (1 - avg_mom**batch_num)

C:\Users\Developer\fastai\courses\dl2\fastai\model.py in step(self, xs, y, epoch)
     55         if self.loss_scale != 1: assert(self.fp16); loss = loss*self.loss_scale
     56         if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss)
---> 57         loss.backward()
     58         if self.fp16: update_fp32_grads(self.fp32_params, self.m)
     59         if self.loss_scale != 1:

C:\Users\Developer\Anaconda3\envs\fastai\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
     91                 products. Defaults to ``False``.
     92         """
---> 93         torch.autograd.backward(self, gradient, retain_graph, create_graph)
     94 
     95     def register_hook(self, hook):

C:\Users\Developer\Anaconda3\envs\fastai\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     87     Variable._execution_engine.run_backward(
     88         tensors, grad_tensors, retain_graph, create_graph,
---> 89         allow_unreachable=True)  # allow_unreachable flag
     90 
     91 

RuntimeError: inconsistent range for TensorList output

stefans · May 15, 2018, 3:04am

@jeremy Is there a documentation available of the exact preprocessing and training steps for the language model (WikiText-103). I looked at the Hindi repository, but I’m not sure, if the “t_up” trick is used there?

So it would be great to have a kind of reference implementation/documentation for the WikiText-103 language model

jeremy · May 15, 2018, 5:50pm

This is the script we used: https://github.com/fastai/fastai/blob/master/courses/dl2/imdb_scripts/train_tri_wt.py

sgugger · May 15, 2018, 7:19pm

Interesting, this comes with the freezing of layers. The model trains without error if you unfreeze it completely, I don’t know where this one comes from.
I’ll try to look into it.

ajennings · May 18, 2018, 7:24pm

Hi all. I’m using Google Colaboratory and I’m getting an AttributeError when trying to install scipy sparse - module ‘scipy’ has no attribute ‘sparse’. I’ve searched the forums and notice others having the same issue.

Not a fast.ai problem but, just wondering if anyone has found a patch…

I have installed scipy-1.1.0 and also imported it directly using - from scipy import sparse as sp

Any help would be greatly appreciated!

AttributeError Traceback (most recent call last)
in ()
----> 1 from fastai.text import *
2 import html

/usr/local/lib/python3.6/dist-packages/fastai/text.py in ()
----> 1 from .core import *
2 from .learner import *
3 from .lm_rnn import *
4 from torch.utils.data.sampler import Sampler
5 import spacy

/usr/local/lib/python3.6/dist-packages/fastai/core.py in ()
----> 1 from .imports import *
2 from .torch_imports import *
3
4 def sum_geom(a,r,n): return an if r==1 else math.ceil(a(1-r**n)/(1-r))
5

/usr/local/lib/python3.6/dist-packages/fastai/imports.py in ()
3 import pandas as pd, pickle, sys, itertools, string, sys, re, datetime, time, shutil, copy
4 import seaborn as sns, matplotlib
----> 5 import IPython, graphviz, sklearn_pandas, sklearn, warnings, pdb
6 import contextlib
7 from abc import abstractmethod

/usr/local/lib/python3.6/dist-packages/sklearn_pandas/init.py in ()
1 version = ‘1.6.0’
2
----> 3 from .dataframe_mapper import DataFrameMapper # NOQA
4 from .cross_validation import cross_val_score, GridSearchCV, RandomizedSearchCV # NOQA
5 from .categorical_imputer import CategoricalImputer # NOQA

/usr/local/lib/python3.6/dist-packages/sklearn_pandas/dataframe_mapper.py in ()
5 import numpy as np
6 from scipy import sparse
----> 7 from sklearn.base import BaseEstimator, TransformerMixin
8
9 from .cross_validation import DataWrapper

/usr/local/lib/python3.6/dist-packages/sklearn/init.py in ()
132 else:
133 from . import __check_build
–> 134 from .base import clone
135 __check_build # avoid flakes unused variable error
136

/usr/local/lib/python3.6/dist-packages/sklearn/base.py in ()
11 from scipy import sparse
12 from .externals import six
—> 13 from .utils.fixes import signature
14 from . import version
15

/usr/local/lib/python3.6/dist-packages/sklearn/utils/init.py in ()
9
10 from .murmurhash import murmurhash3_32
—> 11 from .validation import (as_float_array,
12 assert_all_finite,
13 check_random_state, column_or_1d, check_array,

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in ()
13
14 import numpy as np
—> 15 import scipy.sparse as sp
16
17 from …externals import six

AttributeError: module ‘scipy’ has no attribute ‘sparse’

echan00 · May 21, 2018, 10:56am

Is it me or the files are no longer existing at files.fast.ai?

emilmelnikov · May 21, 2018, 12:48pm

To me, the files are still there.

echan00 · May 21, 2018, 2:06pm

Thanks they’re back online now