It looks like TextLMDataBunch.Show_Batch is actually being taken out of the code in the most recent versions so none of this may be helpful.
I have been working through an issue where when I was trying to do show_batch() using a TextLMDataBunch I was getting the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-48-f346687833ba> in <module>
----> 1 data_lm.show_batch()
~/.conda/envs/kbird/lib/python3.6/site-packages/fastai/text/data.py in show_batch(self, sep, ds_type, rows, max_len)
227 items = [['idx','text']]
228 for i in range(rows):
--> 229 inp = self.x[:,i] if max_len is None else x[:,i][:max_len]
230 items.append([str(i), self.train_ds.vocab.textify(inp.cpu(), sep=sep)])
231 display(HTML(_text2html_table(items, [5,95])))
IndexError: index 9 is out of bounds for dimension 1 with size 9
So after digging into it a bit, I found this to be the culprit (You would think I could just look at the arrow, but for me, it required digging):
inp = self.x[:,i] if max_len is None else x[:,i][:max_len]
A few things I want to check on here:
#1. Is there a reason that the first part uses self.x and the second uses just x which is generated above.
#2. Would it make sense to instead transpose x at an earlier point?
#3. Do we want something that says if there isn’t enough text to display {rows} rows, just display the max in one iter or would it be better to keep iterating through the dataloader until reaching the end.
The simplest way to deal with #3 would be to just load one batch and if it isn’t big enough, just use that as rows instead. Here are my total proposed changes:
def show_batch(self, sep=' ', ds_type:DatasetType=DatasetType.Train, rows:int=10, max_len:int=100):
"Show `rows` texts from a batch of `ds_type`, tokens are joined with `sep`, truncated at `max_len`."
from IPython.display import display, HTML
dl = self.dl(ds_type)
x,y = next(iter(dl))
x=x.transpose(0,1)
items = [['idx','text']]
rows = x.shape[0] if rows > x.shape[0] else rows
for i in range(rows):
inp = x[i] if max_len is None else x[i][:max_len]
items.append([str(i), self.train_ds.vocab.textify(inp.cpu(), sep=sep)])
display(HTML(text2html_table(items, [5,95])))
It doesn’t look like this is probably useful anymore, but I guess it at least may help somebody if they see that error on the current version.
Here is my current Install:
=== Software ===
python version : 3.6.6
fastai version : 1.0.28
torch version : 1.0.0.dev20181029
nvidia driver : 396.37
torch cuda ver : 9.2.148
torch cuda is : available
torch cudnn ver : 7104
torch cudnn is : enabled
=== Hardware ===
nvidia gpus : 1
torch available : 1
- gpu0 : 16270MB | Quadro P5000
=== Environment ===
platform : Linux-3.10.0-862.11.6.el7.x86_64-x86_64-with-centos-7.5.1804-Core
distro : #1 SMP Tue Aug 14 21:49:04 UTC 2018
conda env : kbird
python : /home/kbird/.conda/envs/kbird/bin/python
sys.path :
/home/kbird/.conda/envs/kbird/lib/python36.zip
/home/kbird/.conda/envs/kbird/lib/python3.6
/home/kbird/.conda/envs/kbird/lib/python3.6/lib-dynload
/home/kbird/.local/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython
It looks like this is actually being taken out of the code in the most recent versions so it may not be helpful.