Lesson 4 flattened matrix


(Lior) #1

I do not really understand how the first matrix was flattened

next(iter(md.trn_dl))i
(Variable containing:
     12    567      3  ...    2118      4   2399
     35      7     33  ...       6    148     55
    227    103    533  ...    4892     31     10
         ...            ⋱           ...         
     19   8879     33  ...      41     24    733
    552   8250     57  ...     219     57   1777
      5     19      2  ...    3099      8     48
 [torch.cuda.LongTensor of size 75x64 (GPU 0)], Variable containing:
     35
      7
     33
   ⋮   
     22
   3885
  21587
 [torch.cuda.LongTensor of size 4800 (GPU 0)])

if more example we have a matrix that look like that:

1   5   9
2   6   10
3   7   11
4   8   12

how will the flattened matrix look?


(urmas pitsi) #2

to see results, try this if you have cuda.tensor:
x.cpu().numpy().flatten()

if you have numpy array then:
x.flatten()

then you can observe exactly how they look like. usually it is by rows:
a = np.asarray([[1,2],[3,4],[5,6]])
a, a.flatten()
output:
(array([[1, 2],
[3, 4],
[5, 6]]), array([1, 2, 3, 4, 5, 6]))


(Lior) #3

Thanks for helping,
The result you write is what I expect.

But if you look at the vector in my example you will see that the first numbers are taken from the second row and not from a column,
That what confuses me.


(urmas pitsi) #4

i was noticing the same thing. does your screenshot contain all relevant code? do you see the same result when you cast it to numpy?


(Sam Lloyd) #5

That’s right, so the dataloader returns X, y pairs. y in this case being the value i+1 of X AND flattened, hence it being stepped on by a row. So really you would expect the input
[[1,2]
[3,4]]
And output
[3,4,5,6]
It’s often easier to visualise with words than numbers


(Lior) #6

But by Jeremy video, the line looks like that(correct me if I am wrong):
image

and that want confuses me, the order seems to me wrong
it changes the words order.

this picture was taken from the part 1 lesson 4 video


(urmas pitsi) #7

you almost got me seriously confused :slight_smile: but I guess I found the answer to this magical problem. First take a look at the notebook between line 16 and 17, it reads:

“… Each batch also contains the exact same data as labels, but one word later in the text - since we’re trying to always predict the next word. The labels are flattened into a 1d array.”

It means that we deliberately generate labels that are shifted by on position. So you can observe, that el[1] is exatly el[0].flatten()[64:], last 64 positions are from the future and not present in el[0].


(Sam Lloyd) #8

Apologies if I confused things :sweat_smile: but it sounds like you’ve got it now.
yep
x, y = next(iter(data.trn_dl))
Then
x.size() =
[75, 64]
y.size() = [4800]

and of these
x.flatten()[64:128] will be the same as y[:64]


(urmas pitsi) #9

take a look at source fastai ‘text.py’, row 186:

def get_batch(self, i, seq_len):
    source = self.data
    seq_len = min(seq_len, len(source) - 1 - i)
    return source[i:i+seq_len], source[i+1:i+1+seq_len].view(-1)

this is the source of the LanguageModel dataloader. As you can see the last line has it: elem[0] = sequence, elem[1] = sequence shifted by 1 and flattened.


(Lior) #10

OK, I believe I understand it now,
But now I am not so sure why do we even need it?
The second tenor (the flattened one) changes the words order why not just use the first one?


(urmas pitsi) #11

I think we want to have labels to predict, the ‘y’ vector. So we shift our x so that for each x(i) the label will be x(i+1) ie the following word in the sequence. We try to learn to predict the following word, given a sequence. Does that make sense?


(Lior) #12

Yes, I understand the shifting part, what I am not undnderstanding is the order of the vector:
if my data look like that:

1 5 9
2 6 10
3 7 11
4 8 12
the right order to read the words is:
part 1: 1 2 3 4
part 2: 5 6 7 8
part 3: 9 10 11 12

but the vector will look like that:
[2,6,10,3,7,11,4,8,12 …]
All the order changed as far as I understand.

I feel like I am missing something.


(urmas pitsi) #13

are you sure you see that? I see this, 2.element is transposed as y. Your very first screenshoot at the top of the thread shows the same way.

(Variable containing:
64 18 19174 … 2 33 904
83 4392 18 … 194 16 7129
124 2105 800 … 8 27 122
… ⋱ …
9 1723 245 … 163 2481 32
8 4 1113 … 5 5 8
3033 58 3 … 384 5905 1815
[torch.cuda.LongTensor of size 68x64 (GPU 0)], Variable containing:
83
4392
18

88
231
8
[torch.cuda.LongTensor of size 4352 (GPU 0)])


(Lior) #14

In which part I am mistaking?

  • The order of the words?
  • How the vector will look like?

(urmas pitsi) #15

I meant the very first screenshot at the top of this thread. It shows that el[1] = second row of el[0] flattened.


(Lior) #16

Yes, I see that.
but to get a line of words you should take a column, not a row:

image
so how will you take a line of words from “el flattened”?
In “el” you can el.numpy()[:,0] and you will get a line of words.


(urmas pitsi) #17

As I understand words ie tokens are in the rows. And from flattened it can take it with appropriate step.


(Lior) #18

Now I understand what the problem, By my understanding, each column contains multiple sentences.
it even said her: https://youtu.be/gbceqO8PpBg?t=6867

or from the summary of the lesson:

Why not split by a sentence? [01:53:40] Not really. Remember, we are using columns.
So each of our columns is of length of about 1 million, so although it is true that those
columns are not always exactly finishing on a full stop, they are so darn long we do not care.
Each column contains multiple sentences.

and you said:

and what you said sit well with how the flattened matrix look like but as far as I understand not the same as what said in the lesson.

what am I missing?
:neutral_face:


(urmas pitsi) #19

Text itself reads in rows, so we get a big chunk of data that we slice vertically, by columns. It’s like getting a piece of every sentence/line. I hope I’m right, have to look into it.

The question ‘why not sentences…?’ kind of proves it as well, he wants to know why do slice ‘arbitrarily’ and not where the sentence ends. Answer says that as we have a lot of data then some vertical slices happen to be at sentence ending.


(Lior) #20

Yes, I understand it, I just showed that to show that every column is a part of sentences or many sentences.
therefore it seems strange to me how the flattened matrix looks, as I described here:

all the order of the words changes, to read the flattened matrix we should read it in a special way and not just by iterate over it.
Do I understand correctly?