Yes, it was the number of filters previously used.
Bit of background: Each filters is good at highlighting a certain pattern (horizontal edges for one filter, vertical edges for another and so on), so we add few of those in the beginning to find the most basic patterns - like lines.This is why we find such basic pattern detectors in initial layers of models such as resnet etc.
In the sheet, Jeremy convolved two of those with the input which is why the next tensor to be applied had to have
depth of 2. Note that jeremy decided to have two such 3x3x2 tensors in second convolution operation - this was a choice which will decide depth of filters in next convolution (if we need).
When we do a pooling operation, we don’t do convolution with any filter - rather we simply take pieces of previous layer weights and do some operation on it (taking max of those weights is called “max pooling”). So we’re in a way reducing dimensions of the previous layer, but the depth remains same.