Lesson 3 - Official Topic

jona · April 13, 2020, 6:19pm

Do you know where I should look in the DataBlock API to use an arbitrarily-channeled input? E.g. if I wanted to make a Siamese network where the images were stacked on top of each other instead (6 layers).

kofi · April 14, 2020, 12:00am

ok thank you very much

fmobrj75 · April 14, 2020, 9:26pm

Hi @sgugger, when I use Resize(224, method=‘squish’), the squish resize is applied to both train and validation images? It is also applied in learn.predict, or should I manualy resize the test images before predicting?

sgugger · April 14, 2020, 9:30pm

Yes, the method is the same for training and validation sets. Especially since squish is not random.

fmobrj75 · April 14, 2020, 9:37pm

Thanks!

AjayStark · April 21, 2020, 3:48pm

When scaled up, why is that only one half of the predicted curve appears to adjust and not the other half?

Thanks,

pierreguillou · April 21, 2020, 8:05pm

Hi @paulsolomon,

you’re right about ImageClassifierCleaner (): if the destination class directory has an image with the same name, it fails. I just checked that out.

To correct the code, change the line
for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)
by the following code (cc @jeremy and @sgugger):

for idx,cat in cleaner.change(): 
    real_dst = os.path.join(path/cat, cleaner.fns[idx].name)
    if os.path.exists(real_dst):
        old_file_path = cleaner.fns[idx]
        old_cat = old_file_path.parent.stem
        new_file_path = f'{path/cat/old_cat}_{str(old_file_path.name.replace(" ","").lower())}'
        shutil.move(str(cleaner.fns[idx]), new_file_path)
    else:
        shutil.move(str(cleaner.fns[idx]), path/cat

Antoine.C · April 22, 2020, 12:38pm

@jeremy

Looking at the trick of adding sum() to compute the derivative of a function at multiple points, two thoughts came to mind.

How obvious is it to the general audience that the result gives indeed the desired collection of derivatives?
The backward function takes a gradient argument (the “vector” in the “vector-Jacobian product” that backward actually computes), and populating it with 1.s gives precisely the collection of derivatives.

Roughly speaking, the difference between the two approaches is the order in which summation and differentiation are performed.

I wrote a post to explain these two points (using fastpages, which I find phenomenal!)

gautam_e · April 22, 2020, 1:06pm

Looks like a bias term is missing in the predicted quadratic function, but not sure

AjayStark · April 22, 2020, 1:42pm

Yeah maybe, that makes sense. Will checking it through.

AjayStark · April 22, 2020, 3:16pm

While plotting the function using plot_functions() it is a parabolic structure. (without calculating gradient)
dbt1

But after calculating preds, it looks as if the negative half of the curve is cropped.
dbt2

Why is that happening?

gautam_e · April 22, 2020, 3:52pm

Probably because time starts from 0. That’s how it is in the original notebook (arange(0,…)) , so I am guessing that’s what you did too.

AjayStark · April 23, 2020, 6:14am

Yeahhh didn’t notice that…my bad.
Now it works fine after changing the equation to: a*((t-9)*2) + (b(t-9)) + c
Had to loop through for 40 times but worked perfectly well.

Thanks,

vijayabhaskar · April 24, 2020, 2:09pm

are you sure? grayscale images are just 1 channel images right? 0 for dark 255 for white or vice-versa?

florianl · April 24, 2020, 9:02pm

Yes, PILImageBW creates a 1 Channel grayscale Image.

AjayStark · April 25, 2020, 6:06pm

I tried the SGD end-to-end example, by using 50 time points instead of 20 and the loss didn’t narrow down.
Is there any limit ?

And the loss increased badly.

Why is this happening in the case of 50 divisions?
Thank you,

gpakosz · April 25, 2020, 6:10pm

Great post Antoine, thank you

edmar · April 25, 2020, 9:22pm

Hey,

I can’t understand this:

his can be represented as a function and set of weight values for each possible category, for instance the probability of being the number eight:

def pr_eight(x,w) = (x*w).sum()

Considering that a probability must be a number from 0 to 1 I can’t understand why summing up the multiplication of x and w would generate a valid probability.

gautam_e · April 26, 2020, 11:05am

The * is element-wise multiplication. So the result of x*w is a vector or rank-one tensor. So the sum() ensures that the output is a number and not a tensor (of rank > 0).

edmar · April 26, 2020, 12:26pm

Yes, I get this part but let’s say that the W is a tensor [1,2] and x is [1,1] so suming result would be 3 which is not a valid probality. What I don’t understand is why this sum is a valid probality.