The "BS<=32" paper

So, in the course of another topic, a poster brought up this paper:

Would like to use this thread to open up discussions about this, whether praises, criticisms, or other. I hasten to add I have NOT done a deep study of this paper, yet. I have just barely skimmed it (and am thankful for the relative lack of Greek symbols…a rarity) and it appears that it is concluding that the maximum optimal BS is 32. The paper appears to be basing this on CIFAR/Resnet, not sure how well this generalizes in other contexts.

I also do not have time to do my own BS tests at the moment but will try some soon, perhaps on some of my favorite Janelle Shane-style RNN’s that string together fun nonsense. Also need to brush up on batch normalization which I have all but forgotten. (Disclaimer: I am an old school Coursera student - looking forward to continuing the classes but my step-by-step mind has more trouble following them).


Just chiming in this nice tweet

It is a classic! My concern with language models has been that I have to cut down to a batch size of 4 to get it on the card (or even 2 sometimes) which YLC would probably agree is too small. I am seeking to find what I can get on there (say up to a limit of 32 if you so desire…) and still not get an OOM error.

This is not going to be scientific in any way, but here’s the result of ONE run of Karpathy’s char_rnn, on a 2000 recipe file, with two vastly different batch sizes. I have no idea if this is apples to apples or not (probably not), I just went with a number of epochs that resulted in similar validation loss.

BS 32, 3 epochs (3399 iterations), validation loss = 1.0353
BS 512, 25 epochs (1750 iterations), validation loss = 1.0395

The 512 run ran in half the time. I am seeing slight evidence that 32 may have “worked” better. In my sample output runs, the 512 run seems more likely to get “stuck” and create unnecessarily long strings of either ingredients or instructions, than the 32 run. It also may be more likely to come up with nonsense words. So perhaps there is some truth to the paper…I don’t know how to prove that on a grander scale.

For fun, here are a couple of “recipes” that cranked out. Again I credit/blame Janelle Shane for introducing me to a much too addictive hobby.

Example batch size 32 recipe:

Title: Mayonana Dips
Categories: Appetizers
Servings: 6
2 T Sugar
1/4 c Minced Mushrooms
2 c Shredded Cream (It
1/2 t Vanilla (pork of half pepper

  • Packles and sugar and add flour and at 3"xifusiage
    pottoves on top with salt and pepper. Cook and 1/2" stock
    patties and place on the soup mix in cold water sticks and
    stir in the cake to boil, at 350 degrees F. Cup upauts each or
    do in a large skillet rin platter. Combine flakes and
    combine sheet before strips, then the cheese. Stir in a dork.
    Chill. Roll butter and flake in bowl. Drain the egg whites
    thoroughly. From The Gazette, about 10 minutes more tightly
    nightly sprinkle with roll is not layered potatoes and finely
    rocan cheese. Place on grease the cheese.
    Add water and set aside.

Example batch size 512 recipe - In general, it seemed harder to get recipes of reasonable length than with BS=32. Also I notice about twice as many words are showing as spellcheck errors:

Title: Dips Meat Beef Fillonts
Categories: Camerion Salads Vegetables
Servings: 1
1 c Chocolate Styaked fruits; Preparaduse
1 x Dash (cal)
1 ea Large Eggs, 112"
1 c Almonds (Upting:

  • Stock the tomatoes should be cooking tomatoes half and
    large rice and remove tomatoes. Bake in 300
    degree F. on 1 1/2 hours or until together is
    to golded bowl. Cool on a simmer, stirring
    occasionally. Add pepper and cheese. Cook until
    stand if strawberries. Bake in an 60% pownres. Add
    balasting for 2 minutes. (Fi’st the egg of
    the reserves. Stir in the peppers and cut of
    boiling artraining in baking dish. Toss to a hot oil
    large for filling. Let stand in a slowly for 30 minutes.
    Preas about 1 cups of sour cream

How about the other extreme? What if you do batch size 4 or even 2?

This is really stretching my neurons! The paper always uses the same number of epochs for all batch sizes, so my comparison is indeed not apples to apples. But I thought you could just run more epochs, yet still save time overall because of the bigger batch size. Instead, I’m seeing that more epochs may not give the same quality results as a lower batch size. The validation loss may look just as good, but the actual real-world results seem to be lacking.

Anyway, 3 epochs at BS=4 would have taken over 27000 iterations, so I took the liberty of stopping at 2, where the validation loss had already gotten to about 1.03. Again, tiny sample size, but a resulting recipe of about equal length as shown above, had very few non-words in it. Keeping with the culinary theme, there may be something about slow cooking: it takes longer but might be worth it, in some or most instances. Would love to hear others’ results as well, on other models. Until then, stir in a dork!

BS=4 recipe:

Title: Cookin Sweet Chili Pie Steak
Categories: Poultry Main dish
Servings: 8
4 oz syrup in 1/2 inch stick
1/4 c Raspberry to taste
1 c Parsley, sifted
1/2 c Southers garnish ground blend
8 T Carrots, chopped
1 ea Green pepper, chopped
In a large bowl in oil and (325 for 10 minutes.
Set aside to cook 1 minutes more. Spoon 1/2 t ground
jelly or in the chopped salad of the remaining water, blending
oil on top of the oil in pidring the top of the
salad on both simmer on it to make pan and stir until meat
until heady is use in a small begins in a mo .
small of the chops into greased to make in a
small can black stuffing. When slice of plates.

1 Like

I built a box to run the lessons on. I call it Wimpy. In all the exercises that permit a change to the bs (not cats and dogs) I always had to go down to 2, 4, 8, or 12 to keep from oom (Zotec gt1030 2Gb onboard). I did a bs=1 but got a weird error and didn’t want to track it down. I’ve always been able to get similar results to what the class lecture/notes show. In one case, I did a trifle better. It can take a lot of time, though.
Please, no more recipes! after the last one my brain is Kentucky fried.

1 Like