So, in the course of another topic, a poster brought up this paper:
Would like to use this thread to open up discussions about this, whether praises, criticisms, or other. I hasten to add I have NOT done a deep study of this paper, yet. I have just barely skimmed it (and am thankful for the relative lack of Greek symbols…a rarity) and it appears that it is concluding that the maximum optimal BS is 32. The paper appears to be basing this on CIFAR/Resnet, not sure how well this generalizes in other contexts.
I also do not have time to do my own BS tests at the moment but will try some soon, perhaps on some of my favorite Janelle Shane-style RNN’s that string together fun nonsense. Also need to brush up on batch normalization which I have all but forgotten. (Disclaimer: I am an old school Coursera student - looking forward to continuing the fast.ai classes but my step-by-step mind has more trouble following them).
It is a classic! My concern with language models has been that I have to cut down to a batch size of 4 to get it on the card (or even 2 sometimes) which YLC would probably agree is too small. I am seeking to find what I can get on there (say up to a limit of 32 if you so desire…) and still not get an OOM error.
This is not going to be scientific in any way, but here’s the result of ONE run of Karpathy’s char_rnn, on a 2000 recipe file, with two vastly different batch sizes. I have no idea if this is apples to apples or not (probably not), I just went with a number of epochs that resulted in similar validation loss.
BS 32, 3 epochs (3399 iterations), validation loss = 1.0353
BS 512, 25 epochs (1750 iterations), validation loss = 1.0395
The 512 run ran in half the time. I am seeing slight evidence that 32 may have “worked” better. In my sample output runs, the 512 run seems more likely to get “stuck” and create unnecessarily long strings of either ingredients or instructions, than the 32 run. It also may be more likely to come up with nonsense words. So perhaps there is some truth to the paper…I don’t know how to prove that on a grander scale.
For fun, here are a couple of “recipes” that cranked out. Again I credit/blame Janelle Shane for introducing me to a much too addictive hobby.
Example batch size 32 recipe:
Title: Mayonana Dips
Categories: Appetizers
Servings: 6
2 T Sugar
1/4 c Minced Mushrooms
2 c Shredded Cream (It
1/2 t Vanilla (pork of half pepper
Packles and sugar and add flour and at 3"xifusiage
pottoves on top with salt and pepper. Cook and 1/2" stock
patties and place on the soup mix in cold water sticks and
stir in the cake to boil, at 350 degrees F. Cup upauts each or
do in a large skillet rin platter. Combine flakes and
combine sheet before strips, then the cheese. Stir in a dork.
Chill. Roll butter and flake in bowl. Drain the egg whites
thoroughly. From The Gazette, about 10 minutes more tightly
nightly sprinkle with roll is not layered potatoes and finely
rocan cheese. Place on grease the cheese.
Add water and set aside.
Example batch size 512 recipe - In general, it seemed harder to get recipes of reasonable length than with BS=32. Also I notice about twice as many words are showing as spellcheck errors:
Title: Dips Meat Beef Fillonts
Categories: Camerion Salads Vegetables
Servings: 1
1 c Chocolate Styaked fruits; Preparaduse
1 x Dash (cal)
1 ea Large Eggs, 112"
1 c Almonds (Upting:
Stock the tomatoes should be cooking tomatoes half and
large rice and remove tomatoes. Bake in 300
degree F. on 1 1/2 hours or until together is
to golded bowl. Cool on a simmer, stirring
occasionally. Add pepper and cheese. Cook until
stand if strawberries. Bake in an 60% pownres. Add
balasting for 2 minutes. (Fi’st the egg of
the reserves. Stir in the peppers and cut of
boiling artraining in baking dish. Toss to a hot oil
large for filling. Let stand in a slowly for 30 minutes.
Preas about 1 cups of sour cream
This is really stretching my neurons! The paper always uses the same number of epochs for all batch sizes, so my comparison is indeed not apples to apples. But I thought you could just run more epochs, yet still save time overall because of the bigger batch size. Instead, I’m seeing that more epochs may not give the same quality results as a lower batch size. The validation loss may look just as good, but the actual real-world results seem to be lacking.
Anyway, 3 epochs at BS=4 would have taken over 27000 iterations, so I took the liberty of stopping at 2, where the validation loss had already gotten to about 1.03. Again, tiny sample size, but a resulting recipe of about equal length as shown above, had very few non-words in it. Keeping with the culinary theme, there may be something about slow cooking: it takes longer but might be worth it, in some or most instances. Would love to hear others’ results as well, on other models. Until then, stir in a dork!
BS=4 recipe:
Title: Cookin Sweet Chili Pie Steak
Categories: Poultry Main dish
Servings: 8
4 oz syrup in 1/2 inch stick
1/4 c Raspberry to taste
1 c Parsley, sifted
1/2 c Southers garnish ground blend
8 T Carrots, chopped
1 ea Green pepper, chopped
In a large bowl in oil and (325 for 10 minutes.
Set aside to cook 1 minutes more. Spoon 1/2 t ground
jelly or in the chopped salad of the remaining water, blending
oil on top of the oil in pidring the top of the
salad on both simmer on it to make pan and stir until meat
until heady is use in a small begins in a mo .
small of the chops into greased to make in a
small can black stuffing. When slice of plates.
I built a box to run the lessons on. I call it Wimpy. In all the exercises that permit a change to the bs (not cats and dogs) I always had to go down to 2, 4, 8, or 12 to keep from oom (Zotec gt1030 2Gb onboard). I did a bs=1 but got a weird error and didn’t want to track it down. I’ve always been able to get similar results to what the class lecture/notes show. In one case, I did a trifle better. It can take a lot of time, though.
Please, no more recipes! after the last one my brain is Kentucky fried. @yeldarb @crayoneater @bfarzin