[Project] Stanford-Cars with fastai v1

Do you have the train and test image files in folders in your Google Drive? If so you should be able to more of less just run my code I think. I’m not familiar with linking Drive to Colab (e.g. what the “path” should look like), but I think that should be the only change needed. I think sharing your notebook might help yep :slight_smile:

It worked!!

Thanks very much, great implementation by the way.

1 Like

great to hear :smiley:

Hello!
I’m curious why you trained your model in notebook 9 with .to_fp16() even though you were training it on a p4000? That makes it 64x slower than fp32. Also, why did you pick 15e-4 to be your learning rate?

Hey!

I pick the lr because thats what had generally worked in previous experiments.

re fp_16(), wouldn’t it be faster than fp32? As you can train with larger batch sizes? Or do the P4000s not work as well with fp16 as other cards (e.g. 2070/2080 etc)?

It’s only faster when the GPU processes fp16 faster than fp32. For example:
P4000:
image
1080Ti:
image
2080Ti:
image

Take my words with a grain of salt though as this article claims to have gotten ~20% faster training with mixed precision training on a 1080Ti too. Also, there was a paper posted here that suggested not to go over 32 batches: The "BS<=32" paper
I’m curious to know your opinion about this matter because it’s a little bit confusing.

1 Like

iiiinteresting :thinking: I’m no expert in this to be honest. Possibly its the case the FLOPS don’t correspond to memory usage? My understanding was that fp16 allows you to fit more data in RAM and so allow you use large batches

That’s true, it’s stated in the article I linked to. The question is do higher batch sizes maintain training quality? The paper in the BS<=32 thread claims that a batch size higher than 32 generally leads to worse accuracy and loss. I tried using fp16 to train a modified Stanford cars dataset and also noticed a 20% increase in training speed. I can’t document my test however because for some reason all of my data disappeared from my paperspace notebook. I’ve sent them an email to see if the data can be recovered. I was using the Efficientnet b3 + Mish and Ranger combo you used in notebook 9 in your stanford-cars repo.

Update: I’ve tried .to_fp16() on a P5000 instance on paperspace and got ~2.5x slower training compared to nothing after defining the learner and about 20% slower training time on Google Colab with GPU set as hardware accelerator. My advice? Run an epoch on fp16 and another without it and see if there’s a difference in speed. I’ll look more into this tomorrow. Please note that I did not increase the batch size (32) when I used fp16. I hope I’ll have a comprehensive test up by the end of the weekend.

1 Like

Thanks for testing! Would love to hear how fast it trains when you max out the batch size with fp16

I’m getting predictions that have confidence 5.x and 6.x when I predict things with my model trained with your notebook 9 method. Do you know how I can fix that? I just used load learner and have a ranger.py and mefficientnet in the same dir as the predictor python script.

What do you mean by 5.x or 6.x?

Sorry I didn’t explain better. I’m just really tired right now.
image
Where I used to get confidence I now get these weird numbers. Is this by design? They were between 0 and 1 before.

Are you doing anything to the raw output predictions from the model? Can you post the code of your predictor script?

I’m using this code

from starlette.responses import JSONResponse, HTMLResponse, RedirectResponse
from fastai import *
from fastai.vision import *
import torch
from pathlib import Path
from io import BytesIO
import sys
import uvicorn
import aiohttp
import asyncio

app = Starlette()
path = Path("data")
classes = #truncated list of classes
learn = load_learner(Path("data/"), "export.pkl")

@app.route("/")
def form(request):
    return HTMLResponse("""
        <h3>This app will classify cars<h3>
        <form action="/upload" method="post" enctype="multipart/form-data">
            Select image to upload:
            <input type="file" name="file">
            <input type="submit" value="Upload Image">
        </form>
        Or submit a URL:
        <form action="/classify-url" method="get">
            <input type="url" name="url">
            <input type="submit" value="Fetch and analyze image">
        </form>
    """)

@app.route("/upload", methods=["POST"])
async def upload(request):
    data = await request.form()
    bytes = await (data["file"].read())
    return predict_image_from_bytes(bytes)

@app.route("/classify-url", methods=["GET"])
async def classify_url(request):
    bytes = await get_bytes(request.query_params["url"])
    return predict_image_from_bytes(bytes)

async def get_bytes(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.read()
        
def predict_image_from_bytes(bytes):
    img = open_image(BytesIO(bytes))
    _, class_, losses = learn.predict(img)
    return JSONResponse({
        "scores": sorted(
            zip(learn.data.classes, map(float, losses)),
            key=lambda p: p[1],
            reverse=True
        )[:5]
    })
if __name__ == "__main__":
    if "serve" in sys.argv:
        uvicorn.run(app, host="0.0.0.0", port=80)```
I've tried adding up all the confidences of all classes in each prediction and they give me seemingly random numbers

@morgan I’ve published a medium story regarding fp16 vs fp32 with my test and observations from another medium article. Sorry I couldn’t add graphs, I didn’t save the raw data when I ran the tests.

2 Likes

@morgan Lukemelas moved efficientnet’s hosting and that broke the old version. I’ve made a PR to your repo that relies on the new version and still uses mish.

1 Like

Nice, thanks for the write up and the PR!

Hello @morgan . I’m having an issue whilst using the method on another dataset. Whenever I attempt to train my model, both using EfficientNet b3 and EfficientNet b7, I always get the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-21-96a7be495d6c> in <module>()
 18                ).to_fp16()
 19 
---> 20 fit_fc(learn, tot_epochs=40, lr=15e-4, start_pct=0.10, wd=1e-3, show_curve=False)
 21 
 22 learn.save(f'9_{exp_name}_run{run_count}')

8 frames
/content/MEfficientNet_PyTorch/efficientnet_pytorch/utils.py in forward(self, x)
132     def forward(self, x):
133         x = self.static_padding(x)
--> 134         x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
135         return x
136 

RuntimeError: "unfolded2d_copy" not implemented for 'Half'

I am running this in Google Colab.

My Google searches don’t turn anything up.

Thank you!

1 Like

It looks like the particular layer (or something along those lines) isn’t implemented for mixed precision yet (hence the half). Try turning off fp16 and it should work.

3 Likes