Docker image went from 606MB to 2.95GB by adding fastai

Just added this to my requirements.txt …

fastai==1.0.52

… and the docker image size jumped from 606MB to 2.95GB.

Is that expected?

Assuming it is, is there a fastai-lite in the pipeline that could be used in a web app for inference only with a smaller footprint?

For full reference here is my Dockerfile:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

COPY ./app/api /app

ENV MODULE_NAME=app

RUN pip install --upgrade pip
RUN pip install torch==1.3.1+cpu torchvision==0.4.2+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install --upgrade -r requirements.txt

and here is my requirements.txt file …

aiohttp==3.6.0
aiofiles==0.4.0
Click==7.0
fastapi==0.42.0
gunicorn==19.9.0
h11==0.8.1
httptools==0.0.13
joblib==0.14.0
pydantic==0.32.2
python-multipart==0.0.5
starlette==0.12.9
uvicorn==0.10.0
uvloop==0.13.0
websockets==8.0.2

scikit-learn==0.21.3
fastai==1.0.52
1 Like

Can you see what’s taking that space? spacy is 417MB, and pytorch is 828MB. You can cut some space by removing any languages you don’t need in spacy/lang, but you can’t do much about PyTorch’s space.

I’m not sure what’s using the rest of your space. Here’s something you can use to find out:

https://dev.yorhel.nl/ncdu

Let us know what you find out.

1 Like

Rebuilt my image with the following requirements.txt (I moved pip installing the CPU friendly torch/torchvision into here vs. doing it in the Dockerfile:

aiohttp==3.6.0
aiofiles==0.4.0
Click==7.0
fastapi==0.42.0
gunicorn==19.9.0
h11==0.8.1
httptools==0.0.13
joblib==0.14.0
pydantic==0.32.2
python-multipart==0.0.5
starlette==0.12.9
uvicorn==0.10.0
uvloop==0.13.0
websockets==8.0.2

--find-links https://download.pytorch.org/whl/torch_stable.html
torch==1.3.1+cpu
torchvision==0.4.2+cpu

scikit-learn==0.21.3
fastai==1.0.52

Docker Image size = 2.07GB.

It seems that how you specify your layers in the Dockerfile is important.

Using the ncdu tool (thanks for that btw jeremy!) I notice the following:

  1. /root/.cache/pip = 249.3MB (247.9 of this is under the /http folder). Question: Can I remove this from the image? If so, that give me back 250MB which would be nice.

  2. /usr/local/lib/python3.7/site-packages = 842.7Mb. torch is the biggest package at 341.7Mb and outside of that, everything looks reasonable with scipy, numpy, spacy, pandas adding collectively another 231Mbs.

So aside from being able to remove the /root/.cache files, it doesn’t look like there is much else I can do to reduce the file size. So another question: What is a reasonable image size of an inference only docker image? I know the answer is probably app-specific, but would be great to hear what other folks are seeing and to know if I’m in the ballpark of it looking right or off.

EDIT:

Here is my latest Dockerfile if anyone is interested:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

ENV MODULE_NAME=app

RUN apt-get update && apt-get install -y ncdu \
    && rm -rf /var/lib/apt/lists/*

COPY ./app/api/requirements.txt /app
RUN pip install --upgrade pip
RUN pip install --upgrade -r requirements.txt
RUN rm -rf /root/.cache

COPY ./app/api /app

re: Question: *Can I remove /root/.cache/pip from the image? Seems like the answer is “yes” (see Dockerfile below) … and doing so brings the image to ~ 2.05GB.

Here is a screenshot of what ncdu tells me after making these changes:

1 Like

UPDATE

Deployed this image to AWS ECR and their dashboard reports the size as 871.11

Not sure what docker image ls reports one size, ncdu another (at least it doesn’t seem to add up to what the former says), and AWS yet another size. Anyone in the know care to explain?

1 Like

Hi How do you run ncdu on the docker image so as to see these directory/file sizes in the docker image?

I found the answer to my own question:

docker run -it <image> ncdu /

Thanks

1 Like

Hi
Similar observation here. Kindly take a look at these screenshots.

docker system df -v

reports close 3.2 GB, whereas…

ncdu

Reports 2.3 GB.

Which is the right one?

Here’s my Dockerfile:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim

RUN apt-get update && \
    apt-get install -y git gcc ncdu && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt .

RUN pip install --upgrade pip
RUN pip install --upgrade -r requirements.txt
RUN rm -rf /root/.cache

COPY app app/

RUN python app/server.py

EXPOSE 5000

CMD ["python", "app/server.py", "serve"]

And requirements.txt

aiofiles
aiohttp
asyncio
fastai
python-multipart

There is a difference between an image and a container … so it seems the size of the image is 3.24GB while the size of an instance of that image running is 2.2GB.

This article may help … I’m by no means an expert on Docker and so I may be off myself. As you can see, when I deploy to AWS ECS it comes back as much smaller (and I’m assuming AWS does some kind of compression to make that happen).