Can you see what’s taking that space? spacy is 417MB, and pytorch is 828MB. You can cut some space by removing any languages you don’t need in spacy/lang, but you can’t do much about PyTorch’s space.
I’m not sure what’s using the rest of your space. Here’s something you can use to find out:
Rebuilt my image with the following requirements.txt (I moved pip installing the CPU friendly torch/torchvision into here vs. doing it in the Dockerfile:
It seems that how you specify your layers in the Dockerfile is important.
Using the ncdu tool (thanks for that btw jeremy!) I notice the following:
/root/.cache/pip = 249.3MB (247.9 of this is under the /http folder). Question: Can I remove this from the image? If so, that give me back 250MB which would be nice.
/usr/local/lib/python3.7/site-packages = 842.7Mb. torch is the biggest package at 341.7Mb and outside of that, everything looks reasonable with scipy, numpy, spacy, pandas adding collectively another 231Mbs.
So aside from being able to remove the /root/.cache files, it doesn’t look like there is much else I can do to reduce the file size. So another question: What is a reasonable image size of an inference only docker image? I know the answer is probably app-specific, but would be great to hear what other folks are seeing and to know if I’m in the ballpark of it looking right or off.
EDIT:
Here is my latest Dockerfile if anyone is interested:
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7
ENV MODULE_NAME=app
RUN apt-get update && apt-get install -y ncdu \
&& rm -rf /var/lib/apt/lists/*
COPY ./app/api/requirements.txt /app
RUN pip install --upgrade pip
RUN pip install --upgrade -r requirements.txt
RUN rm -rf /root/.cache
COPY ./app/api /app
re: Question: *Can I remove /root/.cache/pip from the image? Seems like the answer is “yes” (see Dockerfile below) … and doing so brings the image to ~ 2.05GB.
Here is a screenshot of what ncdu tells me after making these changes:
Deployed this image to AWS ECR and their dashboard reports the size as 871.11
Not sure what docker image ls reports one size, ncdu another (at least it doesn’t seem to add up to what the former says), and AWS yet another size. Anyone in the know care to explain?
There is a difference between an image and a container … so it seems the size of the image is 3.24GB while the size of an instance of that image running is 2.2GB.
This article may help … I’m by no means an expert on Docker and so I may be off myself. As you can see, when I deploy to AWS ECS it comes back as much smaller (and I’m assuming AWS does some kind of compression to make that happen).