For those who run their own AI box, or want to

If anyone is looking for a basic dockerfile to reference you can check on the one I maintain for myself here: fastaiv1-docker/dockerfile at master · matdmiller/fastaiv1-docker · GitHub

It is a little more basic than some of the other ones I’ve seen any may be slightly easier for you to modify for your use cases. Some of the directories in here are specific to my machine (ex: /home/mathewmiller ) and will need to be changed for your setup. I am running ubuntu 18.04 on my machine with a 3090 and Nvidia driver version 470.103.01. The docker run command I use is in a comment at the bottom of the dockerfile. If you’re running a different version of ubuntu you may want to switch the base image to your version at the top of the file. I am using portianer for container management. I am certainly not a docker expert so there are probably some ‘dumb’ things I’m doing, but this setup works for me and I’ve been using docker to manage my fast.ai environment for several years now. I moved to docker because I’ve accidentally messed up my environment too many times to count and found this to be the easiest way to get back up and running quickly when that happens by just recreating the container.

If you have any questions or suggestions please let me know!

4 Likes

Hi Balnazzar,

This might help you out: Fast.ai Docker containers from SeeMe.ai

I’ve been building/publishing since v1.0.60 and release them for all new versions. All the images have CUDA and Jupyter notebook support. There’s TODO to build multiple versions using different base images (NGC, …) but can’t give a timeline on that.

Documentation could be better, but let me know if you have issues/questions.

[Note: These are the official fast.ai docker containers, but last time I checked, there was no CUDA support or fastai version support (would love to be wrong on this :slight_smile: )]

4 Likes

Thanks to both of you :slight_smile:

1 Like

Assembling your own box is rather straightforward. What could be tricky is make sure that all components work together. I think easiest (so you don’t have to send back stuff and get something else) is to use someone elses build that was tested for DL workload. Power supply unit is important as graphics cards, especially used for DL have high demand for power, also transient. Number of PCIE lanes that are supported between cpu, motherboard and gpu, etc. Most makers, like (lambda labs, will not show all specs they use to build a box). There will be plenty blogposts with builds, but also there a company System76, which is open source compnay, They publish list of materials used for rigs they sell on github. Partpicker is also a good source of build specs.
Happy building!

4 Likes

Thanks a lot @miwojc ! I’m always worried about touching the hardware stuff :)) I hope step by step that’ll be alright

1 Like

That’s an excellent question! I don’t know for sure, but they published a tag couple days ago and I sort of assumed it had the up-to-date version. The version they seem to have is 2.6.0 and I’m able to run the first notebook without problems.

https://hub.docker.com/r/paperspace/fastai/tags

3 Likes

Thanks for posting your config @matdmiller ! I have a few questions but first a general one :

q1: Would you happen to know if it’s ok to use the latest tag for ubuntu and use the matching cudatoolkit? (line #1 and line#35) (eg they have “nvidia/cuda:11.3.1-runtime-ubuntu20.04” and I do have ubuntu20.04 installed). BTW, the paperspace container I run uses cudatoolkit 11.4 .

q2: How do I build with this file? just a “docker build [[dockerfilename]]” would work?

Also, I’m just a beginner so please excuse if it’s too basic, but som questions line by line in your dockerfile:

#3 I installed nvidia drivers locally on my machine, does this line mean it’ll install “the appropriate driver” in the container itself or ignore if there is a driver already present.

#49 how do I change the password if I don’t know Jeremy’s password. Can I just copy my current setup’s hash in there? u">>my existing password hash>>"

#53 are your fastai/fastbook repos installed in the /home/mathewmiller dir on the host that the container just points to via the /home/ mapping?

#64 with the -v directive does it just map a dir named /home/mathewmiller to the host /home/mathewmiller I don’t see it being created so I think that’s probably the case ?

Thanks for posting your file, for a noob like me, this is something I can at least try to wrap my head around. I would really like to build my own container instead of relying on paperspace (which works great btw) but is a bit of a blackbox to me.

I’m not sure what you mean by the ‘latest’ tag. The tag you have listed should be fine. I don’t think there is a latest container image tag in dockerhub if that’s what you’re referring to. As of today in the pytorch install instructions cuda toolkit 11.3 https://pytorch.org/ is what is specified. The entire image takes about 15 min on my machine to build so you can try things out pretty quickly and easily and just delete the image if it doesn’t work. To test new images I just try and train a model and if that works as expected then it’s probably good to go. I do always keep my old images/containers for some time just so I can jump back to them if needed.

I use Portainer (Install Portainer with Docker on Linux - Portainer Documentation) to manage my docker environment which includes the ability to build images using a dockerfile. It’s a pretty simple web ui that runs in a container itself. I don’t mess around a whole lot with docker directly so this tool makes things easier for me so I don’t have to remember all of the docker commands. It is certainly not required though, everything can be done in the command line. The only thing I have to do outside of Portainer is running the initial docker run ... command because Portainer does not allow you to specify shared memory sizing (or at least it didn’t when I started using Portainer). Your command should work but I would also supply a tag parameter as well so you can easily keep track of versions of your images: docker build | Docker Documentation .

The GPU driver install should I believe just be on the host machine OS, not within the container.

Yes, if you already have a hash you can just use your existing hash or generate a new one using jupyter notebook password. dl_course was the one from previous courses

Yes. I do not keep any data stored inside the container so I don’t have to worry about losing it when updating or resetting containers. That directory exists on the host system. I decided to have the path inside and outside of my container be exactly the same so I can work with environments outside of docker if I want to and not have to worry about updating paths.

Yes that maps that directory on the host system into the container. If that directory inside of the container didn’t exist it creates and then links it. It’s not necessary to use the same directory both inside and outside of the container, but I do it so if I’m running anything outside of the container I don’t have to worry about changing paths. My user home directory in hindsight probably wasn’t the optimal choice of folders to share, but I was pretty inexperienced with linux when I set it up ~5 years ago.

No problem! Let me know if you have any more questions. I started off with my building my own container images and then switched to the fastdotai images when they became available, but the maintainer of them recently moved on and deprecated them so I switched back to building my own again recently. Unfortunately the fastai containers that Jeremy and team maintain are only for CPU and they only use them for CI testing I believe.

2 Likes

Hi @balnazzar,

You can go to paperspace.com, sign up for a free account and use Gradient to create a Notebook environment that spins up on the Fast.ai container.

Otherwise, you can find the fast.ai container for this course on Docker Hub with the following URI: paperspace/fastai:2.0-fastbook-2022-04-25-rc2.

Hope this helps! Let me know if you run into any issues.

2 Likes

Thanks Mathew for taking the time to give a detailed reply to my questions and explaining it, I really appreciate it!

As you mentioned, I’m going to try it, I like the idea of building it from scratch instead of relying on the paperspace containers which seem up-to-date, but I’d rather be independent of that if I can help it. I’ll definitely check out portainer, seems like it’ll make my life a lot easier :slight_smile:

Cheers!

1 Like

hey! hey! thanks for posting that! :slight_smile: it came in real handy just as the class was starting and it’s working like a charm so far (though I’m going to try my hand at building my own based on some advice I got on this forum)

Congrats again on the win!

It’s a beast of a card! :smiley: Many of us here started with 1080Tis or 2080Tis-its the successor to that so I think you’ll have a great start :slight_smile:

2 Likes

Still training all of blurr with a single 1080Ti so, yah, you’ll be in great shape!

5 Likes

I’m trying this dockerfile out with some customizations but it complains about python-qt4 not being available. I know I can get it from another repo, but is it really needed for fastai work? I didn’t see another repo being setup in your dockerfile.

BTW, Portainer seems very interesting but when I try to build through it it doesn’t really give any logs and just dies, so I had to go the commandline build route after all.

I’m not sure to be honest. This dockerfile has been adapted over quite a few years so this may not be needed anymore and just should be cleaned up. You can try to get rid of it and see if it works. Maybe someone else can help chime in on this one?

Portainer doesn’t do a good job of showing status updates while it’s building containers. When it’s done the logs should show up automatically I believe, but if not there is an ‘output’ tab at the top that shows the logs.

1 Like

I run PyTorch on my M1 Mac. It comes down to using miniforge (or mambaforge) in place of miniconda. Miniforge supports apple silicon and by default uses the conda-forge channel only. As of now the pytorch team is still working on M1 gpu support (ETA several months) so you’ll be limited to cpu only for now.

That said, there’re a lot of cases where it’s convenient to work on the data infrastructure of your model and use a small sample set to make sure everything’s working, before running it on a larger cloud machine to train.

As for how to set this up: a quick search should give you the most updated results, since even now I had to check if what I was saying was still current.


edit: wow, so apparently there is M1 gpu support for PyTorch. As @ balnazzar linked earlier: The SHARK runtime can do it, and is built on MLIR - which is a project related to SwiftForTensorflow that saw some collaboration btwn Google Brain and the FastAI community when Chris Lattner was there.

3 Likes

Thanks I’ll clean it up, Re: the logs tab in portainer and I was thinking it would give me some idea about what kind of failure was happening but it just doesn’t do it. The interface is really cool and makes managing all the containers so much easier! thanks for mentioning that.

I was documenting a recipe for jupyterlab / fastai via conda setup on Debian 11 for myself, and moved it to a public repository in case this is useful to someone.

If you have time and will, do some experiments with it and let us know :wink:

4 Likes


When you build an image you can see the complete build log output in the ‘Output’ tab. You will not be able to click on this tab until the build is complete or has failed. I assume this is what you mean when you say the ‘logs’?

1 Like