For those who run their own AI box, or want to

This thread is a bit different from “Setup Help” since it exclusively focus on a local setup, so it’s meant for people who prefer to run their own ML/DL/AI machine rather than a cloud solution.
We can use it to share opinions, tips, advices, etc, about the matter, both on hardware- and on software-related aspects.

I’ll start with my two cents, mainly aimed at beginners that are thinking about building their first machine. Please note that they represent just my personal opinion.

Hardware:

  1. The first thing you’ll need is of course a GPU. The RTX 3060 12Gb is a great choice for starters. Now that the mining craze has finally come to an end, it can be had for 450$/eur.
    It’s reasonably fast and has a decent amount of VRAM.
    In theory, an even better choice would be the AMD rx6600 XT, which comes with 16Gb for the same price, but that will force you to meddle with DirectML under WSL2 (see below). If you don’t know what DirectML is, go for the Nvidia card and save yourself some headaches.

  2. If you already have a Nvidia GPU, 8gb are kind of a bare minimum, and architectures from Pascal onwards can make use of fp16 computations so to spare a lot of vram.

  3. The cpu is not terribly important. Just buy the best you can afford, with two caveats:

  • It’s better to buy a cpu with integrated graphics. You can connect the monitor(s) to it, so that the main GPU will be left alone for computation, and you’ll not occupy VRAM, which is a valuable and scarce resource. Alternatively, buy a cheap discrete card (e.g. Nvidia T400, ~100$) and snatch it into a vacant slot.
  • Intel still has an edge over amd with libraries like MKL, and their processors have integrated graphics more often than their amd counterparts.
  1. RAM: the more you have, the better. But be sure to get at least twice you VRAM amount. ECC ram can help in improving stability, but unfortunately isn’t always an option for consumer-grade cpus. Amd consumer processors do support it ‘unofficially’

  2. Storage: any nvme ssd will do.

Software:

  1. Essentially, you have three (and half) options:
  • A full-fledged linux box. You can run fastai/pytorch in a docker NGC container or in a conda environment. In both cases you don’t need to worry about CUDA and cuDNN, for the standard fastai/fastbook installations will take automatically care of them. On one hand, docker would be a bit more preferable for a beginner, since it provides an additional layer of insulation against mistakes. On the other hand, a conda env on the bare metal is more straightforward (and in line with the official installation instructions).
  • WSL2: that is, the windows subsystem for linux, a form a lightweight virtual machine with (quasi-)direct access to the hardware . Preferable if you use windows as your daily driver OS. Unlike its first iteration, it’s quite mature and reliable, and the GPU will run almost as it would if accessed directly.
  • Direct fastai installation in a windows’ conda env. Not recommended. It’s likely that you will stumble into issues of various nature.
  • A third-and-half option, if you are a die-hard Apple fan, would be installing pytorch for M1 so to leverage Apple’s neural engine. It’s likely that fastai installation would then become a journey of pain, but that’s just my guess: I have no experience with that.
  1. Tools: I’m quite happy with VS Code (or Codium, if you don’t like telemetry). It integrates beautifully with Jupyter and it even has a WSL2-specific extension. And of course there is nbdev, and a lot of other tools (many of them made by Jeremy or his students).

Personally I use a windows 11 box for my ordinary computing tasks. It has a 2060 Super that I use with WSL2 for little experiments. If I need to perform more serious AI-related work, I use my other machine, that is a Linux box with an A6000.

Any contribution/opinion would be highly welcome.

23 Likes

Is there such a thing?

I’m sure @init_27 has some advice, tips, & tricks.

Would love to see some pick lists at different budget tiers, and here more about setting up a docker environment to run things so as to avoid having to mess with cuda/cudnn at the system level (I have yet to be successful in getting nvidia docker images to see my gpus)

1 Like

Yes. Tensorflow, as well. But I don’t own no M1 mac, so it’s for others to try out. Benchmarks here and there report good performances (relative to the thermal footprint).

1 Like

It’s really easy.

  • install docker, the simplest way is just sudo apt install docker.io. Then enable it with systemd.
  • download an image you like from Nvidia NGC hub. For example the cuda base container.
  • install miniconda (if absent) and fastai into it.

And you are done. You may want to install RAPIDS too, or just start by downloading the RAPIDS container and install all the other stuff into it.

1 Like

Got a link handy?

Sure: https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6

But probably there are newer articles.

1 Like

I think the issue is still that it won’t utilize the GPU … per that article: “The next milestone would probably be a library or a plugin that would allow Pytorch to utilize the GPU.”

That would be a game changer.

2 Likes

Indeed it was kind of old-ish. What about this one: PyTorch on Apple M1 MAX GPUs with SHARK – 2X faster than TensorFlow-Metal – nod.ai

1 Like

From what I’ve understood so far, there are exact 3 “stages” that needs to be cleared for you to be able to use docker with nvidia gpu support smoothly.

In the Step 3 docs, once you have “cleared” the step where it asks you to run

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

your system should be ready for using nvidia gpus in docker containers.

If you scroll further up on the nvidia docs listed in Step 3, you can also see the docs for pre-requisites as specified by Nvidia themselves.
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites

I’m hoping this helps. If it still doesn’t work, and you’d like to give it a try over a video call etc., I’m more than happy to help/debug together. :beers:

7 Likes

I took that for granted, but you did good in highlighting it.

Correct, it’s usually a bit older than the latest version. BUT if you want to install docker on a non-standard or non-LTS distro, you have to know what to do and how to troubleshoot, while docker.io in the distro repository works all the same and will be automatically taken care of (dependencies, updates, etc…).

It would be sweet if there was a repo complete with a docker-compose.yml and a DockerFile out there that you could just clone the repo, run docker-compose build, docker-compose up and be done.

Thanks a lot. I’ll have an NVIDIA RTX 3080 Ti GPU in the next few days and definitely need to check this thread to build a box (I always used a GPU cloud before). Do you know how good is it compared to the GPU in the cloud? I never build a computer before :sweat_smile:

4 Likes

I’d like to disagree with you a bit, on docker.io working all the same, but hey ! if it’s working for you, that’s all that really matters ! :beers:

I do have a setup just like this, but it’s slight tuned for my preferences. If you’re willing to go through some bit, I’ve tried to document most of it, shared in this thread.

https://forums.fast.ai/t/setup-help/95289/17?u=suvash

1 Like

It’s difficult to estimate, since you can get a lot of solutions in the cloud. Generally, we can say that a 3080ti will do a lot better than free cloud services.

But the main advantage is that you’ll have total control upon the environment so to tailor it for your specific needs and tastes, what is to be installed and what’s not, no time limits, no worries of leaving it on, no storage constraints, etc…

1 Like

What problem(s) did you encounter with the apt package? With the official distribution of docker I had to trick it into thinking it was being installed on a 20.04 (while it was a 21.10) and, more importantly, it tried (continuously) to have its virtual storage mounted system-wide. Both issues solved by googling, but time is a valuable resource.
In the end, it seems that it all amounts to our personal experience with such nasty things :wink:

1 Like

I know tensorflow for M1 exists but I didn’t know pytorch had been ported as well. Last I heard around these here forums was that the team was working on it but it was a few months ago.

1 Like

I had quite a few issues trying to install the stack locally, so I ended up going with a docker solution out of sheer desperation and it seems to work. Since I’m lazier than most people, I just use the paperspace fastai docker containers that they publish and it seems to work pretty well for me. The steps are pretty much the same as the ones that Suvash mentioned in the reply below.

2 Likes

The publish docker images with the latest versions of fastai? Wonderful!
Please provide a link, I happen to be extremely lazy as well :smiley:

2 Likes