Thanks Lucas, in the end I ended up switching over to an nvidia card just recently. I had almost 6 months invested on this challenge but I’m mainly just getting started . Mainly wanted to learn a bit about Deep Learning without using an online service so nothing serious at this point. I was trying to get a upper-end budget AMD PC working with a reasonable performance on the lessons but it wasn’t to be.
Many of the issues I encountered while initially trying to get this working on the RX 580 were actually low level firmware issues in my case.
ASUS didn’t correctly set up compliant ACPI tables in the BIOS/firmware during the boot process. They seemed to use a kludge to get Ryzen CPU’s working in the first place. There were also other issues at this level which caused bricks when toggling preset values they had set. As a result rocm would not function correctly before getting to the docker stage.
Contacting ASUS was futile, they wouldn’t admit the defect, but offered to replace the board via RMA and this process in pandemic times is lengthy.
For those that may be running into similar issues, or run across this later. The critical issues for the mainboard seemed to stem from PCIE lines not being directly connected with the CPU. The only check I could find that’s visible when purchasing the board for these seemed to be whether the motherboard does/doesn’t support SLI/Crossfire which as far as I’m aware requires this.
The board/CPU may support PCIE Atomics needed by rocm but there may still be firmware level challenges to overcome. As far as I’m aware, no one with a SLI/Crossfire compatible mainboard has had issues issues.
The combo I worked with was an ASUS Prime B450 Plus which supported PCIE Atomics on the Ryzen 5 chip but would not enumerate the devices properly when an APU capable CPU was present alongside a GPU.
I read somewhere the problematic boards were translating PCIE lanes but I’m not enough of a wizard to dig that deep into the problem to confirm at this stage.