Connect your moderator Slack workspace to receive post notifications:
Sign in with Slack

Information concerning training a NN on the Scitas cluster

Hello,
Having a lot of datas, we are required to train a Neural network on multiple GPUs, the lab we are doing the project with gave us access to the Scitas's Izar cluster.
The code we want to run in parallel uses Pytorch, however we are having issues to use this cluster, is there any TA who is familiar with this particular cluster or should we maybe ask to the cluster's staff ?

we've had some trouble to install DL libraries on scitas. did you ask scitas support yet if they support pytorch combined with parallel training? do you just need naive parallelization of jobs, or joint data-parallel training?
the second might be easier on public clouds, where you also get some free starting budget. code see for example https://mlbench.github.io/

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification