Information concerning training a NN on the Scitas cluster
Hello,
Having a lot of datas, we are required to train a Neural network on multiple GPUs, the lab we are doing the project with gave us access to the Scitas's Izar cluster.
The code we want to run in parallel uses Pytorch, however we are having issues to use this cluster, is there any TA who is familiar with this particular cluster or should we maybe ask to the cluster's staff ?
we've had some trouble to install DL libraries on scitas. did you ask scitas support yet if they support pytorch combined with parallel training? do you just need naive parallelization of jobs, or joint data-parallel training?
the second might be easier on public clouds, where you also get some free starting budget. code see for example https://mlbench.github.io/
Information concerning training a NN on the Scitas cluster
Hello,
Having a lot of datas, we are required to train a Neural network on multiple GPUs, the lab we are doing the project with gave us access to the Scitas's Izar cluster.
The code we want to run in parallel uses Pytorch, however we are having issues to use this cluster, is there any TA who is familiar with this particular cluster or should we maybe ask to the cluster's staff ?
we've had some trouble to install DL libraries on scitas. did you ask scitas support yet if they support pytorch combined with parallel training? do you just need naive parallelization of jobs, or joint data-parallel training?
the second might be easier on public clouds, where you also get some free starting budget. code see for example https://mlbench.github.io/
Add comment