with my team, we are studying to work on a topic related to Quantized SGD and Communication-Efficient SGD with it.
To make experiment we noticed that it exist the python lib "multiprocessing" that allow to make parallel computation using each CPU core of the machine.
With only two cores, is it a good way to study the effect on communication? Or do you advice us to do it in an other way (like multithreading with a number of threads superior to the number of cores)? Or not use real parallelization at all?
Thank you in advance!
It depends what you want to do: if you want to measure actual timings, you should probably use an environment that is as much as possible close to 'real-life'. If you want to use pytorch, have a look at torch.distributed for that, for example.
If you don't care about the precise run time, and want to study properties of the results (number of steps, bits communicated), it doesn't really matter what you do. You can even write the updates of multiple workers as batched numpy/pytorch operations.
Hope this helps!
Thank you for your answer!
Ok, there is two totally different ways to measure the effect. In real time or measuring steps/bits communicated.
Thank you for the torch.distributed lib.
We will continue maturing and search for examples in publications to decide our way.