Device Placement Optimization with Reinforcement Learning
Motivation :
The main aim behind the paper is to learn which sets of operations on TensorFlow would perform best on which set of devices in order to optimize the size and computational requirements of the CPUs and GPUs for a neural network. Currently, the devices in the environment being used are heterogeneous and distributed where there is a mixture of hardware devices like CPUs and GPUs and the decision about which parts of the neural models to place on which devices is made by human experts. These decisions are based on simple heuristics and intuitions, thus there is a limitation to this approach which the authors have tackled using Reinforcement Learning.
Main points :
1.They use a sequence-to-sequence model where device placements is the output sequence and the operations in a neural network is the input sequence.
2. They use three Benchmark algorithms to make comparisons.(Recurrent Neural Network Language Model (RNNLM), Neural Machine Translation with attention mechanism (NMT) and Inception-V3) Thus giving us a clear level to make comparisons
3. The Reinforcement Learning - based placement of the Neural MT graph compared to the fine-tuned expert-designed placement shows that an improvement of 19.3% in running time.
4. The Reinforcement Learning - based placement of Inception - V3 compared to the fine-tuned expert-designed placement shows that an improvement of 19.7% in running time.
5. In an extension to their work where they increased memory footprints (increased the LSTM sizes) such that one layer would not be possible to fit into a single device then the human-designed placement is not possible. But their approach managed to find a way to place the input models on devices efficiently.
Pros :
1. They have introduced us to a graph partitioning algorithm with an open source software library called the Scotch optimizer. It is very interesting how they model the inputs and outputs of the Neural Network as graphs. This idea is interesting and can be applied to other domains as well.
2. In the related works sections we can see that they have done a lot of background research. Thus they are able to clearly state what techniques they are using and why they have used it.
Cons :
1. The Devices used are very high end devices which are very fast and also very expensive. So we cannot relate this with regular machines that we students have access to.
2. They use only the execution time and the number of available devices, they do not use other information about the hardware configuration. I feel that in an environment where there were heterogeneous devices this information would be useful.
- Nishka Monteiro
Try to think about deeper issues. Have they justified the models they are using? Are there any obvious limitations or questionable assumptions? On the positive side, their approach seems to work for quite different input workloads (ML models).
ReplyDelete