AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization
Authors:
Li Chen, Justinas Lingys, Kai Chen, Feng Liu
Current method for Datacenter traffic optimization:
A monitoring system is used for traffic data collection, after enough data is collected it is presented to engineers who perform data analysis, apply their application layer knowledge to produce a heuristic. This heuristic is tested using run simulation tools to find the optimal parameter settings. This process takes weeks or even months. In this way traffic optimization policies are generated for varying traffic load, flow size distribution, traffic concentration, etc.
The problem:
Due to the large turn around time there is data staleness because the characteristics traffic distribution that the system was built on maybe different from the current traffic characteristics and even when the traffic distributions are the same there maybe parameter- environment miss match which leads to a performance degradation. Both these problems can be avoided is the turn around time is reduced to seconds. Thus the authors want to automate the entire process with Deep Reinforcement Learning to bring the turn around time down to seconds.
The need for Deep Reinforcement Learning:
Deep Reinforcement Learning is the combination of Deep Learning with Reinforcement Learning. There have been many recent successes where Deep Reinforcement Learning techniques solved problems which had difficult environments and needed complex sequential decision making like DeepMind’s Atari and AlphaGo. Deep Reinforcement Learning models allow us to solve complex control problems end to end.
Method:
In Flow Scheduling the authors order the flows using priorities. At each time step, Reinforcement Learning agent collects the states which are the current or finished flows, generates an action for each active flow, updates the policy based on reward function and the Deep Neural Network is trained to optimize the policy. Short flows come and go quickly unlike long flows. The long flows transfer more bytes of information than the short flows. So the Long flows can tolerated delays in the model. Thus a Multi Level feedback Queue was designed, so all flows start with high priority for some fixed optimized threshold (based on Deep Reinforcement Learning) then get a lower priority for another fixed optimized threshold thus the long flows will descend to the lowest priority. They use a peripheral system at the end host that has 3 parts - A Central System with contains the Deep Reinforcement model, a Monitoring System that reports flow information to the Central System and an Enforcement Module that runs Multi Level Feedback Queues and sends packets according to its flow’s queue.
Points that I liked:
- The Authors are able to create an automated system that gives results in a few seconds. Thus they solve the problem that they started out to fix.
- The Authors came across a problem with long and short flows because of how characteristically different they are. I like how they came up with the Multi Level Feedback Queues to separate long flows and short flows.
- Unlike the long flows, the short flows occur very frequently and takes place so fast they cannot be handled in the same way that the long flows are handled. The authors came up with a new method such that short flows are handled at the end host using Deep Reinforcement Learning optimized thresholds.
- They have implemented an AuTO with reusable components. Since it is written in python it is compatible to machine learning frameworks like Keras, TensorFlow. Thus there work can be reused in the Machine Learning community.
- They extensively evaluate how their model performs under different scenarios that could occur in a datacenter environment.
Points that I was concerned with:
- For experimentation they use flow generators to produce traffic based on realistic workloads, this is not the same as actually deploying it in a live datacenter. Thus even through their model seems optimal in the environment they used it may not perform the same when deployed in a live datacenter environment.
- In the dynamic scenario experiment that they explained the performance of fixed-parameter heuristics suffered greatly when their parameters mismatched the environment. Thus there could be large degradation if the turn around time increases.
- They a running experiements in a system that models a datacenter at 1000 flows per second in order to test their algorithm, this much lower than an actual datacenter.
-Nishka
I agree with the positive points!
ReplyDeleteThey do make some claims about scaling, at least with respect to the CS operations, but you are correct that the experiments are done on a very small DC prototype with a relatively small # of flows. They hand-wave a bit and say DRL can be scaled by parallelism and GPUs.