CAPES: Unsupervised Storage Performance Tuning using Neural Network-Based Deep Reinforcement Learning
TL;DR: CAPES is a model-less deep reinforcement learning based unsupervised system parameter tuning framework that uses deep neural network for implementation. It uses the Q learning paradaigm to suggest changes to the systems parameters such that it optimizes for performance/efficiency or whatever the end goal is.
Problem statement
Often systems need tweaking to make it work such that its performance is at its possible best or on other words many systems applications need optimizing to attain the best result. Tuning for such optimum configuration needs a lot of time, domain knowledge and expertise which could be hard if the enterprise is small enterprise where they cannot afford both time and money to attain the optimum. Through this paper Yan Li, et al. try to propose a generalized framework that can to the task of parameter tuning for any target system online such that its performance meets the specified goal.
This paper masks the need for prior domain knowledge and adapts to dynamically changing workloads.
Here the learning is based on the Reinforcement learning where the authors are trying to improve the performance of a distributed lustre file systems. Lustre FS is widely used FS for large clusters such as super computers.
Model
The idea is to maximize the Action-Value function that represents the best sum of rewards we can get.
CAPES framework consists of a Deep Reinforcement Learning(DRL) engine, for demonstration purposes the RL consists of 2 hidden layers. An interface daemon which is solely responsible for talking to the target system i.e. gather information(performance indicators) and suggest actions(control actions) based on the consultation of DRL and replay db.
Strong Points
- CAPES is more liberal towards the choice of the input features. This is one of distinguishing factors from the previous implementations that makes CAPES a good candidate as it removes the need for a domain expert.
- As delay between decisions, actions and measurements greatly affects the performance and it becomes tough to identify which parameter change actually impacted the systems' performance. The fact that the authors have identified the the delay between an action and reward does not affect their model as Q - function ultimately converges to the optimal oracle like function makes this framework stand out in my opinion.
- CAPES is reinforcement learning framework that can be applied to any target system, in that way the design is universal.
- Authors also propose having sanity checking for the parameters/configurations generated which could avoid highly egregious parameters(like having NumCPUs=0 among others)
CRITIQUE
- With increase in the number of tunable parameters the search space grows exponentially for any system. The proposed model does not clearly state about the steps for tuning(what are ranges of parameters chosen and what is its underlying distribution) at time instant t. Rather it focuses on predicting performance of the configuration and computing error based on performance at t+1.
- For the application chosen it takes >12h for training the network to find the optimal policy for optimizing the file server workload. Perhaps, a cost vs. benefit analysis would help. Also, it is not clear for me from the discussion in the paper as to what is chosen as baseline, was it actually tuned by a domain expert or was factory tuning of the application?
- CAPES performs well for writing workload achieving a performance of 1.45x
where as not much of a improvement in the read workload.
- The reason quoted for this is: tuning congestion window for read does not have much impact as opposed to write. Perhaps, it is not clear why so?
- There could be other better workloads that could be benefited little more than LFS
--Shashank Hegde
Good point about the baseline. How good was it? I also agree they don't provide much insight into how the system tuned the parameters using the DRL. It would have be interesting to see what values it picked over time. Very few papers seem to go into this direction.
ReplyDelete