Automatic Database Management System Tuning Through Large-scale Machine Learning
Authors -
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang
Motivation –
Most of the work to improve database efficiency pivot around
logical or physical design of database. This paper focuses on tuning of low-level
configuration knobs of Database Management System (DBMS) like memory cache than
indexing. Database configuration tuning involves trying out various combinations/values
of knobs. This is an NP- hard problem. Database giants like IBM, Oracle and
BerkeleyDB tried to improve the efficiency of the databases using heuristics or
influence diagrams which is a manual process and requires domain experts. This
paper talks about OtterTune - an ML based tool which provides optimal values of
DBMS knobs to improve the target performance measures like latency and
throughput for a defined load.
Positive Points -
[1] OtterTune works well on any pre-listed DBMS types. It recommends
knob settings for a new application using the data stored in its repository. It
modifies the best match from repository to get the knobs for target measures.
It also uses feedback provided for the recommendation to refine its model.
[2] Ottertune is robust in terms of redundant knobs (different
names or strongly correlated knobs), granularities of knobs and knowledge of
bad or good knobs as it only considers global knobs. It uses factor analysis
and k-means for pruning to remove useless metrics.
[3] It uses DBMS’s internal runtime metrics to train the
model as these metrics are directly affected by the knobs’ settings. It used linear
regression technique named Lasso path algorithm to select impactful DBMS knobs
while also considering polynomial features making the tuning more efficient by
capturing the dependency and also reducing the search space.
[4] OtterTune uses Gaussian Process regression to recommend
configuration. Not only this technique is computationally less expensive but also
powerful as a neural network which is desired for on-line tuning.
Negative Points -
[1] Ottertune keeps the list of the knobs that can be
changed dynamically and doesn’t require a restart. It mentions to change only
dynamic knobs if restarting the DBMS is costly or undesirable. As the knobs may
be highly correlated, it fails to take into account the consequences of tuning
only few knobs while keeping other unchanged which could potentially decrease
the performance.
[2] Paper proposes a solution to tune database that should
not have any heuristic but to group the similar knobs together, it uses K-Means
and assumes a heuristic that K = 40 will do a good job.
[3] Sometimes, OtterTune produces absurd values for some knobs
while trying to optimize the target metrics as it fails to consider the
negative effect of such recommendations.
- Saurabh Gupta
- Saurabh Gupta
For negative point 1, I believe it does try to learn knob dependencies. It is true that K-means does require K, but they point to a heuristic (not sure how good it is, though). I'm not sure of your evidence for point #3? I appreciate your review!
ReplyDeleteHello Prof Jon,
DeletePoint 3 -
It's mentioned in the paper for the knobs like "log files written
between checkpoints" and memory knob. As per paper DBA, the tuning
script, and AWS set the log knob value between 16 and 64 while OtterTune sets it to 540. This happens because it tries to improve I/O performance which causes the "recovery time of the DBMS after a crash" to increase. Another example is related to memory provision where OtterTune over provisions memory to 18 GB while RDS and DBA suggests a value between 7-10 GB. This happens as it tries to optimize the given metrics at any cost.
Point 1 -
Yes, you are right about learning knob dependencies. Paper mentions that we can do polynomial lasso regression to learn the knob dependencies but it doesn't mention anything about the dependencies between different types of knobs. The dependencies could be between a dynamic knob and a knob that requires restart. As it's not practical to restart DB (or no root access) for every tuning session and if we only change recommended dynamic knobs then it could potentially degrade the performance because of the correlation.
Point 2-
Paper mentions that K=40 is not guaranteed to find the optimal solution and they have estimated the value of K using [1] which works for well-separated cluster. So, I am not sure how they chose the value and what approximation they used for the clusters' characteristics.
[1] https://statweb.stanford.edu/~gwalther/gap