Automatic Database Management System Tuning Through Large-scale Machine Learning


Authors - Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, Bohan Zhang

Motivation –

Most of the work to improve database efficiency pivot around logical or physical design of database. This paper focuses on tuning of low-level configuration knobs of Database Management System (DBMS) like memory cache than indexing. Database configuration tuning involves trying out various combinations/values of knobs. This is an NP- hard problem. Database giants like IBM, Oracle and BerkeleyDB tried to improve the efficiency of the databases using heuristics or influence diagrams which is a manual process and requires domain experts. This paper talks about OtterTune - an ML based tool which provides optimal values of DBMS knobs to improve the target performance measures like latency and throughput for a defined load.

Positive Points -

[1] OtterTune works well on any pre-listed DBMS types. It recommends knob settings for a new application using the data stored in its repository. It modifies the best match from repository to get the knobs for target measures. It also uses feedback provided for the recommendation to refine its model.

[2] Ottertune is robust in terms of redundant knobs (different names or strongly correlated knobs), granularities of knobs and knowledge of bad or good knobs as it only considers global knobs. It uses factor analysis and k-means for pruning to remove useless metrics.

[3] It uses DBMS’s internal runtime metrics to train the model as these metrics are directly affected by the knobs’ settings. It used linear regression technique named Lasso path algorithm to select impactful DBMS knobs while also considering polynomial features making the tuning more efficient by capturing the dependency and also reducing the search space.

[4] OtterTune uses Gaussian Process regression to recommend configuration. Not only this technique is computationally less expensive but also powerful as a neural network which is desired for on-line tuning.

Negative Points -

[1] Ottertune keeps the list of the knobs that can be changed dynamically and doesn’t require a restart. It mentions to change only dynamic knobs if restarting the DBMS is costly or undesirable. As the knobs may be highly correlated, it fails to take into account the consequences of tuning only few knobs while keeping other unchanged which could potentially decrease the performance.

[2] Paper proposes a solution to tune database that should not have any heuristic but to group the similar knobs together, it uses K-Means and assumes a heuristic that K = 40 will do a good job.

[3] Sometimes, OtterTune produces absurd values for some knobs while trying to optimize the target metrics as it fails to consider the negative effect of such recommendations.

- Saurabh Gupta


Comments

  1. For negative point 1, I believe it does try to learn knob dependencies. It is true that K-means does require K, but they point to a heuristic (not sure how good it is, though). I'm not sure of your evidence for point #3? I appreciate your review!

    ReplyDelete
    Replies
    1. Hello Prof Jon,

      Point 3 -
      It's mentioned in the paper for the knobs like "log files written
      between checkpoints" and memory knob. As per paper DBA, the tuning
      script, and AWS set the log knob value between 16 and 64 while OtterTune sets it to 540. This happens because it tries to improve I/O performance which causes the "recovery time of the DBMS after a crash" to increase. Another example is related to memory provision where OtterTune over provisions memory to 18 GB while RDS and DBA suggests a value between 7-10 GB. This happens as it tries to optimize the given metrics at any cost.
      Point 1 -
      Yes, you are right about learning knob dependencies. Paper mentions that we can do polynomial lasso regression to learn the knob dependencies but it doesn't mention anything about the dependencies between different types of knobs. The dependencies could be between a dynamic knob and a knob that requires restart. As it's not practical to restart DB (or no root access) for every tuning session and if we only change recommended dynamic knobs then it could potentially degrade the performance because of the correlation.
      Point 2-
      Paper mentions that K=40 is not guaranteed to find the optimal solution and they have estimated the value of K using [1] which works for well-separated cluster. So, I am not sure how they chose the value and what approximation they used for the clusters' characteristics.

      [1] https://statweb.stanford.edu/~gwalther/gap

      Delete

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning