The Case for Learned Index Structures

Authors:
Tim Kraska; Alex Beutel; Ed H. Chi; Jeffrey Dean; Neoklis Polyzotis

Name:
Kayala Sai Kumar.

Motivation:
With rapid growth of data, need for fast and efficient database access has become indispensable.This paper proposes a machine learning approach that can potentially replace popular indexing data structures like B-trees.

Positive Points:

(1) Previous learned indexes used a neural network as a hash function for indexing [1,2,3] while the proposed algorithm uses neural network as a continuous function for describing the pattern in the data (CDF) to index.

(2) The mixture of experts is proposed for learning substantially large databases since a neural network capacity to learn more data is limited by its parameters. This paper leverages this state-of-the-art technique to reduce the error by sub-setting the database and reducing the overall complexity in terms of computation which is an interesting approach.

(3) Minimizing the min error of the recursive model by building a B-tree (has a min err=0) on top of that can be seen as a virtual B-tree with very fast look up time and efficient memory storage.

(4) Size of the B-Tree index limits the data that can reside in cache memory for large database when compared to the recursive model that utilize less than 3MB which  can be stored in cache reducing cache faults and increasing cache efficiency (without considering the GPU latency).


Negative Points:

(1) Despite all the advantages discussed in the paper for the recursive model, it further increases hyper-parameters like number of stages and number of models in each stage. This increase in number of hyper-parameters makes the hyper-parameter tuning difficult since training this network is very expensive.

(2) The proposed method do not support any insertions, deletions or updates, since any change in database would require re-training of the model which is not desired considering the fact that training even a single neural network is still not possible within a few cycles with the available computing power.

Comments

  1. Good discussion of negative points. I also think complexity is an issue.

    ReplyDelete

Post a Comment

Popular posts from this blog

A Machine Learning Approach to Live Migration Modeling

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning