CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning

March 11, 2019

Author:

Stephen Zekany, Daniel Rings, Nathan Harada, Michael A. Laurenzano, Lingjia Tang, Jason Mars

Intro:

Understanding the runtime behavior of the software is critical in many aspects of program development. A conventional approach to this problem is using dynamic profiling, meaning we have to run the program multiple times with environments that representative enough for the actual working situation. Besides that, a dynamic profiling approach has other weakness such as cannot profiling a subset of functions or paths very economically and requires re-profiling each time any changes is applied to the code. This paper presents a statically profiling method, harnessing the ability of RNN to discover the hidden information of hot path in the code. Also, since the training process uses intermediate representation as input, the result will have better generalization power.

Methodology:

The dataset is in the form of a basic block feature; each sequence of basic blocks represents a path. All hot paths are selected, while a portion of cold paths is included. The model is trained on a two-layer LSTM network. The input layer takes a block per timestamp and forward the input to the first layer of LSTM, which compute the results and send to the second layer of LSTM, and a softmax layer map the results from second LSTM layer to a interval of 0 to 1, representing the likelihood of the input path being hot.

Critique:

1. AUROC is not a good measurement.
This paper discussed several metrics and claimed they perform poorly in the situation of highly skewed data. Unfortunately, the measure they choose is also not a good option here. The advantage of the ROC curve is that it can show the performance of an algorithm in the different thresholds. There are essential parts of a ROC curve, the area under the curve and the distribution. Only having AUROC is not enough to say that one algorithm is better than the other, especially when the score is close (0.02 differences). e.g. based on the graph below, we cannot say which method is better without pointing out what situation we are dealing with, even curve A has higher AUC.

2. The comparison to prior work is not convincing enough.
In the evaluation part, the authors built a model based on the prior work form Buse and Weimer's static path classifier. It mentioned that the original model extracts features from Java codes and since this paper use Cpp as the benchmark, their implementation has to remove some features that are specific to Java source code. I would argue that this might weaken the power of Buse and Weimer's model. Besides that, a promise about language independence have been made, so theoretically the CrystalBall model should also be able to apply to Java IR pretty easily. So testing the model on Java benchmark would demonstrate the model are language independent and not potentially weaken the baseline model, making the result more convincing. But the authors choose not to do that.
3. Lack of evidence for language independence.
It seems to me the language independence is a pretty big promise from the authors; however, throughout the entire paper, there is a little discussion about this property. The model is tested on benchmarks of three languages, but the majority of codes are written in C and Cpp, which are very similar to each other. The authors might need to provide more evidence to prove the independence property.

--Yifan Hu

Comments

jon_weissmanMarch 12, 2019 at 6:35 AM
The method is language-independent, but you wish to see that the results were also language independent. Fair point. I'm not that familiar with the AUROC metric, perhaps we can discuss its pros and cons in class.
ReplyDelete
Replies

Add comment

Search This Blog

CSci 8980 Machine Learning in Computer Systems