Mitigating the Compiler Optimization Phase-Ordering Problem using Machine Learning.

TL;DR: There are a lot of compiler optimizations to choose from in the modern compilers. Order in which these optimization are applied greatly affects the performance. In this paper they model the phase ordering and optimization technique as a Markov process to understand the context and ANNs to predict and apply a best technique at a given instant in the optimization process for optimization. However, instant at which these optimizations are applied greatly affects the performance of the overall application as each of the optimizations interacts with the code and other optimization techniques.

Brief introduction As mentioned above, modern compilers offer many options

I believe this is one of the sophisticated modeling technique which may not have been necessary in the current status of DL advancement(esp. time series and state space modelling). The interplay between the optimizations applied and the optimization to apply is modeled as a Markov process. However, the scenario gets even tougher when the application is running alongside compilation like in the JIT compilation paradigm. In a gist the the ANN predicts the next optimization to be applied based on the current optimized state of the method. This stateful behavior exactly fits into the culture of Markov process.

What already exists, whats the need for this Project?

Most compilers apply the optimizations based on a fixed order which is supposedly going work best for most scenarios. This decision comes from the design experts while compiler is being developed.
The comparison of the proposed technique is made with the Genetic algorithm, which makes real-time decision regarding what optimizations to choose on a dynamically compiled program. They seem to perform well where one optimization technique yield majority of performance improvement. This boils down to the fact that the interplay between the optimization techniques is not taken into account by this algorithm. Author show that this maybe a suboptimal algorithm due to:
- Search space is large as there are no hints to reduce the same. In other words, there is no knowledge transfer.
- An order of optimization specific to each piece of code requires a separate exploration and also profiling of such tasks using fine grained timers could add more noisy information to the system.

Highlights

Significant number of variables and signals are looked for modelling. Exhaustive ANN selection procedure so that optimal ANN is selected to predict the optimization sequence. This is one of the reasons why offline training fits the scenario very well.
The authors have made sure that the generations of ANNs are such that the footprint of each generated model is small.
For many of the benchmarks tried, the optimization sequences were short. This is an important metric that shows that the proposed technique aims striking balance both between performance and time taken for optimization procedure to dampen.
In the paper authors dissect performance of each of benchmark and as discuss as to why one optimization was chosen by the neural network over the other.
Unlike the GA method, which optimizes for speed where 60% of the running time is due less than 3 methods. The proposed method really shines here as it speeds up applications where more methods contribute for the comparable run time. In other words what the authors are trying to communicate here is that GA tries to capture the low hanging fruits and fails to optimize for speed up as diverse functions contribute sizeable portions towards run time than just a few.
Performance wise, GA per benchmark method works better than proposed technique, but the authors conjecture that it takes a large amount of time for training the models. Also, the training time is directly linked to the benchmark that is being run.
The authors conjecture that this a very generic process that can be extended to static compilation not just JIT.

Critique

Authors compare their method with GA algorithm for almost everything normalized over O3 optimization. GA in itself is a beefy algorithm, I'm not sure if this sufficient to conclude the effectiveness of the algorithm. It makes me think if reward based training make sense here. Also, I'm not a domain expert as far as dynamic compilation goes, so or so, my analysis maybe premature.
Could have factored in for system errors/noise while modelling the network. For instance accounting for timer errors.
I feel policy agent based modelling could yield better results coupled with LSTMS to account for interplay between sequence of optimizations applied. I believe this can reduce the training time drastically.
The discussion about stopping condition for optimization is a little vague. It is told that one of the optimizations at a particular instant in the optimization timeline corresponds to a stopping optimization. If that optimization is chosen no further optimizations would be applied. There could have been discussion about the distribution of this set of optimization technique that lead to a halt.
Again, the authors say that they have chosen a set a of 60 ANNs per generation and best 10 are chosen, and the mutation process(artificial generation of ANNs) continues with certain fixed probabilities. However, it is not clear as to what lead the authors use such a paradigm and what is basis for the chosen probabilities and their limitations.

--Shashank Hegde

Comments

jon_weissmanMarch 5, 2019 at 6:33 AM
Very thorough discussion, thank you. Comparison to RL or LSTM/RNN could be a natural next step. I agree that while the selection of the NN was interesting, there was not much discussion about it.
ReplyDelete
Replies

Add comment

Search This Blog

CSci 8980 Machine Learning in Computer Systems