A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors
Authors:
Paul-Jules Micolet, Aaron Smith, Christophe Dubach
Motivation:
To determine how to partition and group the cores in a Dynamic Multicore Processors (DMPs) among application threads, this paper first studies the effect of thread partitioning and hardware resource allocation and then introduces a machine-learning methodology. The experiments are operated on StreamIt applications due to its parallelism feature.
Main points:
1. The best performance this paper talks about is a statistical result.
2. This paper analyses the effect of thread partitioning, core composition and the combination of both optimizations. The results show that thread partition and core composition are almost independent and the combination of these two optimizations has a better performance in the 15 benchmarks test.
3. The paper also studies the effect of loop transformation. The experiment results show that loop unrolling can help to speed up the performance but limited by the trade-off between ILT and TLP.
4. The paper uses KNN model to determine the number of threads for the application and linear regression model to determine the number of cores. The features are extracted from the static StreamIt code. Averagely, this machine learning approach is 16% less than the best in performance.
Trade-offs:
1. The features are extracted from the SteamIt built-in function which may miss some other impactful features.
2. That the StreamIt programs don’t have much conditional statements is a shortage.
3. The KNN method needs to find the best K. The result shows its accuracy is not as good as the combination. Will another method be more effective than KNN? Or determining the number of threads is not as important as the number of cores?
Besides, I don’t understand Table 2:
The paper says each column of this table presents the thread and the number in each cell presents the number of cores in this thread. However, according to the text, the best performance has 13 threads but the maximum number of the thread is 10. The machine learning approach has 8 threads while 7 in the table.
Simin Wang
It is true that there was some notable inaccuracy though the final result looked good. For point #2, conditional dataflow would be interesting though static pipelines are fairly common. I agree that Table 2 is very confusing!
ReplyDelete