A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors
Authors: Paul-Jules Micolet, Aaron Smith, Christophe
Dubach
Motivation:
There exists no accurate as well as automated solution for
optimizing the performance of streaming applications at both hardware level and
software level. This paper presents a machine learning technique to get near
optimal performance using software level threads and hardware level cores. It analyzes
the effect of number of threads and number of cores on an application’s
performance and attempts to get optimal performance by determining appropriate
number of threads and number of cores of a Dynamic Multicore Processor using
static code features of the application.
Positive
Points:
[1] Paper analyzes the impact of thread partitioning on
performance of various StreamIt applications for individual cores and composed
cores. It concludes that the performance of an application follows the same
trend and does not depend on the composition of cores. Hence, it reduces the
problem of finding set of threads and core composition to two separate problems
that can be modeled individually.
[2] It uses KNN model to get optimum number of threads and uses
linear regression to get optimal core composition for a thread. It extracts
over 50 features from StreamIt applications and only uses features which are
highly correlated to optimal number of threads making the model simple. It also
synthesizes benchmarks to get sufficient amount of data points for training.
[3] It unrolls loops in the program code which facilitates Thread Level
Parallelism (TLP) and Instruction Level Parallelism (ILP) to estimate the optimum
partitioning of the program by statically analyzing the code.
Negative
Points:
[1] Paper considers variables which are highly correlated with the
optimal number of threads but only considers the positive correlation and doesn’t
consider negative correlation for KNN model. It also assumes that the features have
similar range and hence it uses Euclidean distance for KNN without normalizing features.
[2] Model determines optimal thread number with 33% accuracy which
improves to 67% for two threads away from optimal solution which is 10-20%
below optimum performance.
[3] As a future work, these methods could be enhanced to take into
account running multiple applications and optimizing overall number of threads
and core composition using preemptive scheduling.
[4] It uses default partitioning scheme of StreamIt compiler. This partitioning could be improved using [1] which could lead to better overall performance.
[5] Will the method work if the rates of the pipes in programs are
dynamic? Do we need runtime analysis of the program?
- Saurabh Gupta
Good analysis. Your correct that the paper doesn't consider dynamic stream rates; actually very little analysis of the applications themselves is presented. A question is for what type of streaming applications is this methodology likely to be effective. Also, it seemed that feature extraction was a manual process. Would the same features be predictive for other applications? How general is the StreamIt application suite.
ReplyDelete