StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware
Introduction:
The paper approaches the problem of Malware Detection using machine learning approach. In this paper, the authors propose considering new type of features along with two traditionally used set of features for developing techniques for malware detection. Along with this proposal, the paper makes other contributions as well. The authors provide a new framework which is able to process information from large data sets as streams, and efficiently perform malware detection.
Motivation:
Android is an open platform for development. It has certainly made easy for everyone to develop and publish apps. The number of apps published on Google store is in millions. However, it has also made easier for hackers to reach out to a larger population for malicious attacks. Additionally, with surge in the number of attacks on devices, more data is available for analysis which can be used by the machine learning models to identify malicious apps. This is motivation to use machine learning based approach for malware detection. The authors claim that present static and dynamic analysis techniques may not be computationally efficient, and this has motivated them to develop a distributed stream processing application which is more efficient than existing solutions.
Research Questions Addressed:
The authors have provided machine learning based approaches for static and dynamic analysis to identify malicious android apps. Traditional techniques are claimed to be inefficient in terms of resource consumption of the Android OS. The new framework is proposed to address this problem of inefficiency.
Procedure:
The StormDroid framework is a framework for Android malware detection. It is a stream processing application. Making it a streaming application makes it possible to divide the process in 3 phases. The 3 phases are Preamble, Feature Extraction and Classification. In the Preamble phase the resource files are extracted for feature extraction. In the feature extraction phase, it extracts features for a particular app based on combined set of contributed features. In the last phase, it trains classifiers using large sets of labeled Android applications and further classifies them as benign and malicious. Dividing this process in 3 phases, makes it possible to process different applications in different phases simultaneously.
Number of features selected per category:
Permission: 59
Sensitive API calls: 90
Sequence: 1
Dynamic Behavior: 12
Feature Extraction:
Significant portion of the paper consists of explanation about the features selected for the detection process. The features selected are of two types. One is set of traditionally used features for malware detection which are of the type: Permissions and Sensitive API calls. They have selected a set of certain permissions requested by Android applications and Sensitive API calls as predictors. As per my understanding, all these features have binary value, 1 if the app requests the permission or makes the call or 0 otherwise. They then define two new types of features: Sequence and Dynamic Behavior. These are again binary value features. Sequence actually has numeric value. It is computed based on the number of times a particular sensitive API call is made by a malicious app and benign app. If the numeric value crosses threshold of 0.4 then the feature takes value of 1 otherwise 0. Dynamic behavior feature values are extracted by observing the dynamic behavior of the application. The application is run in a DroidBox and then analyze the saved log files. Permissions requested, Sensitive API calls and Sequence are static analysis features whereas Dynamic behavior features are part of dynamic analysis.Number of features selected per category:
Permission: 59
Sensitive API calls: 90
Sequence: 1
Dynamic Behavior: 12
Experiment:
Once these features are extracted they feed the vector representing an application for training and classification. They try to corroborate their argument of using two new user-defined features by comparing the accuracy of model developed using the traditional(permissions and API calls) and new features(sequence and dynamic behaviors) with a model developed using only the traditional features. They compare the accuracy using different machine learning techniques.Evaluation:
Positive Points:
- Traditional features just indicate the permissions requested and sensitive API calls made by the malicious app. However, it is also important to capture the number of times and the sensitive API calls are made by an application. It is also important to capture the dynamic behavior of the app. This is incorporated by incorporating 2 new feature types. The new features are binary features as well. Which doesn't add any complexity in incorporating new features for existing detection techniques.
- The authors have presented a thorough process of selecting different features. They have mentioned all the features they observed and selection criteria for highly correlating features with malicious apps.
- The feature selection process also considers features which correlate poorly with benign apps. This helps the models in learning the distinction between benign and malicious apps.
- Introduction of new streamline framework: The streamline framework enables us to achieve concurrency in processing large data sets of App information. This is a direct application of production level system. It offers a very practical solution to perform malware detection at large scale.
- For evaluation they have used multiple machine learning techniques. The simplicity of the features extracted enables us to try different machine learning techniques. This highlights the flexibility of their system.
Negative Points:
- They have mentioned that the StormDroid is important contribution of the paper. However they have tried to explain the architecture in a short paragraph. It would have been helpful to read more about the model implementation. They should have mentioned if the training process was stochastic or performed using the entire batch. Because this is generally central part of any machine learning model.
- In the Experiment Results section 4.3(ii) they mention that they try to overcome over fitting they randomly sample 1000 malicious applications and scan them using StormDroid and other malware detection tools. I am not sure on how selecting one class of objects (in this case malicious apps) and evaluating the accuracy helps in reducing over-fitting. High accuracy in this case can also be because of over-fitting.
- The feature extraction process depends heavily on the static analysis performed after de-compiling java applications. In my opinion this procedure of using Smali files is limited to Java applications. If some applications are written in C++, then I guess it would be very difficult to de-compile the compiled apps to Smali files. I guess it's not possible. This poses a serious limitation.
- They evaluate the run-time performance of Storm Droid against a single threaded application run on single node. However, it might happen that conventional malware detection technique can perform equally good or even better if deployed on a distributed framework.
Conclusion:
Overall, I found that the paper was not so dense to read. They mostly focused on importance of new feature set and how it increases the accuracy of Malware Detection process. And they have supported this argument by using it for different machine learning techniques.
- Hrushikesh Nimkar
Very nice analysis. I agree that it would have been nice to see the system details about the framework. The overfitting issue is a problem particularly since the features are hand-engineered.
ReplyDelete