Your app has a malicious app id: Malicious Android package name detection using recurrent neural networks.

William Lee , Senior Researcher , Sophos

Joshua Saxe , Chief Data Scientist , Sophos

Malicious Android applications are generated by attackers using systems that automatically rewrite and obfuscate malware to avoid detection. Those malware often have randomly generated application ids or package names and they are also signed with randomly generated certificates. Typical machine learning systems based on manual feature engineering or signature based detection systems utilizing the random strings are not sufficiently robust to the problems. Deep learning promises to change this by automating feature extraction and learning insights from data patterns. Furthermore, recurrent neural networks are a class of artificial neural network designed to recognize patterns in sequences of data, such as text or speech data.

In this paper, we propose a malware classification model based on recurrent neural networks, our model can learn generalizations about malicious string patterns from app’s package name and certificate owner info. The model learns to extract features and classify malware using embedding and LSTM (Long Short Term Memory) layers. We demonstrate our model can detect new, unseen malware families efficiently without extensive model retraining and its trained model can be deployed into an Android device. We also describe related work, our model’s architecture, evaluation results and future work.

William Lee

William Lee is a senior researcher at Sophos. Prior to joining Sophos, he developed malware classification systems at Symantec and mobile platforms at Samsung Electronics. He has presented on mobile malware topics at Virus Bulletin and CARO conferences. His research interests include malware classification and clustering models based on deep neural networks.

Joshua Saxe

Joshua Saxe is Chief Data Scientist at Sophos, where he and his team focus on developing breakthrough security data science technologies. Highlights of his work have included leading research to develop neural networks for detecting malicious PE, URL and HTML content, developing a crowd data driven malware reverse engineering system, and leading research to discover malware genealogical relationships at scale. He is currently working on a book for No Starch Press on security data science.