APHMM: Advanced Profile Hidden Markov Model
The rapid growth of malicious software (malware) production in recent decades and the increasing number of threats posed by malware to network environments, such as the Internet and intelligent environments, emphasize the need for more research on the security of computer networks in information security and digital forensics. The method presented in this study identifies “species” of malware families, which are more sophisticated, obfuscated, and structurally diverse. We propose a hybrid technique combining aspects of signature detection with machine learning based methods to classify malware families. The method is carried out by utilizing Profile Hidden Markov Models (PHMMs) on the behavioral characteristics of malware species. This project explains the process of modeling and training an advanced PHMM using sequences obtained from the extraction of each malware family’s paramount features, and the canonical sequences created in the process of Multiple Sequence Alignment (MSA) production. Due to the fact that not all parts of a file are malicious, the goal is to distinguish the malicious portions from the benign ones and place more emphasis on them in order to increase the likelihood of malware detection by having the least impact from the benign portions. Based on “consensus sequences”, the experimental results show that our proposed approach outperforms other HMM-based techniques even when limited training data is available.