Deep Defense Against Mal-Doc: Utilizing Transformer and SeqGAN for Detecting and Classifying Document Type Malware
DOI:
10.3390/app15062978
Publication Date:
2025-03-10T12:46:41Z
AUTHORS (6)
ABSTRACT
The prevalence of non-executable malware is on the rise, presenting a major threat to users, including major public institutions and corporations. While extensive research has been conducted on detecting malware threats, there is a noticeable gap in studying document-type malware compared with executable files. The proposed model will solve this gap by detecting and classifying document-type malware families using script codes, including tags, to write documents and script languages to execute malicious functions. These script codes offer insights into how the malware was constructed and operates on the victim’s system. Additionally, we leverage language models in our approach. Initially, we develop MalCode2Vec to learn associations between source codes and represent them as numeric vectors. Subsequently, we design a Transformer-based model for document malware detection and family classification. Detection is conducted at both the stream and file levels. To solve the class imbalance issue in the malware family, we utilize a generative adversarial network to generate malware samples. Our experimental domain focuses on the Hangul (Korean) word processor, a tool notably used by North Korea in targeting the South Korean government.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (39)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....