Detecting PDF Javascript Malware Using Clone Detection

Thumbnail Image
Karademir, Saruhan
NiCad , Clone Detection , PDF , Security , Acrobat , Malware
One common vector of malware is JavaScript in Adobe Acrobat (PDF) files. In this thesis, we investigate using near-miss clone detectors to find this malware. We start by collecting a set of PDF files containing JavaScript malware and a set with clean JavaScript from the VirusTotal repository. We use the NiCad clone detector to find the classes of clones in a small subset of the malicious PDF files. We evaluate how clone classes can be used to find similar malicious files in the rest of the malicious collection while avoiding files in the benign collection. Our results show that a 10% subset training set produced 75% detection of previously known malware with 0% false positives. We also used the NiCad as a pattern matcher for reflexive calls common in JavaScript malware. Our results show a 57% detection of malicious collection with no false positives. When the two experiments’ results are combined, the total coverage of malware rises to 85% and maintains 100% precision. The results are heavily affected by the third-party PDF to JavaScript extractor used. When only successfully extracted PDFs are considered, recall increases to 99% and precision remains at 100%.
External DOI