A Comparative Study of Deep Learning and Machine Learning Models for Malware Detection
Location
Hager-Lubbers Exhibition Hall
Description
PURPOSE: Artificial intelligence has since become an area of interest for many professionals, with an emphasis on how it can help cybersecurity practitioners diffuse these threats. Machine learning and deep learning are subsets of artificial intelligence that have been leveraged for this specific task. This project delves into both worlds to provide a comparative analysis of these methods. METHODS AND MATERIALS: In this study, we propose to build data mining pipelines that demonstrate the efficacy of binary classifiers in detecting malware using traditional machine learning and deep learning. ANALYSES: One–way ANOVA test with 0.05 significance gave a P-value of close to zero, which demonstrated significant differences in the model accuracies. We went ahead and compared the model's performance using other metrics like F1 score, precision, and recall. RESULTS In our traditional machine learning experiments, all our algorithms exhibited promising detection results of nearly 99%. The best-performing model here was the Gradient Boosting classifier, with an accuracy of 99.73%. Cross-validation experiments determined XGBOOST to be the best-performing algorithm with an Accuracy of 99.05%. In our deep learning experiments, the CNN ensemble gave the best performance with an accuracy of 99.13. Cross-validation experiments marginally improved the CNN accuracy to 99.15%. CONCLUSIONS: These results demonstrate that the proposed binary classification scheme is effective in malware detection.
A Comparative Study of Deep Learning and Machine Learning Models for Malware Detection
Hager-Lubbers Exhibition Hall
PURPOSE: Artificial intelligence has since become an area of interest for many professionals, with an emphasis on how it can help cybersecurity practitioners diffuse these threats. Machine learning and deep learning are subsets of artificial intelligence that have been leveraged for this specific task. This project delves into both worlds to provide a comparative analysis of these methods. METHODS AND MATERIALS: In this study, we propose to build data mining pipelines that demonstrate the efficacy of binary classifiers in detecting malware using traditional machine learning and deep learning. ANALYSES: One–way ANOVA test with 0.05 significance gave a P-value of close to zero, which demonstrated significant differences in the model accuracies. We went ahead and compared the model's performance using other metrics like F1 score, precision, and recall. RESULTS In our traditional machine learning experiments, all our algorithms exhibited promising detection results of nearly 99%. The best-performing model here was the Gradient Boosting classifier, with an accuracy of 99.73%. Cross-validation experiments determined XGBOOST to be the best-performing algorithm with an Accuracy of 99.05%. In our deep learning experiments, the CNN ensemble gave the best performance with an accuracy of 99.13. Cross-validation experiments marginally improved the CNN accuracy to 99.15%. CONCLUSIONS: These results demonstrate that the proposed binary classification scheme is effective in malware detection.