← Back to Research
A Comparative Benchmark of Machine Learning Models for Static Malware Analysis: From EMBER 2018 to the Challenges of 2024
David-Cristian Horvath and Imre Zsigmond
Proceedings of the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026)
Abstract
The growth of malware poses a persistent threat to digital infrastructure, requiring the development of efficient detection systems. While machine learning (ML) is key to modern static analysis, the practical trade-offs between models are not always well-documented. This paper addresses this gap by presenting a benchmark of six ML models on the EMBER 2018 dataset, including three Gradient Boosted Decision Tree (GBDT) implementations (LightGBM, XGBoost, CatBoost), a Random Forest, an Extra Trees classifier, and a Multilayer Perceptron (MLP). We extend this analysis by performing hyperparameter optimization, revealing performance improvements. Our evaluation moves beyond standard classification metrics to include efficiency criteria: inference time, training time, and model size. To investigate the impact of concept drift, we conduct a comparative analysis using the newly released EMBER2024 dataset. By training models on its modern v3 feature set and evaluating them on both the standard test set and the specialized "challenge set" of initially-undetected malware, we quantify the increased difficulty of the detection landscape. Our findings confirm the state-of-the-art performance of GBDT models but also highlight the performance degradation against evasive threats, providing a guide for researchers on the balance between predictive accuracy, computational cost, and model transparency in malware detection scenarios.
@conference{icaart26,
author={David{-}Cristian Horvath and Imre Zsigmond},
title={A Comparative Benchmark of Machine Learning Models for Static Malware Analysis: From EMBER 2018 to the Challenges of 2024},
booktitle={Proceedings of the 18th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2026},
pages={2284-2291},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0014218700004052},
isbn={978-989-758-796-2},
issn={2184-433X},
}