Assessing the Impact of Batch-Based Data Aggregation Techniques for Feature Engineering on Machine Learning-Based Network IDSs

  1. Magán-Carrión, Roberto
  2. Urda, Daniel
  3. Díaz-Cano, Ignacio
  4. Dorronsoro, Bernabé
  1. 1 Universidad de Granada

    Universidad de Granada

    Granada, España


  2. 2 Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Av. Cantabria s/n, 09006, Burgos, Spain
  3. 3 Applied Robotics Group, Department of Automatic, Electronic, Computer Architecture and Com. Net. Engineering, University of Cádiz, Cádiz, Spain
  4. 4 Graphical Methods, Optimization and Learning (GOAL) Group, Department of Computer Engineering, University of Cádiz, Cádiz, Spain
14th International Conference on Computational Intelligence in Security for Information Systems and 12th International Conference on European Transnational Educational (CISIS 2021 and ICEUTE 2021)

ISSN: 2194-5357 2194-5365

ISBN: 9783030878719 9783030878726

Year of publication: 2021

Pages: 116-125

Congress: 14th International Conference on Computational Intelligence in Security for Information Systems and 12th International Conference on European Transnational Educational (CISIS 2021 and ICEUTE 2021)

Type: Conference paper

DOI: 10.1007/978-3-030-87872-6_12 GOOGLE SCHOLAR lock_openOpen access editor


Communication networks and systems are continuously threatened by a great variety of cybersecurity attacks coming from new malware that targets old and new systems’ vulnerabilities. In this sense, Intrusion Detection Systems (IDSs) and, specifically, Network IDSs (NIDSs) are used to count on robust methods and techniques to detect and classify security attacks. One of the important parts in the assessment of NIDSs, is the Feature Engineering (FE) process, where raw datasets are transformed onto derived ones where both, features and observations are smartly transformed. In this work, the ff4ml framework, which includes the Feature as a Counter (FaaC) FE approach, is used to transform raw features into new ones that are counters of the originals. The FaaC approach aggregates raw observations by time intervals, thus limiting its use to network datasets containing timestamps. This work proposes a batch-based aggregation technique that allows applying FaaC in timestamp-less datasets and analyzes its impact on the performance of Machine Learning (ML)-based NIDSs in comparison to timestamp-based aggregation approaches.

Bibliographic References

  • Ali, R., Ali, A., Iqbal, F., Khattak, A.M., Aleem, S.: A systematic review of artificial intelligence and machine learning techniques for cyber security. In: Tian, Y., Ma, T., Khan, M.K. (eds.) ICBDS 2019. CCIS, vol. 1210, pp. 584–593. Springer, Singapore (2020).
  • Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2014)
  • Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York Inc., Information Science and Statistics, Berlin (2006)
  • Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
  • Camacho, J., García-Giménez, J.M., Fuentes-García, N.M., Maciá-Fernández, G.: Multivariate Big Data Analysis for intrusion detection: 5 steps from the haystack to the needle. Comput. Secur. 87, 1–11 (2019)
  • Camacho, J., Pérez-Villegas, A., García-Teodoro, P., Maciá-Fernández, G.: PCA-based multivariate statistical network monitoring for anomaly detection. Comput. Secur. 59, 118–137 (2016)
  • ENISA: ENISA Threat Landscape Report (2020). Accessed 9 June 2020
  • Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., Therón, R.: UGR’16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 73, 411–424 (2018)
  • Magán-Carrión, R., Urda, D., Diaz-Cano, I., Dorronsoro, B.: Towards a reliable comparison and evaluation of network intrusion detection systems based on machine learning approaches. Appl. Sci. 10(5), 1775 (2020)
  • Stapor, K., Ksieniewicz, P., García, S., Woźniak, M.: How to design the fair experimental classifier evaluation. Appl. Soft Comput. 104, 107219 (2021)
  • Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6 (2009)
  • Wiafe, I., Koranteng, F.N., Obeng, E.N., Assyne, N., Wiafe, A., Gulliver, S.R.: Artificial intelligence for cybersecurity: a systematic mapping of literature. IEEE Access 8, 146598–146612 (2020)