Procesamiento de datos heterogéneos en el internet de las cosas

  1. Corral Plaza, David José
Dirigida por:
  1. Inmaculada Medina Bulo Directora
  2. Guadalupe Ortiz Bellot Codirectora

Universidad de defensa: Universidad de Cádiz

Fecha de defensa: 08 de marzo de 2021

Tribunal:
  1. Gregorio Diaz Descalzo Presidente/a
  2. Alfonso García de Prado Fontela Secretario
  3. Cesare Pautasso Vocal
Departamento:
  1. Ingeniería Informática

Tipo: Tesis

Teseo: 648759 DIALNET

Resumen

Day after day the number of Internet of Things (IoT) and smart devices capable of producing, consuming, and exchanging information increases considerably. In most cases, the structure of the information produced by such devices is completely different, therefore providing heterogeneous information. This fact is becoming a challenge for researchers working on IoT, who need to perform homogenisation and pre-processing tasks before using the IoT data in their analytics. Moreover, the volume of these heterogeneous data sources is usually huge, thus leading us to the Big Data term, which relies on the three V’s: Velocity, Volume, and Variety. Being able to work with these large and heterogeneous datasets, performing specific domain analytics, and reacting in real time to situations of interests, would result in a big competitive advantage. Hence, there is a need of being able to operate with these heterogeneous data, to consume, to process, and to analyse them. In this context, Data Serialization Systems (DSS), Stream Processing (SP) platforms, and Complex Event Processing (CEP) are postulated as potential tools that will help developers to overcome these challenges previously commented. Firstly, DSS allows us to transmit and transport data quickly and effectively thanks to their serialization strategies. Secondly, SP platforms bring the possibility of establishing architectures capable of consuming, processing, and transforming vast amounts of data in real time. Finally, CEP is a well-established technology that facilitates the analytics of streams of data, detecting and notifying about anomalies in real time. At the same time, these advantageous tools require years of training to be able to dominate and use them efficiently and effectively. So, providing these technologies to domain experts, users who are experts on the domain itself but usually lack computer science or programming skills, is a must. This is where Model-Driven Development (MDD) comes up. MDD is a paradigm in software development that facilitates users the usage of complex technologies, due to it abstracts the user from the implementation details and allows them to focus on defining the problem directly. Therefore, in this PhD thesis, we aim to solve these issues. On the first hand, we have developed an architecture for processing and analysing data coming from heterogeneous sources with different structures in IoT scopes, allowing researchers to focus on data analysis, without having to worry about the structures of the data that are going to be processed. This architecture combines the real-time SP paradigm and DSS for information processing and transforming, together with the CEP for information analysis. The combination of these three technologies allows developers and researchers to build systems that can consume, process, transform, and analyse large amounts of heterogeneous data in real time. On the other hand, to bring this architecture to any kind of users, we have developed MEdit4CEP-SP, a model-driven system and extension of the tool MEdit4CEP, that integrates SP, DSS, and CEP for consuming, processing and analysing heterogeneous data in real time, providing domain experts with a graphical editor which allows them to infer and define heterogeneous data domains, as well as user-friendly modelling the situations of interest to be detected in such domains. In this editor, the graphical definitions are automatically transformed into code, thanks to the use of MDD techniques, which is deployed in the processing system at runtime. Also, all these definitions are persistently stored in a NoSQL database, so any user can reuse the definitions that already stored. Moreover, this set of tools could be used as collaborative, due to they can be deployed on the cloud, meaning that several domain experts or final users can be working together with their MEdit4CEP-SP instances, using their own computers, adding, removing and updating event types and event patterns from the same CEP engine. Furthermore, we have evaluated our solution thoroughly. First, we have tested our SP architecture to prove its scalability and its computing capacity, showing that the system can process more data using more nodes. Its performance is outstanding, reaching a maximum Transactions Per Second (TPS) of 135 080 Mb/s using 4 nodes. Next, we have tested the graphical editor with real users to show that it provides its functionalities in a friendly and intuitive way. The users were asked to fulfil a series of tasks and then they answered a questionnaire to evaluate their experience with the editor. The results of such questionnaire were successful. Finally, the benefits of this system are compared with other existing approaches in the literature with excellent results. In such comparative analysis, we have contrasted our proposal against others based on a series of key features that systems for modelling, consuming, processing, and analysing heterogeneous data in real time should present.