DATA LAKE ARCHITECTURE IN THE EDUCATION AREA

Authors

DOI:

https://doi.org/10.31891/2307-5732-2024-331-25

Keywords:

data lake, data lake architecture, big data, metadata, educational data, data warehouses

Abstract

An overview of the work and research of scientists in the context of big data, and in particular data lakes, was conducted. The lake is primarily considered as a certain repository for further data analysis. The data lake model focuses on the concept of what information needs to be stored, rather than what data is actually needed or what its purpose is. The existing methods of organizing the architecture of the data lake, both on the basis of basic and modified ones, based on the use of various possibilities related to graphs, semantic networks, and ontologies, have been worked out. Such approaches form functionally oriented models of architectures, as well as the possibility of creating new hybrid architectures with specialized metadata management tools. Metadata includes working with data and objects at different levels of detail. The granularity is strongly related to the concept of data lakes, most often in the aspects of data recognition of different entities. Metadata itself is information about the data and processes that the data lake collects and requires separate management mechanisms. In recent years, several such mechanisms have been introduced that focus on categorization or list metadata management functions, or a combination thereof. The work presents the results of the analysis of the field of education at all stages of the educational process. The characteristics and features of each stage of the educational process are provided and data repositories are built, for a better understanding of each of them and the construction of the data lake architecture. A model representing the possible stages of the educational process and the data directly related to them was built. The concept of a complete portrait of the characteristics of the student of education, which should provide information about him, based on the completed stages of the educational process, including both formal and informal education, is introduced. A formal representation of the educational data lake is presented, describing the main elements that should be included in the architecture model. A model of the architecture of the lake of educational data, a description of its components and functional levels has been formed. Analyzed data organization techniques to improve analytics and memory optimization using columnar data format and Spark tools.

Published

2024-02-29

How to Cite

DATA LAKE ARCHITECTURE IN THE EDUCATION AREA. (2024). Herald of Khmelnytskyi National University. Technical Sciences, 331(1), 149-157. https://doi.org/10.31891/2307-5732-2024-331-25