American Journal of Computer Architecture

2023;  10(2): 21-27

doi:10.5923/j.ajca.20231002.01

Received: Aug. 9, 2023; Accepted: Aug. 21, 2023; Published: Aug. 23, 2023

 

Exploring the Synergy of Data Lakes and Edge Analytics

Anshumali Ambasht

Deloitte Consulting, Chicago, Illinois, United States of America

Correspondence to: Anshumali Ambasht, Deloitte Consulting, Chicago, Illinois, United States of America.

Email:

Copyright © 2023 The Author(s). Published by Scientific & Academic Publishing.

This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Abstract

The integration of data lakes and edge analytics marks a pivotal shift in the data processing landscape, enabling organizations to harness the power of vast data repositories and real-time insights at the network edge. This article delves into the intersection of data lakes and edge analytics, elucidating their individual significance and exploring their synergistic potential. By examining the benefits, challenges, and real-world applications, this article sheds light on how this convergence empowers businesses to drive informed decision-making, enhance operational efficiency, and unlock unprecedented opportunities in a data-driven world.

Keywords: Data Lake, Edge Analytics, IoT, Real Time, Machine Learning

Cite this paper: Anshumali Ambasht, Exploring the Synergy of Data Lakes and Edge Analytics, American Journal of Computer Architecture, Vol. 10 No. 2, 2023, pp. 21-27. doi: 10.5923/j.ajca.20231002.01.

Article Outline

1. Introduction
2. Data Lakes
    2.1. Understanding Data Lake Architecture
    2.2. Benefits of Data Lake Architecture
3. Edge Analytics
    3.1. Understanding Edge Analytics Architecture
    3.2. Edge Analytics Techniques
    3.3. Benefits of Edge Analytics Architecture
4. Convergence Benefits
    4.1. Benefits through the Synergy
5. Real World Applications
    5.1. Industrial IoT and Predictive Maintenance
    5.2. Smart Agriculture and Precision Farming
    5.3. Healthcare Monitoring and Patient Care
    5.4. Intelligent Transportation Systems
    5.5. Retail Customer Experience Enhancement
    5.6. Energy Management and Sustainability
    5.7. Autonomous Vehicles and Real-time Decision-making
6. Challenges and Mitigation Steps
    6.1. Data Synchronization and Consistency
    6.2. Data Quality and Governance
    6.3. Security and Privacy Concerns
    6.4. Limited Computing Resources at the Edge
    6.5. Scalability and Management
    6.6. Latency Considerations
    6.7. Edge Device Diversity
    6.8. Integration with Existing Systems
    6.9. Data Lifecycle Management
    6.10. Training and Expertise
7. Cost Estimates for Data Lake and Edge Analytics Integration
    7.1. Infrastructure Costs
    7.2. Data Transfer Costs
    7.3. Personnel and Training
    7.4. Security and Compliance Costs
    7.5. Development and Integration
    7.6. Operational Costs
    7.7. Data Lifecycle Management:
8. Future Prospects and Innovations
    8.1. Federated Edge Analytics
    8.2. Edge AI and Machine Learning
    8.3. Hybrid Edge-Cloud Architectures
    8.4. Edge Data Hubs
    8.5. Quantum Computing Impacts
    8.6. Edge Analytics Ecosystems
    8.7. Edge Analytics in 5G Networks
    8.8. Ethical and Regulatory Considerations
    8.9. Insights-Driven Business Models
    8.10. Continuous Evolution
9. Conclusions

1. Introduction

The emergence of data lakes as comprehensive storage repositories and the rise of edge analytics in enabling real-time insights have reshaped the way organizations approach data processing. This article introduces the concept of merging data lakes and edge analytics, providing an overview of their individual characteristics and setting the stage for their combined capabilities.

2. Data Lakes

A data lake, metaphorically, is a vast body of water into which data streams flow, irrespective of the data's structure or source. Unlike traditional data storage methods that rely on predefined schemas, data lakes allow organizations to ingest raw, unprocessed data, whether structured, semi-structured, or unstructured. This "schema-on-read" approach defers data structuring until the point of analysis, granting unparalleled flexibility in handling diverse data types. [1]

2.1. Understanding Data Lake Architecture

At its core, a data lake is a centralized repository that stores structured, semi-structured, and unstructured data at scale. Its architecture is characterized by its flexibility, accommodating data in its raw form, and deferring schema application until the data is accessed for analysis. This schema-on-read approach contrasts with traditional databases that enforce schemas at the time of data ingestion. Following are the key components: [2]
Storage Layer: Central to the data lake's architecture is the storage layer, often realized through distributed file systems like Hadoop HDFS or cloud-based object storage like Amazon S3 or Azure Data Lake Storage. This layer houses the actual data, which can be structured data from relational databases, semi-structured data like JSON or XML, and unstructured data such as images, videos, and text.
Ingestion Layer: The data ingestion layer is responsible for bringing data into the lake. It accommodates batch processing, where data is loaded in bulk at scheduled intervals, and real-time processing, where data arrives continuously. Data may come from various sources, including IoT devices, databases, log files, and external APIs.
Metadata Management: Metadata provides context to the stored data, enabling users to efficiently discover and understand the available information. It includes information about the data's source, structure, lineage, and any transformations applied. Effective metadata management ensures data lineage, quality, and compliance.
Processing Layer: The processing layer consists of tools and frameworks that enable data transformation, cleaning, and analysis. Technologies like Apache Spark, Apache Flink, and Hadoop MapReduce process data in parallel, extracting insights from large datasets efficiently.
Security and Governance: Ensuring data security and governance is paramount. Access controls, encryption mechanisms, and compliance policies are implemented to safeguard data. A robust data governance framework ensures data integrity, privacy, and compliance with regulations.

2.2. Benefits of Data Lake Architecture

Flexibility: The schema-on-read approach liberates organizations from the constraints of predefined schemas, allowing them to ingest and analyze diverse data types without the need for extensive data transformations.
Scalability: Data lakes offer horizontal scalability, enabling organizations to handle massive volumes of data by adding more storage and processing resources as needed.
Cost Efficiency: Cloud-based data lakes eliminate the need for upfront hardware investments. Organizations pay only for the storage and resources they consume, making data lakes a cost-effective solution.
Advanced Analytics: The architecture supports advanced analytics, machine learning, and AI, facilitating the discovery of valuable insights from the raw data. This dynamic environment encourages experimentation and innovation.

3. Edge Analytics

Edge analytics represents a paradigm shift in data processing, where data is analyzed locally on edge devices, such as sensors, IoT devices, and gateways, rather than being sent to centralized data centers. This real-time processing at the edge of the network significantly reduces latency, enabling organizations to respond swiftly to events as they happen. It is particularly crucial for applications that demand immediate action, such as industrial automation, healthcare monitoring, and autonomous vehicles. [3]

3.1. Understanding Edge Analytics Architecture

Edge analytics architecture is a framework designed to process and analyze data locally on edge devices, such as IoT sensors, gateways, and edge servers, instead of sending it to centralized data centers. This localized processing reduces the time it takes to derive insights from the data, making it well-suited for applications that demand immediate responses. Following are the key components: [4]
Edge Devices: These are the sensors, IoT devices, or machines that generate data at the edge of the network. Edge devices collect data and initiate local processing before transmitting insights.
Edge Gateways: Gateways act as intermediaries between edge devices and the central infrastructure. They preprocess, filter, and aggregate data before forwarding relevant insights to the cloud or central data repository.
Local Processing and Storage: Edge devices or gateways perform initial data processing and analysis. Local storage enables temporary data retention for immediate insights and minimizes data transfer.
Analytics Engines: These are software components that execute algorithms for data analysis and insights. Analytics engines can be embedded within edge devices, gateways, or edge servers.
Communication Protocols: Edge devices communicate using lightweight protocols optimized for low bandwidth and real-time interactions. MQTT, CoAP, and AMQP are common protocols used in edge analytics architectures.
Cloud Integration: In hybrid architectures, relevant insights from edge analytics are sent to the cloud for further analysis or historical storage.
Cloud resources can support complex analytics, long-term storage, and large-scale processing.

3.2. Edge Analytics Techniques

Edge analytics techniques are a suite of methodologies designed to process and analyze data at the edge of the network, closer to where data is generated. These techniques enable organizations to derive immediate insights, reduce latency, and optimize bandwidth usage. Let's explore some key edge analytics techniques and their applications.
Data Filtering and Aggregation: Data is filtered and aggregated at the edge before being sent to a central repository.
This technique reduces the volume of data that needs to be transferred, conserving network bandwidth. It's useful for scenarios where data granularity can be sacrificed for efficiency.
Anomaly Detection: Algorithms are used to identify anomalies or outliers in real-time data streams. Anomaly detection is crucial for applications like predictive maintenance in industrial settings. It enables timely identification of equipment malfunctions or deviations from normal behavior.
Predictive Analytics: Historical and real-time data are analyzed to predict future events or trends. Predictive analytics is valuable for applications such as demand forecasting in retail or predicting patient outcomes in healthcare.
Complex Event Processing: Patterns are detected and processed across multiple data streams to trigger predefined actions. CEP is used for scenarios that require real-time responses, such as fraud detection in financial transactions or monitoring traffic for congestion control in smart cities.
Machine Learning Inference: Trained machine learning models are deployed at the edge to infer insights from incoming data. Machine learning inference at the edge can include image recognition for surveillance cameras or natural language processing for voice assistants.
Time-Series Analysis: Historical time-series data is analyzed to identify trends, patterns, and anomalies. Time-series analysis is valuable in applications like monitoring equipment performance, weather forecasting, or energy consumption tracking.

3.3. Benefits of Edge Analytics Architecture

Bandwidth Efficiency: By filtering and processing data at the edge, only relevant information is transmitted to centralized systems, reducing network traffic and conserving bandwidth. [5]
Reduced Latency: Processing data locally significantly reduces the time it takes to generate insights. This is particularly beneficial for applications where real-time responses are critical. [6]
Privacy and Security: Edge analytics architecture allows sensitive data to be processed locally, mitigating privacy concerns and ensuring compliance with data protection regulations. [7]
Resilience: Edge devices can continue to operate even when network connectivity is disrupted, ensuring continuous data processing and decision-making. [7]

4. Convergence Benefits

In the dynamic landscape of data-driven innovation, the convergence of data lakes and edge analytics stands as a transformative alliance. This union not only addresses the challenges of managing vast and varied datasets but also amplifies the potential for real-time insights and informed decision-making.

4.1. Benefits through the Synergy

The convergence of data lakes and edge analytics creates a symbiotic relationship that leverages their unique strengths to produce amplified outcomes.
Reduced Latency and Data Efficiency: Edge analytics excels in processing data at the source, minimizing latency and enabling instant insights. By coupling it with data lakes, organizations can strategically filter and transmit only relevant data to the lake, optimizing bandwidth usage and enhancing data efficiency.
Real-time Decision-making: The amalgamation empowers organizations to make informed decisions in real time. Data lakes provide a comprehensive repository for historical and contextual data, while edge analytics ensures that the most recent data is processed promptly, enabling swift action when it matters most.
Contextual Insights: Data lakes offer a holistic view of data, allowing for the contextualization of insights derived from edge analytics. This integration helps in discerning patterns, trends, and anomalies by correlating real-time edge data with historical data stored in the lake.
Data Governance and Compliance: Data lakes provide a centralized platform for managing data governance, ensuring data quality, and complying with regulatory requirements. Integrating edge analytics with the lake enables consistent application of governance policies across the data lifecycle.
Optimized Data Processing and Storage: Edge analytics alleviate the burden on central processing and storage systems by filtering and processing data at the edge. This not only reduces the load on data lakes but also ensures that valuable insights are generated closer to the data source.
Scalability and Adaptability: Data lakes offer scalability to accommodate growing datasets, while edge analytics operates on a per-device basis. This convergence ensures that as the number of edge devices grows, the processing load is distributed efficiently, maintaining overall system scalability.

5. Real World Applications

The convergence of data lakes and edge analytics opens doors to innovative solutions across various industries. This synergy empowers organizations to address challenges, optimize processes, and derive valuable insights in real time. Here are some real-world examples that demonstrate the tangible benefits of integrating data lakes and edge analytics:

5.1. Industrial IoT and Predictive Maintenance

In manufacturing, industrial IoT devices collect data from machinery and equipment. Edge analytics processes this data locally, detecting anomalies and signs of wear in real time. Relevant insights are sent to a data lake, where historical data is stored. By analyzing historical patterns and current data, organizations can predict maintenance needs, optimize equipment utilization, and reduce downtime, resulting in enhanced operational efficiency.

5.2. Smart Agriculture and Precision Farming

Edge devices installed in fields gather data on soil moisture, weather conditions, and crop health. Edge analytics process this data locally, providing real-time insights on irrigation needs and disease detection. The processed data is then integrated with a data lake containing historical crop yield and soil data. Farmers can make informed decisions about irrigation schedules, pest control, and planting strategies, leading to increased crop yield and resource optimization.

5.3. Healthcare Monitoring and Patient Care

Wearable medical devices equipped with edge analytics continuously monitor patients' vital signs. Local analysis helps detect anomalies and critical health events. These insights are shared with a data lake containing patient histories, medical research, and treatment records. Medical professionals can access a comprehensive view of patients' health, enabling personalized treatment plans and timely interventions.

5.4. Intelligent Transportation Systems

Edge devices embedded in vehicles and traffic infrastructure capture real-time data on traffic conditions, road safety, and vehicle performance. Edge analytics process this data locally to provide immediate insights on traffic congestion, accidents, and road hazards. These insights are combined with historical traffic data stored in a data lake, enabling predictive traffic modeling, route optimization, and improved traffic management.

5.5. Retail Customer Experience Enhancement

In retail, edge devices track customer movement and behavior within stores. Edge analytics analyze this data to understand shopping patterns and optimize store layouts. Insights are integrated into a data lake containing customer purchase history and preferences. Retailers can offer personalized recommendations, improve inventory management, and enhance the overall shopping experience.

5.6. Energy Management and Sustainability

Sensors installed in energy-intensive facilities collect real-time data on energy consumption and equipment performance. Edge analytics process this data locally, identifying energy-saving opportunities and detecting inefficiencies. These insights are integrated with a data lake containing historical energy consumption patterns. Organizations can implement energy-efficient practices, optimize resource usage, and reduce operational costs.

5.7. Autonomous Vehicles and Real-time Decision-making

Autonomous vehicles are equipped with edge devices that process sensor data in real time, making split-second decisions for safe navigation. These decisions are informed by real-time edge analytics. Data from these edge devices is also sent to a data lake for further analysis and continuous improvement of autonomous driving algorithms.

6. Challenges and Mitigation Steps

The convergence of data lakes and edge analytics promises revolutionary advantages in the realm of real-time insights and data-driven decision-making. However, this amalgamation is not without its array of challenges that demand astute attention and strategic solutions. Below, we delve into the intricate fabric of these challenges and the strategies to surmount them.

6.1. Data Synchronization and Consistency

Challenge: Maintaining coherence between real-time insights from edge analytics and historical data residing in the data lake can be intricate.
Mitigation: Employ data synchronization tools that leverage timestamping, version control, and change detection mechanisms. Implement data reconciliation processes to ensure alignment between edge and data lake datasets.

6.2. Data Quality and Governance

Challenge: Upholding data quality across the diverse spectrum of edge devices and sources poses a substantial hurdle.
Mitigation: Integrate data validation mechanisms at the edge, leveraging data profiling, cleansing, and transformation techniques. Establish a comprehensive data governance framework that enforces standardized data collection protocols.

6.3. Security and Privacy Concerns

Challenge: Ensuring data security and privacy in less secure edge environments necessitates robust strategies.
Mitigation: Employ a multi-layered security approach encompassing encryption, access controls, secure boot, and regular security updates. Implement edge devices with built-in hardware security modules to fortify data protection.

6.4. Limited Computing Resources at the Edge

Challenge: Crafting analytics solutions that operate seamlessly within the constrained computational resources of edge devices is a technical hurdle.
Mitigation: Opt for edge-aware analytics algorithms that prioritize data preprocessing, aggregation, and summarization. Leverage edge-enhanced processing units and hardware accelerators to optimize resource utilization.

6.5. Scalability and Management

Challenge: The exponential growth of edge devices demands a scalable architecture that can be efficiently managed.
Mitigation: Implement edge analytics solutions using containerization or microservices, facilitating modular scalability. Leverage centralized management platforms equipped with automated deployment, monitoring, and updates.

6.6. Latency Considerations

Challenge: While edge analytics minimizes latency, ultra-low latency requirements necessitate advanced strategies.
Mitigation: Employ edge hardware with FPGA or GPU capabilities for accelerated processing. Leverage edge caching, data compression, and real-time streaming techniques to further mitigate latency.

6.7. Edge Device Diversity

Challenge: The heterogeneous landscape of edge devices demands adaptable analytics strategies.
Mitigation: Develop edge analytics frameworks that can dynamically adapt to varying device capabilities. Implement resource-aware algorithm selection to ensure optimal performance across device types.

6.8. Integration with Existing Systems

Challenge: Seamlessly integrating edge analytics outputs with central data lake architectures requires meticulous planning.
Mitigation: Utilize standardized data formats and APIs for interoperability. Employ data virtualization tools to bridge the gap between edge-generated insights and centralized data processing systems.

6.9. Data Lifecycle Management

Challenge: Managing the complete data lifecycle from edge to data lake necessitates strategic storage and archiving decisions.
Mitigation: Implement tiered data storage solutions that prioritize local storage for real-time processing and archive data of historical value in the data lake. Define data retention policies that align with regulatory compliance and cost-efficiency.

6.10. Training and Expertise

Challenge: Implementing and managing edge analytics solutions require specialized skills and knowledge.
Mitigation: Invest in comprehensive training programs to upskill existing teams. Consider hiring professionals with proficiency in both edge computing and data lake architectures to ensure effective implementation and management.

7. Cost Estimates for Data Lake and Edge Analytics Integration

As organizations embark on the journey of integrating data lakes and edge analytics, understanding the associated costs is essential for effective planning and budgeting. The costs encompass a wide spectrum, ranging from infrastructure investments to ongoing operational expenses. Here, we provide an in-depth breakdown of potential costs and considerations:

7.1. Infrastructure Costs

Edge Devices and Gateways: Estimate costs for purchasing and deploying edge devices, sensors, and IoT components. Include expenses for edge gateways that preprocess and transmit data to the data lake.
Edge Servers: If required, budget for edge servers with enhanced processing capabilities. Consider hardware costs, installation, and ongoing maintenance.
Data Lake Infrastructure: Calculate costs related to setting up and maintaining the data lake infrastructure. This includes storage resources, computing resources, and networking components.

7.2. Data Transfer Costs

Bandwidth Usage: Anticipate expenses associated with transferring data from edge devices to the data lake. Account for data volume and potential network charges.
Network Optimization: Allocate resources for network optimization measures to minimize data transfer costs. Consider implementing data compression techniques and efficient routing protocols.

7.3. Personnel and Training

Training Costs: Set aside budget for training personnel responsible for managing edge devices, gateways, and data lake operations. This includes training on architecture, security practices, and analytics.
Personnel Expenses: Consider ongoing personnel expenses for IT support, data analysts, and data engineers responsible for maintaining and optimizing the integrated system.

7.4. Security and Compliance Costs

Security Measures: Budget for implementing robust security measures, including encryption tools, authentication mechanisms, and intrusion detection systems.
Compliance Audits: Allocate funds for regular compliance audits to ensure that data handling practices adhere to industry regulations.

7.5. Development and Integration

Custom Development Costs: Estimate expenses for developing custom software solutions and integrating them with existing systems.
Testing and Quality Assurance: Consider costs for testing and quality assurance efforts to ensure seamless integration and functionality.

7.6. Operational Costs

Monitoring and Maintenance: Plan for ongoing monitoring and maintenance expenses for edge devices, gateways, and the data lake infrastructure.
Scalability Costs: As the architecture scales to accommodate more devices and data, account for costs associated with scaling edge resources, upgrading hardware, and expanding data lake capacity.

7.7. Data Lifecycle Management:

Storage Costs: Estimate costs for storing data in both the edge devices and the data lake. Consider factors such as data retention policies, storage tiers, and archival requirements.
Data Purging Costs: Factor in expenses related to data purging and archiving as data ages and becomes less relevant.

8. Future Prospects and Innovations

The landscape of data processing is a constantly evolving one, driven by technological advancements and changing business needs. The convergence of data lakes and edge analytics has opened new avenues for innovation, enabling organizations to extract greater value from their data. As we peer into the future, several trends and innovations are poised to shape the trajectory of this dynamic partnership. [8]

8.1. Federated Edge Analytics

Federated edge analytics involves orchestrating and coordinating analytics processes across multiple edge devices to achieve more comprehensive insights. This approach allows edge devices to collaborate and share insights while maintaining data privacy and security. Federated learning, a subset of this trend, allows models to be trained across multiple edge devices without sharing raw data, enhancing privacy and efficiency.

8.2. Edge AI and Machine Learning

Edge analytics is set to become more intelligent with the infusion of artificial intelligence and machine learning capabilities directly into edge devices. This will enable edge devices to not only process data but also perform advanced analytics, anomaly detection, and pattern recognition, without relying on centralized cloud resources.

8.3. Hybrid Edge-Cloud Architectures

The boundary between edge and cloud computing will blur further, giving rise to hybrid architectures that seamlessly integrate edge devices with cloud-based data lakes and processing resources. This will allow organizations to balance the benefits of local processing and real-time insights with the scalability and compute power of the cloud.

8.4. Edge Data Hubs

Edge data hubs will emerge as central points for managing and coordinating edge analytics processes. These hubs will handle data preprocessing, aggregation, and orchestration of analytics tasks across multiple edge devices, enhancing efficiency and reducing redundancy.

8.5. Quantum Computing Impacts

As quantum computing advances, it will have implications for data processing at both the edge and the cloud. Quantum edge devices could provide accelerated analytics capabilities, enabling complex computations and simulations at the edge.

8.6. Edge Analytics Ecosystems

Vendors will create comprehensive ecosystems for edge analytics, offering tools, frameworks, and platforms tailored for different use cases and industries. This will simplify the deployment and management of edge analytics solutions.

8.7. Edge Analytics in 5G Networks

The rollout of 5G networks will provide higher bandwidth and lower latency, making real-time edge analytics even more feasible. Industries like autonomous vehicles, augmented reality, and industrial automation will benefit from enhanced connectivity and processing capabilities.

8.8. Ethical and Regulatory Considerations

As edge analytics becomes more widespread, ethical considerations regarding data privacy, consent, and transparency will take center stage. Organizations will need to navigate complex regulations to ensure that their edge analytics practices align with data protection and compliance requirements.

8.9. Insights-Driven Business Models

The convergence of data lakes and edge analytics will enable organizations to develop new business models based on real-time insights. Subscription services, pay-per-insight models, and data monetization strategies could emerge as innovative revenue streams.

8.10. Continuous Evolution

The landscape of data processing is characterized by continuous evolution. As technology advances and use cases expand, data lakes and edge analytics will continue to evolve in tandem, adapting to the ever-changing needs of businesses and society.

9. Conclusions

The convergence of data lakes and edge analytics has opened a transformative path in the data-driven landscape, offering real-time insights and optimized decision-making capabilities. This integration carries both advantages and challenges that organizations must navigate to harness its potential effectively.
On the positive side, the integration brings forth real-time insights, enabling rapid and informed decisions by leveraging the latest data. The reduction in latency through edge analytics ensures quicker response times, particularly crucial for applications demanding immediate actions. The centralized data management facilitated by data lakes streamlines the handling of data from diverse sources, enhancing organization-wide efficiency. Scalability remains a hallmark, allowing the architecture to adapt to growing data volumes and device counts. Optimized resource utilization minimizes data transfer requirements, conserving bandwidth and enhancing network efficiency. Additionally, the fusion of real-time edge insights with historical data empowers more comprehensive and accurate data analysis.
However, the integration also poses challenges. Complexity arises from designing, synchronizing, and managing diverse edge devices and their insights within the broader data lake ecosystem. Ensuring data consistency across real-time edge data and historical lake data requires meticulous synchronization. Balancing real-time processing needs with data security in the less secure edge environment demands robust security measures. Resource limitations in edge devices necessitate the optimization of analytics algorithms within confined computing and storage capacities. The operational overhead of managing and scaling edge analytics processes, especially as the number of devices expands, can be intricate. Integrating insights seamlessly across edge analytics and data lake architectures presents a notable challenge.
In navigating this intricate landscape, organizations must weigh the agility of real-time insights against the intricacies of managing a diverse and complex ecosystem. By strategically approaching these challenges and capitalizing on the advantages, organizations can harness the transformative potential of data lakes and edge analytics integration, positioning themselves at the forefront of data-driven innovation.

References

[1]  Fatemeh Nargesian, Erkang Zhu, Renée J. Miller, Ken Q. Pu, and Patricia C. Arocena. 2019. Data lake management: challenges and opportunities. Proc. VLDB Endow. 12, 12 (August 2019), 1986–1989. https://doi.org/10.14778/3352063.3352116.
[2]  Mathis, C. Data Lakes. Datenbank Spektrum 17, 289–293 (2017). https://doi.org/10.1007/s13222-017-0272-7.
[3]  Sabuzima Nayak, Ripon Patgiri, Lilapati Waikhom, Arif Ahmed, A review on edge analytics: Issues, challenges, opportunities, promises, future directions, and applications, Digital Communications and Networks, 2022, ISSN 2352-8648, https://doi.org/10.1016/j.dcan.2022.10.016.
[4]  Safavat, Sunitha, Naveen Naik Sapavath, and Danda B. Rawat. "Recent advances in mobile edge computing and content caching." Digital Communications and Networks 6, no. 2 (2020): 189-194.
[5]  Gómez-Carmona, Oihane, Diego Casado-Mansilla, Diego López-de-Ipiña, and Javier García-Zubia. "Simplicity is best: Addressing the computational cost of machine learning classifiers in constrained edge devices." In Proceedings of the 9th International Conference on the Internet of Things, pp. 1-8. 2019.
[6]  Cziva, Richard, and Dimitrios P. Pezaros. "Container network functions: Bringing NFV to the network edge." IEEE Communications Magazine 55, no. 6 (2017): 24-31.
[7]  Bischoff, M., J. M. Scheuermann, C. Kiesi, and J. Hatzky. "The edge is near: an introduction to edge computing." Gepostet June (2019).
[8]  P. K. Illa and N. Padhi, "Practical Guide to Smart Factory Transition Using IoT, Big Data and Edge Analytics," in IEEE Access, vol. 6, pp. 55162-55170, 2018, doi: 10.1109/ACCESS.2018.2872799.