0% found this document useful (0 votes)
5 views

Exploring Database Lakehouse Architecture Design Patterns: Best Practices and Considerations

The document discusses the challenges organizations face in managing large-scale datasets and the limitations of traditional data architectures like data lakes and warehouses. It introduces Lakehouse architecture as a unified solution that combines the benefits of both, while also highlighting the complexities involved in its cloud-based implementation. Future research is recommended to focus on enhancing real-time data processing, machine learning integration, and empirical validation across various industries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Exploring Database Lakehouse Architecture Design Patterns: Best Practices and Considerations

The document discusses the challenges organizations face in managing large-scale datasets and the limitations of traditional data architectures like data lakes and warehouses. It introduces Lakehouse architecture as a unified solution that combines the benefits of both, while also highlighting the complexities involved in its cloud-based implementation. Future research is recommended to focus on enhancing real-time data processing, machine learning integration, and empirical validation across various industries.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215

Exploring Database Lakehouse Architecture


Design Patterns: Best Practices and
Considerations
Krishna Prisad Bajgai1 (M.Phil. Scholar-ICT); Dr. Bhoj Raj Ghimire2 (PhD)
1;2
Faculty of Information and Communication Technology Nepal Open University, Lalitpur, Nepal

Publication Date: 2025/02/25

Abstract: Organizations face challenges in managing diverse, large-scale datasets while ensuring scalability, efficiency, and
quality. Traditional data lakes and warehouses often fall short in modern big data environments. The Lakehouse architecture
unit both, but cloud implementation faces issues like optimized ingestion, efficient storage, and integration of multiple data
engines. Sectors like healthcare and agriculture struggle with real-time data and IoT, leading to inefficiencies. Current
research highlights gaps in performance, scalability, and the integration of advanced analytics. Future work should focus on
improving large dataset handling, real-time processing, and machine learning integration for better decision-making and
performance.

Keywords: Lakehouse Architecture, Big Data Integration, Cloud Computing, Real-Time Data Processing, Federated Governance,
Machine Learning.

How to Cite: Krishna Prisad Bajgai; Dr. Bhoj Raj Ghimire (2025). Exploring Database Lakehouse Architecture Design Patterns:
Best Practices and Considerations. International Journal of Innovative Science and Research Technology, 10(2), 550-557.
https://doi.org/10.5281/zenodo.14921215

I. INTRODUCTION execution also remain critical in the implementation of


Lakehouse systems [13]. The limitations of current research
Organizations are increasingly facing challenges in highlight the need for further advancements in flexibility,
managing and integrating diverse, large-scale datasets while scalability, and real-world deployment of lakehouses. Future
ensuring scalability, efficiency, and data quality. Traditional work should focus on improving the handling of large datasets,
data management architectures, such as data lakes and data real-time data processing, and the integration of machine
warehouses, often fail to meet the demands of modern big data learning to enhance decision-making [9]. Additionally,
environments, leading to inefficiencies and limited support for empirical studies are needed to validate the practical
data-driven decision-making [1] .The Lakehouse architecture effectiveness of these systems across diverse industries, with
has emerged as a promising solution, unifying the capabilities a particular focus on cost optimization, scalability, and
of both data warehouses and data lakes. However, its performance under large-scale deployments [3].
implementation in cloud-based environments presents
complexities, including the need for optimized data ingestion, The limitations of current research highlight the need for
efficient storage mechanisms, and seamless integration of further advancements in flexibility, scalability, and real-world
multiple data processing engines [8]. Sectors such as deployment of lakehouses [1]. Future work should focus on
healthcare and agriculture struggle with managing real-time improving the handling of large datasets, real-time data
data, IoT devices, and diverse data sources, leading to processing, and the integration of machine learning to enhance
inefficiencies in decision-making and operations [5][7]. decision-making [8]. Additionally, empirical studies are
Further, traditional storage formats in lakehouses do not needed to validate the practical effectiveness of these systems
support graph analytics, and federated governance in data across diverse industries, with a particular focus on cost
mesh architectures requires further research [4][6]. Key optimization, scalability, and performance under large-scale
challenges such as performance optimization and query deployments [9].

IJISRT25FEB264 www.ijisrt.com 550


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215

Fig 1 From: The Lakehouse: State of the Art on Concepts and Technologies

 Problem Statement: lakes, often fail to meet the demands of modern big data
Many of the Organizations face growing challenges in landscapes. The lack of integration between these
managing large-scale, diverse datasets as traditional architectures results in inefficiencies, operational bottlenecks,
architectures like data lakes and warehouses often fall short in and limited support for data-driven decision-making [1][2].
scalability and efficiency [1]. The Lakehouse architecture
offers a unified solution, but its cloud-based implementation The emerging "Lakehouse" architecture offers a unified
poses complexities such as data ingestion, storage solution that combines the advanced analytics capabilities of
optimization, and processing integration [12]. Industries like data warehouses with the scalability and flexibility of data
healthcare and agriculture struggle with real-time data and IoT lakes. However, implementing lakehouses in cloud-based
devices, highlighting the need for advancements in environments introduces complexities, including the need for
performance, scalability, and machine learning integration optimized data ingestion, efficient storage mechanisms, and
[5][7]. Future research must address these gaps to optimize seamless sintegration of multiple data processing
cost, enhance decision-making, and validate Lakehouse engines.[4][10].
systems in diverse, large-scale deployments [9].
In particular, the healthcare and agriculture sectors
Organizations struggle to manage large, diverse datasets illustrate the challenges of managing diverse data sources,
due to the inefficiencies of traditional data architectures like such as IoT devices, sensors, and real-time monitoring
warehouses and lakes [1]. While the Lakehouse architecture systems. Existing systems struggle to handle the velocity and
offers a unified solution, challenges persist in cloud-based variety of data, leading to inefficiencies in clinical decision-
implementations, including data ingestion, storage making and precision farming applications [5]
optimization, governance, and query performance [13].
Sectors like healthcare and agriculture face additional hurdles Additionally, managing graph data in lakehouse
with real-time and graph data, necessitating innovative environments poses unique challenges, as traditional columnar
solutions to enhance scalability, integration, and decision- storage formats like Parquet and ORC are not optimized for
making [5][6][7]. graph analytics. This limitation hinders performance for
operations such as neighbor retrieval and label filtering,
II. LITERATURE REVIEW necessitating novel storage solutions tailored for graph
data[6].
Organizations today face significant challenges in
managing and integrating diverse, large-scale datasets while Organizations also encounter difficulties in
ensuring scalability, efficiency, and data quality. Traditional implementing federated governance and ensuring data quality
centralized architectures, such as data warehouses and data within distributed architectures like data meshes. Effective

IJISRT25FEB264 www.ijisrt.com 551


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
data governance and quality management are critical for from various departments and systems. These studies
supporting complex big data landscapes and enhancing evaluated lakehouse architectures for their capability to unify
decision-making processes [3] data management and enable decision-making[3][2].

Furthermore, optimizing query execution and  Agricultural Data (Agriculture 4.0)


performance remains a key challenge in Lakehouse systems. Studies centered on data relevant to precision agriculture,
The concept of a unified Query Optimizer as a Service including IoT sensor data, satellite imagery, and weather data.
(QOaaS) has been proposed to address these issues, ensuring These datasets demonstrated the integration and processing
efficient data processing across diverse engines[11][13]. capabilities of lakehouse systems in agriculture[5].

To overcome these limitations, the development of  Graph Data


scalable, efficient, and unified architectures is crucial. By Research into graph data focused on the Labeled
addressing the gaps in data integration, governance, and Property Graph (LPG) model, evaluating how effectively
performance, the Lakehouse paradigm can unlock the lakehouse architectures could manage graph-specific
potential for enhanced data strategies and innovation across operations, such as neighbor retrieval and label filtering[6].
industries [11][8].
 Healthcare Data
 Data Used : The studies explored structured and unstructured
The following summarizes the types of data used in the healthcare datasets, including Electronic Health Records
referenced studies, highlighting their relevance to evaluating (EHRs), imaging data, sensor data, and patient-generated data,
or validating proposed cloud data lakehouse architectures. to evaluate real-time processing capabilities in healthcare data
lakes[7].
 Publicly Available and Synthetic Datasets
Studies frequently utilized synthetic datasets, publicly  Open Data Formats and TPC-DS Benchmark
available datasets, or data from specific use cases to evaluate Open data formats, such as Apache Parquet and ORC,
the performance and scalability of lakehouse systems. These were emphasized, along with the TPC-DS benchmark, a
datasets are often employed to test key architectural standard for decision support systems, to assess system
improvements in scalability, query optimization, and data performance[10][13].
processing performance[1][8].
 Conceptual Focus on Architectures
 Organizational Data (Structured, Semi-Structured, and Certain studies concentrated on architectural discussions,
Unstructured) without using specific datasets, highlighting integration, query
Several studies focused on real-world organizational optimization, and ingestion challenges in lakehouse
data, encompassing diverse formats and structures derived systems[9][11][12].

Fig 2 Mono-Zone Architecture.

IJISRT25FEB264 www.ijisrt.com 552


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
 Research Data in CRIS A. Methods and Technologies :
Studies focused on Current Research Information
Systems (CRIS) data, covering projects, personnel,  Architectural Frameworks for Data Lakehouses
organizational units, funding programs, research outputs Utilized cloud services like AWS, Azure, and Google
(publications, patents), facilities, and events[3][14]. Cloud for data lakehouses. Data processing frameworks
(Apache Spark, Hadoop) and storage solutions (S3, Delta
Lake) are discussed as key components for solving data
integration challenges.[1][2][8][9].

Fig 3 High-level Framework for Building a Data Lake on AWS.

 Integration of OLAP and OLTP Systems It focuses on labeled property graphs (LPG) and employs
Novel approaches to managing data consistency and innovative encoding/decoding techniques.[6]
schema enforcement by integrating OLAP and OLTP within
lakehouse architectures were proposed.[8][9][10].  Healthcare Data Lakes
Explored technologies for real-time data processing in
 Data Mesh Architecture healthcare data lakes, including:
Emphasized a domain-oriented decentralized approach,
treating data as a product, assigning ownership to domain  Data Ingestion: Platforms like Apache Kafka and Apache
teams, and implementing self-serve data platforms for Flink.
enhanced accessibility and management.[3]  Data Storage: Scalable solutions such as HDFS and cloud
storage.
 Federated Governance in Data Mesh  Data Processing: Real-time analytics frameworks.
Proposed federated computational governance, which  Data Mining: Machine learning for predictive analytics
ensures consistent policies across domains while granting and personalized care.[7]
local autonomy.[4]
 Lakehouse Architecture Innovations
 Cloud and Distributed Computing for Agriculture Built on open, direct-access data formats and
Reviewed centralized and distributed cloud architectures incorporates features like ACID transactions, data versioning,
for agriculture. These strategies optimize data storage, and indexing. Supports machine learning workloads
processing, and analysis for Agriculture 4.0.[5] effectively.[8]
 GraphAr for Graph Data in Data Lakes  Comparative Reviews
Introduced GraphAr as a specialized storage scheme Analyzed strengths and weaknesses of existing DW and
leveraging Parquet for graph data management in data lakes. DL technologies, highlighting desired features for Lakehouse
systems.[9]

IJISRT25FEB264 www.ijisrt.com 553


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
 Query Optimizer as a Service (QOaaS)  Photon: A Fast Query Engine
Proposed QOaaS architecture using Substrait to Introduced Photon, a C++ vectorized query engine by
standardize plan specifications, integrated with Microsoft’s Databricks, optimized for SQL and Apache Spark's DataFrame
Fabric ecosystem.[10] API in Lakehouse environments.[13]

 Data Ingestion Patterns  Combining Data Lakes and Data Wrangling


Suggested a design pattern tailored for cloud-based Presented a combined approach to use data lakes for
architectures to improve big data ingestion processes.[11] central storage and data wrangling techniques to ensure data
quality.[14]

Fig 4 Modern Data Warehouse Architecture

B. Accuracy Evaluation Methods : based methods. Key metrics include speedup in neighbor
Aravind Nuthalapati (2024), This paper primarily retrieval, label filtering, and end-to-end workload
focuses on best practices and future directions for data lake- efficiency[6].
houses but does not specify a formal method for accuracy
evaluation.[1] Mitul Tilala et al. (2022), Explores healthcare data lakes
but does not provide formal accuracy evaluation methods.[7]
Jan Schneider et al. (2024) Evaluates the performance
of the lakehouse model using the TPC-DS benchmark, Michael Armbrust et al. (2021), Performance of the
comparing query execution times, data ingestion rates, and Lakehouse system is benchmarked using TPC-DS,
resource utilization. demonstrating advanced query performance comparable to
cloud data warehouses.[8]
The results show that the Lakehouse system built on
Parquet is competitive with popular cloud data warehouses.[2] Dipankar Mazumdar et al. (2023), Provides conceptual
discussions on the benefits of lakehouses without presenting
Otmane Azeroual and Radka Nacheva (2023), formal accuracy evaluations.[9]
Conceptual discussion on data mesh and its architectural
benefits; however, no formal accuracy evaluation or Ahmed Harby and Farhana Zulkernine (2022), A
performance benchmarks are included.[3] comparative review of data warehouse and lakehouse
technologies, but no empirical evaluations are reported.[10]
Anton Dolhopolov et al. (2024), Discusses federated
governance in data mesh architecture but does not provide Rana Alotaibi et al. (2024), Discusses the potential
empirical accuracy evaluations.[4] performance optimizations of Query Optimizer as a Service
(QOaaS) but lacks empirical accuracy benchmarks.[11]
Olivier Debauche et al. (2021), Reviews cloud and
distributed architectures for agriculture data management but Chiara Rucco et al. (2024), Proposes a cloud-based
lacks empirical accuracy benchmarks.[5] design pattern for optimizing data ingestion but does not
specify accuracy evaluation methods.[12]
Xue Li et al. (2024), Evaluates GraphAr's performance
by benchmarking against conventional Parquet and Acero-

IJISRT25FEB264 www.ijisrt.com 554


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
Alexander Behm et al. (2022), Benchmarks Photon, a address challenges in traditional systems and enhance
query engine for lakehouses, against cloud data warehouses performance by unifying analytics and warehousing. Otmane
and engines. Performance metrics include query execution Azeroual & Radka Nacheva (2023) advocate for decentralized
times and resource efficiency.[13] data ownership through data mesh architecture. Anton
Dolhopolov et al. (2024) emphasize federated governance to
Otmane Azeroual et al. (2022), Combines data lakes improve data management. In agriculture, Olivier Debauche et
with wrangling techniques but does not provide empirical al. (2021) find cloud integration enhances centralized data
performance or accuracy benchmarks.[14] management. Xue Li et al. (2024) improve graph data
management in data lakes with GraphAr, while Mitul Tilala et
 Validation and Verification of Proposed Models: al. (2022) explore real-time data processing in healthcare.
Focuses on best practices and architectural principles for Michael Armbrust et al. (2021) present lakehouses as solutions
cloud-based lakehouses but does not provide experimental to data staleness and reliability. Dipankar Mazumdar et al.
validation or verification[1]. Validates the proposed lakehouse (2023) show lakehouses support structured and unstructured
architecture through a comparative analysis with existing data workloads. Ahmed Harby & Farhana Zulkernine (2022)
data warehouse (DW) and data lake (DL) systems, discuss lakehouse strengths in efficient data processing. Rana
demonstrating how the lakehouse addresses their Alotaibi et al. (2024) propose QOaaS to optimize query
limitations[2]. execution. Chiara Rucco et al. (2024) explore a cloud-based
design pattern for data ingestion. Alexander Behm et al. (2022)
Discusses the effectiveness of the data mesh approach for report up to 12x query performance improvements with
enhancing scalability, data integration, and decision-making Photon. Otmane Azeroualet al. (2022) combine data wrangling
but does not include empirical validation or real-world with data lakes to enhance data quality.
testing[3]. Proposes integrating federated governance into data
mesh architectures for improved data management but lacks IV. LIMITATION, AND FUTURE WORK
empirical validation or verification[4]. Analyzes cloud and
distributed architectures in agriculture data management but  Aravind Nuthalapati (2024)
does not present validation or verification methodologies[5]
 Limitations: The paper discusses the general advantages
Validates GraphAr's effectiveness through of cloud-based lakehouse architectures but does not delve
performance benchmarks, achieving a 3,283× speedup for deeply into challenges such as handling massive datasets
neighbor retrieval, 6.0× for label filtering, and 29.5× for end- and real-time data processing.
to-end workloads compared to traditional methods[6].
Discusses the potential of real-time data processing in  Future Work: Further research should focus on improving
healthcare data lakes to enhance clinical decision-making but system flexibility, handling more complex data types,
does not include empirical validation or testing[7]. and integrating AI or machine learning to enhance data
insights.[1]
Validates the lakehouse concept through industry
trends and logical reasoning, comparing it to existing data  Jan Schneider et al. (2024)
management architectures but without real-world empirical
testing[8]. Provides conceptual insights into lakehouse  Limitations: While the lakehouse model addresses key
systems but does not include experimental validation or real- challenges, the paper does not explicitly discuss
world testing[9]. Compares lakehouse architectures with data performance degradation under large-scale data or the
warehouses and data lakes but does not include formal integration of machine learning workflows.
validation methodologies[10]. Validates the QOaaS concept
using prototypes and its integration within the Fabric  Future Work: Research should focus on scalability
ecosystem but does not detail specific validation challenges, real-time data handling, and enhancing
techniques[11]. Proposes a cloud-based design pattern for integration with machine learning systems to improve
optimizing data ingestion but does not provide validation or decision-making.[2]
empirical testing details[12].Validates Photon's performance
through benchmarks, demonstrating significant speed  Otmane Azeroual and Radka Nacheva (2023)
improvements over existing cloud data warehouses in SQL
workloads [13]. Suggests that integrating data lakes and data  Limitations: The paper does not explicitly identify
wrangling processes enhances data quality but does not specific limitations but suggests the need for empirical
include empirical validation.[14]. validation of the data mesh approach in real-world
environments.
III. RESULTS AND FINDINGS OF THE STUDIES
 Future Work: Further research could include empirical
The studies highlight advancements in data management studies to evaluate the practical effectiveness of the
and architecture, focusing on scalable solutions like proposed model in diverse organizational contexts.[3]
lakehouses and data meshes. Aravind Nuthalapati (2024)
demonstrates the scalability and cost-efficiency of cloud-
based lakehouse architectures. Jan Schneider et al. (2024)

IJISRT25FEB264 www.ijisrt.com 555


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
 Anton Dolhopolov et al. (2024) deployments or the cost implications of lakehouses at
scale.
 Limitations: The paper does not provide extensive real-
world examples or data to support the federated  Future Work: Future work should involve empirical
governance model in practice. validation of the lakehouse model in diverse industries and
focus on cost optimization and real-world scalability.[9]
 Future Work: Future research should focus on scalability
issues and validating the effectiveness of federated  Ahmed Harby and Farhana Zulkernine (2022)
governance across larger datasets in various domains.[4]
 Limitations: The paper does not provide detailed case
 Olivier Debauche et al. (2021) studies or empirical evidence regarding the actual
deployment of lakehouses in large-scale systems.
 Limitations: The paper highlights the benefits of cloud
and distributed architectures in agriculture but does not  Future Work: Future studies should implement the
delve into the scalability of these solutions under large- architecture in real-world scenarios to assess scalability
scale deployments or integration challenges. and performance across industries.[10]

 Future Work: Future studies could include empirical  Rana Alotaibi et al. (2024)
evaluations of these architectures in real agricultural
settings, focusing on scalability and integration with  Limitations: While QOaaS is promising, the paper
other technologies.[5] acknowledges the challenge of implementing flexible
cardinality estimation and adapting it to different cost
 Xue Li et al. (2024) models.

 Limitations: Graph data storage schemes in data lakes  Future Work: Research should focus on prototyping
need further refinement for larger datasets, and the QOaaS, refining its approach, and evaluating its real-
approach does not discuss performance issues when world performance in large systems.[11]
scaling.
 Chiara Rucco et al. (2024)
 Future Work: Future research should explore the
scalability of GraphAr, especially with very large  Limitations: The paper suggests using a cloud-based
datasets, and enhance integration with distributed design pattern for data ingestion but does not explore the
systems.[6] limitations in processing speed or data variety under
high-load scenarios.
 Mitul Tilala et al. (2022)
 Future Work: Future research should address the
 Limitations: The paper focuses on real-time data scalability of the ingestion pattern and integrate AI-driven
processing in healthcare but does not address challenges in optimizations for processing diverse data types.[12]
scaling real-time systems or integration with legacy
healthcare systems.  Alexander Behm et al. (2022)

 Future Work: Future research should examine scalability  Limitations: The study focuses on Photon’s query engine
in large healthcare systems and explore integration with performance but does not discuss its scalability issues or
AI-driven diagnostic tools.[7] its effectiveness across different data workloads.

 Michael Armbrust et al. (2021)  Future Work: Future research could focus on optimizing
Photon for a broader range of workloads and exploring
 Limitations: The paper presents the lakehouse as a integration with other data processing frameworks.[13]
solution but acknowledges that real-world performance
and the practicality of large-scale implementation  Otmane Azeroual et al. (2022)
require further evaluation.
 Limitations: The paper focuses on combining data lakes
 Future Work: Future research should explore additional with wrangling but does not deeply analyze real-time
features, optimize performance for various data processing challenges or large-scale implementation
workloads, and address challenges faced during constraints.
implementation.[8]
 Future Work: Future work could involve empirical
 Dipankar Mazumdar et al. (2023) validation of the proposed model in real-world CRIS
implementations.[14].
 Limitations: The article highlights benefits but does not
explore the specific challenges in real-world

IJISRT25FEB264 www.ijisrt.com 556


Volume 10, Issue 2, February – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.5281/zenodo.14921215
V. CONCLUSION [14]. Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova,
A. (2022). Combining Data Lake and Data Wrangling
The studies collectively highlight advancements in data for Ensuring Data Quality in CRIS. 35 citations.
architecture, emphasizing scalability, integration, and
performance improvements. Lakehouses unify data lakes and
warehouses, addressing challenges like data staleness, cost,
and diverse workloads. Innovations such as data mesh
architecture, federated governance, GraphAr, and QOaaS
enhance data management, decision-making, and query
optimization. Applications in healthcare, agriculture, and big
data scenarios demonstrate improvements in real-time
processing, data quality, and ingestion efficiency. These
findings underscore the transformative potential of modern
data systems in addressing diverse industry needs.

REFERENCES

[1]. Nuthalapati, A. (2024). Architecting data lake-houses


in the cloud: Best practices and future directions. 32
citations.
[2]. Schneider, J., Gröger, C., Lutsch, A., Schwarz, H., &
Mitschang, B. (2024). The Lakehouse: State of the Art
on Concepts and Technologies. 119 citations.
[3]. Azeroual, O., & Nacheva, R. (2023). Data Mesh for
Managing Complex Big Data Landscapes and
Enhancing Decision Making in Organizations. 31
citations.
[4]. Dolhopolov, A., Castelltort, A., & Laurent, A. (2024).
Implementing Federated Governance in Data Mesh
Architecture. 40 citations.
[5]. Debauche, O., Mahmoudi, S., Manneback, P., &
Lebeau, F. (2021). Cloud and Distributed Architectures
for Data Management in Agriculture 4.0: Review and
Future Trends. 55 citations.
[6]. Li, X., Zeng, W., Wang, Z., Zhu, D., Xu, J., Yu, W., &
Zhou, J. (2024). GraphAr: An Efficient Storage
Scheme for Graph Data in Data Lakes. 77 citations.
[7]. Tilala, M., Pamulaparthyvenkata, S., Chawda, A. D., &
Benke, A. P. (2022). Explore the Technologies and
Architectures Enabling Real-Time Data Processing
Within Healthcare Data Lakes. 30 citations.
[8]. Armbrust, M., Ghodsi, A., Xin, R., & Zaharia, M.
(2021). Lakehouse: A New Generation of Open
Platforms that Unify Data Warehousing and Advanced
Analytics. 53 citations.
[9]. Mazumdar, D., Hughes, J., & Onofré, J. B. (2023). The
Data Lakehouse: Data Warehousing and More. 31
citations.
[10]. Harby, A., & Zulkernine, F. (2022). From Data
Warehouse to Lakehouse: A Comparative Review. 31
citations.
[11]. Alotaibi, R., Tian, Y., Grafberger, S., Camacho-
Rodríguez, J., Bruno, N., Kroth, B., et al. (2024).
Towards Query Optimizer as a Service (QOaaS) in a
Unified LakeHouse Ecosystem. 41 citations.
[12]. Rucco, C., Longo, A., & Saad, M. (2024). Optimizing
Data Ingestion for Big Data: A Cloud-Based Design
Pattern Approach. 32 citations.
[13]. Behm, A., Palkar, S., Agarwal, U., Armstrong, T.,
Cashman, D., Dave, A., et al. (2022). Photon: A Fast
Query Engine for Lakehouse Systems. 59 citations.

IJISRT25FEB264 www.ijisrt.com 557

You might also like