Storage
Storage
Unit - 2
Prepared By
Dr M Praneesh
Sri Ramakrishna College of Arts & Science
Data Storage
• Data lakes are centralized repositories that store vast amounts of raw,
unprocessed data in its native format, enabling organizations to perform
advanced analytics, machine learning, and data exploration. Technologies like
Data Apache Hadoop, Apache Spark, and AWS Glue are commonly used for building
Lake and managing data lakes, offering support for batch and real-time data
processing, data ingestion, and data governance.
• Distributed file systems like Hadoop Distributed File System (HDFS) and Google
File File System (GFS) provide scalable, fault-tolerant storage solutions for
Syste distributed computing environments. They are optimized for storing and
m processing large datasets across multiple nodes in a distributed computing
cluster, supporting parallel data processing and fault tolerance.
Data Access Frequency
Compatibility Scalability