0% found this document useful (0 votes)
4 views20 pages

BDA Mod-1

The document discusses the concept of data, its storage, and scalability in systems, particularly in the context of Google Cloud Platform's BigQuery service. It outlines the architecture of BigQuery, detailing its various layers including data storage, execution engine, query processing, and security management. Additionally, it contrasts traditional data storage systems with Big Data storage solutions, highlighting the use of NoSQL databases for handling large volumes of semi-structured data.

Uploaded by

Aditya Aryan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

BDA Mod-1

The document discusses the concept of data, its storage, and scalability in systems, particularly in the context of Google Cloud Platform's BigQuery service. It outlines the architecture of BigQuery, detailing its various layers including data storage, execution engine, query processing, and security management. Additionally, it contrasts traditional data storage systems with Big Data storage solutions, highlighting the use of NoSQL databases for handling large volumes of semi-structured data.

Uploaded by

Aditya Aryan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

BDA Mod-1

BDA Mod-1 1
BDA Mod-1 2
BDA Mod-1 3
BDA Mod-1 4
BDA Mod-1 5
BDA Mod-1 6
BDA Mod-1 7
BDA Mod-1 8
BDA Mod-1 9
BDA Mod-1 10
BDA Mod-1 11
Data is information, usually in the form of facts or statistics that one can analyze or use
for further calculations. Data is information that can be stored and used by a computer
program. Data is information presented in numbers, letters, or other form. Data is
information from series of observations, measurements or facts. Data is information
from series of behavioral observations, measurements or facts.

BDA Mod-1 12
BDA Mod-1 13
BDA Mod-1 14
Scalability is the capability of a system to handle the workload as per the magnitude of

the work.
 System capability needs increment with the increased workloads.
 When the workload and complexity exceed the system capacity, scale it up and scale
it

BDA Mod-1 15
out.
 Scalability enables increase or decrease in the capacity of data storage, processing&
analytics.

BDA Mod-1 16
BDA Mod-1 17
In the context of Google Cloud Platform's BigQuery service, the architecture can be
understood through the various layers that contribute to its functionality. BigQuery is a
fully-managed, serverless data warehouse that enables super-fast SQL queries using
the processing power of Google's infrastructure. The primary layers in its architecture
are as follows:

1. Data Storage Layer:

Google Cloud Storage (GCS): This is where the raw data is stored in its native
format, often in Parquet or ORC. GCS acts as the underlying storage layer for
BigQuery and allows for scalable and cost-effective storage of large datasets.

2. Execution Engine Layer:

Dremel Engine: Dremel is the underlying execution engine of BigQuery. It is a


highly scalable, interactive ad-hoc query system designed for analysis of read-
only nested data.

3. Query Execution Layer:

BDA Mod-1 18
Query Processor: This layer handles the execution of SQL-like queries on the
stored data. It's responsible for parsing and optimizing queries, and then
coordinating the parallel execution of these queries across multiple nodes.

4. Storage Management Layer:

Capacitor: This layer manages the storage of columnar, compressed data in a


way that allows for efficient query processing. Capacitor is responsible for
managing the storage and retrieval of data during query execution.

5. Metadata Layer:

Catalog: The metadata layer, or catalog, manages the metadata associated


with datasets, tables, and other objects in BigQuery. It keeps track of the
schema, table locations, and other essential information needed for query
planning and execution.

6. User Interface (UI) Layer:

Web UI and API: BigQuery provides a user-friendly web interface for interactive
querying and exploration. Additionally, it offers APIs for programmatic access,
allowing integration with various applications and services.

7. Security and Access Control Layer:

Identity and Access Management (IAM): Google Cloud IAM is used for
controlling access to BigQuery resources. It enables fine-grained access
control, ensuring that only authorized users and applications can interact with
specific datasets and tables.

8. Integration Layer:

Integration with Other GCP Services: BigQuery integrates seamlessly with


other Google Cloud Platform services, facilitating data transfer, analysis, and
visualization. For example, it can be integrated with Google Data Studio for
creating interactive dashboards or with Cloud Composer for orchestrating data
workflows.

Understanding these layers provides a comprehensive view of how BigQuery functions


within the broader architecture of Google Cloud Platform's big data services.

BDA Mod-1 19
1.6 Data Storage and Analysis
1.6.1 Data Storage and Management: Traditional Systems

Traditional systems handle structured or semi-structured data.

Relational Database Management Systems (RDBMS) like MySQL and DB2 are
common.

Utilizes Structured Query Language (SQL) for data management.

Involves creating schemas, catalogs, and using Data Definition Language (DDL)
and Data Manipulation Language (DML).

Distributed Database Management Systems (DDBMS) facilitate cooperation


between databases across a network.

In-memory column and row formats optimize data retrieval for different purposes.

Enterprise data servers integrate data from various sources into a data warehouse.

Supports business processes like analytics, reporting, and business intelligence.

1.6.2 Big Data Storage

Big Data storage often involves NoSQL databases.

Handles semi-structured data with flexible data models.

NoSQL databases don't rely on SQL and feature key/value pairs, hash tables, or
ordered keys.

Does not use JOINS and supports fault-tolerant data storage through replication.

May relax ACID rules during transactions.

Suited for the unique demands of massive volumes and diverse types of data in Big
Data environments.

BDA Mod-1 20

You might also like