BDA Mod-1
BDA Mod-1
BDA Mod-1 1
BDA Mod-1 2
BDA Mod-1 3
BDA Mod-1 4
BDA Mod-1 5
BDA Mod-1 6
BDA Mod-1 7
BDA Mod-1 8
BDA Mod-1 9
BDA Mod-1 10
BDA Mod-1 11
Data is information, usually in the form of facts or statistics that one can analyze or use
for further calculations. Data is information that can be stored and used by a computer
program. Data is information presented in numbers, letters, or other form. Data is
information from series of observations, measurements or facts. Data is information
from series of behavioral observations, measurements or facts.
BDA Mod-1 12
BDA Mod-1 13
BDA Mod-1 14
Scalability is the capability of a system to handle the workload as per the magnitude of
the work.
System capability needs increment with the increased workloads.
When the workload and complexity exceed the system capacity, scale it up and scale
it
BDA Mod-1 15
out.
Scalability enables increase or decrease in the capacity of data storage, processing&
analytics.
BDA Mod-1 16
BDA Mod-1 17
In the context of Google Cloud Platform's BigQuery service, the architecture can be
understood through the various layers that contribute to its functionality. BigQuery is a
fully-managed, serverless data warehouse that enables super-fast SQL queries using
the processing power of Google's infrastructure. The primary layers in its architecture
are as follows:
Google Cloud Storage (GCS): This is where the raw data is stored in its native
format, often in Parquet or ORC. GCS acts as the underlying storage layer for
BigQuery and allows for scalable and cost-effective storage of large datasets.
BDA Mod-1 18
Query Processor: This layer handles the execution of SQL-like queries on the
stored data. It's responsible for parsing and optimizing queries, and then
coordinating the parallel execution of these queries across multiple nodes.
5. Metadata Layer:
Web UI and API: BigQuery provides a user-friendly web interface for interactive
querying and exploration. Additionally, it offers APIs for programmatic access,
allowing integration with various applications and services.
Identity and Access Management (IAM): Google Cloud IAM is used for
controlling access to BigQuery resources. It enables fine-grained access
control, ensuring that only authorized users and applications can interact with
specific datasets and tables.
8. Integration Layer:
BDA Mod-1 19
1.6 Data Storage and Analysis
1.6.1 Data Storage and Management: Traditional Systems
Relational Database Management Systems (RDBMS) like MySQL and DB2 are
common.
Involves creating schemas, catalogs, and using Data Definition Language (DDL)
and Data Manipulation Language (DML).
In-memory column and row formats optimize data retrieval for different purposes.
Enterprise data servers integrate data from various sources into a data warehouse.
NoSQL databases don't rely on SQL and feature key/value pairs, hash tables, or
ordered keys.
Does not use JOINS and supports fault-tolerant data storage through replication.
Suited for the unique demands of massive volumes and diverse types of data in Big
Data environments.
BDA Mod-1 20