0% found this document useful (0 votes)

9 views

Data Engineering Notes

The document provides an overview of data concepts, including types of data, data pipelines, storage solutions, and big data characteristics. It discusses data integrity, cloud computing, and legislation related to data and AI, as well as the challenges of big data privacy and data modeling techniques. The content also covers various data models, including conceptual, logical, and physical data modeling, along with relevant diagrams and examples.

Uploaded by

joshua.stevenson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Data Engineering Notes

Uploaded by

joshua.stevenson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Lecture 1 : Introduction

Agenda:

Data

Data Pipeline

Data Storage

Big Data

Characteristics of Big Data

Types of Data

Categorical Numerical

TYPE DESCRIPTION EXAMPLE TYPE DESCRIPTION EXAMPLE

Labels w/ no Discrete Counts # Students

Nominal Male / Female
quantitative value
Continuous
Discrete and Undergrad / Interval measurable w/ no Temperature
Ordinal
ordered labels Postgrad true zero

Continuous
Ratio measurable w/ true Height
zero

Data Pipeline
Data Storage for Analytics

Database Data Lake Data Warehouse

Data Structured Raw & Unstructured Structured

Data Processing Schema-on-write Schema-on-read Schema-on-write

Scalability Varies High High

Cost $0 $$ $$$

Users Anyone Data Scientists Business Users

- Exploratory Analysis
- Real-time data processing - Reporting
- Machine Learning
Use Cases - High transactional throughput - Analytics
- Data mining
- Strong data consistency - BI
- Data science research

Data Integrity

ACID Properties in DBMS:

A→ Atomicity The entire transaction takes place at once or doesn't happen at all

C→ Consistency The database must be consistent before and after the transaction

I→ Isolation Multiple transactions occur independently without interference

D→ Durability The changes of a successful transaction occurs even if the system failure occurs

Cloud Computing

Cloud Computing solutions mainly come in 3 forms:

IaaS Infrastructure as a Service

PaaS Platform as a Service

SaaS Software as a Service

The 6 V’s
Volume Variety Velocity

Refers to the quantity of data Complexity of the data The speed at which data is
produced or gathered (Gartner) generated and processed
Different data types and
Storage and processing needs to sources frequency of generation,
be addressed
Data Engineers need tools that frequency of handling,
handle a variety of data formats recording
in different locations
publishing shift from batch
processing to online
processing

Veracity Valence Value

Data Quality Valence implies connectedness Value is the ultimate goal of data
science and engineering
Origin Two data items are connected
when there is some relationship Value refers to the valuable
Reliability of source, previous
between them insights gained from the ability to
processing
investigate and identify new
Valence → Data connections /
Volatility patterns and trends from high
Total number of possible
Validity volume cross-platform systems
connections

Higher valence, higher data

density which has an impact on
the efficiency of data analysis
techniques
Lecture 2: The Big Data Landscape

Agenda:

Data architecture best practice

Opensource big data technologies

Data and AI legislation

Data Architecture

”A description of the structure and interaction of

the enterprise’s major types and sources of data, logical data assets, physical data assets, and data management resources”

AWS Well-Architected Framework → Six Important Characteristics of Data Systems

Pillars
Ability to increase system resources to improve performance and
Scalability
1. Operational Excellence handle the demand

2. Security Elasticity Scale dynamically

3. Reliability Availability % of time in operable state

Reliability Probability of expected functionality during a specified interval

4. Performance

5. Cost optimization

6. Sustainability

Opensource Big Data Technologies

Used when the data volume exceeds the available memory and ideal for data exploration, filtration, sampling, and summarization.
Hadoop
Components include HDFS, MapReduce and YARN

Apache Spark Faster alternative to MapReduce that can handle batch and real-time data and is flexible to work with HDFS and Cassandra

Apache Cassandra Processes structured data with fault-tolerance on cloud infrastructure and commodity hardware

NoSQL database management system that stores data in flexible, JSON-like documents, making it easy to handle and scale large volumes
MangoDB
of diverse and unstructured data

Data Legislation

The Processing of Personal The POPI Act

Information is Governed By
Requires companies and individuals who handle personal information to take
Location Acronym Policy appropriate measures to protect it, involving both Cybersecurity safeguards and
policy development to prevent unlawful processing
Protection of
RSA POPI Act Personal Mandates that responsible parties disclose the purpose of data collection and
Information Act usage upfront, requiring informed consent from data subjects for any applications,
General Data especially in multiple AI uses, ensuring transparency and compliance
EU GDPR Protection
Regulation

The AI Act
The AI Act is a proposed European law on Artificial Intelligence -
The First law on AI by a major regulator anywhere

AI that contradicts EU values is prohibited (Title II, Article 5)

Example:
Subliminal Manipulation
An inaudible sound is played in truck drivers’ cabins to push them to drive
longer than healthy and safe. AI is used to find the frequency maximising this
- Resulting in physical / psychological harm
effect on drivers

Exploitation of Children or Mentally disabled persons Example:

A doll with an integrated voice assistant encourages a minor to engage in
- Resulting in physical / psychological harm progressively dangerous behaviour or challenges in the guise of a fun game

Example:
General Purpose Social Scoring An AI system identifies at-risk children in need of social care based on
insignificant or irrelevant social ‘misbehaviour’ of parents

Example:
Remote biometric identification for law enforcement purposes in
All faces captured live by video cameras checked, in real time, against a
publicly accessible spaces (with exceptions)
database to identify a terrorist

The AI Act requires high-quality, Organizations must maintain The AI Act strict data privacy and
unbiased data for training AI systems, transparency about the data sources security measures affecting how big data
which necessitates robust data and processing methods used in AI is collected, stored, and processed,
governance practices to ensure data models, leading to more rigorous ensuring compliance with privacy
accuracy, consistency, and fairness in big documentation and auditing of big regulations
data environments. data processes
Lecture 3: Data Models 1

Agenda:

Challenges of big data privacy

Data sources

Definition of data modelling

Structured vs unstructured data

Constraints

Big Data Privacy

Big Data can contain sensitive and valuable business information (assets)

Private Information that must be protected can be in the form of:

Financial records

Communication → (email, texts, phone calls)

Medical History

Educational Records

Job History

Principle of Least Privilege

Requires that a person or system only be given the privileges and data necessary to complete immediate tasks
required of them and nothing more

Migrating to the cloud is not a security guarantee, it follows a shared responsibility model

Cloud vendor → provides physical security

User → responsible for securing the applications and systems in the cloud

Data Security

Data security protects digital information in the data pipeline from unauthorized access, corruptions or theft

Protection is implemented via measures such as

Physical security → Hardware and storage devices

Logical security → Software applications

Security measures are complemented by administrative access controls & organizational policies and procedures

Common security measures

Encryption

Data erasure instead of standard data wiping

Masking personally identifiable information

Data resiliency

Data security strategies need to be comprehensive, incorporating

Application security updates

Backups

Employee education

Network endpoint security monitoring and controls → (multifactor authentication)

Data Encryption

Cryptographic Key → A string of characters (mathematically generated) that is fed to a cryptographic algorithm
(encryption/decryption) to secure data

Encryption of data using one or two keys is known as symmetric and asymmetric encryption

Cryptographic functions can be used for

Encryption & Decryption → Scrambles text to ciphertext and vice versa

Authentication → Assumes only authorized users have private keys in asymmetric encryption

Digital Signatures → Provide digital authenticity

Asymmetric Encryption Symmetric Encryption

SSL Connection Handshake

sequenceDiagram
participant Client
participant Server

Client->>Server: 1. Client issues secure session request<br>(https://someserver.org/someda

Server-->>Client: 2. Server sends X.509 certificate containing server's public key
Client--> Server: 3. Client authenticates certificate against list of known certificate au
Client->>Server: 4. Client generates random symmetric key and<br>encrypts it using server'
Server->>Client: 5. Client and server now both know the symmetric key<br>and encrypt end-u

Sources of Big Data

Machine Person Organization

Real-time sensors in industrial Social media data Transactional information in

machinery or vehicles,, that database
Status updates
log/track activity over time
Structured data in data
Tweets
Environmental Sensors warehouse
Phots
Medical / health trackers
Media

Data Modelling in Data Engineering

Business Blueprint Data Quality System Design Collaboration

requirements
The model The model A visual Modelling
During the serves as a ensures data is representation facilitates
modelling common structured of the data discussion among
process, business standard for infrastructure stakeholders,
Correctly
requirements for the data improves the promoting better
storage and structure that Consistently understanding understanding
processing are needs to be → resolves of components leading to
identified. implemented. inconsistencies and their possible
and interactions. improvement in
The different Providing a
ambiguities the design
data types, their understanding Potential issues
relationships and of the Efficiently can then be
the business rules infrastructure → without identified and
that apply to design and redundancy addressed
them are defined speeds up early, reducing
development costs

Data
governance
policies and
procedures can
then be
established

Structured vs Unstructured Data

T ypical
Characteristics Resides in Generated by Examples
Applications

• Dates
• Phone numbers
• Social security
numbers
• Airline reservation
• Relational • Credit card
• Predefined data models systems
Structured databases • Humans or numbers
• Usually text-only • Inventory control
Data • Data machines • Customer names
• Easy to search • CRM systems
warehouses • Addresses
• ERP systems
• Product names
and numbers
• Transaction
information

• Text files
• No predefined data • Applications • Word processing • Reports
model • NoSQL • Presentation • Email messages
Unstructured • May include text, images, databases • Humans or software • Audio files
Data sound, video, or other • Data machines • Email clients • Video files
formats warehouses • Tools for viewing • Images
• Difficult to search • Data lakes or editing media • Surveillance
imagery

Data Constraints

Logical statements that must hold for the data

Constraint Example

Value Age ≥ 18

Uniqueness ID number must be unique → NB for Primary Keys

Cardinality Many-to-one → Many students can attend one university

Type Age must be numeric

Domain Gender must be M or F

Structural # of rows = # of columns for data table

Lecture 4: Data Models 2

Agenda:

Definitions

Levels of data modelling

Conceptual data modelling

UML Diagrams

Information Systems

A system has a boundary outside of which are external entities (people ,systems) that interact with the system in
focus

Within this system are components (systems, people) that implement that system’s behaviour

These interactions and internal system behaviours involve data being exchanged, transformed and stored

Any system exists within a broader context that needs to be understood and the system should be defined in terms of
how it delivers functionality and services within this context, usually in the form of requirements
Levels of Data Modelling

Comparison As-Is To-Be

ERD Feature Conceptual Logical Physical

Entity (name) Yes Yes Yes

Relationship Yes Yes Yes

Column Yes Yes

Column Type Optional Yes

Primary Key Yes

Foreign Key Yes

Conceptual Data Modelling

Goal → Capture satellite view of business requirements of an organization

What problems are involved with the business and require immediate solutions?

What are the core concepts of these problems?

How are these problems related to one another?

Is there any scoping information available?

Logical Data Modelling

Detailed structures of the system Questions to be asked:

Entities, attributes, and relationships What should the collection look like?

Independent of database management system How can we make information secure?

How should we store history information?

How can we answer business questions in a shorter

amount of time

What is the optimal way to perform sharding?

Physical Data Modelling

Visual illustration of the physical structure of the actual database

Includes table structures, column names, column data types, primary keys, column constraints, relationships
between tables

Entities, attributes, key groups, primary key, foreign keys and relationships to each other

Specific to a DBMS

Use Case Diagrams

Use case Defining the An actor An association A use case is a set
diagrams show system represents a illustrates the of events that
the expected boundary role played by participation of occurs when an
behaviour of the determines an outside the actor in the actor uses a
system what is object. One use case system to
considered object may complete a
external or play several process.
internal to the roles and, Normally, a use
system therefore, is case is a
represented by relatively large
several actors process, not an
individual step or
transaction
Lecture 5: Data Models 3

Agenda:

UML Use Case Diagrams

Data Flow Diagrams

Entity Relationship Diagram

Data Flow Diagram

"A process model used to depict the flow of data through a
system and the work or processing performed by the
system"

---
title: The process for creating a DFD
---
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#0a0a0a',
'primaryTextColor': '#fff',
'primaryBorderColor': '#fff',
'lineColor': '#fff',
'tertiaryTextColor':'#fff'
}
}
}%%

graph LR
step1["Ìdentify business data objects`"]
step2["Ìdentify processes`"]
step3["Ìdentify external enetities`"]
step4["`Tie diagram together`"]
step1 --> step2 --> step3 --> step4

DFD Elements DFD Template

DFD Example

Context-level Data Flow Diagram

Entity Relationship Diagram

"A data model utilizes several notations to depict data in
terms of the entities and relationships described by that
data"

ERD Elements

Entity A class of persons, places, objects, events, or concepts about which we need to capture and store data

Entity Instance A single occurrence of an entity

Attribute A descriptive property or characteristic of an entity. Also known as element, property, or field

Relationship A natural business association between one or more entities

The minimum and maximum number of occurrences of one entity that may be related to a single occurrence of the
Cardinality
other entity

Cardinality Interpretation:
Exactly one (one and only one):

Minimum Instances: 1

Maximum Instances: 1

Graphic Notation: Single line with a vertical bar ( | ), indicating a

mandatory single relationship. OR

Zero or one:

Minimum Instances: 0

Maximum Instances: 1

Graphic Notation: A circle ( o ) indicating optionality, followed by a

line with a vertical bar ( | ), showing that the relationship can either
not exist or exist exactly once.

One or more:

Minimum Instances: 1

Maximum Instances: Many (>1)

Graphic Notation: A line with a vertical bar ( | ) followed by a

three-pronged fork ( < ), indicating a mandatory relationship with one
or more instances.

Zero, one, or more:

Minimum Instances: 0

Maximum Instances: Many (>1)

Graphic Notation: A circle ( o ), indicating optionality, followed by a

three-pronged fork ( < ), indicating that the relationship can exist with
zero, one, or many instances.

More than one:

Minimum Instances: More than one (>1)

Maximum Instances: More than one (>1)

Graphic Notation: A three-pronged fork ( < ) with both ends

representing multiple instances.
Lecture 6: Systems Development

Agenda:

Systems development & Systems Development Life Cycle (SDLC) Definitions

SDLC roles and deliverables

Linear vs evolutionary life cycles

Life cycle examples

How to choose an approach

Systems Development

Systems development is the process of taking a set of business requirements and,

through a series of structured stages, translating these into into an operational IT system.

Scopes of System Development

Hardware

Software

Data

Procedures

People

Systems Development Life Cycle

A system development lifecycle (SDLC) is a framework describing a process for understanding, planning, building,
testing, and deploying an information system

Main Stages of System Development

System Development In A Wider Context

Example Roles

Business Roles: Project Roles: Technical Roles: Implementation &

Support Roles:
Sponsor or senior project manager technical architect
responsible owner release manager
team leader solution developer
business analyst database
work package solution tester
administrator
domain expert manager
system administrator
end users
Example Deliverables

models such as: requirement documents

class or entity relationship models test plans

use case models test scripts

process models implementation plans

state transition diagrams system components & working software

sequence diagrams

component diagrams

Linear vs Evolutionary Life Cycles

Linear SDLCs Evolutionary SDLCs

A linear approach describes a sequence of tasks that are An evolutionary approach evolves the solution through
completed in order, only moving to the next step once the progressive versions, each more complete than the last,
previous step is complete. E.g. waterfall, V-model, and often uses a prototyping approach to development.
incremental E.g. iterative, spiral

Strengths of Linear SDLCs: Strengths of Evolutionary SDLCs:

breaks down problem into distinct stages, each early delivery of value to the customer - either
with a clear purpose working versions or knowledge of project risk

everything is agreed in advance of being used, with copes well with complex requirements - fast changing,
no need to revisit later uncertain or

provides structure to complex systems, making

distributed or outsourced development easier to complicated
manage
encourages collaboration with the users throughout, so
suits a very detailed design decomposition and customer buy-in is higher
detailed specification approach
allows ‘just enough’ to be done knowing that it can be
locking down each stage in advance of the next refined later on
makes it easier to control cost and scope creep

simple and intuitive for smaller problems (people

who don’t think they are using a SDLC are
probably taking a linear approach)

Weaknesses of Evolutionary SDLCs:

Weaknesses of Linear SDLCs:
can be hard to project manage due to multiple
depends greatly on each stage being done properly,
iterations; especially hard with multiple iteration
as it is hard or impossible to go back and change it
teams and complex products
later
without careful management, the evolving
for complex problems, the time required to be
requirements can result in scope creep
thorough at each stage leads to long timescales
overall costs can be higher due to the additional
doesn’t cope well with changing requirements
integration and test across multiple teams and
customer or business value is not available until iterations
the end
easy to over-promise on early functionality
if the project is stopped early there is little of
business value to show for the cost

Model Types

Waterfall Model The V-Model The Extended V-Model

Incremental Life Cycle Iterative Life Cycle Boehm’s Spiral Life Cycle

How to choose an approach

Complexity of problem

Team experience

Stability of requirements

Delivery speed, and quality

Customer involvement

Uniqueness

High regulatory requirements

employee_id

name home_state

state_code

name home_state

state_code

employee_id

employee_roles

employees

jobs
home_state

state_code

home_state

employee_roles

employees

jobs

states
Lecture 8: RDBMS & SQL

Agenda:

ACID

SQL

Example Queries

ACID

ACID is a database design principle which defines how transactions are managed specifically in a relational database

Atomicity Consistency Isolation Durability

All operations will Ensures that the Ensures that the Ensures that the results
always succeed or fail database will always results of a of an operation are
completely remain in a consistent transaction are not permanent
state by ensuring visible to other
No partial transactions Once a transaction has
that only data that operations until it is
been committed, it
conforms to the complete
cannot be rolled back.
constraints of the
Irrespective of any
database schema can
system failure
be written to the
database

CRUD Operations

Create → SQL INSERT

Read → SQL SELECT

Update → SQL UPDATE / MERGE

Delete → SQL DELETE

Lecture 9: NoSQL Databases 1

Agenda:

Definition of NoSQL DBMS

Characteristics of NoSQL

The CAP Theorem

The BASE Principle

Four Types of NoSQL Storage Devices

Not-only SQL (NoSQL)

is a term that refers to using multiple data storage technologies

Polyglot Persistence
within a single system, in order to meet varying data storage needs

is a non-relational database that is highly scalable, fault-tolerant

Not-only SQL (NoSQL) database and specifically designed to house semi-structured and structured
data

is a type of Database partitioning in which a large database is

divided or partitioned into smaller data and different nodes. These
Sharding
shards are not only smaller, but also faster and hence easily
manageable.

Characteristics of NoSQL

Characteristic Description

Schema-less Data Allows storage of data without a predefined schema, enabling flexibility in handling diverse and evolving
Model data types.

Scale Out Rather T han Supports horizontal scaling by adding more nodes to the database cluster, as opposed to upgrading a
Scale Up single node's hardware.

Built on cluster-based technologies that ensure fault tolerance and high availability by replicating data
Highly Available
across multiple nodes.

Lower Operational Often based on open-source platforms with no licensing fees and designed to run on cost-effective
Costs commodity hardware.

Ensures that while data may not be immediately consistent across nodes after a write, it will eventually
Eventual Consistency
reach a consistent state.

Prioritizes availability and scalability (BASE model) over strict consistency (ACID model), with databases
BASE, Not ACID
designed to eventually reach consistency.

Data access is typically through APIs, including RESTful APIs, with some databases offering SQL-like
API-Driven Data Access
query capabilities.

Auto Sharding and Automatically partitions (shards) data across multiple nodes and replicates it to ensure high availability
Replication and support horizontal scaling.

Comes with built-in caching mechanisms, reducing the need for external caching solutions like
Integrated Caching
Memcached.

Distributed Query
Maintains consistent query performance across multiple shards in a distributed environment.
Support

Supports using multiple storage technologies (NoSQL and RDBMS) within the same application, allowing
Polyglot Persistence
for a flexible approach to data persistence.
Characteristic Description

Stores de-normalized, aggregated data to eliminate the need for complex joins and mappings, although
Aggregate-Focused
graph databases are an exception to this approach.

CAP Theorem

Every node provides the most recent state,

Consistency
or does not provide a state at all

Every node has constant read and write

Availability
access

Partition The system works despite partitions in the

Tolerance network

Partition Tolerant + Partition Tolerant + Available + Consistent =

Available = Not Consistent Consistent = Not Not Partition Tolerant
Available
If availability (A) and partition If consistency (C) and availability (A)
tolerance (P) are required, then If consistency (C) and partition are required, available nodes need to
consistency (C) is not possible because tolerance (P) are required, nodes communicate to ensure consistency
of the data communication cannot remain available (A) as the (C). Therefore, partition tolerance (P)
requirement between the nodes. So, nodes will become unavailable while is not possible.
the database can remain available (A) achieving a state of consistency (C).
but with inconsistent results.

BASE Principle

BASE is a database design principle based on the CAP theorem and leveraged by database systems that use
distributed technology.

BASE stands for:

basically available → database will always acknowledge a client’s request, either in the form of the requested
data or a success/failure notification

soft state → database may be in an inconsistent state when data is read; thus, the results may change if the
same data is requested again

eventual consistency → state in which reads by different clients, immediately following a write to the
database, may not return consistent results. Database only attains consistency once the changes have been
propagated to all nodes

IF BASE → Availability (A) + Partition Tolerant (P)

NoSQL Storage Devices

Key-value

Document

Column-family

Graph

Key-value Databases Document Storage Devices

Only need key to retrieve value

Key-value pairs, in a document
No fixed schema for value
No fixed schema for value
Quick to read and write using RAM
Value NOT opaque to database
Highly scalable
Value has fields which can be queried
Value opaque to database

No partial updates or queries to value’s “attributes”

Column-family Devices Graph Storage Devices

Store data similarly to traditional RDBMS but group

related columns together in a row, resulting in column-
families

Each column can be a collection of related columns

itself, referred to as a super-column Used to persist inter-connected entities

Emphasis on storing the linkages between entities

Entities → stored as nodes (also called vertices)

Linkages → stored as edges

Lecture 12: Introduction to GCP

Agenda:

Intro to GCP

Services

Coupons

ML Use Cases

Cloud Computing

Cloud Computing — the delivery of computing services over the internet

Benefits:

scalability

flexibility

cost savings

global reach

Google Cloud Platform (GCP)

Suite of cloud computing services offered by Google

Helps businesses solve complex problems using technology and innovation

Google Cloud Essentials

1. What is Cloud Computing?

Definition: Cloud computing is explained as getting tasks done using someone else's computers. Specifically,
using Google Cloud means utilizing Google's computers.

Capabilities: Google Cloud enables developers to build and host applications, store data, and analyze data using
Google's scalable and reliable infrastructure.

2. Google Data Centers

Overview: Google's data centers, located worldwide, house the computing, storage, and networking resources
that power Google's services like Search, Gmail, and YouTube.

Developer Access: Google Cloud shares these resources with developers, allowing them to build and run
applications on Google’s infrastructure.

3. Scalability with Cloud Spanner

Retail Example: A scenario is presented where a retailer needs to manage inventory, pricing, and demand
across thousands of stores. Handling seasonal spikes, especially during holidays, is highlighted as a significant
challenge.

On-Premise vs. Cloud: Managing an on-premise database would be costly and inefficient due to the need for
provisioning additional hardware. Conversely, using Google Cloud's managed services, like Cloud Spanner, allows
for efficient scaling and cost management.

Google Cloud Products and Services

1. Running Code

Compute Engine: Provides virtual machines that run in Google data centers.

Cloud Run: Enables the deployment of containerized applications on a fully managed serverless platform.

App Engine: Supports the deployment of highly scalable web apps and back-end services.

2. Storing Data

Cloud Storage: Ideal for unstructured data such as images, videos, and audio files.

Cloud SQL: Offers managed versions of MySQL, Postgres, and SQL Server, allowing for familiar relational
database management without the hassle of self-management.

Cloud Firestore: A NoSQL, document-based, real-time database, popular in scenarios where up-to-date data is
crucial, like gaming.

3. AI & Machine Learning Tools

Vision AI: Provides an API for image analysis, including object detection, landmark recognition, text extraction,
and more.

Cloud Natural Language: Analyzes text to extract information about entities, sentiment, syntax, and content
categorization.

Vertex AI: Google Cloud’s platform for building, deploying, and managing custom machine learning models.

Use Case for Google Cloud

Scenario: Developing a social networking site for dogs.

Cloud Storage: Used to store profile photos.

Cloud Firestore: Stores profile information such as dog names, locations, and hobbies.

Vision AI: Analyzes profile photos to detect objects like balls, stuffed animals, and bones, providing data insights.

Cloud Run: Deploys the application to the web, allowing for automatic scaling as the user base grows.

Core Services Data & Analytics Services

Compute Big data & Machine Learning

Storage Data storage and Management

Networking Data Analytics and Visualization

Identity & security services

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Planning For Big Data PDF
100% (1)
Planning For Big Data PDF
88 pages
Security in Big Data
No ratings yet
Security in Big Data
17 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Big Data Security I
No ratings yet
Big Data Security I
8 pages
BDA Assignment L9
No ratings yet
BDA Assignment L9
7 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Data Security Best Practices
No ratings yet
Data Security Best Practices
20 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
What Is Big Data
No ratings yet
What Is Big Data
18 pages
Over View of Security Issues in Big Data Analytics
No ratings yet
Over View of Security Issues in Big Data Analytics
70 pages
Data Security Best Practices
No ratings yet
Data Security Best Practices
27 pages
BDT..U1_PPT_08112023
No ratings yet
BDT..U1_PPT_08112023
71 pages
Super Important Questions For BDA
100% (1)
Super Important Questions For BDA
26 pages
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
No ratings yet
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
11 pages
unit 1 b tech 3 year bd
No ratings yet
unit 1 b tech 3 year bd
10 pages
3
No ratings yet
3
12 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Data Analytics Notes Unit 1
No ratings yet
Data Analytics Notes Unit 1
23 pages
Big Data Analytics_Lecture Slides
No ratings yet
Big Data Analytics_Lecture Slides
72 pages
UNIT II - Emerging Technology
No ratings yet
UNIT II - Emerging Technology
22 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
Lecture 2-3
No ratings yet
Lecture 2-3
65 pages
test 1 big data
No ratings yet
test 1 big data
17 pages
Big Data Technologies For Cybersecurity: Presented By: Ali Chouman Student ID: 79100008
No ratings yet
Big Data Technologies For Cybersecurity: Presented By: Ali Chouman Student ID: 79100008
14 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
3 pages
ICT703_Big Data_Assessment 1_ Case Study Analysis Report_1.2
No ratings yet
ICT703_Big Data_Assessment 1_ Case Study Analysis Report_1.2
14 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
BDA-Unit-1 (2)
No ratings yet
BDA-Unit-1 (2)
39 pages
Bigdata Notes
No ratings yet
Bigdata Notes
136 pages
1_introduction_to_big_data_management_and_processing
No ratings yet
1_introduction_to_big_data_management_and_processing
46 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Seminar_Report kiran
No ratings yet
Seminar_Report kiran
14 pages
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
100% (1)
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
75 pages
Digitization Week 3
No ratings yet
Digitization Week 3
13 pages
Introduction To Big Data-0
No ratings yet
Introduction To Big Data-0
77 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Chapter 5 - Big Data Implementation Part 3 (Security)
No ratings yet
Chapter 5 - Big Data Implementation Part 3 (Security)
28 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Att jwjs3faEhHI5rgtCdQQBaRQmkLa2mglbwpKPmho79h4
No ratings yet
Att jwjs3faEhHI5rgtCdQQBaRQmkLa2mglbwpKPmho79h4
20 pages
Defining The Big Data Architecture Framework
No ratings yet
Defining The Big Data Architecture Framework
55 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Big Data
No ratings yet
Big Data
51 pages
Big_Data_Security_Management_Issues
No ratings yet
Big_Data_Security_Management_Issues
4 pages
Soumya Ray + Paper 1 + Published Version
No ratings yet
Soumya Ray + Paper 1 + Published Version
22 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
Topic 2 - Features of Big Data
No ratings yet
Topic 2 - Features of Big Data
34 pages
Module 1
No ratings yet
Module 1
21 pages
BDA Assign 1
No ratings yet
BDA Assign 1
21 pages
Unit I
No ratings yet
Unit I
61 pages
Uk Sganalytics Com Blog Six Critical Steps To Modernize Data Security
No ratings yet
Uk Sganalytics Com Blog Six Critical Steps To Modernize Data Security
12 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
BDA UNIT-1 (Lecture-1)
No ratings yet
BDA UNIT-1 (Lecture-1)
5 pages
Big-Data-A-Comprehensive-Overview
No ratings yet
Big-Data-A-Comprehensive-Overview
25 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cognitive and Autonomic Cyber Defense
No ratings yet
Cognitive and Autonomic Cyber Defense
22 pages
Unlocking the Potential of Big Data to Support Tactical Performance Analysis in Professional Soccer a Systematic Review
No ratings yet
Unlocking the Potential of Big Data to Support Tactical Performance Analysis in Professional Soccer a Systematic Review
17 pages
Resume Sample For US STAFFING
No ratings yet
Resume Sample For US STAFFING
6 pages
Data Analytics Strategy Toolkit - Overview
No ratings yet
Data Analytics Strategy Toolkit - Overview
18 pages
Enterprise AI
100% (1)
Enterprise AI
527 pages
Ey The Retailer March 2017
No ratings yet
Ey The Retailer March 2017
24 pages
Geospatial Intelligence Applications and Future Trends 1st Edition Fatimazahra Barramou El Hassan El Brirchi Khalifa Mansouri Youness Dehbi pdf download
100% (1)
Geospatial Intelligence Applications and Future Trends 1st Edition Fatimazahra Barramou El Hassan El Brirchi Khalifa Mansouri Youness Dehbi pdf download
49 pages
Factors Influencing Big Data Decision-Making Quality
No ratings yet
Factors Influencing Big Data Decision-Making Quality
8 pages
Dgtin Self Revision
No ratings yet
Dgtin Self Revision
22 pages
Pre-Seen Debrief Sep 23 Day 2 Summary by Kashif Kamran
No ratings yet
Pre-Seen Debrief Sep 23 Day 2 Summary by Kashif Kamran
4 pages
Big Data
No ratings yet
Big Data
10 pages
OM50 WELCOME PreCAT
No ratings yet
OM50 WELCOME PreCAT
12 pages
Machine Learning and Big Data Concepts Algorithms Tools and Applications Ebook All Chapters PDF
100% (6)
Machine Learning and Big Data Concepts Algorithms Tools and Applications Ebook All Chapters PDF
12 pages
Big_Data_and_Environmental_Sustainabilit
No ratings yet
Big_Data_and_Environmental_Sustainabilit
34 pages
The Industrial Internet of Things Driving The Big Data Boom
No ratings yet
The Industrial Internet of Things Driving The Big Data Boom
27 pages
Big Data Practical Connection Assignment
100% (1)
Big Data Practical Connection Assignment
3 pages
IOT UNIT-1 NOTES
No ratings yet
IOT UNIT-1 NOTES
18 pages
Module 5
No ratings yet
Module 5
14 pages
Intelligent Computing On IoT 2.0, Big Data Analytics, and Block Chain Technology 1st Edition by Mohammad
No ratings yet
Intelligent Computing On IoT 2.0, Big Data Analytics, and Block Chain Technology 1st Edition by Mohammad
418 pages
Mobile Application Review of Related Literature
100% (2)
Mobile Application Review of Related Literature
5 pages
Robo Advisort and Its Potential Behavioural Bias
No ratings yet
Robo Advisort and Its Potential Behavioural Bias
9 pages
PDF Data Control Jean-Louis Monino download
100% (2)
PDF Data Control Jean-Louis Monino download
41 pages
Gourley Declaration
No ratings yet
Gourley Declaration
36 pages
Data Science Assesment 1-2
No ratings yet
Data Science Assesment 1-2
6 pages
Full Download On the Move to Meaningful Internet Systems OTM 2018 Conferences Confederated International Conferences CoopIS C TC and ODBASE 2018 Valletta Malta October 22 26 2018 Proceedings Part I Hervé Panetto PDF DOCX
100% (1)
Full Download On the Move to Meaningful Internet Systems OTM 2018 Conferences Confederated International Conferences CoopIS C TC and ODBASE 2018 Valletta Malta October 22 26 2018 Proceedings Part I Hervé Panetto PDF DOCX
55 pages
Btech Oe 8 Sem Big Data Koe 097 2023
No ratings yet
Btech Oe 8 Sem Big Data Koe 097 2023
2 pages
Big Data PDF
No ratings yet
Big Data PDF
17 pages
Bda Module-1
No ratings yet
Bda Module-1
55 pages
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
No ratings yet
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
3 pages