0% found this document useful (0 votes)
9 views

Data Engineering Notes

The document provides an overview of data concepts, including types of data, data pipelines, storage solutions, and big data characteristics. It discusses data integrity, cloud computing, and legislation related to data and AI, as well as the challenges of big data privacy and data modeling techniques. The content also covers various data models, including conceptual, logical, and physical data modeling, along with relevant diagrams and examples.

Uploaded by

joshua.stevenson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Engineering Notes

The document provides an overview of data concepts, including types of data, data pipelines, storage solutions, and big data characteristics. It discusses data integrity, cloud computing, and legislation related to data and AI, as well as the challenges of big data privacy and data modeling techniques. The content also covers various data models, including conceptual, logical, and physical data modeling, along with relevant diagrams and examples.

Uploaded by

joshua.stevenson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Lecture 1 : Introduction

Agenda:

Data

Data Pipeline

Data Storage

Big Data

Characteristics of Big Data

Types of Data

Categorical Numerical

TYPE DESCRIPTION EXAMPLE TYPE DESCRIPTION EXAMPLE

Labels w/ no Discrete Counts # Students


Nominal Male / Female
quantitative value
Continuous
Discrete and Undergrad / Interval measurable w/ no Temperature
Ordinal
ordered labels Postgrad true zero

Continuous
Ratio measurable w/ true Height
zero

Data Pipeline
Data Storage for Analytics

Database Data Lake Data Warehouse

Data Structured Raw & Unstructured Structured

Data Processing Schema-on-write Schema-on-read Schema-on-write

Scalability Varies High High

Cost $0 $$ $$$

Users Anyone Data Scientists Business Users

- Exploratory Analysis
- Real-time data processing - Reporting
- Machine Learning
Use Cases - High transactional throughput - Analytics
- Data mining
- Strong data consistency - BI
- Data science research

Data Integrity

ACID Properties in DBMS:

A→ Atomicity The entire transaction takes place at once or doesn't happen at all

C→ Consistency The database must be consistent before and after the transaction

I→ Isolation Multiple transactions occur independently without interference

D→ Durability The changes of a successful transaction occurs even if the system failure occurs

Cloud Computing

Cloud Computing solutions mainly come in 3 forms:

IaaS Infrastructure as a Service

PaaS Platform as a Service

SaaS Software as a Service

The 6 V’s
Volume Variety Velocity

Refers to the quantity of data Complexity of the data The speed at which data is
produced or gathered (Gartner) generated and processed
Different data types and
Storage and processing needs to sources frequency of generation,
be addressed
Data Engineers need tools that frequency of handling,
handle a variety of data formats recording
in different locations
publishing shift from batch
processing to online
processing

Veracity Valence Value

Data Quality Valence implies connectedness Value is the ultimate goal of data
science and engineering
Origin Two data items are connected
when there is some relationship Value refers to the valuable
Reliability of source, previous
between them insights gained from the ability to
processing
investigate and identify new
Valence → Data connections /
Volatility patterns and trends from high
Total number of possible
Validity volume cross-platform systems
connections

Higher valence, higher data


density which has an impact on
the efficiency of data analysis
techniques
Lecture 2: The Big Data Landscape

Agenda:

Data architecture best practice

Opensource big data technologies

Data and AI legislation

Data Architecture

”A description of the structure and interaction of


the enterprise’s major types and sources of data, logical data assets, physical data assets, and data management resources”

AWS Well-Architected Framework → Six Important Characteristics of Data Systems


Pillars
Ability to increase system resources to improve performance and
Scalability
1. Operational Excellence handle the demand

2. Security Elasticity Scale dynamically

3. Reliability Availability % of time in operable state

Reliability Probability of expected functionality during a specified interval


4. Performance

5. Cost optimization

6. Sustainability

Opensource Big Data Technologies

Used when the data volume exceeds the available memory and ideal for data exploration, filtration, sampling, and summarization.
Hadoop
Components include HDFS, MapReduce and YARN

Apache Spark Faster alternative to MapReduce that can handle batch and real-time data and is flexible to work with HDFS and Cassandra

Apache Cassandra Processes structured data with fault-tolerance on cloud infrastructure and commodity hardware

NoSQL database management system that stores data in flexible, JSON-like documents, making it easy to handle and scale large volumes
MangoDB
of diverse and unstructured data

Data Legislation

The Processing of Personal The POPI Act


Information is Governed By
Requires companies and individuals who handle personal information to take
Location Acronym Policy appropriate measures to protect it, involving both Cybersecurity safeguards and
policy development to prevent unlawful processing
Protection of
RSA POPI Act Personal Mandates that responsible parties disclose the purpose of data collection and
Information Act usage upfront, requiring informed consent from data subjects for any applications,
General Data especially in multiple AI uses, ensuring transparency and compliance
EU GDPR Protection
Regulation

The AI Act
The AI Act is a proposed European law on Artificial Intelligence -
The First law on AI by a major regulator anywhere

AI that contradicts EU values is prohibited (Title II, Article 5)

Example:
Subliminal Manipulation
An inaudible sound is played in truck drivers’ cabins to push them to drive
longer than healthy and safe. AI is used to find the frequency maximising this
- Resulting in physical / psychological harm
effect on drivers

Exploitation of Children or Mentally disabled persons Example:


A doll with an integrated voice assistant encourages a minor to engage in
- Resulting in physical / psychological harm progressively dangerous behaviour or challenges in the guise of a fun game

Example:
General Purpose Social Scoring An AI system identifies at-risk children in need of social care based on
insignificant or irrelevant social ‘misbehaviour’ of parents

Example:
Remote biometric identification for law enforcement purposes in
All faces captured live by video cameras checked, in real time, against a
publicly accessible spaces (with exceptions)
database to identify a terrorist

The AI Act requires high-quality, Organizations must maintain The AI Act strict data privacy and
unbiased data for training AI systems, transparency about the data sources security measures affecting how big data
which necessitates robust data and processing methods used in AI is collected, stored, and processed,
governance practices to ensure data models, leading to more rigorous ensuring compliance with privacy
accuracy, consistency, and fairness in big documentation and auditing of big regulations
data environments. data processes
Lecture 3: Data Models 1

Agenda:

Challenges of big data privacy

Data sources

Definition of data modelling

Structured vs unstructured data

Constraints

Big Data Privacy

Big Data can contain sensitive and valuable business information (assets)

Private Information that must be protected can be in the form of:

Financial records

Communication → (email, texts, phone calls)

Medical History

Educational Records

Job History

Principle of Least Privilege

Requires that a person or system only be given the privileges and data necessary to complete immediate tasks
required of them and nothing more

Migrating to the cloud is not a security guarantee, it follows a shared responsibility model

Cloud vendor → provides physical security

User → responsible for securing the applications and systems in the cloud

Data Security

Data security protects digital information in the data pipeline from unauthorized access, corruptions or theft

Protection is implemented via measures such as

Physical security → Hardware and storage devices

Logical security → Software applications

Security measures are complemented by administrative access controls & organizational policies and procedures

Common security measures

Encryption

Data erasure instead of standard data wiping

Masking personally identifiable information

Data resiliency

Data security strategies need to be comprehensive, incorporating


Application security updates

Backups

Employee education

Network endpoint security monitoring and controls → (multifactor authentication)

Data Encryption

Cryptographic Key → A string of characters (mathematically generated) that is fed to a cryptographic algorithm
(encryption/decryption) to secure data

Encryption of data using one or two keys is known as symmetric and asymmetric encryption

Cryptographic functions can be used for

Encryption & Decryption → Scrambles text to ciphertext and vice versa

Authentication → Assumes only authorized users have private keys in asymmetric encryption

Digital Signatures → Provide digital authenticity

Asymmetric Encryption Symmetric Encryption

SSL Connection Handshake

sequenceDiagram
participant Client
participant Server

Client->>Server: 1. Client issues secure session request<br>(https://someserver.org/someda


Server-->>Client: 2. Server sends X.509 certificate containing server's public key
Client--> Server: 3. Client authenticates certificate against list of known certificate au
Client->>Server: 4. Client generates random symmetric key and<br>encrypts it using server'
Server->>Client: 5. Client and server now both know the symmetric key<br>and encrypt end-u

Sources of Big Data

Machine Person Organization

Real-time sensors in industrial Social media data Transactional information in


machinery or vehicles,, that database
Status updates
log/track activity over time
Structured data in data
Tweets
Environmental Sensors warehouse
Phots
Medical / health trackers
Media

Data Modelling in Data Engineering

Business Blueprint Data Quality System Design Collaboration


requirements
The model The model A visual Modelling
During the serves as a ensures data is representation facilitates
modelling common structured of the data discussion among
process, business standard for infrastructure stakeholders,
Correctly
requirements for the data improves the promoting better
storage and structure that Consistently understanding understanding
processing are needs to be → resolves of components leading to
identified. implemented. inconsistencies and their possible
and interactions. improvement in
The different Providing a
ambiguities the design
data types, their understanding Potential issues
relationships and of the Efficiently can then be
the business rules infrastructure → without identified and
that apply to design and redundancy addressed
them are defined speeds up early, reducing
development costs

Data
governance
policies and
procedures can
then be
established

Structured vs Unstructured Data


T ypical
Characteristics Resides in Generated by Examples
Applications

• Dates
• Phone numbers
• Social security
numbers
• Airline reservation
• Relational • Credit card
• Predefined data models systems
Structured databases • Humans or numbers
• Usually text-only • Inventory control
Data • Data machines • Customer names
• Easy to search • CRM systems
warehouses • Addresses
• ERP systems
• Product names
and numbers
• Transaction
information

• Text files
• No predefined data • Applications • Word processing • Reports
model • NoSQL • Presentation • Email messages
Unstructured • May include text, images, databases • Humans or software • Audio files
Data sound, video, or other • Data machines • Email clients • Video files
formats warehouses • Tools for viewing • Images
• Difficult to search • Data lakes or editing media • Surveillance
imagery

Data Constraints

Logical statements that must hold for the data

Constraint Example

Value Age ≥ 18

Uniqueness ID number must be unique → NB for Primary Keys

Cardinality Many-to-one → Many students can attend one university

Type Age must be numeric

Domain Gender must be M or F

Structural # of rows = # of columns for data table


Lecture 4: Data Models 2

Agenda:

Definitions

Levels of data modelling

Conceptual data modelling

UML Diagrams

Information Systems

A system has a boundary outside of which are external entities (people ,systems) that interact with the system in
focus

Within this system are components (systems, people) that implement that system’s behaviour

These interactions and internal system behaviours involve data being exchanged, transformed and stored

Any system exists within a broader context that needs to be understood and the system should be defined in terms of
how it delivers functionality and services within this context, usually in the form of requirements
Levels of Data Modelling

Comparison As-Is To-Be

ERD Feature Conceptual Logical Physical

Entity (name) Yes Yes Yes

Relationship Yes Yes Yes

Column Yes Yes

Column Type Optional Yes

Primary Key Yes

Foreign Key Yes

Conceptual Data Modelling

Goal → Capture satellite view of business requirements of an organization

What problems are involved with the business and require immediate solutions?

What are the core concepts of these problems?

How are these problems related to one another?

Is there any scoping information available?


Logical Data Modelling

Detailed structures of the system Questions to be asked:

Entities, attributes, and relationships What should the collection look like?

Independent of database management system How can we make information secure?

How should we store history information?

How can we answer business questions in a shorter


amount of time

What is the optimal way to perform sharding?

Physical Data Modelling

Visual illustration of the physical structure of the actual database

Includes table structures, column names, column data types, primary keys, column constraints, relationships
between tables

Entities, attributes, key groups, primary key, foreign keys and relationships to each other

Specific to a DBMS

Use Case Diagrams


Use case Defining the An actor An association A use case is a set
diagrams show system represents a illustrates the of events that
the expected boundary role played by participation of occurs when an
behaviour of the determines an outside the actor in the actor uses a
system what is object. One use case system to
considered object may complete a
external or play several process.
internal to the roles and, Normally, a use
system therefore, is case is a
represented by relatively large
several actors process, not an
individual step or
transaction
Lecture 5: Data Models 3

Agenda:

UML Use Case Diagrams

Data Flow Diagrams

Entity Relationship Diagram

Data Flow Diagram


"A process model used to depict the flow of data through a
system and the work or processing performed by the
system"

---
title: The process for creating a DFD
---
%%{
init: {
'theme': 'base',
'themeVariables': {
'primaryColor': '#0a0a0a',
'primaryTextColor': '#fff',
'primaryBorderColor': '#fff',
'lineColor': '#fff',
'tertiaryTextColor':'#fff'
}
}
}%%

graph LR
step1["`Identify business data objects`"]
step2["`Identify processes`"]
step3["`Identify external enetities`"]
step4["`Tie diagram together`"]
step1 --> step2 --> step3 --> step4

DFD Elements DFD Template


DFD Example

Context-level Data Flow Diagram

Entity Relationship Diagram


"A data model utilizes several notations to depict data in
terms of the entities and relationships described by that
data"

ERD Elements

Entity A class of persons, places, objects, events, or concepts about which we need to capture and store data

Entity Instance A single occurrence of an entity

Attribute A descriptive property or characteristic of an entity. Also known as element, property, or field

Relationship A natural business association between one or more entities

The minimum and maximum number of occurrences of one entity that may be related to a single occurrence of the
Cardinality
other entity

Cardinality Interpretation:
Exactly one (one and only one):

Minimum Instances: 1

Maximum Instances: 1

Graphic Notation: Single line with a vertical bar ( | ), indicating a


mandatory single relationship. OR

Zero or one:

Minimum Instances: 0

Maximum Instances: 1

Graphic Notation: A circle ( o ) indicating optionality, followed by a


line with a vertical bar ( | ), showing that the relationship can either
not exist or exist exactly once.

One or more:

Minimum Instances: 1

Maximum Instances: Many (>1)

Graphic Notation: A line with a vertical bar ( | ) followed by a


three-pronged fork ( < ), indicating a mandatory relationship with one
or more instances.

Zero, one, or more:

Minimum Instances: 0

Maximum Instances: Many (>1)

Graphic Notation: A circle ( o ), indicating optionality, followed by a


three-pronged fork ( < ), indicating that the relationship can exist with
zero, one, or many instances.

More than one:

Minimum Instances: More than one (>1)

Maximum Instances: More than one (>1)

Graphic Notation: A three-pronged fork ( < ) with both ends


representing multiple instances.
Lecture 6: Systems Development

Agenda:

Systems development & Systems Development Life Cycle (SDLC) Definitions

SDLC roles and deliverables

Linear vs evolutionary life cycles

Life cycle examples

How to choose an approach

Systems Development

Systems development is the process of taking a set of business requirements and,


through a series of structured stages, translating these into into an operational IT system.

Scopes of System Development

Hardware

Software

Data

Procedures

People

Systems Development Life Cycle

A system development lifecycle (SDLC) is a framework describing a process for understanding, planning, building,
testing, and deploying an information system

Main Stages of System Development


System Development In A Wider Context

Example Roles

Business Roles: Project Roles: Technical Roles: Implementation &


Support Roles:
Sponsor or senior project manager technical architect
responsible owner release manager
team leader solution developer
business analyst database
work package solution tester
administrator
domain expert manager
system administrator
end users
Example Deliverables

models such as: requirement documents

class or entity relationship models test plans

use case models test scripts

process models implementation plans

state transition diagrams system components & working software

sequence diagrams

component diagrams

Linear vs Evolutionary Life Cycles

Linear SDLCs Evolutionary SDLCs

A linear approach describes a sequence of tasks that are An evolutionary approach evolves the solution through
completed in order, only moving to the next step once the progressive versions, each more complete than the last,
previous step is complete. E.g. waterfall, V-model, and often uses a prototyping approach to development.
incremental E.g. iterative, spiral

Strengths of Linear SDLCs: Strengths of Evolutionary SDLCs:

breaks down problem into distinct stages, each early delivery of value to the customer - either
with a clear purpose working versions or knowledge of project risk

everything is agreed in advance of being used, with copes well with complex requirements - fast changing,
no need to revisit later uncertain or

provides structure to complex systems, making


distributed or outsourced development easier to complicated
manage
encourages collaboration with the users throughout, so
suits a very detailed design decomposition and customer buy-in is higher
detailed specification approach
allows ‘just enough’ to be done knowing that it can be
locking down each stage in advance of the next refined later on
makes it easier to control cost and scope creep

simple and intuitive for smaller problems (people


who don’t think they are using a SDLC are
probably taking a linear approach)

Weaknesses of Evolutionary SDLCs:


Weaknesses of Linear SDLCs:
can be hard to project manage due to multiple
depends greatly on each stage being done properly,
iterations; especially hard with multiple iteration
as it is hard or impossible to go back and change it
teams and complex products
later
without careful management, the evolving
for complex problems, the time required to be
requirements can result in scope creep
thorough at each stage leads to long timescales
overall costs can be higher due to the additional
doesn’t cope well with changing requirements
integration and test across multiple teams and
customer or business value is not available until iterations
the end
easy to over-promise on early functionality
if the project is stopped early there is little of
business value to show for the cost

Model Types

Waterfall Model The V-Model The Extended V-Model

Incremental Life Cycle Iterative Life Cycle Boehm’s Spiral Life Cycle

How to choose an approach

Complexity of problem

Team experience

Stability of requirements

Delivery speed, and quality

Customer involvement

Uniqueness

High regulatory requirements


employee_id

name home_state

state_code

name home_state

state_code

employee_id

employee_roles

employees

jobs
home_state

state_code

state_code

home_state

employee_roles

employees

jobs

states
Lecture 8: RDBMS & SQL

Agenda:

ACID

SQL

Example Queries

ACID

ACID is a database design principle which defines how transactions are managed specifically in a relational database

Atomicity Consistency Isolation Durability

All operations will Ensures that the Ensures that the Ensures that the results
always succeed or fail database will always results of a of an operation are
completely remain in a consistent transaction are not permanent
state by ensuring visible to other
No partial transactions Once a transaction has
that only data that operations until it is
been committed, it
conforms to the complete
cannot be rolled back.
constraints of the
Irrespective of any
database schema can
system failure
be written to the
database

CRUD Operations

Create → SQL INSERT

Read → SQL SELECT

Update → SQL UPDATE / MERGE

Delete → SQL DELETE


Lecture 9: NoSQL Databases 1

Agenda:

Definition of NoSQL DBMS

Characteristics of NoSQL

The CAP Theorem

The BASE Principle

Four Types of NoSQL Storage Devices

Not-only SQL (NoSQL)

is a term that refers to using multiple data storage technologies


Polyglot Persistence
within a single system, in order to meet varying data storage needs

is a non-relational database that is highly scalable, fault-tolerant


Not-only SQL (NoSQL) database and specifically designed to house semi-structured and structured
data

is a type of Database partitioning in which a large database is


divided or partitioned into smaller data and different nodes. These
Sharding
shards are not only smaller, but also faster and hence easily
manageable.

Characteristics of NoSQL

Characteristic Description

Schema-less Data Allows storage of data without a predefined schema, enabling flexibility in handling diverse and evolving
Model data types.

Scale Out Rather T han Supports horizontal scaling by adding more nodes to the database cluster, as opposed to upgrading a
Scale Up single node's hardware.

Built on cluster-based technologies that ensure fault tolerance and high availability by replicating data
Highly Available
across multiple nodes.

Lower Operational Often based on open-source platforms with no licensing fees and designed to run on cost-effective
Costs commodity hardware.

Ensures that while data may not be immediately consistent across nodes after a write, it will eventually
Eventual Consistency
reach a consistent state.

Prioritizes availability and scalability (BASE model) over strict consistency (ACID model), with databases
BASE, Not ACID
designed to eventually reach consistency.

Data access is typically through APIs, including RESTful APIs, with some databases offering SQL-like
API-Driven Data Access
query capabilities.

Auto Sharding and Automatically partitions (shards) data across multiple nodes and replicates it to ensure high availability
Replication and support horizontal scaling.

Comes with built-in caching mechanisms, reducing the need for external caching solutions like
Integrated Caching
Memcached.

Distributed Query
Maintains consistent query performance across multiple shards in a distributed environment.
Support

Supports using multiple storage technologies (NoSQL and RDBMS) within the same application, allowing
Polyglot Persistence
for a flexible approach to data persistence.
Characteristic Description

Stores de-normalized, aggregated data to eliminate the need for complex joins and mappings, although
Aggregate-Focused
graph databases are an exception to this approach.

CAP Theorem

Every node provides the most recent state,


Consistency
or does not provide a state at all

Every node has constant read and write


Availability
access

Partition The system works despite partitions in the


Tolerance network

Partition Tolerant + Partition Tolerant + Available + Consistent =


Available = Not Consistent Consistent = Not Not Partition Tolerant
Available
If availability (A) and partition If consistency (C) and availability (A)
tolerance (P) are required, then If consistency (C) and partition are required, available nodes need to
consistency (C) is not possible because tolerance (P) are required, nodes communicate to ensure consistency
of the data communication cannot remain available (A) as the (C). Therefore, partition tolerance (P)
requirement between the nodes. So, nodes will become unavailable while is not possible.
the database can remain available (A) achieving a state of consistency (C).
but with inconsistent results.

BASE Principle

BASE is a database design principle based on the CAP theorem and leveraged by database systems that use
distributed technology.

BASE stands for:

basically available → database will always acknowledge a client’s request, either in the form of the requested
data or a success/failure notification

soft state → database may be in an inconsistent state when data is read; thus, the results may change if the
same data is requested again

eventual consistency → state in which reads by different clients, immediately following a write to the
database, may not return consistent results. Database only attains consistency once the changes have been
propagated to all nodes

IF BASE → Availability (A) + Partition Tolerant (P)


NoSQL Storage Devices

Key-value

Document

Column-family

Graph

Key-value Databases Document Storage Devices

Only need key to retrieve value


Key-value pairs, in a document
No fixed schema for value
No fixed schema for value
Quick to read and write using RAM
Value NOT opaque to database
Highly scalable
Value has fields which can be queried
Value opaque to database

No partial updates or queries to value’s “attributes”

Column-family Devices Graph Storage Devices

Store data similarly to traditional RDBMS but group


related columns together in a row, resulting in column-
families

Each column can be a collection of related columns


itself, referred to as a super-column Used to persist inter-connected entities

Emphasis on storing the linkages between entities

Entities → stored as nodes (also called vertices)

Linkages → stored as edges


Lecture 12: Introduction to GCP

Agenda:

Intro to GCP

Services

Coupons

ML Use Cases

Cloud Computing

Cloud Computing — the delivery of computing services over the internet

Benefits:

scalability

flexibility

cost savings

global reach

Google Cloud Platform (GCP)

Suite of cloud computing services offered by Google

Helps businesses solve complex problems using technology and innovation

Google Cloud Essentials

1. What is Cloud Computing?

Definition: Cloud computing is explained as getting tasks done using someone else's computers. Specifically,
using Google Cloud means utilizing Google's computers.

Capabilities: Google Cloud enables developers to build and host applications, store data, and analyze data using
Google's scalable and reliable infrastructure.

2. Google Data Centers

Overview: Google's data centers, located worldwide, house the computing, storage, and networking resources
that power Google's services like Search, Gmail, and YouTube.

Developer Access: Google Cloud shares these resources with developers, allowing them to build and run
applications on Google’s infrastructure.

3. Scalability with Cloud Spanner

Retail Example: A scenario is presented where a retailer needs to manage inventory, pricing, and demand
across thousands of stores. Handling seasonal spikes, especially during holidays, is highlighted as a significant
challenge.

On-Premise vs. Cloud: Managing an on-premise database would be costly and inefficient due to the need for
provisioning additional hardware. Conversely, using Google Cloud's managed services, like Cloud Spanner, allows
for efficient scaling and cost management.

Google Cloud Products and Services


1. Running Code

Compute Engine: Provides virtual machines that run in Google data centers.

Cloud Run: Enables the deployment of containerized applications on a fully managed serverless platform.

App Engine: Supports the deployment of highly scalable web apps and back-end services.

2. Storing Data

Cloud Storage: Ideal for unstructured data such as images, videos, and audio files.

Cloud SQL: Offers managed versions of MySQL, Postgres, and SQL Server, allowing for familiar relational
database management without the hassle of self-management.

Cloud Firestore: A NoSQL, document-based, real-time database, popular in scenarios where up-to-date data is
crucial, like gaming.

3. AI & Machine Learning Tools

Vision AI: Provides an API for image analysis, including object detection, landmark recognition, text extraction,
and more.

Cloud Natural Language: Analyzes text to extract information about entities, sentiment, syntax, and content
categorization.

Vertex AI: Google Cloud’s platform for building, deploying, and managing custom machine learning models.

Use Case for Google Cloud

Scenario: Developing a social networking site for dogs.

Cloud Storage: Used to store profile photos.

Cloud Firestore: Stores profile information such as dog names, locations, and hobbies.

Vision AI: Analyzes profile photos to detect objects like balls, stuffed animals, and bones, providing data insights.

Cloud Run: Deploys the application to the web, allowing for automatic scaling as the user base grows.

Core Services Data & Analytics Services

Compute Big data & Machine Learning

Storage Data storage and Management

Networking Data Analytics and Visualization

Identity & security services

You might also like