100% found this document useful (1 vote)

305 views47 pages

Data Masters - Datawarehousing in The Cloud

test

Uploaded by

avinashkumarbihari1275

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

305 views47 pages

Data Masters - Datawarehousing in The Cloud

test

Uploaded by

avinashkumarbihari1275

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

Using Cloud Data

Warehousing to Analyze
Structured and Semi-
Structured data sets
Kevin Bair
Solution Architect
[email protected]
Topics this presentation will cover
1. Cloud DW Architecture

2. ETL / Data Pipeline Architecture

3. Analytics on Semi-Structured Data

4. “Instant” Datamarts without replicating TB of data

5. Analyzing Structured with Semi-Structured Data

2
Introducing Snowflake: An experienced team of data
experts with a vision to reinvent the data warehouse

Bob Muglia Benoit Dageville, PhD Marcin Zukowski, PhD Thierry Cruanes, PhD
CEO CTO & Founder Founder & VP of Engineering Founder Architect

Former President of Lead architect of Oracle parallel Inventor of vectorized query Leading expert in query

Microsoft’s Server and Tools execution and a key execution in databases optimization and parallel

Business manageability architect execution at Oracle

3
Today’s data: big, complex, moving to cloud

Of workloads will
be processed In
cloud data centers
(Cisco)

Surge in cloud
spending and
supporting
technology
(IDC)

Data in the cloud today is

expected to grow in the
next two years.
(Gigaom)

4
Structured data and Semi-
Structured data
• Transactional data • Machine-generated
• Relational • Non-relational
• Fixed schema • Varying schema
• OLTP / OLAP • Most common in cloud
environments
What does Semi Structured
mean?
• Data that may be of any type
• Data that is incomplete
• Structure that can rapidly and unpredictably
change
• Usually Self Describing

• Examples
• XML
• AVRO
• JSON
XML Example
<?xml version="1.0" encoding="UTF-8"?>
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>Two of our famous Belgian Waffles with plenty of real maple
syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>Light Belgian waffles covered with strawberries and whipped
cream</description>
<calories>900</calories>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<price>$8.95</price>
<description>Light Belgian waffles covered with an assortment of fresh berries and
whipped cream</description>
<calories>900</calories>
</food>
</breakfast_menu>
JSON Example
{
"custkey": "450002",
"useragent": {
"devicetype": "pc",
"experience": "browser",
"platform": "windows"
},
"pagetype": "home",
"productline": "none",
"customerprofile": {
"age": 20,
"gender": "male",
"customerinterests": [
"movies",
"fashion",
"music"
]
}
}
Avro Example

Schema } JSON

Data
} Binary
Why is this so hard for a
traditional Relational DBMS?
• Pre-defined Schema

• Store in Character Large Object (CLOB) data

type

• Inefficient to Query

• Constantly Changing
Current architectures can’t keep up

Data Warehousing Hadoop

• Complex: manage hardware, data • Complex: specialized skills, new tools
distribution, indexes, … • Limited elasticity: data
• Limited elasticity: forklift upgrades, redistribution, resource contention
data redistribution, downtime • Not a data warehouse: batch-
• Costly: overprovisioning, significant oriented, limited optimization,
care & feeding incomplete security

11
Data Pipeline / Data Lake Architecture – “ETL”

Data
Source Stage Stage EDW
Lake
Website S3 Hadoop S3 MPP
Logs • 10TB • 30 TB • 5 TB • 10 TB Disk
• Summary

Operational
Systems

External
Providers

Stream
Data
One System for all Business Data
HDFS
Semi-structured data
Structured Structured data
Data Sink { "firstName": "John",
Storage Apple 101.12 250 FIH-2316 "lastName": "Smith",
"height_cm": 167.64,
Pear 56.22 202 IHO-6912 "address": {

Map-Reduce Jobs
"streetAddress": "21 2nd
Orange 98.21 600 WHQ-6090 Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},

z
Relational
Databases

Other Systems Snowflake

 Multiple Systems  One System
 Specialized Skillset  One Common Skillset
 Slower/More Costly Data Conversion  Faster/Less Costly Data Conversion
 For both Structured and Semi-
Structured Business Data
How have other Big Data / DW
vendors approached this?
Microsoft - SQL Server doesn't yet accommodate JSON queries, so instead the company
announced Azure DocumentDB, a native document DBaaS (database as a service) for the Azure
cloud (http://azure.microsoft.com/en-us/documentation/services/documentdb/)

Oracle Exadata - Oracle Exadata X5 has many new software capabilities, including faster pure
columnar flash caching, database snapshots, flash cache resource management, near-instant
server death detection, I/O latency capping, and offload of JSON and XML analytics
(https://www.oracle.com/corporate/pressrelease/data-center-012115.html)

IBM Neteeza - You can use the Jaql Netezza® module to read from or write to Netezza tables.
(www-
01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.an
alyze.doc/doc/r0057926.html)

Postgres/Redshift - We recommend using JSON sparingly. JSON is not a good choice for storing
larger datasets because, by storing disparate data in a single column, JSON does not leverage
Amazon Redshift’s column store architecture.
(http://docs.aws.amazon.com/redshift/latest/dg/json-functions.html)

Hadoop – Hive and/or Map Reduce, somewhat vendor specific

14
Relational Processing of
Semi-Structured Data

1. Variant data type compresses storage of semi-

structured data
2. Data is analyzed during load to discern repetitive
attributes within the hierarchy
3. Repetitive attributes are columnar compressed
and statistics are collected for relational query
optimization
4. SQL extensions enable relational queries against
both semi-structured and structured data

15
Why Support Semi-structured
data via SQL?
• Integrate with existing data
• Reduced administrative costs
• Improved security
• Transaction management
• Better performance
• Better resource allocation
• Increased developer productivity
• SQL is proven model for performing queries,
especially joins.
Requirements for a Cloud-based Big
Data / Data Warehouse Platform
• No Contention (Writers can’t block readers)
• Continuous loading of data without “Windows”
• Compress, and don’t duplicate the data
• Segment the data (Datamarts) without replicating
• Ability to analyzed structured and Semi-structured
data, together, at Volume (TB-PB) using SQL
• Reduce Complexity, easy to manage and develop
with
• ELT vs. ETL, allowing processing to happen closer
to the data
• Security, encrypt all data at rest
Architectural Evolution of the Data
Warehouse
Scale Up
(SMP – Single Server)

RDBMS
Software

Storage
Architectural Evolution of the Data
Warehouse
Scale Up Scale Out Scale Up, Out or Down
(SMP – Single Server) (MPP / Hadoop Cluster) (Elastic / Cloud)

Optimizer
Optimizer
Metadata / Schema
Metadata / Schema
Optimizer

Metadata / Schema Leader Node

Query Query Query Query
Engine Engine Engine Engine
Query Engine

Query Query Query

Engine Engine Engine

Storage Storage Storage

Storage 1/X 2/X 3/X Storage
…..
Data Node(s)

1) Partition Keys / OLAP 1) No Partitions

2) Skew 2) Multiple Clusters
3) Redundancy 3) Only data needed is accessed
4) Query Inefficiency 4) Query Efficient
Data Warehousing ETL &
Data Loading

Cloud Service
Database is separate
Virtual
from Virtual Warehouse Warehouse Marketing
Finance
Users
Users
One Virtual Warehouse, S
multiple Databases Virtual
Virtual
Warehouse
Warehouse
One Database, multiple
Virtual Warehouses Databases

Virtual Warehouse scales

independently from Database Virtual
Virtual
Warehouse
Test/Dev Warehouse
Data loading does not Users S Sales
Virtual Users
Impact query performance Warehouse

Biz Dev
User
20
Data Pipeline / Snowflake Architecture – “ELT”

Source Stage EDW

Website Logs S3 Snowflake
• 10TB • 2 TB Disk

Operational
Systems

External
Providers

Stream Data
Amazon Cloud Data Pipeline Architecture

Stream Data
“JSON”

IAM “ Application”

EDW
Amazon
Kinesis S3 buckets
Amazon SQS

“ Long Term Storage Files” “ Notification”

Amazon SNS
Amazon Glacier

22
Amazon Cloud Data Pipeline Architecture (Near Real-time)

Stream Data
“JSON”

IAM “ Application”

EDW
Amazon
Kinesis S3 buckets
Amazon SQS

“ Near Real-time” “ Long Term Storage Files” “ Notification”

Storm /
Spark Amazon SNS
Amazon Glacier

23
Typical customer environment
Data sources
Datamarts BI / Analytics

OLTP EDW
databa
ses

Enterpri
se
ETL
applicati
ons

Third-
party
Web
applicati
ons

Other
Hadoop
Demo Time!
Demo Scenarios
• Clickstream Analysis (load JSON, multi-table insert)
• Which Product Category is most clicked on?
• Which Product line does the customer self identify as
having the most interest in?

• Twitter Feed (Join Structured and Semi-Structured)

• From our twitter campaign, is there a correlation
between twitter volume and sales?
Clickstream Example
{
"custkey": "450002",
"useragent": {
"devicetype": "pc",
"experience": "browser",
"platform": "windows"
},
"pagetype": "home",
"productline": "none",
"customerprofile": {
"age": 20,
"gender": "male",
"customerinterests": [
"movies",
"fashion",
"music"
]
}
}
What makes Snowflake unique for
handling Semi-Structured Data?
• Compression
• Encryption / Role Based Authentication
• Shredding
• History/Results
• Clone
• Time Travel
• Flatten
• Regexp
• No Contention
• No Tuning
• Infinitely scalable
• SQL based with extremely high performance
Where to Get More Info

• Visit us: http://www.snowflake.net/

• Email us:
• Sales: [email protected]
• General: [email protected]

• Q&A
THANK YOU!
Functions
• ARRAYAGG, ARRAY_AGG
• REGEXP
• ARRAY_APPEND
• REGEXP_COUNT • ARRAY_CAT
• ARRAY_COMPACT
• REGEXP_INSTR • ARRAY_SIZE
• REGEXP_LIKE • ARRAY_CONSTRUCT
• ARRAY_CONSTRUCT_COMPACT
• REGEXP_REPLACE • ARRAY_INSERT
• REGEXP_SUBSTR • ARRAY_PREPEND
• ARRAY_SLICE
• RLIKE • ARRAY_TO_STRING
• CHECK_JSON
• PARSE_JSON
• OBJECT_CONSTRUCT
• OBJECT_INSERT
• GET
• GET_PATH
• AS_type
• IS_type
• IS_NULL_VALUE
• TO_JSON
• TYPEOF
Parsing JSON using Snowflake SQL
(After loading JSON file into Snowflake table)

Parsing JSON using SQL from a VARIANT column in a Snowflake table

SELECT 'The First Person is '||fullrow:fullName||' '||
'He is '||fullrow:age||' years of age.'||' '||
'His children are: '
||fullrow:children[0].name||' Who is a '||
fullrow:children[0].gender||' and is '||
fullrow:children[0].age||' year(s) old '
||fullrow:children[1].name||' Who is a '||
fullrow:children[1].gender||' and is '||
fullrow:children[1].age||' year(s) old ' Result
FROM json_data_table
WHERE fullrow:fullName = 'John Doe';
FLATTEN() Function
and its Pseudo-columns
FLATTEN() Converts a repeated field into a set of rows.
FLATTEN() Returns Pseudo-columns in addition to the data result.
SELECT S.fullrow:fullName, t.value:name, t.value:age, t.SEQ, t.KEY, t.PATH, t.INDEX, t.VALUE
FROM json_data_table AS S, TABLE(FLATTEN(S.fullrow,'children')) t;

For maps or
objects. It contains
the key to the
exploded value

A unique sequence #
Path expression of For arrays, It contains The expression
related to the input
the exploded value in the index in the array of contained in the
expression
the input expression the exploded value collection
FLATTEN() in Snowflake SQL
(Removing one level of nesting)

FLATTEN() Converts a repeated field into a set of rows:

Parsing JSON using SQL directly from the file without loading into Snowflake
SELECT 'The First Person is '||
S.$1:fullName||' '||
'He is '||S.$1:age||' years of age.'||' '||
'His children are: '||S.$1:children[0].name||' Who is a
'||S.$1:children[0].gender||' and is '||S.$1:children[0].age||' year(s) old '
FROM @~/json/json_sample_data (FILE_FORMAT => 'json') as S
WHERE S.$1:fullName = 'John Doe';
Parsing JSON Records:
PARSE_JSON
Interprets an input string as a JSON document, producing a VARIANT value
SELECT s.fullrow:fullName Parent, c.value Children_Object,
c.value:name Child_Name, c.value:age Child_Age
FROM json_data_table AS S,
TABLE(FLATTEN(S.fullrow,'children')) c
WHERE PARSE_JSON(c.value:age) > 8;
Parsing JSON Records:
CHECK_JSON
Valid JSON will produce NULL
SELECT CHECK_JSON('{"age": "15",
"gender": "Male",
"name": "John"}') ;
Valid JSON

Invalid JSON will produce error message

SELECT CHECK_JSON('{"age": "15",
"gender": "Male",
"name "John" ') ;
Missing :

Invalid JSON will produce error message

SELECT CHECK_JSON('{"age": "15",
"gender": "Male",
"name": "John" ') ;
Missing }
Parsing JSON Records:
CHECK_JSON
Validate JSON records in the S3 file before loading it. Use SELECT with CSV file format
SELECT S.$1, CHECK_JSON(S.$1)
FROM @~/json/json_sample_data (FILE_FORMAT => 'CSV') AS S ;

Missing matching ] Missing : before [ Missing Attribute Value

Validate JSON records in the S3 file before loading it. Use COPY with JSON file format
COPY INTO json_data_table
FROM @~/json/json_sample_data.gz
FILE_FORMAT = 'JSON' VALIDATION_MODE ='RETURN_ERRORS';
Back up
Learn more at snowflake.net
Snowflake Architecture

User Interface
ODBC Driver JDBC Driver Web UI

Cloud Services
Optimization Query Mgmt Warehouse Mgmt Security Metadata

Virtual Warehouse
Processing
EC2 Customer Service Financial Analysts Quality Control Loading

Database Storage
S3
Data Sales Marketing Materials

Cloud Infrastructure
Amazon AWS

43
Snowflake Architecture

User Interface
ODBC Driver JDBC Driver Web UI

Compute Cloud Services

EC2 Financial Analysts Optimization Query Mgmt Warehouse Mgmt Security

DML DDL

1 4 9 B F H J M
Node Node Node Node Node Node Node Node Metadata Metadata Metadata

Database Database Database

Cluster
Storage 1 2 3 4 5 6 7 8
S3 9 A
I
B C
K
D
L
E
M
F
N
G
O
H J

44
Snowflake Architecture

User Interface
ODBC Driver JDBC Driver Web UI

Compute Cloud Services

EC2 Loading Financial Analysts Optimization Query Mgmt Warehouse Mgmt Security

DML DDL

Node Node Node Node Node Node Node Node

Metadata Metadata Metadata

Node Node Node Node Node Node Node Node
Database Database Database
Cluster
Node Node Node Node
Storage
Node Node Node Node S3
Data Sales Marketing
Cluster

AWS cloud
45
Snowflake High Availability Architecture
SQL

Load Balancer REST

Cloud Services Cluster Cluster Cluster Cluster Cluster Cluster

Fully Replicated
Metadata

Virtual
Warehouses

Fully Replicated
Database Storage

Availability zone 1 Availability zone 2 Availability zone 3

46
Enterprise-class data warehouse:
Security
Authentication
• Embedded multi-factor authentication server
• Federated authentication via SAML 2.0 (in development)

Access control
…
.

X X • Role-based access control model

• Granular privileges on objects & actions

Data encryption
• Encryption at rest for database data
• Encryption of Snowflake metadata
• Snowflake-managed keys

Controls & processes validated through SOC

certification & audit

IICS Student Guide
No ratings yet
IICS Student Guide
587 pages
Denodo Data Virtualization Basics
100% (1)
Denodo Data Virtualization Basics
57 pages
2nd Term Test Paper-ICT - KS4
100% (3)
2nd Term Test Paper-ICT - KS4
3 pages
A Detailed View Inside Snowflake
No ratings yet
A Detailed View Inside Snowflake
14 pages
Snowflake Training
No ratings yet
Snowflake Training
685 pages
Multi-Core Ecu Designing Using Autosar by Deependra Magarde
No ratings yet
Multi-Core Ecu Designing Using Autosar by Deependra Magarde
50 pages
Getting Started With Snowflake Guide
100% (1)
Getting Started With Snowflake Guide
23 pages
PPF Actions 1
No ratings yet
PPF Actions 1
36 pages
Solutions Partner Technical Onboarding Guide
100% (1)
Solutions Partner Technical Onboarding Guide
27 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
IDMC Best Practices and Standards
100% (1)
IDMC Best Practices and Standards
27 pages
Snowflake Faq
No ratings yet
Snowflake Faq
185 pages
Snowflake - T
No ratings yet
Snowflake - T
108 pages
Iso 22900-3
No ratings yet
Iso 22900-3
460 pages
Hazwaste PTT-Application-Process
100% (1)
Hazwaste PTT-Application-Process
20 pages
Data Architect - GC
No ratings yet
Data Architect - GC
4 pages
Snowflake For: Data Engineering
No ratings yet
Snowflake For: Data Engineering
15 pages
Master Snowflake Interview Q A 1729835390
No ratings yet
Master Snowflake Interview Q A 1729835390
7 pages
Snowflake Data Sharing
100% (1)
Snowflake Data Sharing
35 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
Teradata To Snowflake Migration Guide
100% (2)
Teradata To Snowflake Migration Guide
15 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Snowflake Overview: The Datawarehouse Build For Cloud
100% (2)
Snowflake Overview: The Datawarehouse Build For Cloud
8 pages
Snowflake Flatten PDF
100% (2)
Snowflake Flatten PDF
17 pages
Snowflake
No ratings yet
Snowflake
10 pages
3 Snowflake+Architecture
No ratings yet
3 Snowflake+Architecture
20 pages
Best Practices For Using Tableau With Snowflake
100% (1)
Best Practices For Using Tableau With Snowflake
63 pages
Snowflake
No ratings yet
Snowflake
16 pages
Password Manager Project
No ratings yet
Password Manager Project
25 pages
Redshift Vs Snowflake - An In-Depth Comparison PDF
100% (2)
Redshift Vs Snowflake - An In-Depth Comparison PDF
19 pages
L01 Manual
No ratings yet
L01 Manual
38 pages
(Guia Databrick Lakehouse)
No ratings yet
(Guia Databrick Lakehouse)
83 pages
Data Modeling Interviews
No ratings yet
Data Modeling Interviews
16 pages
Snowflake 20 s1 A PDF
No ratings yet
Snowflake 20 s1 A PDF
254 pages
Learnings From Real-World Snowflake Customer Case Studies
No ratings yet
Learnings From Real-World Snowflake Customer Case Studies
14 pages
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Snowpro Core Certification Guide
No ratings yet
Snowpro Core Certification Guide
5 pages
GCP Data Engineer Resume Examples For 2024 Resume Worded
No ratings yet
GCP Data Engineer Resume Examples For 2024 Resume Worded
1 page
Data Warehouse
No ratings yet
Data Warehouse
155 pages
Snowflake Questions 2
No ratings yet
Snowflake Questions 2
6 pages
What Is The Snowflake Data Warehouse
No ratings yet
What Is The Snowflake Data Warehouse
7 pages
Snowpro™ Advanced: Data Engineer: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Data Engineer: Exam Study Guide
16 pages
Why The Company Was Called Snowflake?: Founders
0% (1)
Why The Company Was Called Snowflake?: Founders
15 pages
Top 100+ Data Engineer Interview Questions and Answers For 2022
No ratings yet
Top 100+ Data Engineer Interview Questions and Answers For 2022
4 pages
Defining The Data Lake White Paper
0% (1)
Defining The Data Lake White Paper
7 pages
Documentation: o o o o o
50% (2)
Documentation: o o o o o
5 pages
Teradata To Snowflake Migration Guide
No ratings yet
Teradata To Snowflake Migration Guide
14 pages
What Is Snowflake
No ratings yet
What Is Snowflake
34 pages
CV For Snowflake Traning
No ratings yet
CV For Snowflake Traning
4 pages
SQL Server Modernization
No ratings yet
SQL Server Modernization
74 pages
DBT - Note2024-Roles
No ratings yet
DBT - Note2024-Roles
1 page
ERStudio Training v3
No ratings yet
ERStudio Training v3
48 pages
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
No ratings yet
89 Talend Interview Questions For Experienced 2018 - Real Time Scenario
3 pages
Firebase by Ta
No ratings yet
Firebase by Ta
60 pages
Intro To Data Engineering Databricks Webinar 13may
No ratings yet
Intro To Data Engineering Databricks Webinar 13may
59 pages
Snowflake Questions1
No ratings yet
Snowflake Questions1
4 pages
ILT-Fundamentals 4-Day - Datasheet
No ratings yet
ILT-Fundamentals 4-Day - Datasheet
4 pages
Finals Exam in Information Assurance and Security
No ratings yet
Finals Exam in Information Assurance and Security
3 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
Snowflake
No ratings yet
Snowflake
3 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
SnowFlake Schema
No ratings yet
SnowFlake Schema
8 pages
Erwin Data Modeling PPT
100% (1)
Erwin Data Modeling PPT
20 pages
Sssis Interview Questins
No ratings yet
Sssis Interview Questins
7 pages
Peperiksaan Percubaan Selangor Ict SPM 09
No ratings yet
Peperiksaan Percubaan Selangor Ict SPM 09
23 pages
CRMXIF
No ratings yet
CRMXIF
5 pages
New York - ServiceNow Upgrade Planning Checklist v19
No ratings yet
New York - ServiceNow Upgrade Planning Checklist v19
6 pages
Tib BW Release Notes
No ratings yet
Tib BW Release Notes
102 pages
A Mini Project On Stock Management System
No ratings yet
A Mini Project On Stock Management System
29 pages
Data Modeling Interview Questions
No ratings yet
Data Modeling Interview Questions
2 pages
Magic Quadrant For x86 Server Virtualization Infrastructure
No ratings yet
Magic Quadrant For x86 Server Virtualization Infrastructure
14 pages
SAFECode Fundamental Practices For Secure Software Development March 2018 PDF
No ratings yet
SAFECode Fundamental Practices For Secure Software Development March 2018 PDF
38 pages
CMS Lite 操作手冊 - v1.0.1 - EN
No ratings yet
CMS Lite 操作手冊 - v1.0.1 - EN
35 pages
DPA 19.3 Data Collection Reference Guide
No ratings yet
DPA 19.3 Data Collection Reference Guide
460 pages
Store Management
No ratings yet
Store Management
31 pages
Aamir Resume (MCITP 2008)
No ratings yet
Aamir Resume (MCITP 2008)
2 pages
INSTALLING OPERATING SYSTEM Part I
No ratings yet
INSTALLING OPERATING SYSTEM Part I
4 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Chapter 8
No ratings yet
Chapter 8
6 pages
Amey B-50 Software Engineering Lab Experiment-12
No ratings yet
Amey B-50 Software Engineering Lab Experiment-12
15 pages
QR Code Generation
No ratings yet
QR Code Generation
3 pages
Unit IV - Database
No ratings yet
Unit IV - Database
18 pages
Hitachi ID Identity and Access Management Suite
No ratings yet
Hitachi ID Identity and Access Management Suite
16 pages
BARACODA
No ratings yet
BARACODA
37 pages
How To Update Firmware For Cutting PlottersR3E2
No ratings yet
How To Update Firmware For Cutting PlottersR3E2
7 pages
AlbumArt KMPlayer
No ratings yet
AlbumArt KMPlayer
7 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
From Everand
The Snowflake Handbook: Optimizing Data Warehousing and Analytics
Robert Johnson
No ratings yet
Oracle Exadata Complete Self-Assessment Guide
From Everand
Oracle Exadata Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet

Data Masters - Datawarehousing in The Cloud

Uploaded by

Data Masters - Datawarehousing in The Cloud

Uploaded by

Using Cloud Data

2. ETL / Data Pipeline Architecture

3. Analytics on Semi-Structured Data

4. “Instant” Datamarts without replicating TB of data

5. Analyzing Structured with Semi-Structured Data

Business manageability architect execution at Oracle

Data in the cloud today is

• Store in Character Large Object (CLOB) data

Data Warehousing Hadoop

Other Systems Snowflake

Hadoop – Hive and/or Map Reduce, somewhat vendor specific

1. Variant data type compresses storage of semi-

Metadata / Schema Leader Node

Query Query Query

Storage Storage Storage

1) Partition Keys / OLAP 1) No Partitions

Virtual Warehouse scales

Source Stage EDW

“ Long Term Storage Files” “ Notification”

“ Near Real-time” “ Long Term Storage Files” “ Notification”

• Twitter Feed (Join Structured and Semi-Structured)

• Visit us: http://www.snowflake.net/

Parsing JSON using SQL from a VARIANT column in a Snowflake table

FLATTEN() Converts a repeated field into a set of rows:

Invalid JSON will produce error message

Invalid JSON will produce error message

Missing matching ] Missing : before [ Missing Attribute Value

Compute Cloud Services

Database Database Database

Compute Cloud Services

Node Node Node Node Node Node Node Node

Metadata Metadata Metadata

Load Balancer REST

Cloud Services Cluster Cluster Cluster Cluster Cluster Cluster

Availability zone 1 Availability zone 2 Availability zone 3

X X • Role-based access control model

Controls & processes validated through SOC

You might also like