0% found this document useful (0 votes)
23 views23 pages

Day 5 - Data Cloud Ingestion

Uploaded by

picevih793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views23 pages

Day 5 - Data Cloud Ingestion

Uploaded by

picevih793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Data Cloud

Bootcamp
Day 5 — Data Ingestion
Eliot Harper
Senior Architect, CloudKettle
Forward Looking Statement
This presentation contains forward-looking statements about, among other things, trend analyses and future events, future financial performance, anticipated growth, industry prospects,
environmental, social and governance goals, and the anticipated benefits of acquired companies. The achievement or success of the matters covered by such forward-looking statements
involves risks, uncertainties and assumptions. If any such risks or uncertainties materialize or if any of the assumptions prove incorrect, Salesforce’s results could differ materially from the
results expressed or implied by these forward-looking statements. The risks and uncertainties referred to above include those factors discussed in Salesforce’s reports filed from time to time
with the Securities and Exchange Commission, including, but not limited to: impact of, and actions we may take in response to, the COVID-19 pandemic, related public health measures
and resulting economic downturn and market volatility; our ability to maintain security levels and service performance meeting the expectations of our customers, and the resources and
costs required to avoid unanticipated downtime and prevent, detect and remediate performance degradation and security breaches; the expenses associated with our data centers and
third-party infrastructure providers; our ability to secure additional data center capacity; our reliance on third-party hardware, software and platform providers; the effect of evolving
domestic and foreign government regulations, including those related to the provision of services on the Internet, those related to accessing the Internet, and those addressing data
privacy, cross-border data transfers and import and export controls; current and potential litigation involving us or our industry, including litigation involving acquired entities such as
Tableau Software, Inc. and Slack Technologies, Inc., and the resolution or settlement thereof; regulatory developments and regulatory investigations involving us or affecting our industry;
our ability to successfully introduce new services and product features, including any efforts to expand our services; the success of our strategy of acquiring or making investments in
complementary businesses, joint ventures, services, technologies and intellectual property rights; our ability to complete, on a timely basis or at all, announced transactions; our ability to
realize the benefits from acquisitions, strategic partnerships, joint ventures and investments, including our July 2021 acquisition of Slack Technologies, Inc., and successfully integrate
acquired businesses and technologies; our ability to compete in the markets in which we participate; the success of our business strategy and our plan to build our business, including our
strategy to be a leading provider of enterprise cloud computing applications and platforms; our ability to execute our business plans; our ability to continue to grow unearned revenue and
remaining performance obligation; the pace of change and innovation in enterprise cloud computing services; the seasonal nature of our sales cycles; our ability to limit customer attrition
and costs related to those efforts; the success of our international expansion strategy; the demands on our personnel and infrastructure resulting from significant growth in our customer
base and operations, including as a result of acquisitions; our ability to preserve our workplace culture, including as a result of our decisions regarding our current and future office
environments or work-from-home policies; our dependency on the development and maintenance of the infrastructure of the Internet; our real estate and office facilities strategy and
related costs and uncertainties; fluctuations in, and our ability to predict, our operating results and cash flows; the variability in our results arising from the accounting for term license
revenue products; the performance and fair value of our investments in complementary businesses through our strategic investment portfolio; the impact of future gains or losses from our
strategic investment portfolio, including gains or losses from overall market conditions that may affect the publicly traded companies within our strategic investment portfolio; our ability to
protect our intellectual property rights; our ability to develop our brands; the impact of foreign currency exchange rate and interest rate fluctuations on our results; the valuation of our
deferred tax assets and the release of related valuation allowances; the potential availability of additional tax assets in the future; the impact of new accounting pronouncements and tax
laws; uncertainties affecting our ability to estimate our tax rate; uncertainties regarding our tax obligations in connection with potential jurisdictional transfers of intellectual property,
including the tax rate, the timing of the transfer and the value of such transferred intellectual property; uncertainties regarding the effect of general economic and market conditions; the
impact of geopolitical events; uncertainties regarding the impact of expensing stock options and other equity awards; the sufficiency of our capital resources; the ability to execute our
Share Repurchase Program; our ability to comply with our debt covenants and lease obligations; the impact of climate change, natural disasters and actual or threatened public health
emergencies; and our ability to achieve our aspirations, goals and projections related to our environmental, social and governance initiatives.
How Data Cloud Works

Data Sources Connect Harmonize Unify Analyze & Predict Act


CRM Data

1P Data Batch and Analytics and Apps and


Streaming Insights Actions
3P Cloud Storage Ingestion

Any Device

Scale Data Business Flows


APIs & SDKs Data Models
Intelligence
MuleSoft

Customer Graph

AI Predictions Activations
Automatically Connect & Ingest Data at Any Velocity
20x Increase in ROI

Data Sources Connect Prepare


1P Data Source 1.5 Hours weekly
bandwidth returned
OOB Connectors

Ingest billions of profile & event data


Batch Ingestion Data Prep Recipes
SFTP & Cloud Storage and Data Transforms

Access 150+ enterprise data sources


APIs & SDKs

Web and Mobile Apps

APIs & SDKs


(Batch & Streaming)
Streaming Easily prepare and improve data in the
Real-Time Ingestion
Data Spaces platform
Data Cloud Connector
Connector
MuleSoft

Reduce time-to-value with built-in connectors


and data transforms

Custom Integrations via AppExchange [Data Cloud Page]


Our Customer Data Lakehouse
Deliver market leading data preparation and management capabilities
Batch Stage Prepare Curate
Truth
Land Data in Any Form or Quality and at Land data in any Cleanse, Harmonize using
form or quality standardize, the CIM, resolve
Any Velocity transform data identity and
extract insights
Ingest data in batch or streaming modes as is TB data

using Data Source Objects

Prepare Data
Use built-in data prep capabilities transform
data source objects into data lake objects:
Filtering Stream

Normalization
De-normalization External
Data Lake
100k rps
Pivots and Transforms

Harmonize and Curate Truth x

Data Source Data Lake Objects Data Model


Map data lake objects to the CIM data model Objects Objects

objects, resolve identity and extract insights to Bronze Silver Gold

build a single source of truth


Basic Phases and Terms
Data Ingestion

Data Sources: Data Spaces:


Systems where data resides Partitions of your prepared data and its utilized components
(e.g. CRM, SFMC, etc.)

Formulas
Ways to perform minor
adjustments at the time of
Data Stream: ingestion
An entity that can be Data Source Data Lake
Data
extracted fromStream:
a Data Object (DSO):
An entity that can be
Object (DLO):
Source (e.g. Contacts) Your raw, ingested Your raw, ingested
extracted from a Data data Bulk / Streaming
Source (e.g. Contacts) data
Transformation
Ways to perform major
joins / filters /
transformations on DLOs
Bronze, Silver, Gold: Medallion Architecture
Data Transformations

Data is logically organized as 4 parts

Data Source Objects - the original data


sources. This is the customer’s original file
format (e.g. CSV) or transient data storage
in case of built-in connectors. (e.g.
Data Source Data Lake Data Model Marketing cloud)
Object (DSO) Objects (DLO) Object (DMO)
Data Lake Objects - the data that is
transformed and actually stored in the lake
This is generally stored as Parquet files.

● Multi Format (Json, csv, ● Schema enforced ● Semantic Mapping establishes


Data Spaces - Once your data has been
parquet, orc) ● Parquet formatted Iceberg Tables DLO to DMO
ingested, it is assigned to a Data Space that
● ●
Multi Sourced - Cloud Storage, Hydrated by transformations ● Can be optionally materialized
acts as a partition, allowing you greater
Mulesoft, Kafka ● Typed (Profile Vs Engagement) ● Insights, Unified Profiles are
control over how your data is organized
● Schema Preserving ● Materialized Tables DMOs
● Virtual BYOL Tables ● Salesforce Data come direct into ● Simplified Curated Data to Data Model Objects - These are either
Lake Objects Powers Business Applications materialized or views on top of the Data
Lake Objects. These can be CIM objects or
materialized ones such as Unified
Individual, Computed Insights,
transformations etc.
Data Streams & Sources
Ingestion - Marketing Cloud
What is it?
• Native Integration with Marketing Cloud to bring in any MC
data into Data Cloud
• Ingest Data from any data extension in MC in a few clicks
and any channel related data like Opens, Clicks, Bounce etc.

Unique Value Proposition


• Seamless Native Integration with Salesforce Proprietary
APIs
• Time To Value - Pre-Built Bundles
• All Engagement Data from MC available in CDP instantly.
• Clicks Not Code
• Ingest Data at High Scale & Velocity
Example Use Cases
Ingest Email Open, Click data to identify top engagers for
segmentation
Ingest Einstein scores for AI Based Segmentation
Surface Marketing Insights to CRM Agents
Ingestion - CRM
What is it?
• Ingest any data from CRM in a few clicks across Sales,
Service, Loyalty & any CRM Object.
• Pre-Made Bundles mapped to CDP Data Model
• Packaging capabilities to create industry specific/ISV
bundles

Unique Value Proposition


• Seamless Native Integration with Salesforce Proprietary
APIs
• Time To Value - Pre-Built Bundles
• All Data from CRM accessible at big data scale -
Competitors offer integration that only allow subset of data
to be ingested.
• Clicks Not Code
Example Use Cases
Ingest Accounts, Case, Leads, Loyalty data for segmentation.
Single repository of Data across CRM & Other sources for BI
Ingestion - Commerce Cloud
What is it?
• Ingest Commerce Cloud Order Data and Related Customer
and Catalog Data with OOTB Connector

Unique Value Proposition


• This is a unique capability offered only between Salesforce
CDP and Commerce Cloud
• Time To Value - Pre-Built Bundles
• Clicks Not Code

Example Use Cases


Unify Online data from Commerce with Offline data coming
from other sources to understand lifetime value of the
customer.

Leverage Order Data to create affinities based on previous


purchasing patterns within CDP
Ingestion - Web SDK
What is it?
• SDK/Tag to capture real-time customer events from the
brand’s website

Unique Value Proposition


• Unified SDK with Personalization allows Data Collection and
Actionability using same tag

Example Use Cases


● Collect Real-Time Web Behaviour - Views, Clicks, Add to
Cart, Form Submission, Watch Video etc
● Trigger actions based on real-time behavior on any
channel - Email, SMS, Push, Sales/Service Events,
External Webhooks, Slack Message, Stream to
Warehouse & more
● Leverage Web Data for other use cases for Insights,
Identity Resolution, Segmentation, Activation, Business
Intelligence, Personalization
Ingestion - Mobile SDK
What is it?
• Mobile SDK to capture all mobile transactions, behaviors,
and other events.

Unique Value Proposition


• Unified SDK with Marketing Cloud allows same SDK to
capture mobile events as well as trigger personalized push,
in-app messages and more
• Fully Integrated with Journey Builder to trigger
omni-channel journeys
Example Use Cases
● Collect Real-Time Mobile Behaviour - Views, Clicks, Add
to Cart, Form Submission, Watch Video etc
● Trigger actions based on real-time behavior on any
channel - Email, SMS, Push, Sales/Service Events,
External Webhooks, Slack Message, Stream to
Warehouse & more
● Leverage Web Data for other use cases for Insights,
Identity Resolution, Segmentation, Activation, Business
Intelligence, Personalization
Ingestion - Personalization
What is it?
• Native integration to ingest data from Interaction Studio.
• Integrate multiple datasets to allow ability to get a global
view across brands and regions.
• All types of events are ingested in few clicks.
• Ingest anonymous and known data.

Unique Value Proposition


• Faster time to value with customers already using
Personalization
• Native Connector with Personalization Engine to streamline
Data collection and Personalization.
Example Use Cases
Build affinities within CDP using Calculated Insights based on
raw data from Personalization

Create a superset of data to understand/segment on/report


on customer lifetime value, affinities across business units
and datasets for personalization.
Ingestion - Cloud Storage - S3
What is it?
• Ingest data from any system via S3 bucket
• Import data stored on public cloud seamlessly

Unique Value Proposition


• UI Driven Experience
• Clicks not Code
• Automatic delimiter, data type, and date time pattern
detection
• Ability to transform incoming data easily
• Wildcard match to accommodate date-stamped or
otherwise changing file names
• High water mark tracking to allow only reading new files
• Customized scheduler (hourly, weekly, monthly)

Example Use Cases


Ingest any and all external data sources in bulk.
Ingest data available in customer’s data lake or other services
on AWS in a few clicks.
Ingestion - Cloud Storage - GCS
What is it?
• Ingest data from any system via Google Cloud Storage..
• Import data stored on public cloud seamlessly.

Unique Value Proposition


• UI-Driven Experience
• Automatic delimiter, data type, and date time pattern
detection
• Ability to transform incoming data easily
• Wildcard match to accommodate date-stamped or otherwise
changing file names
• High water mark tracking to allow only reading new files
• Customized scheduler (hourly, weekly, monthly)
Example Use Cases
Ingest any and all external data sources in bulk.
Ingest data available in customer’s data lake or other services
on GCP in a few clicks.
Ingest Google Analytics data of your choice via the
GA->Bigquery->GCS ingestion route.
Ingestion - APIs
What is it?
• Streaming and Bulk APIs
• Send data from any application to CDP

Unique Value Proposition


• Easily Configurable Schema
• Designed for High Scale, High Velocity
• Packaging support for re-usability
Example Use Cases
● Ingest Real-Time POS data from Store
● Ingest Weather Updates
● Ingest Loyalty Data
● Ingest External Data Sources from any system
Ingestion API
RESTful API with two different patterns

Streaming API Batch API

● Small micro-batches of records being ● Moving large amounts of data on a daily,


updated in near-real time weekly, or monthly schedule
● Source system built on modern ● Legacy systems where you can only
streaming architectures export data in off-peak hours
● Change data capture events
● A new Salesforce CDP org that you want
● Consuming data from webhooks
to backfill with 30/60/90/X days of data
● Updates to individual profile
● Events and Behavioral data
Ingestion - Mulesoft
What is it?
• Native Integration with Mulesoft to ingest data using
Streaming and Bulk APIs

Unique Value Proposition


• Mulesoft opens an ecosystem of 250+ OOTB Native
Connectors
• If you need a connector, chances are mulesoft already have
one.
• API First strategy for data integration and complex use cases
• Reduce Time to build custom integrations
• IT agility
• Accelerator Patterns focused on time to value
Example Use Cases
Ingest data from legacy systems
Ingest data from External systems like POS, OMS, Snowflake,
Azure and other connectors for which CDP does not have
ootb approach
Data Ingestion Timings
Lookback
Connectors Data Delivery Latency Refresh Mode
Window
Marketing Cloud 90 days Batch Hourly - 24 Hours Upsert or Full Refresh

Hourly Upsert
CRM No limit Batch
Bi-weekly Full Refresh

Cloud File Storage


None Batch Hourly Upsert or Full Refresh
(S3, GCS, Azure)
Sales Order and Sales Order
Sales Order - Upsert
B2C Commerce 30 days Batch Customer - Hourly
All others - Full Refresh
Others - Daily
Marketing Cloud Profile - 15 minutes Users - Upsert
0 days Near Real Time
Personalization Events/Engagement - 2 mins All others - Insert
Ingestion API
(Batch and Near Real Time 15 minutes Upsert
Streaming)
Web and Mobile User Profiles - Hourly
Near Real Time
SDK Engagement - 15 minutes
Mulesoft (using
Near Real Time 15 Minutes
Ingestion API)
Types of Data in Data Cloud - Categories
PROFILE EVENT / ENGAGEMENT OTHER

Party Contact Point


Identification App Catalog Lookups
Order Case
Contact Point Profile
Email Attributes
Pricebooks
Visit Download
Contact Point
Device
Mobile

● Profile: Individual, Account, other profile-like entities


● Engagement: Behavioral events or transactions, such as clicks or purchases. Completed at a
snapshot in time by a profile entity, such as an Individual or Household
● Other: Related to Profile & Engagement but isn’t in that set, such as product or store info

Considerations
● You cannot change the category after saving the data stream
Q&A

You might also like