0% found this document useful (0 votes)
59 views50 pages

Data Products

The document discusses data products, defining them as products that derive value from analytics and whose primary objective is to use data to facilitate an end goal. It categorizes data products into three types - data as a service, data-enhanced products, and data as insights - and provides examples of recommendation engines and LinkedIn's use of data products. The document also describes five broad functions of data products: raw data, derived data, algorithms, decision support, and automated decision-making.

Uploaded by

Kelvin Ting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views50 pages

Data Products

The document discusses data products, defining them as products that derive value from analytics and whose primary objective is to use data to facilitate an end goal. It categorizes data products into three types - data as a service, data-enhanced products, and data as insights - and provides examples of recommendation engines and LinkedIn's use of data products. The document also describes five broad functions of data products: raw data, derived data, algorithms, decision support, and automated decision-making.

Uploaded by

Kelvin Ting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Products

What comes to mind?

The next bus will arrive in 10 minutes.


The price of a hotel reservation for next week is
$97. 

2
Learning Objectives:-

1. To explain what data product is.


2. To categorize types of data products.
3. To describe data products functions.
4. To express interfaces or interactions.
5. To be familiar with feature engineering.
6. To discuss on deployment / productization.

3
The Age of the Data Product
• The information revolution - driven as it is by networked
communication systems and the Internet, is unique in that it
has created a surplus of a valuable new material — data
— and transformed us all into both consumers and
producers.
• The sheer amount of data being generated is tremendous.
Data increasingly affects every aspect of our lives.
• We have developed a reasonable expectation for products
and services that are highly personalized and finely tuned
to our needs, creating a market for a new information
technology — the data product.

https://www.oreilly.com/library/view/data-analytics-with/9781491913734/ch01.html
4
What is Data Product?
• Data science is about insights (not information, not
technology)
• The best insights are actionable.
• Data products are products that derive substantial
value from analytics and whose primary objective
is to use data to facilitate an end goal.
– Information product: Google Analytics
– Data product: Google Search
• Data products are the reason data scientists are
lately treated like rock stars.

5
“The future belongs to the companies and people that turn
data into products.” -- Mike Loukides, O’Reilly

6
Notice the Different!
• Information Product - It's any product or service
that you can sell to people to provide them with
information, usually about a specific topic.
– e.g. Instruction manual, news service and
online directory.
• A knowledge product is a result of human
thought that has value.
– e.g. Lesson learned report, a summary of best
practices, thesis and journal articles.

7
Data products are created with data science workflows,
specifically through the application of models, usually
predictive or inferential, to a domain-specific dataset.

A data product is an economic engine.


It derives value from data and then
produces more data, more value, in
return.
88
3 Types of Data Products

9
Data as a Service
• Data itself is the product.
• These products are offered to users as either a
paid for or free service.
• All data products that create direct revenue fall
into this category.
• Companies offering this type of data product
provide data for specific interests such as to-the-
second accurate stock-market data or location-
specific weather data.
• e.g. AcuWeather, Gro Intelligence
10
Data-Enhanced Products
• These are data-based additional functions which
modify a traditional product to increase its value.
• Data products which enhance a physical or virtual
product fall into this second category.
• The value of such a product is reflected in the change
in revenue (price or quantity) of the enhanced product.
• Most recommenders fall in this category, as they
improve the sales of products.
– 35% of Amazon’s revenue comes from
recommendations - and why 75% of Netflix content is
consumed based on recommendations.

11
Data as Insights
• These are products that analyze data to
provide insights to decision maker within an
organization.
• e.g. Google Analytics, Tableau, Cloudera /
Hortonworks / MapR
• Insights as a Service is a software service
that specifically delivers quality, actionable
insights. Typically, such services are hosted
in the cloud.
12
13
An Example of Data Product based on
Predictive Modeling
Recommender systems
A subclass of information filtering systems that are meant to predict the
preferences or ratings that a user would give to a product. Recommender
systems are widely used in movies, news, research articles, products,
social tags, music, etc.

Amazon recommender system


Amazon examines items customers have purchased, and based on
similar purchase behavior of other users, makes recommendations. In
this case, order history data is combined with recommendation
algorithms to make predictions about what customer might purchase in
the future.

14
Recommendation Engines
• Recommendation engines filter out the
products that a particular customer would
be interested in or would buy based on his
or her previous buying history.
• The more data available about a
customer the more accurate the
recommendations.

15
Types of Recommender Systems
The TWO main types of recommender systems are:

16
16
Collaborative Filtering
A method of making automatic predictions (filtering) about the
interests of a user by collecting preferences or taste information
from many users (collaborating).
Typically, the workflow of a collaborative filtering system is:

1. Look for users who share the same rating patterns with the
active user (the user whom the prediction is for).
2. Use the ratings from those like-minded users found in step 1 to
calculate a prediction for the active user.
Two types of collaborative filtering techniques are used:
1. User-based collaborative filtering
2. Item-based collaborative filtering

https://towardsdatascience.com/brief-on-recommender-systems-b86a1068a4dd 17
Two types of collaborative
filtering techniques
User-based collaborative filtering Item-based collaborative filtering

18
Content-based filtering
• This filtering is based on the description or some data
provided for that product.
• The system finds the similarity between products based
on its context or description. The user’s previous history is
taken into account to find similar products the user may
like.

If a user likes movies such as ‘Mission


Impossible’ then we can recommend
him the movies of ‘Tom Cruise’ or
movies with the genre ‘Action’

19
Building Products from Data at
Linkedln

20
Active vs Passive Data
• Active / explicit data – user needs to
actively provide the data (e.g. User ratings
and reviews).
• Passive / implicit data – data collection in
which data is gathered automatically often
without user knowledge (e.g. user clicks
and views).

21
Data Products Functions
FIVE broad groups of data products
functions:
1. Raw data,
2. Derived data,
3. Algorithms,
4. Decision support and
5. Automated decision-making

https://towardsdatascience.com/designing-data-products-b6b93edf3d23
22
(1) Raw Data
• Starting with raw data, we are collecting
and making available data as it is
(perhaps we’re doing some small
processing or cleansing steps).
• The user can then choose to use the data
as appropriate, but most of the work is
done on the user’s side.

http://Gnip.com - the official reseller of Twitter's data(tweets)

23
(2) Derived Data

• In providing users with derived data, we


are doing some of the processing on our
side.
• In the case of customer data, add
additional attributes like assigning a
customer segment to each customer, or
we could add their likelihood of clicking on
an ad or of buying a product from a certain
category.
24
What is Derived Data?
• A derived data element is a data element
derived from other data elements using a
mathematical, logical, or other type of
transformation, e.g. arithmetic formula,
composition, aggregation.

25
(3) Algorithms
• Algorithms, or algorithms-as-a-service.
• We are given some data, we run it through the
algorithm - be that machine learning or otherwise 
- and we return information or insights.
• A good example is Google Image: the user
uploads a picture, and receives a set of images
that are the same or similar to the one uploaded.
• Behind the scenes, the product extracts
features, classifies the image and matches it to
stored images, returning the ones that are most
similar.
26
(4) Decision Support
• Providing information to the user to help them with decision-
making but we are not taking the decision ourselves.
• Analytics dashboards such as Google Analytics, Flurry, or
WGSN would fall into this category.
• Give the user relevant information in an easy-to-digest format to
allow them to take better decisions.
• In the case of Google Analytics, that could mean changing the
editorial strategy, addressing leaks in the conversion funnel, or
doubling down on a given product strategy.
• The important thing to remember here is as follows: while we
have taken design-decisions in data collection, derivation of new
data, in choosing what data to display and how to display it, the
user is still tasked with interpreting the data themselves. They are
in control of the decision to act (or not act) on that data.
27
(5) Automated Decision-Making
• Here we outsource all of the intelligence within a
given domain.
• Netflix product recommendations or Spotify’s
Discover Weekly would be common examples.
• Self-driving cars or automated drones are
more physical manifestations of this closed
decision-loop.
• We allow the algorithm to do the work and
present the user with the final output (sometimes
with an explanation as to why the AI chose that
option, other times completely opaque).
28
Notes
• Generally speaking these product types are
listed in terms of increasing complexity.
More specifically, they are listed in terms of
increasing internal complexity and (should
have) less complexity on the user’s side.

• The more computation, decision-making or


“thinking” the data product does itself, the
less thinking required by the user.

29
• Typically raw data, derived data and
algorithms have technical users.
• Most often they tend to be internal products
in an organization but counter-examples
would include Ad Exchanges, or API suites. 
• Decision support and automated decision-
making products tend to have a more
balanced mix of technical and non-technical
users.

30
Data Interactions
• Each of these data products can be
presented to our users in a variety of
ways.
• What are these interfaces or interactions?

31
API
• APIs are the de-facto standard for building and
connecting modern applications. 
AccuWeather,*
via its self-
service portal,
offers both an
API product
with up-to-the-
minute weather
data and an API
product with
daily weather
data.
https://www.accuweather.com/en/my/pantai-valley/787644/weather-forecast/787644
32
Dashboards & Visualizations
• Dashboards are a data visualization tool that
allow all users to understand the analytics that
matter to their business.

33
Web Elements
• More recently, these interfaces have been
broadly extended to include voice, robotics and
augmented reality, amongst others.

34
Data Product Matrix

Different products require different approaches


35
Art of Data Science
• Data science builds models that work on large
datasets, from thereon one makes predictions.
• The art of data science is to figure out which feature
to use when.
• If you look at datasets, it is rows of data stored in the
table, every column is called a feature and the model
that we build needs to shortlist the features.
• Based on the features, one makes predictions and
shortlisting of features is called feature selection.

https://www.analyticsindiamag.com/data-enabled-products-defining-future-data-science/
36
Feature Engineering
Feature engineering is the process of using
domain knowledge of the data to create
features that make machine learning algorithms
work.
THREE main tasks in feature engineering:-
• Feature transformation
• Feature generation
• Feature selection

37
Feature Transformation
• Constructing new features from existing
features; this is often achieved using
mathematical mappings.
• For example, the BMI index is a feature
obtained through feature transformation
using a mathematic formula.

38
Feature Generation
• Generating new features that are often not the result of
feature transformation.
• For example, one generates new usable features for images
from the pixels of the images (as the pixels are not usable
features).
• Many domain specific ways for defining features also belong
in the feature generation category.
• Feature generation methods can be generic automatic ones,
in addition to domain specific ones.
• Patterns mined from given data can also be used to generate
new features. Sometimes the terms ``feature extraction” and
“feature construction” are used for feature generation.

39
Feature Selection
• Selecting a small set of features from a very
large pool of features. The reduced feature set
size makes it computationally feasible to use
certain machine learning and data analytic
algorithms.
• Feature selection may also lead to improved
quality on the result of those algorithms.
• Feature selection has traditionally been focused
on the classification problem, but it is also
needed for other data analytic problems.
40
Feature Selection
A dataset about customers
• To find out: what product customers are most likely to buy.
• Have to figure out - which features are important in making
those decisions.
• Might decide that age of customer is the feature to be
included in the analysis, gender is a feature to be included
but by some reason the post code they are residing is not a
feature to be included this process.
• This is called feature selection and once we have our
features we build different types of models that fall into a
couple of different types of categories.

41
Notes
• Automatic feature engineering is about
generic approaches for automatically
generating a large number of features and
selecting an effective subset of the
generated features in the process.
• Feature analysis and evaluation is about
evaluating the usefulness of features and
feature sets. This is sometimes included as
part of feature selection.
42
Productization
Any successful data science project must
end with productization.
This is the stage where trained models are
deployed as application that can be easily
accessed by the end users.

43
Productionalizing Machine Learning
Models
FOUR different ways of productionalizing machine learning models.

44
Batch Prediction
• The simplest form of machine learning workflow
is the batch prediction.
• This is typically seen in academia and places
like Kaggle.
• You take a static dataset, run your model on it,
and output a forecast. How do you
productionalize something like that?
• On Kaggle, you save your predictions to a CSV
file that you submit through an online form.

45
Web Service
• The most common type of machine learning
workflow is a simple web service.
• The web service takes in some parameters and
spits out a prediction straight away.
• This is way more agile than the batch prediction
scheme.
• The difference from batch predictions, apart from
running in near real-time, is that it handles a single
record at a time, instead of processing all the data
at once.
46
Online Learning
• Emerging now is real-time streaming analytics, also
known as hot path analytics.
• This works very well with the lambda architecture
that’s so popular in big data systems.
• The input data in this case would be a stream of
events, and the model would be placed right in the
firehose, so to speak, running the model on the data
as it enters the system.
• The model would typically be running as a service on
a Spark cluster or something similar. This is very
useful for sensor data.

47
Automated ML
• Tendency to think of machine learning models as
something you train, deploy and forget, but that’s often not
good enough.
• Online learning means that your model learns, improves
and updates itself while in production.
• This obviously requires some engineering, but the payoff is
a dynamic model.
• An even more sophisticated version of this is automated
machine learning. Instead of updating the model, you can
run an entire machine learning pipeline online in production
that comes up with entirely new models on the go.

48
Dataprenuers
The entrepreneurs focused on data science and related topics like
Business Intelligence, Business Analytics, Predictive Modeling,
Machine learning etc.
The datapreneurs are classified in 4 areas:
i. Data Products,
ii. Data Science Services,
iii. Data Science Training, and
iv. Data Science Communities
Examples for the 4 areas:
v. Jim Goodnight & John Sall (SAS), Christian Chabot (Tableau)
vi. Gurjeet Singh (Ayasdi), Carlos Guestrin (Apple)
vii. Andrew Ng (Coursera), Sebastian Thrun (Udacity)
viii. Anthony Goldbloom (Kaggle), Gregory Piatetsky-Shapiro
(Kdnuggets)
49
https://www.kdnuggets.com/2015/09/top-datapreneurs-data-science-analyticsvidhya.html
Try the Demo &
Watch the Video
• https://demos.datasciencedojo.com/

Beyond Analytics — Building Data Products for Data


Natives
https://databricks.com/session/beyond-analytics-building-da
ta-products-for-data-natives
(21 minutes)

So You Want to Build a Data Product? -


http://on.wsj.com/1qbvL87
https://www.oreilly.com/library/view/data-analytics-with/9781491913734/ch01.html

50

You might also like