0% found this document useful (0 votes)

58 views42 pages

Module 3

The document discusses strategies for solving the "chicken and egg" problem in AI entrepreneurship where a lack of data prevents building AI systems, but AI systems are needed to generate more data. Some strategies discussed are: 1) Starting with a non-AI product to generate initial data; 2) Partnering with organizations that have existing data; 3) Crowdsourcing labeled data; and 4) Leveraging public data sources. Examples like Facebook and Lemonade are provided of companies that started without AI but were later able to develop AI systems using data generated from their initial non-AI products and services.

Uploaded by

متع نظرك

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views42 pages

Module 3

Uploaded by

متع نظرك

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

AI Fundamentals

Natural Language Processing

Prasanna (Sonny) Tambe, Associate Professor of Operations, Information and Decisions

Text is a Valuable Form of Unstructured Data

• Can provide valuable “signals” about markets and business decisions

• Online reviews tell us a lot about product decisions
• Online chatter can tell us about financial market activity
Natural Language Processing

• Text can be used to make predictions, but we first have to convert text to
“features”
• Sentiment, spelling, number of words used, etc.
• We can pre-process text to prepare it for analysis
• Correct issues with white space, extra spaces, punctuation, etc.
• Then identify what it is about the text that might be important for making
predictions in the future
Text Features

Example: A word
• Which words that appear in different texts predict outcomes
• Example: Online review
• Look at particular words that have meaning for making predictions about
product sales, repeat purchases, whether or not a review is “helpful”
• Could be sentimental words, positive/negative words, words about the
product
Text Features

• Can use combinations of words or groups of words

• Sentiment analysis
• Describing whether the words someone uses in a piece of text means
they are feeling more positive or negative about what they are talking
about
• Other ways to map sentiment in a way that can map to larger sets of
emotions, not just positive or negative
Natural Language Processing

• These features extracted from the text allow us to predict outcomes (e.g.
sentiment predicts buying behavior)
• Deep learning gives us even more flexibility
• Can combine the text content in richer and more meaningful ways
Natural Language Processing

Example: News articles and stock price movements

• Creating an algorithm that starts with breaking news and generates
information or predictions about stock price
• Start with a database of news articles
• Generate features from the news articles
• Label to be predicted may be the stock movement for a given time unit
• Use training data to train the model
• Test performance on test data set
• Deploy
Natural Language Processing

• Another common application for NLP, instead of prediction, is taking text

and putting it into groups or topics
• An example of unsupervised learning
• Topic modeling
• Classifies documents by content in a way that makes them easier to
interpret
• Will tell you how these documents should be grouped together in a way
that makes it easier to take action on them from a business perspective
AI Fundamentals
GANs and VAEs

Prasanna (Sonny) Tambe, Associate Professor of Operations, Information and Decisions

Generative Models

• Instead of classifying data into two categories, a generative model asks

what the underlying process is that could have generated the type of data
that we are seeing in the sample
• Generative models can create new data instances
Generative Models

• Many examples of generative models applied to art and music

• Generate new songs in particular styles, like country or jazz
• Produce art in styles that mimic those of traditional masters (e.g. GANGogh)
• Text applications (e.g. GPT-3)
• Generate text in a way that could have been written by a student or
journalist
Generative Adversarial Networks (GANs)

• Used to generate artificial content that is increasingly hard to tell apart from
real content
• Uses two networks “competing” with one another
• A generative network that produces new content
• Another network, a discriminator, is used simply to tell whether the output of
the first network is real or fake
Generator and Discriminator Networks

• Over time, the generator will learn what it needs to do to create content that is
harder and harder for the discriminator to identify as being fake content
GAN Example: Real vs. Fake Faces
Generative Adversarial Networks (GANs)

• A lot of controversy around these types of applications

• Concerns around deepfakes
Variational AutoEncoders (VAEs)

• Encoders take data and boil it down to a simpler representation

• AutoEncoders can take data and boil it down to a simpler representation which
can then be used to recreate itself
Variational AutoEncoders (VAEs)

• Variational AutoEncoders can be used to slightly vary some attributes or

aspects of the image in ways that we might care about
• Controlled content generation
• Generating artificial content in a way that we can start to control how it’s
different or how it gets tweaked
Applications: Controlled Content Generation
Applications: Controlled Content Generation
AI Fundamentals for Non-Data Scientists
ML Operations

Kartik Hosanagar, Professor of Operations, Information and Decisions

Traditional Dev Ops

Practices and tools used to build, test, and deploy code to production

https://docs.gitlab.com/ee/ci/introduction/
Machine Learning Workflow
Machine Learning Workflow

• Code is not the only source of changes, the data might change, the model itself
might change as it re-trains
Machine Learning Workflow

• Code is not the only source of changes, the data might change, the model itself
might change as it re-trains
Existing ML Ops Tools

Infrastructure Data Model

Deployment Monitoring
Management Management Management
AI Fundamentals for Non-Data Scientists
Chicken and Egg

Kartik Hosanagar, Professor of Operations, Information and Decisions

The Need for Data Can Pose a Chicken & Egg Problem

• Performance differences between ML algorithms can be relatively small

• Existing companies generally don’t have a problem with obtaining training
data
• New products can face a chicken and Product(s)
egg problem
• Without users, they don’t have data AI Systems Users
• Without data, they cannot build their
AI product
Data
Solving the Chicken & Egg Problem in AI Entrepreneurship

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
Solving the Chicken & Egg Problem in AI Entrepreneurship

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
1. Start with a Non-AI Product that Generates Data

• Create a non-AI service that solves customer problems, generates the data in
the process
• That data can then be used to train an AI system that enhances the existing
service or creates a related service

• Facebook initially didn’t use AI, but the social networking

platform generated a lot of data
• This data was then used to train AI systems that helped
personalize the newsfeed and made it possible to run targeted
advertising

https://www.lemonade.com/blog/the-sixth-sense/
https://www.sec.gov/Archives/edgar/data/1691421/000104746920003846/a2241899zs-1a.htm
1. Start with a Non-AI Product that Generates Data

• Create a non-AI service that solves customer problems and generates data in
the process
• This data can then be used to train an AI system that enhances the existing
service or creates a related service

• InsurTech startup Lemonade didn’t have data to build AI at first, but

over time has built AI to create quotes, process claims & detect fraud

• Now AI handles the “first notice of loss” for 96% of claims & manages
full claim resolution without human involvement in 1/3 of cases

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
2. Partner With An Organization That Has Data

• Partner with a company/organization that has proprietary data but lacks AI

expertise
• This is particularly useful if it is difficult to create a product that generates the
kind of data you need (e.g. medical data)

+
• Combine patient data with Google’s cloud and AI
capabilities to solve important questions in healthcare
• Using alarm data to distinguish “false alarms from
real ones” in hospitalized patients’ monitors.

https://med.stanford.edu/news/all-news/2016/08/stanford-medicine-google-team-up-to-harness-power-of-data-science.html
Solving the Chicken & Egg Problem in AI Entrepreneurship

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
3. Crowdsource the (Labeled) Data You Need

• Crowdsourcing platforms like Amazon’s Mechanical Turk or Scale AI can be

used to get label data
CAPTCHAs
• While CAPTCHAs serve an
important security purpose, Google
simultaneously uses them as a
crowdsourced image labeling
system
• Workflow that works to label data
without being distracting to the user

https://medium.com/swlh/ai-labeling-crowdsourcing-platforms-630adbc79c40
CAPTCHA Image: https://chrome.google.com/webstore/detail/buster-captcha-solver-for/mpbjkejclgfgadiemmefgebjfooflfhl
Solving the Chicken & Egg Problem in AI Entrepreneurship

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
4. Make Use of Public Data (and Pre-Trained Models)

• Products based on public data may be less defensible, but defensibility

can be built via other product innovations
• Can also use publicly available pre-trained ML models that can be
customized with transfer learning
• Before you conclude that the data you need is not available, look harder—
there is more publicly available data than you might imagine, including
data marketplaces

https://www.cnbc.com/2020/03/03/bluedot-used-artificial-intelligence-to-predict-coronavirus-spread.html
Solving the Chicken & Egg Problem in AI Entrepreneurship

Original
1 Start with a non-AI product that generates data
Product(s)

2 Partner with an organization that has data

AI Systems Users

3 Crowdsource the (labeled) data you need

New AI Product(s) Data

4 Make use of public data

New
5 Rethink the need for data
Users
5. Rethink the Need for Data

• Most of the practical AI today is built on ML (particularly supervised ML,

which requires large labeled training datasets)
• There are many approaches to building AI without large datasets
• Reinforcement learning
• Expert systems

https://medium.com/curai-tech/the-science-of-assisting-medical-diagnosis-from-expert-systems-to-machine-learned-models-cc2ef0b03098
Reinforcement Learning

• AI systems do not begin with large training datasets, but learn by taking
actions and observing the results
• Google’s AlphaGo was trained on a large dataset, but iteration #2,
AlphaZero, was based on reinforcement learning— yet AlphaZero beat
AlphaGo (which itself beat world champion Lee Sedol)
Expert Systems