Course2 - Cloud Digital Leader
Course2 - Cloud Digital Leader
https://cloud.google.com/training/business
Obtén información sobre qué son los datos, cómo usarlos para tomar decisiones y su
función fundamental en el aprendizaje automático. También se presentarán conceptos como
datos estructurados y no estructurados, bases de datos, almacenes de datos y data lakes.
Cloud technology on its own only provides a fraction of the true value to a business; When
combined with data–lots and lots of it–it has the power to truly unlock value and create new
experiences for customers. In this course, you'll learn what data is, historical ways
companies have used it to make decisions, and why it is so critical for machine learning.
This course also introduces learners to technical concepts such as structured and
unstructured data. database, data warehouse, and data lakes. It then covers the most
common and fastest growing Google Cloud products around data.
This is the second course in the Cloud Digital Leader series. At the end of this course, enroll
in the Infrastructure and Application Modernization with Google Cloud course
Introduction
Welcome to Innovating with Data and Google Cloud! In this module, you'll meet the
instructor, learn about the course content, and how to get started.
Barry: Hi, I'm Barry Schmell, a course designer and facilitator at Google Cloud.
Business data is not a new term.
Businesses have leveraged information about performance and operations for centuries to
make decisions.
Traditionally, data analysis could take days or months, is often incomplete, and complex
reports were often done by specialized teams.
Cloud technology disrupts traditional data analysis.
Data can now be consumed, analyzed, and used at speed and scale never before
possible.
In fact, businesses can now leverage cloud technology to ingest data in real time to train
machine learning models and to take action.
In this course, we're going to explore how businesses can better use data in their digital
transformation journey.
In Module 1, I'll define data and its role in digital transformation.
I'll identify where you can find data and how you can create new insights by combining
different datasets.
In Module 2, I'll examine the differences and similarities between databases, data
warehouses, and data lakes.
I'll offer use cases for each and some relevant Google Cloud solutions for getting the most
value from your data.
Finally, in Module 3, I'll look at machine learning and artificial intelligence.
I'll highlight key opportunities for applying ML in any business and cover common Google
Cloud ML and AI solutions that you can start using right away.
Throughout the course, what I want you to remember is you don't have to be a data
scientist or technical expert to do data analysis or to use machine learning.
In fact, with the right cloud tools, data is now accessible in new ways.
Anyone in your organization can unlock the value of data to enable digital transformation.
Data is key to digital transformation. In fact, capturing, managing, and leveraging data is
central to redefining customer experience and creating new value in almost every industry.
This module begins by exploring the value of data for enabling digital transformation,
followed by a breakdown of the different types of data and considerations for collecting,
storing, and processing data when using cloud technology.
Introduction:
Barry: Data is a term that's used a lot in today's business world, and there's a good reason
for that.
Capturing, managing, and leveraging data is central to redefining customer experience
and creating new value in almost every industry.
In this module, I'll start with the basics.
What is data, and what's its role in the digital transformation of your business?
Then I'll discuss how you can leverage data in your organization.
Next, I'll break down the types of data, and finally, I'll go through some important data
considerations for every business using data in the cloud.
Let's get started.
[bright music] person: Let's start by asking a simple question: what is data?
Data is any information that is useful to an organization.
Imagine numbers on a spreadsheet or text in an email.
These are both examples of data.
Other examples include audio or video recordings, images, and even just ideas in
employees' heads.
Businesses now have access to data like never before.
This includes internal information, data from inside your organization, and external
information, customer and industry data.
For example, as organizations have digitized their operations, all kinds of business data
has become available, such as financial information, logistics data, production output, and
quality reports.
Businesses also have access to new kinds of data about their customers.
Consider digital interactions such as the length of time a user spends on a web page or
reaction to a social media post.
These are totally new and very rich sources of information about customer behavior.
The Internet has also increased access to external data, such as industry benchmarking
reports.
Capturing and leveraging this data to unlock business value is central to digital
transformation.
Large enterprises with traditional IT infrastructures face several limitations in leveraging
the value of data.
These limitations include processing volumes and varieties of new data, either at regular
time intervals known as batch or in real time.
Finding cost-effective solutions for setting up and maintaining data centers.
Scaling resource capacity up or down, and regulating their capacity globally, especially
during peak demand times throughout the year.
Accessing historical data that is often stored in different formats and on different platforms.
Deriving insights from historical and new data in time and cost-effective ways.
Public cloud services like Google Cloud offer organizations economies of scale, rapid
elasticity, and automation where there was manual overhead.
They allow organizations to bring together data points and platforms fragmented across
their whole ecosystem.
In particular, the cloud provides data solutions that were once almost impossible.
Businesses can now consume, store, and process terabytes of data in real time and run
queries at its request to retrieve and use data instantly.
Resources are now distributed across a global network.
This means that multiple data centers can create resilience against data loss or service
disruption, but without any extra overhead for businesses.
And data can be combined, analyzed, and served to business teams quickly and cost-
effectively.
For the first time in many businesses, this means data insight is highly accurate and
accessible across the business, and now an enabler of transformation.
Let's look at some examples of organizations that have transformed their business by
unlocking the value of data.
Budget airlines don't provide food as part of their service.
Instead, they charge customers for meals if they want that.
This may seem like a cost-effective solution, but it's often difficult to estimate the number
of meals required onboard.
If the airline overestimates the number of meals required, they risk wasting food and losing
revenue.
But if they underestimate the number of meals needed, they risk selling out of food,
providing poor customer service, and losing potential revenue.
One budget airline in Asia embraced digital transformation and reimagined how they could
solve this problem using data.
First, they identified factors to help estimate stock such as the size of the plane and the
number of passengers, but they soon discovered that estimates based on these factors
were not highly accurate.
This meant that they had to think about their data differently.
So they analyzed additional information such as destination, time of day, and flight
connections before and after their journey.
Using this information, they uncovered actionable insights.
For example, they learned that of the total vegetarian and non-vegetarian meals required
on each flight, flights to and from India required 73% more vegetarian meals.
With these new insights, the airline was able to predict the amount of meals required more
accurately.
As a result, they provided a more positive customer experience and improved the
profitability of their food service.
Let's take a look at another example.
Traditional retailers have access to a range of data about their stores, including stock
levels, items purchase, and average spend per customer.
However, they've never been able to capture information about more nuanced in-store
customer behavior.
One video security company noticed this problem in the retail sector and reconsidered
how they could use their existing technology and data to overcome it.
Traditionally, security monitoring systems were used for one main purpose: to detect
criminal behavior in stores.
But what if they could reimagine the purpose of this technology?
By using cloud computing to mine data from video cameras and devices, this company
was able to generate insights on customer retail footpath, sentiment, and dwell time.
This means that businesses can now correlate data on shopper behaviors in the store to
improve safety, operational efficiency, and top-line growth.
Manufacturing is another great example of an industry that is using data and embracing
digital transformation.
Companies and high-speed manufacturing industries such as pharmaceuticals, food and
beverage, and consumer packaged goods require continuous production.
They can't afford any downtime because that can significantly impact revenue, customer
experience, and product quality.
In these always on-manufacturing environments, maintaining production health is key.
One technology company helps businesses perform vital monitoring of their manufacturing
and production lines.
They do this by combining IoT, Internet of things devices, with manufacturing and
analytics.
With cloud computing, they analyze historical data and live information generated by
sensors to assess machine health, predict maintenance, and ensure that production lines
are always running.
We'll talk more about IoT in the next module.
These are just a few examples of how cloud technology can unlock new value by
reimagining data.
No matter where you are in your company, you too can leverage data to solve challenges.
In the next video, I'll discuss how you can start to think about and map the data in your
organization to uncover new business value.
[bright music] person: So far you've learned that leveraging data is a critical part of digital
transformation.
And no matter what your role is, you can unlock new business value by thinking about
data in new ways.
A helpful starting point is to identify and map your data.
Let me explain.
A data map is a chart of all the data used in end-to-end business processes.
For example, imagine that you own a chain of apparel retail stores.
What might you include in your data map?
A customer purchases an item in one of your stores.
That's a data point.
If you aggregate that with other purchases across stores in each region, you have a type
of data: transactions.
We call this a dataset.
Another dataset might be item returns.
Another is footfall in your stores.
You may have noticed that all of these datasets are about your users.
User data is therefore your first data bucket.
This category contains all data from customers who use or purchase your services and
products.
Now let's think about data that is more operational.
For example, data about staffing levels in each store, stock delivery dates, overall sales
performance of each store, and store staffing structure.
That's, how many people are in fitting rooms versus at the cash register.
These fall into the second bucket of data: corporate data.
Corporate data includes data about the company such as sales patterns and operations.
Can you think of more datasets that you'd have as the retail owner?
A third category serves as the umbrella for user and corporate data.
We'll call it industry data.
Industry data is data found outside an individual organization that everyone in the sector
needs to view or access to gain knowledge about a specific domain.
This could include wider trends, purchasing patterns, and publicly available research
papers.
These three buckets make up your data map.
As you add more datasets to each bucket, you'll eventually gain richer insights.
But how can I sort these datasets further?
Well, I can use a check mark to identify the datasets I currently have and a question mark
to label the datasets I think I can get.
Now your data is mapped out with a list of the different datasets you have or think you can
get.
How can you make your data actionable?
Start playing with the intersections between your datasets.
Take two or more datasets and ask yourself, "What insight could I gain if these datasets
were combined?"
For example, suppose you're a sales manager at a biomedical diagnostics company.
You provide a range of diagnostic tools to hospitals and laboratories across the region.
One of your datasets is sales of each product.
You also have some datasets about the hospitals themselves, including specialties,
location, patient turnover, and central laboratory facilities.
In this case, by integrating how many products are sold at each hospital with all the other
datasets you have for the hospital, you have a clear insight into what makes your ideal
customer.
Here's another example.
If you took introduction to digital transformation with Google Cloud, you might remember
Jane.
She's a personal banker.
What does her data ecosystem look like?
Well, user datasets might include user demographics, user financial history, and previous
user interactions.
Corporate datasets might include sales by financial product, sales conversation call logs,
and performance metrics of financial portfolios.
Banking is a heavily regulated industry, so a lot of industry data is available, including
industry benchmarks.
Another set of industry data is stock performance and other investment trends.
Jane, her colleagues, and her competitors at other banks all have access to this data.
By integrating her user demographics data with product purchases, she can begin
uncovering demographic indicators that are significant for predicting product purchases.
Now it's your turn.
Take a piece of paper and write user data, corporate data, and industry data.
Think about what dataset you have in each.
Then consider how different datasets can be combined to create valuable insights.
In the next section, we'll explore the different data modalities and why they matter in digital
transformation.
[bright music] person: We've talked a lot about different data formats throughout the
course, such as images, audio files, and social media interactions, as well as tabular data
like sales figures.
Most importantly, we've uncovered how you can combine and leverage data in new ways
to revolutionize business models and create new value.
To better understand how to combine this data throughout your digital transformation, let's
explore the two types of data and what they mean for businesses.
We can categorize data in two main types: structured and unstructured.
Structured data is highly organized.
Examples include customer records consisting of names, addresses, credit card numbers,
and other quantitative data.
Structured data can be easily stored and managed in databases.
By contrast, unstructured data has no organization and tends to be qualitative.
Examples of unstructured data can include word processing documents, audio files,
images, and videos.
This data can be stored as objects.
An object consists of the data itself, a variable amount of metadata, and a globally unique
identifier.
Some unstructured data can be stored in a format called a BLOB.
This stands for Binary Large Object.
Images, audio, and multimedia files can all be stored as BLOBs.
Organizations rely on both structured and unstructured data to gain insight and make
intelligent decisions.
However, unstructured data has historically been very difficult to analyze.
Cloud technology changes this.
With the right cloud tools, businesses can extract value from unstructured data by using an
application programming interface, or API, to create structure.
An API is a set of functions that integrate different platforms with different types of data so
that new insights can be uncovered.
Let's look at an example of a used car dealership that was able to leverage both structured
and unstructured data.
Whenever customers brought cars into the dealership, the agents had to manually upload
and label photos of each car and then set a price depending on the model and condition.
This process took them, on average, 20 minutes per car.
In this case, how could the car dealership use automation to make this process more
efficient?
Well, they decided to develop a custom train machine learning model, a topic I'll cover
more in detail in an upcoming module.
To combine unstructured data, the photographs, with structured data, they used car
prices.
They could then use this combined data to predict the price value of a vehicle.
With this new approach, the overall process to photograph and evaluate a car has now
dropped from 20 minutes to 2 to 3 minutes per car.
That's a 95% reduction in time and a massive improvement in overall service.
Another example is Bloomberg.
Bloomberg aggregates data-driven news, global insights, and expert analysis from over
2,700 journalists in more than 120 countries.
They also make their content, their structured and unstructured data, available in multiple
languages worldwide.
This is inevitably a big challenge.
Their customers expect up-to-date and accurate news that's accessible in their specific
languages.
So how do you translate the data you're receiving and then localize it into your global
audiences, all in real time?
Well, Bloomberg uses Google Translate API, which enables them to make financial news,
global insights available to their customers in as many as 92 languages.
Using an API to translate content into multiple languages is a common way that
businesses apply the power of the cloud to unstructured data, creating new value while
saving time and costs.
More importantly, they can reach more customers and provide a better, more personalized
service.
Understanding structured and unstructured data can help you define what's possible with
the data solutions you have.
We'll explore this in the next module when we cover data consolidation and analytics.
But first, any conversation about data needs to include a reference to security, privacy,
compliance, and ethics.
We'll cover that in the next section.
person: Capturing, storing, and analyzing vast amount of data is key to adopting Cloud
technology.
But handling this volume and diversity of data comes with its own ethical considerations
and requires alternative ways of thinking about security.
Google believes that capturing and managing data demands responsibility and
accountability.
Not all information that can be captured should be captured.
In other words, businesses are accountable for making responsible decisions about which
data they collect, store, and analyze.
This also extends the data that businesses already own.
In this case, it's essential to examine who has access to the data and how they'll be using
it.
First, consider the source of the data, how it's being collected, and where it's stored.
If it's personal or sensitive data about a customer or an employee, it needs to be securely
collected, encrypted when stored in the Cloud, and protected from external threats.
Additionally, only a subset of users should be granted permission to view or access the
private data.
Data security and privacy becomes more complex in a global economy.
Regional or industry-specific regulations often guide data policies.
Google Cloud offers a range of solutions and best practice resources that companies can
leverage.
Another consideration is whether all the data is relevant and appropriate.
Let me explain where this can be particularly important.
Suppose, for instance, you want to use thousands of lung X-ray images to train an ML
model to automatically identify tumor markings in new patient X-rays.
What you need are the X-ray images.
This is the relevant data.
What's not relevant is patient's personal data.
You need to ensure that any source data about individuals such as names or addresses is
omitted or redacted.
There's also some information that is not personally identifiable and should still not be
included in the modeling for ethical reasons.
A good example of this is whether or not individuals have health insurance.
This is not relevant for educating the model to identify tumors.
And if the data is included, the solution could be discriminatory.
These ethical and privacy considerations are particularly complex when you're working
with unstructured data.
For example, a customer support team that resolves hundreds of customer's issues a day
via mail might want to use an automated tool to find patterns in the email passages and
develop targeted solutions.
It's true that emails contain valuable data that can be mined to solve this challenge, but it's
essential to be conscious of protecting customer privacy at the same time.
Ethical and fair considerations are particularly important and applicable when you work
with artificial intelligence, AI, and machine learning.
We'll cover ML and important factors to consider in a later module.
For now, I want you to remember that human bias can influence the way datasets are
collected, combined, and used.
Because of this, it's always important to include strategies to remove unconscious biases
as you start to leverage data to build new business value.
In this module, you learned the importance of data in digital transformation.
Unlocking the value of data enables a business to both rethink how they serve their
customers and reimagine how they operate.
Ultimately, using data effectively enables any business, large or small, to better achieve its
mission.
Move on to the next module to learn about data consolidation and analytics.
How you store and manage your data affects what you can do with it.
So in the next module, we'll examine the challenges, solutions, and use cases for different
data consolidation and storage systems on-premises or in the Cloud.
Quiz 1:
1.
Eduardo is using a machine learning model to improve recruitment efficiency for his
company. What candidate data is appropriate and relevant for training the model? Select
the two correct answers.
Address
Gender
Education
Years of experience
Ethnicity
2.
Images and videos are examples of what type of data? Select the correct answer.
Organized
Unstructured
Semi-structured
Structured
3.
What are the key benefits of using cloud technology to unlock value from data, especially
for traditional Enterprises? Select the two correct answers.
4.
Mark owns a large pharmaceutical company that manufactures essential medical supplies.
The production lines are required to operate efficiently at all times. How can Mark use
cloud technology to ensure his production lines are meeting optimal performance
requirements? Select the correct answer.
5.
Lucinda is creating a data map for her online learning company. Her datasets include
learner demographics, their purchases, and browsing history. What data 'bucket' would
these datasets fall into? Select the correct answer.
Cloud data
User data
Industry data
Corporate data
Introduction:
Bary: In the previous module, you learned that unlocking the value of data is central to
digital transformation.
Businesses that leverage data effectively can improve efficiency and productivity, deliver
fresh, personalized customer experiences, and create new business value.
In particular, I discussed the different types of data that businesses can access, and how
you can combine them to generate insights and take intelligent action.
The way that data is collected, stored, and managed is foundational to what you can do
with it.
In this module, I'll start by considering where data is now and the benefits of migrating your
data to the Cloud.
Then I'll define key terms related to data storage, including database, data warehouse,
and data lake.
For each term, I'll cover some use cases in applicable Google Cloud Solutions.
Finally, I'll close the module by exploring business intelligence solutions like Looker, which
enable businesses to gain insight into their data.
Let's get started.
Cloud databases
[upbeat music] Bary: In the last video, I mentioned that how data is stored is central to
being able to use it.
There are many solutions for data storage.
One format is a database.
A database is an organized collection of data generally stored in tables and accessed
electronically from a computer system.
Companies typically use a database to keep track of their basic online transactions,
provide information that will help the company run its business efficiently, or help
managers and employees make better decisions.
For example, a hotel booking site would use a database for their customer transactions.
If a person books a room for a night, that data is captured in the database, and the room
availability is updated in real time on all customer channels.
Another example is online banking.
When someone transfers money from one account to another using their mobile app, that
figure is updated in the bank's database in real time and the user is able to see the most
up-to-date account balance.
Data integrity and scale are two priorities for businesses that use databases.
Data integrity or transactional integrity refers to the accuracy and consistency of data
stored in a database.
Data integrity is achieved by implementing a set of rules when a database is first
designed, and through ongoing error-checking and validation routines as data is collected.
Databases, therefore, also allow businesses to roll back transactions to see data history.
In our banking example, suppose the customer goes to an ATM to check their account
balance.
It's not the same at what's displayed on the mobile app.
In this case, the bank needs the ability to roll back the transactions to identify the source of
the problem.
Perhaps the ATM is broken, or perhaps the user didn't click the final transfer button on
their app.
This rollback integrity protects the bank from fraudulent claims and protects customers
money.
Another priority when using databases is scalability.
Going back to our example, suppose the ATM belongs to a global bank that has a large
customer base and processes high volumes of transactions every day.
They need a database that can scale to meet that demand.
Or think about our hotel booking example.
Websites that process global holiday bookings have millions of transactions happening
every day that require transactional integrity at scale.
Different cloud providers offer a range of database solutions.
Let's look at a couple of the most common Google Cloud database services and their
benefits.
We'll start with Cloud SQL.
Cloud SQL is a fully-managed Relational Database Management Service, or RDBMS.
It's easily integrates with existing applications and Google Cloud services like Google
Kubernetes Engine and BigQuery and built on the performance innovation in Compute
Engine.
Cloud SQL is compatible with common database management systems and
methodologies.
It offers security, availability, and durability, and storage scales automatically when
enabled.
This makes it easy for organizations to set up, maintain, manage, and administer
databases in the Cloud.
You might want to use Cloud SQL for databases that serve websites, for operational
applications for e-commerce, and to feed into report and chart creation that informs
business intelligence.
Let's look at a specific customer example headquartered in Mumbai, India.
Living Consumer Products runs two flagship products: a casual dating mobile app called
iCrushiFlush and a contextual digital platform, CDP, that provides digital marketing
services to clients.
By signing on to iCrushiFlush through Facebook, users provide details such as gender,
location, and interest, as well as headshot images.
iCrushiFlush stores this information in a database and displays it to other iCrushiFlush
users through an algorithm depending on compatibility.
Due to the large data volumes generated by iCrushiFlush and CDP, as well as the need to
allocate scarce personnel and financial resources to business projects, Living Consumer
Products decided to operate in the Cloud from the beginning.
However, testing and early experiences with public cloud services didn't meet the
company's cost requirements.
In addition, they felt that availability and scalability were non-negotiables.
And they couldn't afford downtime that would drive users away.
Living Consumer Products migrated iCrushiFlush and CDP to Google Cloud.
Now they're running these services on Compute Engine to provide compute power, Cloud
Storage to provide unified cloud storage, and Cloud SQL to run its relational database.
This allows Living Consumer Products to both store and retrieve large volumes of data
such as user images in real time.
We'll cover Cloud Storage in more detail in upcoming videos.
By handing off to Google the time-consuming tasks required to set up and run a database
like applying patches and updates, managing backups, and configuring replications, you
can save time and money, and keep your focus on building great applications.
Cloud Spanner is another fully-managed database service, and it's designed for global
scale.
With Cloud Spanner, data is automatically and instantly copied across regions.
This replication means that if one region goes offline, the organization's data can still be
served from another region.
It also means that queries always return consistent in ordered answers regardless of the
region.
For example, if someone in the London office updates information in the database, that
update is immediately available for someone in the New York office.
Consistency is critical to companies like Spotify.
It provides users with music streaming services.
If queries don't always reflect the latest change, this creates many challenges for the
company.
Spotify holds information about the objects it stores in the Cloud.
This is known as metadata.
Migrating their metadata storage to Cloud Spanner gave them strong listing consistency,
so they know their queries will always reflect the latest data situation.
Cloud Spanner provides strong consistency and massive scalability, which means that, for
organizations, this is no longer a trade-off.
Plus, it provides enterprise-grade security.
This makes it ideal for organizations that want scalability for their databases, whether it's
within a region or across the world.
It's great for mission-critical online transaction processing, and because it's all managed, it
dramatically reduces the operational overhead needed to keep the database online and
serving traffic.
Cloud SQL and Cloud Spanner are examples of databases that enable customers to
manage high volumes of transactional data at speed and at scale.
With Google Cloud databases, businesses can build and deploy faster, deliver
transformative applications, and maintain portability and control of their data.
Databases are one of the two main types of data storage systems in the Cloud.
The second is data warehouses.
Data warehouses allow businesses to unlock insights and take intelligent action.
Watch the next video to learn more.
[upbeat music] Bary: In the last video, we explored Cloud databases and how they enable
businesses to ingest and use high volumes of transactional data. Let's now discuss the
second type of data storage system in the Cloud: data warehouses. While databases store
transactional data in an online fashion, data warehouses assemble data
from multiple sources, including databases. Databases are built and optimized to enable
ingesting large amounts of data from many different sources efficiently. However, data
warehouses are built to enable rapid analysis of large and multidimensional datasets. For
example, a dataset may capture every online sale every day of the week.
But if you want to analyze those sales to identify trends or even to get the sum of total
sales each day, you need a data warehouse. Perhaps you also want to identify sales
trends by combining data such as new product rollout dates, marketing campaign
language, and operational efficiency data.
Think of the data warehouse as the central hub for all business data. Different types of
data can be transformed and consolidated into the warehouse, so they're useful for
analysis. In particular, a Cloud data warehouse allows businesses to consolidate data that
is structured and semi-structured. Remember that unstructured data tends
to be unorganized and qualitative. In other words, it wouldn't fit in a spread sheet. When
combined with connector tools, data warehouses can transform unstructured data into
semi-structured data that can be used for analysis. Let's look an example. Consider the
online hotel booking example we examined in the previous video.
Suppose they now want to do even more with their data. They want to use multiple types
and sources of data to gain insights about hotel quality, and ultimately to improve their
service to customers. They identify a list of possible data sources from their end-to-end
customer journey such as:
number of bookings by type of room, number of guests, and time of year; overall
satisfaction during their stay and satisfaction with hotel staff, amenities, food, and check-in
and check-out processes; customer posts on social media platforms by sentiment,
location, or specific event at the hotel; and customer feedback and complaints captured
via the website,
mail, or in person at the customer service desk. All these different data types and formats
are ingested and assembled into a data warehouse through different channels. The
business can query the data warehouse quickly and at scale to derive meaningful insights.
They can take their business's goal one step further
and use the source data to build machine learning models to surface personalized hotel
recommendations and tailored booking experiences for customers. We'll talk more about
that in the next module. For now, let's look at a Google Cloud leading data warehouse
solution, BigQuery. BigQuery is a fully-managed data warehouse with downtime free
upgrades
and maintenance and seamless scaling. Most of all, BigQuery allows you to analyze
petabytes of data using incredibly fast speeds and zero operational overhead. This means
that as an organization, you can focus on analyzing your data to find meaningful insights
instead of spending time and resources on maintenance. Most data warehouse providers
link storage and compute together. So customers are charged for compute capacity,
whether they're running a query or not. Importantly, BigQuery is serverless. This doesn't
mean that there's no server. It means that resources such as compute power are
automatically provisioned behind the scenes as needed to run your queries.
So businesses do not pay for compute power unless they're actually running a query.
Ocado is one of the world's largest online-only grocery retailer. It experienced significant
growth in its early years of business. As a result, it began to find that its old databases just
weren't fast enough to meet business demands.
Now they've migrated to Google Cloud and use BigQuery. They found that query results
are delivered 80 times faster and at 30 percent less costs. Let's look at another example.
Bueno is a Software as a Service, or SaaS company, that helps businesses meet their
sustainability goals by improving building systems.
Buildings are usually equipped with various networks that control and operate the facilities
they contain. But air conditioning, lighting, and security systems often exist in their own
silos with no link between them for communication. Bueno aims to bridge this gap by using
technology to gather data and provide better transparency
between the different systems, so that the building sector can use this information for fault
detection, optimization, and business intelligence. Bueno has used Cloud technology from
the beginning. They decided to migrate to Google Cloud for a range of reasons, one being
that they wanted to use existing data to better understand their customers and their
business.
They do this using BigQuery. There are two other tools they use: Pub/Sub and Dataflow.
Pub/Sub is a service for real-time ingestion of data, whereas Dataflow is a service for large
scale processing of data. Remember how I described data warehouses as the central hub
for data to flow into?
Well, these two different services, Pub/Sub and Dataflow, can work together to bring
unstructured data into the Cloud and transform it into semi-structured data. This
transformed data can then be sent directly from Dataflow to BigQuery, where it becomes
immediately available for analysis. These tools enabled Bueno to unlock new insights
about their customers,
such as equipment and site level data, weather data, and even the customers' last visits.
Now customers can use Bueno system to look for insights and discover new things that
may require action being taken. Their customers can generate a work order for
maintenance to be done and log the cost of the job.
Bueno can then use that data to verify the value and accuracy of their analytics. Now that
you understand the differences between data warehouses and databases and have
learned about a few Google Cloud services, let's briefly explain how unstructured data can
be stored in data lakes.
[upbeat music] Bary: In earlier videos, I covered databases and data warehouses as types
of data storage solutions.
Now let's look at data lakes.
This is another type of data management solution that stores structured, semi-structured,
and unstructured data.
Data lakes are repositories for raw data and tend to serve many purposes.
For example, they often hold backup data, which helps businesses build resilience against
unexpected harm affecting their data.
In other words, businesses are protected against data loss.
They also hold data that is historic and not relevant to day-to-day business operations.
Let's look at a Google Cloud data lake service.
One way to classify an organization's requirements for storage is by how often they need
to access the data.
Cloud storage is a service that enables you to store and serve Binary Large OBject, or
BLOB data.
BLOBs are typically images, audio, or other media objects.
Cloud Storage provides organizations with different options so they can tailor their object
storage based on their access needs.
In fact, some of the key benefits of Google Cloud Storage are: you can store unlimited
data with no minimum amount required, low latency-- you can retrieve your data as often
as you'd like-- and you can access it from anywhere in the world.
Suppose, for instance, your organization is storing data that is frequently accessed from
around the world.
This might be data that serves website content, or mobile applications, or streaming
videos.
For this type of data, Cloud Storage offers multiregional storage.
It's ideal for serving content to users worldwide.
We talked in the last video about Spotify.
Spotify uses Cloud Storage to serve music to users around the world.
Because Cloud Storage stores geographically-dispersed copies of your data, your
organization is less likely to lose its data in the case of a disaster.
Regional storage is also offered by Cloud Storage.
This is ideal when your organization wants to use the data locally.
It gives you added throughput in performance by storing your data in the same region as
your compute infrastructure.
This is a great choice for internal use cases such as data analytics and machine learning
jobs.
For data that will be accessed less often, Cloud Storage offers Nearline, Coldline, and
Archive storage classes.
Nearline is best for data you don't expect to access more than once per month, such as
multimedia file storage or online backups.
Coldline is best for data that you plan to access at most once per 90 days or quarter.
Archive is best for data that you plan to access at most once per year, such as archive
data or as a backup for disaster recovery.
Now let's look at another example of Cloud Storage in use.
In the financial industry, voice transcription has always been tricky because it's jargon-
heavy, and trading conversations are sensitive in nature.
Cloud9 Technologies is a company that provides an innovative voice communication and
analytics platform specifically built for the unique compliance and management demands
of financial markets.
Their platform leverages Google Cloud machine learning services to automate voice-to-
text transcription of trading conversations.
The platform also uses Cloud Storage to house the enormous quantities of information
gathered.
The data is encrypted by default, and any sensitive information such as names is
automatically redacted in the storage process.
All right.
So far, we've covered three different types of data management systems: databases, data
warehouses, and data lakes.
Each delivers value to businesses in different ways, enabling them to leverage data at
scale.
These systems and tools like Pub/Sub, Dataflow, and BigQuery enable businesses to
ingest and analyze data.
How is that data then served to the business to generate insights?
I'll cover the answer in the next video.
[upbeat music] Bary: Throughout this module, you've learned about databases, data
warehouses, and data lakes as solutions to store and manage your data.
Now let's look at business intelligence solutions that serve your data in the form of insights
at scale.
The challenge businesses often face is identifying the right business intelligence solution.
Some solutions are too complex and not accessible by everyone outside the data
engineering or data analysis teams.
This means other teams have to put in requests and wait for answers, which defeats the
purpose of gaining real-time insights.
Other solutions let everyone in the business perform their own data analysis, but they can
only perform their analysis with portions of the available data.
This means that only a few people, or possibly no one, has a full view of the company's
business data.
Looker is a Google Cloud business intelligence solution.
Put simply, it's a data platform that sits on top of an analytics database and makes it
simple to describe your data and define business metrics.
Once you have a reliable source of truth for your business data, anyone on your team can
analyze and explore it, ask and answer their own questions, create visualizations, and
explore row-level details.
With everyone exploring this data individually, it's possible to discover greater insights and
allow teams to share their findings easily with a simple link.
And every answer becomes the inspiration to explore more.
Let's look at an example.
Gaming companies have to constantly innovate to remain relevant in a crowded market.
Mobile and video gaming analytics provides insight into user behaviors.
By investigating how users interact with their games, a business can develop a better
understanding of their audience and use that to create more compelling games.
For example, gaming analytics can be used by product managers, developers, and
marketers to see which features are used most, discover levels or areas in the game
where players are getting stuck, and identify player lifetime value.
With this information, gaming companies can then create better, more targeted content for
their players based on the needs, interests, and challenges of their users.
Using Looker, gaming companies can combine marketing and behavioral data to acquire
the right types of players for their games.
This combination allows them to connect their revenue to marketing spend and determine
which networks, campaigns, and creative strategies gain greater results.
But some data like player retention and repeated gameplay are traditionally more difficult
to analyze than other pieces of data.
Looker leverages the power of data warehouses like BigQuery to make this data useful.
For example, it can standardize important metrics to create greater consistency and
accuracy when other data analytics tools can only produce siloed results.
This is just one example of how an effective business intelligence solution can enable
businesses to transform to better serve their customers.
Now let's focus on how businesses can create new value with data.
In the next module, I'll discuss machine learning and artificial intelligence and explore how
they enable digital transformation.
Quiz 2:
1.
What is a data lake? Select the correct answer.
2.
How is data integrity achieved? Select the two correct answers.
3.
Lydia manages a large hotel chain. How can Looker enable Lydia to better serve her
customers? Select the correct answer.
4.
How do databases and data warehouses differ? Select the correct answer.
Databases efficiently process structured data, while data warehouses rapidly process
unstructured data.
Data warehouses efficiently process structured data, while databases rapidly process
software data.
Databases efficiently ingest large amounts of real-time data, while data warehouses
rapidly analyze multi-dimensional datasets.
Data warehouses efficiently ingest large amounts of real-time data, while databases
rapidly analyze large, multi-dimensional datasets.
5.
Which of the following is an advantage for storing and managing data in the public cloud?
Select the two correct answers.
Increased CapEx
Increased data structure
Elasticity
Speed
Increased coverage
Data analytics has historically been about understanding what has already happened in the
past. With advancements in cloud technology, and the availability of digital devices, we're
generating volumes of data everyday. This module explores the capabilities of machine
learning and what becomes possible with volumes of data. It builds on the previous module
by explaining ways a machine can learn to predict, categorize, and recommend based on
lots and lots of data. Throughout the module, you'll uncover examples of customers who
have used machine learning to bring innovative solutions to their customers.
Introduction:
Bary: In previous modules, we explored the critical role that data plays in digital
transformation.
I also covered the importance of how you can collect, store, and access data to enable
effective data driven decisions.
Volumes of data and the right cloud based tools are the foundation for using machine
learning and artificial intelligence or ML and AI.
To set the foundation for this module, I'll begin with the definition for ML and AI.
Then I'll cover some important data quality considerations that influence the efficacy of
machine learning models.
Finally, I'll highlight several real world use cases in which customers have leveraged ML to
radically transform their business.
Let's begin.
[upbeat music] person: To understand machine learning, you have to start by thinking
about data in your business.
Do you have a dashboard that analysts view every day?
Or maybe there's a report that your managers review each month.
Both the dashboard and the report are examples of backward-looking data.
They look at what happened in the past.
Most data analysis in your organization is probably backward-looking, analysis of historical
data to calculate metrics or identify trends.
But to create value in your business, you need to use that data to make decisions for
future business.
Let me give you an example.
Suppose Maya leads the business strategy and operations team for an international
airline.
She might be looking at historical annual reports to establish a trend in customer
purchasing patterns.
She'd probably use this data to forecast annual sales and operational costs, but there's
nothing new or transformational about this decision-making process.
What if Maya could predict the satisfaction rate of each flight or predict customer
complaints and get ahead of them?
To do this effectively, she'd need to access a lot more data, including number of
passengers per flight, duration of each flight, customer satisfaction ratings per flight,
number of customer complaints per flight, factors that contributed to customer complaints,
weather reports, seasonal indicators, and time to resolution for customer complaints.
With all of these various data points, she might be able to predict the quality of a single
flight and its customer complaints, but there are hundreds of flights each day.
The real value for Maya would come from being able to make predictive insights for all
flights all year round.
More importantly, it would be far more valuable if she could dynamically adjust pricing or
staff assignments or even catering based on the predictions.
ML unlocks these capabilities and more.
But what exactly is machine learning?
To understand that, we need to step back and define artificial intelligence first.
Artificial intelligence, or AI, is a broad field or term that describes any kind of machine
capable of acting autonomously.
Machine learning or ML is a specific branch within that field.
Specifically, ML refers to computers that can learn from data without using a complex set
of rules.
Machine learning solves many kinds of problems.
For the purposes of this course, we'll focus on a definition of ML that applies to numerical
or classification problem types.
I'll use this ML definition to guide your learning throughout the module, and here it is.
ML is a way to use standard algorithms or standard models to analyze data in order to
derive predictive insights and make repeated decisions at scale.
Put simply, it's a way of teaching a computer how to solve problems by feeding it
examples of the correct answers.
Usually these problems are about predicting something.
For example, you can predict how long it takes to travel from one location to another by
feeding the computer examples of the completed journeys.
Similarly, you can predict the estimated taxes owed by feeding the computer examples of
tax filings.
You'd do the same for predicting weather patterns over the next few days.
More technically speaking, suppose you wanted to use machine learning to accurately
label a photo of a fruit or vegetable the model has never seen before.
You train an ML model or the standard algorithm using lots of images of fruits and
vegetables, input data, and their correct labels, output data.
As you train the ML model with more input data and corresponding output data, its
predictions become more accurate when you feed it an image of a fruit or vegetable it
hasn't seen before.
Now that was an overly simplified example.
We'll cover many more real-world examples throughout the module.
Ultimately, the purpose of ML in a business is the same as all other disruptive new
technologies-- to enable organizations to better achieve their missions.
To apply machine learning effectively, you need lots of data.
In fact, you need lots of high-quality data to generate more and more accurate, meaningful
predictions.
In the next video, I'll examine factors that impact data quality.
Data Quality
[upbeat music] person: In the previous video, you learned that ML is a way to use standard
algorithms or models to analyze data. This analyzed data can be then used to derive
predictive insights and make repeated decisions. The accuracy of those predictions,
however, depend on large volumes of data
that are free of bugs. Let me use a software analogy to explain what I mean by bugs. In
traditional software development, a bug is a mistake in the code that causes unexpected
or undesired behavior. In ML, even though there can be bugs in the implementation of an
algorithm,
bugs in data are far more common. Consider this example. A few years ago, some
Googlers wanted to use ML to help diagnose diabetic retinopathy, which is the fastest
growing cause of blindness, potentially affecting more than 415 million diabetic patients
worldwide. Working closely with doctors in the US and India,
these Googlers created an ML model that would diagnose diabetic retinopathy almost as
well as ophthalmologists can. They trained an ML model using labeled images of the
backs of eyes, each label being the diagnosis. Because humans were involved in the
labeling of the images, the labeling system is not completely objective.
The data may have included incorrect labels or even human bias, which is then
propagated into the ML model itself. So how would you ensure that you have optimal data
quality when training an ML model? The best data has three qualities. One, it has
coverage. Two, it's clean or consistent.
And three, it's complete. I'll explain each one. Data coverage refers to the scope of a
problem domain and all possible scenarios it can account for. In other words, all possible
input and output data. Let's imagine an auto manufacturing use case where the goal is to
use ML to automatically identify defects in car parts.
Let's assume also that the car parts are divided into red and blue. If red and blue make up
all the possible scenarios, but you only train your model with red parts, the model might
not be able to detect defects in blue car parts when it's presented with new data.
So more data and broader coverage produce a more accurate ML model. The second
quality of good data is its cleanliness. This is sometimes called data consistency. Data is
considered dirty or inconsistent if it includes or excludes anything that might prevent an ML
model from making accurate predictions. This is a lot like the errors or bugs we talked
about earlier.
The simplest form of inconsistency in data is data format. Suppose, for instance, you want
to analyze data from multiple documents, and one of the data points on each document is
a timestamp. The timestamps from all sources have to be of the same format, otherwise
the data is considered dirty.
Let's return to our manufacturing scenario. Where do you think inconsistencies could
occur? Well, if you're using photos to look for defects in car parts, you need to be careful
with which images you choose to train the model. For example, if the images have
shadows in them, the model won't know whether shadows
are part of an object or not. If you want to make predictions from images that are
supposed to have shadows, that's okay-- otherwise your data is dirty. I mentioned
incorrect labels earlier, which is another form of dirty data. In this scenario, you might have
parts that were labeled as fractured,
but in reality they were discarded because they were the wrong size. There are lots of
examples of human error that causes dirty data as well. Imagine the sales and retail
industry, for example. If someone enters incorrect purchase data in a data storage system,
this creates dirty data. If there's an error in an automated service,
or if a transaction is recorded incorrectly every time the register runs out of paper, this also
produces dirty data. The more incorrect or dirty data you have, the more correct and clean
data you'll need to provide a counterbalance so the ML model learns the correct outcome.
Another quality of good data is completeness.
This refers to the availability of sufficient data about the world to replace human
knowledge. Think of this as the various data categories or themes that help complete a
user's profile such as address, gender, or height. Incomplete data can limit the
performance of an ML model. We say there's incomplete data
when there's a lack of better data, there are mistaken expectations about how ML works
and what it's capable of, or program design and implementation are poorly executed. Let's
go back to our manufacturing example. Imagine that one of the major sources of defects is
overheating, but you're not collecting temperature data.
That's an example of incomplete data. Even if you start collecting temperature data now,
you may not have the historical data that maps to past examples of good and fractured
parts. Another form of incomplete data is the number of cases for all possible scenarios
the data is intended to cover.
In the same manufacturing example, your goal is to match the labels, good condition and
fractured, with every part. If axle is one item you're evaluating for defects, you'll need
examples of axles in good condition and fractured. If you don't have that data, your data is
incomplete. Remember--data is the tunnel
through which your model views the world. Anything the model can't see it assumes
doesn't exist. For example, if a model was given an image that only showed what's on the
left, it might think the road was open and traffic free. In reality, if I show you the full image,
the road is just closed. The good news is that most of these problems can be solved
simply by getting more data, but you have to be purposeful in collecting that data. Do you
need to improve coverage, improve cleanliness or consistency, or improve completeness?
Remember--data is central to ML.
If you're planning to use it, you'll need to account for as many possibilities when preparing
your data before training an ML model. Now you might be wondering what kinds of skills or
expertise you need to begin using machine learning in your organization. Before you go
too far down that path,
I want to reassure you that ML has become more accessible than you think. In the next
video, I'll explore some Google Cloud ML solutions that you can use-- some even right
away-- to bring new value into your business.
person: People often assume that you need a robust technical team that includes data
analysts, data engineers, and even ML engineers to leverage the capability of Cloud and
ML. They also assume that only then can you build custom ML models that meet your
organization's needs. This can seem costly and daunting.
The reality is that ML is more accessible now than ever before. In fact, Google Cloud
democratizes AI by providing a range of ML and AI solutions that enable businesses to
leverage the power of ML and AI without their traditional costs and efforts. Depending on
your organization's data science expertise and needs,
Google Cloud ML and AI offerings provide the options to use a pretrained ML model using
Google's data such as Vision API, train an existing ML model with your own data, build a
custom ML model and train it using your own data. For example, Google Cloud AI Platform
is
a unified simply-managed platform that makes machine learning easy to adopt by analysts
and developers. It's not limited to data scientists. It provides modern ML services with the
ability to generate your own tailored models and use pretrained models so that you can
add innovative capabilities to your own applications.
It also includes the Google Cloud AI Hub, a hosted repository of plug-and-play AI
components. If you have data scientists who are already working with ML, they might
already be using TensorFlow. It has a comprehensive, flexible ecosystem of tools,
libraries, and community resources. TensorFlow lets researchers push innovation in ML
and lets
developers easily build and deploy ML-powered applications. It was first developed for
Google's internal use, but it's now open-source so that everyone can benefit. TensorFlow
can also take advantage of tensor processing units or TPUs, which are hardware devices
designed to accelerate ML workloads with TensorFlow by 15-30x. Google Cloud makes
them available in
the Cloud with Compute Engine virtual machines. Each Cloud TPU offers a large amount
of performance, and because you pay only for what you use, there's no upfront capital
investment required. If your data scientists need to work on a new problem, Google's AI
Hub has notebook samples they can use to learn about,
train, and deploy the new model they need. The AI Hub is a hosted repository of plug-and-
play AI components, including end-to-end AI pipelines and out-of-the-box algorithms.
Google Cloud AI Platform is a fully managed machine learning service that allows Cloud
customers to create machine learning models, train them, and use them to integrate
predictive analytics into their applications and data processing pipelines. Suppose you
don't have specialized data scientists but do have business analysts and developers. What
do you do? This is where our machine learning as a service and platform as a service
offerings are useful. Google Cloud can help app developers build smart apps
using application programming interfaces or APIs. APIs are simple methods and tools to
connect various applications. They can be deployed in a virtual private Cloud, on-
premises, or in Google's public Cloud. They allow developers to quickly and easily train
custom models regardless of their level of experience. To better understand this,
let's imagine a developer building a mobile app that users will submit photos to. The
developer needs the app to recognize what the images are and filter out any that aren't
safe for work. With AI Hub, the developer can search for a suitable API and easily
incorporate an ML service into their project.
For instance, the developer might choose Vision API. This offers powerful pretrained
machine learning models using Google's data to automatically detect faces, objects, texts,
and even sentiment in images. The developer can therefore use Vision API to assign
labels to images and quickly classify them into millions of predefined categories.
But categorizing images is sometimes more complex. Think back to our example from the
previous video where ML was used to recognize defects in car parts. Vision API can tell
the difference between generic images found in Google's database, like the difference
between a wheel and an engine, but it won't be able to identify
good or defective parts for a specific car manufacturing company. In this case, a developer
could use AutoML Vision API. This API automates the training of your own custom
machine learning models. This means a developer can simply upload a custom batch of
images or ingest them into AutoML Vision
directly from Cloud Storage and train an image classification model with the easy-to-use
graphical interface. Models can be further optimized and deployed for use directly from the
Cloud. The APIs covered in this module that enable access to ML and AI services are just
a small sample of Google Cloud offerings.
You can also find APIs for categorizing videos, converting audio to text or text audio,
understanding the natural language, translating from one language into another, and so
much more. In fact, many of the most innovative applications for machine learning, several
of these kinds of applications are combined. For example, what if whenever one
of your customers contacted your call center your application could automatically answer
simple queries in natural language and route more complex queries to an agent? The
Google Cloud AI Platform makes that kind of meaningful interactivity possible. Another
example is the Google Translate API. You might already be familiar with Google Translate,
a free service available instantly via search when you type in, for example, dog in Spanish.
The API allows global businesses to access the ML service to provide localized
information to their customers and employees in real time. You've now learned that there
are many different opportunities to use ML and
AI with Google Cloud to transform your day-to-day work. The ability to leverage ML is now
accessible across the organization through APIs that enable innovation and help
businesses achieve their mission. In the next video, I'll cover some common opportunities
for using ML in day-to-day business and highlight some real-world examples where
businesses have transformed using Google solutions.
[relaxed electronic music] Barry: So far, we've defined machine learning, reviewed the
importance of quality data, and explored a few Google ML and AI offerings.
In this video, I'll cover four common business problems that ML is particularly suited to
solve.
In each case, ML is uniquely placed to create new business value when it can learn from
data to automate action and processes and to customize responses to behavior.
The four common business problems are replacing or simplifying rule-based systems,
automating processes, understanding unstructured data, and creating personalized
customer experiences.
Let's start with the rule-based systems.
I'll use Google Search as an example.
Suppose, for instance, you want to search for the Giants, a sports team.
Ah, but wait, if you type in "giants," should the search results show you San Francisco
Giants or New York Giants?
One is a baseball team based in California, the other is an American football team based
in New York.
A few years ago, the search engine code base used hand-coded rules to decide which
sports team to show a user.
If the query is giants and the user is in the Bay Area, show them results about San
Francisco Giants.
If the user is in the New York area, show them results about New York Giants.
If they're anywhere else, show them results about tall people.
This is for just one query.
If you multiply this by thousands of different queries and by different users each day, you
can probably imagine how complex the whole code base would become.
This is a perfect problem for ML to solve.
If we had all of the data that tells us which search results users clicked on per query, why
not train a machine learning model to predict the rank for search results?
That was the idea behind RankBrain, Google's deep neural network for search ranking,
which was introduced in 2015 by Google's engineers.
It outperformed many human built signals and, using ML, Google was able to replace
many of the hand-coded rules.
The neural network ended up improving search quality dramatically.
In fact, Google's neural network is a key differentiator among similar technologies in the
market.
An added benefit of RankBrain or any machine learning model is that the system could
continually improve itself based on new user queries and new user clicks.
Search is one example of how ML leverages vast amount of data to provide highly
accurate predictions in a rule-based system.
A second opportunity for using machine learning is to automate processes where ML
makes predictions and repeated decisions at scale.
Let's look at an example.
Ananda Development is a property developer headquartered in Thailand that decided to
use ML to automate the handover stage of their property sales.
Before embracing cloud and ML, the handover process included multiple manual steps
and was prone to errors.
In any sale before the buyer paid for the property, an Ananda development inspector and
the buyer had to conduct a detailed check of the condominium for any building variations
that needed to be fixed.
Ananda development inspectors would visually check hundreds of items a day for
problems and list any issues on paper.
Prospective buyers might also take notes and photographs of the findings.
On average, a single inspector would have to check several hundred items per day.
Multiplied across several inspectors and multiple projects, this workload adds up.
This laborious manual process was also subject to occasional human error.
That meant data could be omitted or recorded incorrectly.
Ananda Development decided to build an app using machine learning to make the
inspection process more efficient.
The app used Google Speech-to-Text API to recognize and convert Thai language speech
in a version of English spoken by many Thai people to text.
The company found the product had an accuracy rate of over 90 percent in recognizing
Thai speech and high accuracy rates in recognizing Thai English.
The inspection process is now more efficient and accurate.
As another benefit, buyers also receive copies of electronic inspection reports and
updated status notes as defects are repaired.
Another class of ML use case is for understanding unstructured data like images, videos,
and audio.
Before I dive into examples of how you can use ML to understand unstructured data, I
need to acknowledge a key point.
So far in the module, we've been talking about a specific type of ML that uses structure
data to make predictions at scale.
Now I'm going to cover how ML can also be used to understand unstructured data.
Unstructured data is data that can't be directly compared to other data.
For example, some characteristics of books are structured like the title, their publisher,
location of publishing, number of pages.
Again, this is known as tabular data, but it's not easy directly to compare the content of the
two books or to precisely determine how they are related or different.
Even human experts might not agree on exactly how similar two books are.
Open text or language is just one example of unstructured data.
Other examples include pictures, videos, and audio.
A great example of using ML for unstructured data comes from Ocado.
Ocado is one of the world's largest online-only grocery supermarkets.
Previously, all email sent to Ocado would go to a central mailbox for sorting and
forwarding by a person.
This process was time-consuming and would lead to a poor customer experience.
To improve and scale this process, Ocado used ML to automatically route emails to the
department that needs to process them.
This new process eliminated multiple rounds of reading and triaging.
Here, Ocado used ML to both automate a process and understand unstructured data.
Specifically, they used ML's ability to process natural language to identify the customer
sentiment and the topic of each message so they could route it immediately to the relevant
department.
Now let's look at a fourth example, personalization.
Many businesses use ML to personalize user experiences.
Personalization is the difference between a newspaper and an email.
A newspaper article can be interesting, but it's written to appeal to thousands or millions of
people.
However, an email is often tailored just to one person by including their name, for
example.
YouTube is a great example of personalization in action.
When you watch a video on YouTube, you're probably noticed on the homepage or to the
right of your video, there's a list of recommended videos that are up next.
When your video finishes, these new videos will play.
And we'd like them to be interesting and useful for you.
This feature keeps the user interested and engaged with the product.
By providing personalized recommendations using ML, YouTube can deliver a better
service to their customers while also increasing their ad revenue.
Many businesses use the same approach to feature product recommendations on their
websites personalized to individual users.
Other businesses use personalization to surface new content like music recommendations
or films to stream.
Great, let's do a recap.
So far, I discussed some common applications of machine learning such as replacing rule-
based systems, automating business processes, understanding unstructured data, and
personalizing applications.
It's important to remember that ML models aren't standalone and that solving complex
business challenges requires combinations of models.
There are, of course, many more applications of machine learning for businesses, and you
can learn even more about them in the course Managing Machine Learning Projects with
Google Cloud.
Next, I'll summarize the key topics we covered in this course.
Quiz 3:
1.
The finance team just posted an open role for a Financial Manager. Jessica, the recruiter,
wants to use a machine learning (ML) model to predict when the new position would be
filled. Why is this use case not suitable for ML? Select the correct answer.
The problem statement is too vague and wouldn’t benefit the overall company.
Jessica would need access to sensitive employee data to train a custom ML model.
This is an infrequent decision for a specific role and department.
Once the prediction is made, the ML model is no longer useful.
2.
What are two common business problems that machine learning solves? Select the two
correct answers.
3.
Olivia wants to use a machine learning (ML) model to categorize product images from
social media and use that information to make predictions about future products. Her team
includes experienced developers, but no specialized data scientists or ML experts. Which
Google Cloud solution can they leverage to do this? Select the correct answer.
4.
One characteristic of high quality, bug-free data is that it has coverage. What are the other
two qualities? Select the two correct answers.
Simplicity
Structure
Completeness
Cleanliness
Clarity
5.
Machine learning is a subset of which body of knowledge? Select the correct answer.
Augmented reality
Virtual reality
Artificial intelligence
Automated intelligence
6.
Which of the following describes data completeness? Select the correct answer.
Anything that can prevent the ML model from accurately predicting the correct outcome
The availability of sufficient data about the world to replace human knowledge
A collection of 10 or more datasets about a domain to replace human knowledge
The problem scope or knowledge domain that the data covers
Summary
This module provides a summary of the key points covered in each module and steps you
can take to continue your learning.