Google Cloud Platform Data Science
Google Cloud Platform Data Science
Platform for
Data Science
A Crash Course on Big Data, Machine
Learning, and Data Analytics Services
—
Dr. Shitalkumar R. Sukhdeve
Sandika S. Sukhdeve
Google Cloud
Platform for Data
Science
A Crash Course on Big Data,
Machine Learning, and Data
Analytics Services
Acknowledgments�����������������������������������������������������������������������������xiii
Preface�����������������������������������������������������������������������������������������������xv
Introduction��������������������������������������������������������������������������������������xvii
v
Table of Contents
vi
Table of Contents
Bibliography�������������������������������������������������������������������������������������213
Index�������������������������������������������������������������������������������������������������215
vii
About the Authors
Dr. Shitalkumar R. Sukhdeve is an
experienced senior data scientist with a strong
track record of developing and deploying
transformative data science and machine
learning solutions to solve complex business
problems in the telecom industry. He has
notable achievements in developing a machine
learning–driven customer churn prediction
and root cause exploration solution, a
customer credit scoring system, and a product
recommendation engine.
Shitalkumar is skilled in enterprise data science and research
ecosystem development, dedicated to optimizing key business indicators
and adding revenue streams for companies. He is pursuing a doctorate
in business administration from SSBM, Switzerland, and an MTech in
computer science and engineering from VNIT Nagpur.
Shitalkumar has authored a book titled Step Up for Leadership
in Enterprise Data Science and Artificial Intelligence with Big Data:
Illustrations with R and Python and co-authored a book titled Web
Application Development with R Using Shiny, Third Edition. He is a
speaker at various technology and business events such as World AI Show
Jakarta 2021, 2022, and 2023, NXT CX Jakarta 2022, Global Cloud-Native
and Open Source Summit 2022, Cyber Security Summit 2022, and ASEAN
Conversational Automation Webinar. You can find him on LinkedIn at
www.linkedin.com/in/shitalkumars/.
ix
About the Authors
x
About the Technical Reviewer
Sachin G. Narkhede is a highly skilled data
scientist and software engineer with over
12 years of experience in Python and R
programming for data analytics and machine
learning. He has a strong background in
building machine learning models using scikit-
learn, Pandas, Seaborn, and NLTK, as well as
developing question-answering machines and
chatbots using Python and IBM Watson.
Sachin's expertise also extends to data visualization using Microsoft
BI and the data analytics tool RapidMiner. With a master's degree in
information technology, he has a proven track record of delivering
successful projects, including transaction monitoring, trade-based money
laundering detection, and chatbot development for banking solutions. He
has worked on GCP (Google Cloud Platform).
Sachin's passion for research is evident in his published papers on
brain tumor detection using symmetry and mathematical analysis. His
dedication to learning is demonstrated through various certifications and
workshop participation. Sachin's combination of technical prowess and
innovative thinking makes him a valuable asset in the field of data science.
xi
Acknowledgments
We extend our sincerest thanks to all those who have supported us
throughout the writing process of this book. Your encouragement,
guidance, and unwavering belief in our abilities have contributed to
bringing this project to fruition.
Above all, we express our deepest gratitude to our parents, whose
unconditional love, unwavering support, and sacrifices have allowed us
to pursue our passions. Your unwavering belief in us has been the driving
force behind our motivation.
We are grateful to our family for their understanding and patience
during our countless hours researching, writing, and editing this book.
Your love and encouragement have served as a constant source of
inspiration.
A special thank you goes to our friends for their words of
encouragement, motivation, and continuous support throughout this
journey. Your belief in our abilities and willingness to lend an ear during
moments of doubt have been invaluable.
We would also like to express our appreciation to our mentors
and colleagues who generously shared their knowledge and expertise,
providing valuable insights and feedback that have enriched the content of
this book.
Lastly, we want to express our deepest gratitude to the readers of this
book. Your interest and engagement in the subject matter make all our
efforts worthwhile. We sincerely hope this book proves to be a valuable
resource for your journey in understanding and harnessing the power of
technology.
xiii
Acknowledgments
Once again, thank you for your unwavering support, love, and
encouragement. This book would not have been possible without each
and every one of you.
Sincerely,
Shitalkumar and Sandika
xiv
Preface
The business landscape is transforming by integrating data science
and machine learning, and cloud computing platforms have become
indispensable for handling and examining vast datasets. Google Cloud
Platform (GCP) stands out as a top-tier cloud computing platform, offering
extensive services for data science and machine learning.
This book is a comprehensive guide to learning GCP for data science,
using only the free-tier services offered by the platform. Regardless of
your professional background as a data analyst, data scientist, software
engineer, or student, this book offers a comprehensive and progressive
approach to mastering GCP's data science services. It presents a step-by-
step guide covering everything from fundamental concepts to advanced
topics, enabling you to gain expertise in utilizing GCP for data science.
The book begins with an introduction to GCP and its data science
services, including BigQuery, Cloud AI Platform, Cloud Dataflow, Cloud
Storage, and more. You will learn how to set up a GCP account and
project and use Google Colaboratory to create and run Jupyter notebooks,
including machine learning models.
The book then covers big data and machine learning, including
BigQuery ML, Google Cloud AI Platform, and TensorFlow. Within this
learning journey, you will acquire the skills to leverage Vertex AI for
training and deploying machine learning models and harness the power of
Google Cloud Dataproc for the efficient processing of large-scale datasets.
The book then delves into data visualization and business intelligence,
encompassing Looker Studio and Google Colaboratory. You will gain
proficiency in generating and distributing data visualizations and reports
using Looker Studio and acquiring the knowledge to construct interactive
dashboards.
xv
Preface
xvi
Introduction
Welcome to Google Cloud Platform for Data Science: A Crash Course on
Big Data, Machine Learning, and Data Analytics Services. In this book, we
embark on an exciting journey into the world of Google Cloud Platform
(GCP) for data science. GCP is a cutting-edge cloud computing platform
that has revolutionized how we handle and analyze data, making it an
indispensable tool for businesses seeking to unlock valuable insights and
drive innovation in the modern digital landscape.
As a widely recognized leader in cloud computing, GCP offers a
comprehensive suite of services specifically tailored for data science
and machine learning tasks. This book provides a progressive and
comprehensive approach to mastering GCP's data science services,
utilizing only the free-tier services offered by the platform. Whether you’re
a seasoned data analyst, a budding data scientist, a software engineer, or
a student, this book equips you with the skills and knowledge needed to
leverage GCP for data science purposes.
Chapter 1: “Introduction to GCP”
This chapter explores the transformative shift that data science and
machine learning brought about in the business landscape. We highlight
cloud computing platforms' crucial role in handling and analyzing vast
datasets. We then introduce GCP as a leading cloud computing platform
renowned for its comprehensive suite of services designed specifically for
data science and machine learning tasks.
xvii
Introduction
xviii
Introduction
xix
CHAPTER 1
Introduction to GCP
Over the past few years, the landscape of data science has undergone a
remarkable transformation in how data is managed by organizations.
The rise of big data and machine learning has necessitated the storage,
processing, and analysis of vast quantities of data for businesses. As a
result, there has been a surge in the demand for cloud-based data science
platforms like Google Cloud Platform (GCP).
According to a report by IDC, the worldwide public cloud services
market was expected to grow by 18.3% in 2021, reaching $304.9 billion.
GCP has gained significant traction in this market, becoming the third-
largest cloud service provider with a market share of 9.5% (IDC, 2021). This
growth can be attributed to GCP’s ability to provide robust infrastructure,
data analytics, and machine learning services.
GCP offers various data science services, including data storage,
processing, analytics, and machine learning. It also provides tools for building
and deploying applications, managing databases, and securing resources.
Let’s look at a few business cases that shifted to GCP and achieved
remarkable results:
2
Chapter 1 Introduction to GCP
3
Chapter 1 Introduction to GCP
4
Chapter 1 Introduction to GCP
5
Chapter 1 Introduction to GCP
6
Chapter 1 Introduction to GCP
7
Chapter 1 Introduction to GCP
Your project is now set up, and you can start using GCP services.
Note If you are using the free tier, make sure to monitor your usage
to avoid charges, as some services have limitations. Also, you may
need to enable specific services for your project to use them.
Summary
Google Cloud Platform (GCP) offers a comprehensive suite of cloud
computing services that leverage the same robust infrastructure used by
Google’s products. This chapter introduced GCP, highlighting its essential
services and their relevance to data science.
We explored several essential GCP services for data science, including
BigQuery, Cloud AI Platform, Cloud Dataflow, Cloud DataLab, Cloud
Dataproc, Cloud Storage, and Cloud Vision API. Each of these services
serves a specific purpose in the data science workflow, ranging from data
storage and processing to machine learning model development and
deployment.
8
Chapter 1 Introduction to GCP
9
CHAPTER 2
Google Colaboratory
Google Colaboratory is a free, cloud-based Jupyter Notebook environment
provided by Google. It allows individuals to write and run code in Python
and other programming languages and perform data analysis, data
visualization, and machine learning tasks. The platform is designed to be
accessible, easy to use, and collaboration-friendly, making it a popular tool
for data scientists, software engineers, and students.
This chapter will guide you through the process of getting started
with Colab, from accessing the platform to understanding its features and
leveraging its capabilities effectively. We will cover how to create and run
Jupyter notebooks, run machine learning models, and access GCP services
and data from Colab.
Features of Colab
Cloud-based environment: Colaboratory runs on Google’s servers,
eliminating users needing to install software on their devices.
Easy to use: Colaboratory provides a user-friendly interface for
working with Jupyter notebooks, making it accessible for individuals with
limited programming experience.
Access to GCP services: Colaboratory integrates with Google
Cloud Platform (GCP) services, allowing users to access and use GCP
resources, such as BigQuery and Cloud Storage, from within the notebook
environment.
12
Chapter 2 Google Colaboratory
13
Chapter 2 Google Colaboratory
15
Chapter 2 Google Colaboratory
Hands-On Example
Insert text in the notebook to describe the code by clicking the +
Text button.
16
Chapter 2 Google Colaboratory
import random
import matplotlib.pyplot as plt
17
Chapter 2 Google Colaboratory
18
Chapter 2 Google Colaboratory
Importing Libraries
The following is an example code to import libraries into your notebook. If
the library is not already installed, use the “!pip install” command followed
by the library’s name to install:
19
Chapter 2 Google Colaboratory
Once you execute the code, a prompt will appear, asking you to
sign into your Google account and grant Colaboratory the necessary
permissions to access your Google Drive.
Once you’ve authorized Colaboratory, you can access your Google
Drive data by navigating to /content/drive in the Colaboratory file explorer.
To write data to Google Drive, you can use Python’s built-in open
function. For example, to write a Pandas DataFrame to a CSV file in Google
Drive, you can use the following code:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.to_csv('/content/drive/My Drive/Colab Notebooks/data.
csv', index=False)
20
Chapter 2 Google Colaboratory
import pandas as pd
df = pd.read_csv('/content/drive/My Drive/Colab
Notebooks/data.csv')
Visualize Data
To visualize data in Colaboratory, you can use libraries such as Matplotlib
or Seaborn to create plots and charts.
21
Chapter 2 Google Colaboratory
Create a data frame in Python using the Pandas library with the
following code:
import pandas as pd
import numpy as np
This will create a data frame with 100 rows and 2 columns named “age”
and “weight”, populated with random integer values between 20 and 80 for
age and 50 and 100 for weight.
Visualize the data in the Panda’s data frame using the Matplotlib
library in Python. Here’s an example to plot a scatter plot of the age and
weight columns:
plt.scatter(df['age'], df['weight'])
plt.xlabel('Age')
plt.ylabel('Weight')
plt.show()
22
Chapter 2 Google Colaboratory
This will create a scatter plot with the age values on the x-axis and
weight values on the y-axis. You can also use other types of plots, like
histograms, line plots, bar plots, etc., to visualize the data in a Pandas
data frame.
23
Chapter 2 Google Colaboratory
import pandas as pd
import numpy as np
24
Chapter 2 Google Colaboratory
5. X = np.random.randn(n_samples, n_features):
This line generates a random array of shape (n_
samples, n_features) using the np.random.randn()
function from Numpy. Each element in the array is
drawn from a standard normal distribution (mean =
0, standard deviation = 1).
25
Chapter 2 Google Colaboratory
8. df = pd.DataFrame(np.hstack((X, y[:,
np.newaxis])), columns=[“feature_1”, “feature_2”,
“feature_3”, “feature_4”, “feature_5”, “target”]):
This line combines the features (X) and target (y)
arrays into a Pandas DataFrame. The np.hstack()
function horizontally stacks the X and y arrays, and
the resulting combined array is passed to the pd.
DataFrame() function to create a DataFrame. The
columns parameter is used to assign column names
to the DataFrame, specifying the names of the
features and the target.
26
Chapter 2 Google Colaboratory
27
Chapter 2 Google Colaboratory
8. clf = RandomForestClassifier(n_estimators=100):
This line creates an instance of the
RandomForestClassifier class with 100 decision
trees. The n_estimators parameter determines the
number of trees in the random forest.
9. clf.fit(X_train, y_train): This line trains the random
forest classifier (clf) on the training data (X_train
and y_train). The classifier learns patterns and
relationships in the data to make predictions.
28
Chapter 2 Google Colaboratory
import joblib
model = clf # Your trained model
# Save the model
joblib.dump(model, '/content/drive/My
Drive/Colab Notebooks/model.joblib')
29
Chapter 2 Google Colaboratory
app = Flask(__name__)
@app.route("/predict", methods=["POST"])
def predict():
data = request.json[X_test]
model = joblib.load('/content/drive/My
Drive/Colab Notebooks/model.joblib')
predictions = model.predict(data)
print(predictions)
return {'predictions': predictions.tolist()}
if __name__ == "__main__":
app.run()
This example uses the Flask library to serve the model as a REST
API. You can test the API by sending a POST request to the endpoint /
predict with the input data in the request body.
We will now examine the code step-by-step, analyzing each line:
30
Chapter 2 Google Colaboratory
31
Chapter 2 Google Colaboratory
32
Chapter 2 Google Colaboratory
Summary
In this chapter, we learned about Google Colaboratory, a free cloud-
based Jupyter Notebook environment offered by Google. We discovered
its key features, including its cloud-based nature, user-friendly interface,
integration with Google Cloud Platform (GCP) services, sharing and
33
Chapter 2 Google Colaboratory
34
CHAPTER 3
BigQuery
BigQuery is a serverless, highly scalable, cost-effective cloud data
warehouse provided by Google Cloud Platform (GCP). It allows you
to store and analyze large amounts of structured and semi-structured
data using SQL-like queries. With BigQuery, you can run ad hoc queries
on terabytes of data in seconds without the need for any setup or
maintenance.
© Shitalkumar R. Sukhdeve and Sandika S. Sukhdeve 2023 35
S. R. Sukhdeve and S. S. Sukhdeve, Google Cloud Platform for Data Science,
https://doi.org/10.1007/978-1-4842-9688-2_3
Chapter 3 Big Data and Machine Learning
36
Chapter 3 Big Data and Machine Learning
Set up a Google Cloud account: To use BigQuery, you must sign up for
a Google Cloud account. You can start with a free trial and then choose a
paid plan that meets your needs.
Create a project: Once you have a Google Cloud account, create a
new project in the Google Cloud Console. This will be the home for your
BigQuery data.
Load your data: You can load your data into BigQuery in several ways,
including uploading CSV or JSON files, using a cloud-based data source
such as Google Cloud Storage, or streaming data in real time.
Run a query: Once your data is loaded, you can start querying it using
BigQuery’s web UI or the command-line interface. BigQuery uses a SQL-
like syntax, so if you’re familiar with SQL, you’ll find it easy to use.
Analyze your results: After running a query, you can view the results
in the web UI or export them to a file for further analysis.
Visualize your data: You can also use tools like Google Looker Studio
to create visualizations and reports based on your BigQuery data.
BigQuery provides detailed documentation and tutorials to help you
get started. Additionally, GCP provides a free community support forum,
and if you have a paid plan, you can also access technical support.
Before jumping to the production environment and using a credit card,
you can try BigQuery Sandbox.
BigQuery Sandbox is a free, interactive learning environment
provided by Google Cloud Platform (GCP) to help users explore and learn
about BigQuery. With BigQuery Sandbox, you can run queries on public
datasets and test out the functionality of BigQuery without incurring any
charges.
BigQuery Sandbox presents a user-friendly, web-based interface
that facilitates SQL-like queries on BigQuery data. It offers a hassle-free
experience, eliminating the need for any setup or installation. This makes it
an invaluable resource for individuals such as data analysts, data scientists,
and developers who are new to BigQuery and seek a hands-on learning
experience with the tool while exploring its capabilities.
37
Chapter 3 Big Data and Machine Learning
BigQuery Sandbox is not meant for production use and has limitations,
such as a smaller data size limit and slower performance than a fully
configured BigQuery instance. However, it’s a great way to get started with
BigQuery and gain familiarity with its functionality before using it in a
production environment.
Here’s a step-by-step tutorial for using BigQuery Sandbox:
Open the BigQuery Sandbox website: Go to the BigQuery Sandbox
website at https://console.cloud.google.com/bigquery.
38
Chapter 3 Big Data and Machine Learning
After creating the project, you can find the following screen.
39
Chapter 3 Big Data and Machine Learning
Select a dataset: Click the name of a dataset to select it and see its
schema and tables. In this example “bigquery-public-data.google_trends.
top_terms” table is used.
Write a query: To write a query, click the Query Editor tab in the
navigation menu. Write your query using BigQuery’s SQL-like syntax in the
query editor and click the Run Query button to execute it. The following
query retrieves a list of the daily top Google Search terms using data from
the bigquery-public-data.google_trends.top_terms table:
40
Chapter 3 Big Data and Machine Learning
41
Chapter 3 Big Data and Machine Learning
Repeat the process: Feel free to iterate through this process multiple
times, experimenting with various datasets and crafting diverse queries to
delve into the extensive capabilities offered by BigQuery.
Explore data: With Query results, there are options to save results and
explore data.
Explore data with Colab: Once you click the Explore with Colab
Notebook, you will be able to see the following screen.
42
Chapter 3 Big Data and Machine Learning
BigQuery ML
BigQuery ML is a feature in Google Cloud’s BigQuery that allows you to
build machine learning models directly within BigQuery, without the need
to move data out of BigQuery. This means you can use BigQuery ML to
train machine learning models on large datasets that you have stored in
BigQuery.
43
Chapter 3 Big Data and Machine Learning
import pandas as pd
import numpy as np
44
Chapter 3 Big Data and Machine Learning
You can find the data.csv in the Colab files. Download the file and save
it on your computer.
Create a BigQuery dataset: Go to the Google Cloud Console, select
BigQuery, and create a new BigQuery dataset.
Upload the data to BigQuery: Go to the BigQuery Editor and create
a new table. Then, upload the data to this table by clicking Create Table
and selecting Upload. Choose the file containing the data and select the
appropriate options for the columns.
45
Chapter 3 Big Data and Machine Learning
46
Chapter 3 Big Data and Machine Learning
Now, your editor should look like this. If everything is fine, you will be
able to find a green tick in the right corner like the following. Go ahead and
run the query.
47
Chapter 3 Big Data and Machine Learning
48
Chapter 3 Big Data and Machine Learning
Evaluate the model: In the BigQuery Editor, create a new SQL query.
Use the ML.EVALUATE function to evaluate the model. Here’s an example
of how to do this:
SELECT *
FROM ML.EVALUATE(MODEL your_model_name, (
SELECT *
FROM your_dataset_name.your_table_name
))
In the preceding syntax, put all the details as before, and your query
should look like the following:
SELECT *
FROM ML.EVALUATE(MODEL `example_data.data_set_class_rand_
model`, (
SELECT *
FROM `practice-project-1-377008.example_data.data_set_
class_rand`
))
Run the query and you can find the following results.
49
Chapter 3 Big Data and Machine Learning
SELECT *
FROM ML.PREDICT(MODEL your_model_name, (
SELECT *
FROM your_dataset_name.your_table_name
))
Again insert all the information as before, and your query should look
like the following:
SELECT *
FROM ML.PREDICT(MODEL `example_data.data_set_class_rand_
model`, (
SELECT *
FROM `practice-project-1-377008.example_data.data_set_
class_rand`
))
50
Chapter 3 Big Data and Machine Learning
51
Chapter 3 Big Data and Machine Learning
52
Chapter 3 Big Data and Machine Learning
53
Chapter 3 Big Data and Machine Learning
54
Chapter 3 Big Data and Machine Learning
55
Chapter 3 Big Data and Machine Learning
With a diverse set of tools at your disposal, you can interact with AI
Platform in a manner that suits your individual and team needs. These
tools offer various options, enabling you to select the most suitable one
based on your specific requirements and personal preferences.
56
Chapter 3 Big Data and Machine Learning
57
Chapter 3 Big Data and Machine Learning
58
Chapter 3 Big Data and Machine Learning
59
Chapter 3 Big Data and Machine Learning
60
Chapter 3 Big Data and Machine Learning
Now you are ready to access Google Cloud services. Go to the search
box and type Vertex AI, and you will be able to see the following screen.
61
Chapter 3 Big Data and Machine Learning
62
Chapter 3 Big Data and Machine Learning
63
Chapter 3 Big Data and Machine Learning
64
Chapter 3 Big Data and Machine Learning
5. Create an instance.
65
Chapter 3 Big Data and Machine Learning
66
Chapter 3 Big Data and Machine Learning
In Vertex AI, Cloud Storage buckets are used to store data required for
training machine learning models and store the models themselves once
they are deployed. Some common use cases for Cloud Storage buckets in
Vertex AI include
67
Chapter 3 Big Data and Machine Learning
2. Click CREATE.
68
Chapter 3 Big Data and Machine Learning
69
Chapter 3 Big Data and Machine Learning
70
Chapter 3 Big Data and Machine Learning
1. Project setup.
71
Chapter 3 Big Data and Machine Learning
72
Chapter 3 Big Data and Machine Learning
6. Click Continue.
73
Chapter 3 Big Data and Machine Learning
74
Chapter 3 Big Data and Machine Learning
If everything goes well, after some time the model will appear in the
Training menu as the following.
75
Chapter 3 Big Data and Machine Learning
76
Chapter 3 Big Data and Machine Learning
77
Chapter 3 Big Data and Machine Learning
78
Chapter 3 Big Data and Machine Learning
79
Chapter 3 Big Data and Machine Learning
80
Chapter 3 Big Data and Machine Learning
81
Chapter 3 Big Data and Machine Learning
After completing this step, your model is deployed at the endpoint and
ready to connect with any application to share the predictions.
82
Chapter 3 Big Data and Machine Learning
83
Chapter 3 Big Data and Machine Learning
Click OPEN JUPYTERLAB, and you can see the following screen.
Now under the notebook, click Python 3. You can rename the
notebook as per your choice.
84
Chapter 3 Big Data and Machine Learning
2. Set the project ID: The first line of code sets the
project ID for the Vertex AI platform. The project ID
is a unique identifier for the project in the Google
Cloud Console. The project ID is set using the
following code:
import os
3. Set the region: The next line of code sets the region
for the Vertex AI platform. This is where the Vertex
AI resources will be created and stored. The region
is set using the following code:
region = "us-central1"
85
Chapter 3 Big Data and Machine Learning
# Create a bucket
bucket = client.create_bucket(bucket_name,
location=region)
print("Bucket {} created.".format(bucket.name))
Here, the bucket_name and bucket_uri are defined, and the timestamp
is used to create a unique name for the bucket. Then, a Google Cloud
Storage client is created using the project ID, and the new bucket is created
using the client and the specified name and region.
Initialize the Vertex AI SDK for Python.
86
Chapter 3 Big Data and Machine Learning
The following code initializes the Vertex AI SDK, which provides access
to the Vertex AI platform and its capabilities:
The first line imports the aiplatform module from the google.cloud
library, which provides access to the Vertex AI API.
The aiplatform.init method is then called with three arguments:
project, location, and staging_bucket. The project argument is the ID
of the Google Cloud project from which the Vertex AI platform will be
accessed. The location argument is the geographic location where the
Vertex AI platform resources will be created. The staging_bucket argument
is the URI of the Google Cloud Storage bucket that will be used to store
intermediate data during model training and deployment.
This code sets up the connection to the Vertex AI platform and
prepares it for use in your application.
Initialize BigQuery.
The following code sets up a BigQuery client to access the BigQuery
data storage and analysis service in the specified project:
The first line imports the bigquery module from the google.cloud
library, which provides access to the BigQuery API.
The bq_client variable is then defined as an instance of the bigquery.
Client class, using the project_id variable as an argument. This creates a
client object that is used to interact with the BigQuery API for the specified
project.
87
Chapter 3 Big Data and Machine Learning
import numpy as np
import pandas as pd
LABEL_COLUMN = "species"
88
Chapter 3 Big Data and Machine Learning
# Define NA values
NA_VALUES = ["NA", "."]
# Download a table
table = bq_client.get_table(BQ_SOURCE)
df = bq_client.list_rows(table).to_dataframe()
89
Chapter 3 Big Data and Machine Learning
90
Chapter 3 Big Data and Machine Learning
91
Chapter 3 Big Data and Machine Learning
8. training_data_uri = os.getenv(“AIP_TRAINING_
DATA_URI”) retrieves the value of the AIP_
TRAINING_DATA_URI environmental variable and
assigns it to the training_data_uri variable.
9. validation_data_uri = os.getenv(“AIP_
VALIDATION_DATA_URI”) retrieves the value of
the AIP_VALIDATION_DATA_URI environmental
variable and assigns it to the validation_data_uri
variable.
92
Chapter 3 Big Data and Machine Learning
%%writefile task.py
import argparse
import numpy as np
import os
import pandas as pd
import tensorflow as tf
# Read args
parser = argparse.ArgumentParser()
93
Chapter 3 Big Data and Machine Learning
# See https://cloud.google.com/vertex-ai/docs/workbench/
managed/executor#explicit-project-selection for issues
regarding permissions.
PROJECT_NUMBER = os.environ["CLOUD_ML_PROJECT_ID"]
bq_client = bigquery.Client(project=PROJECT_NUMBER)
94
Chapter 3 Big Data and Machine Learning
# Download a table
def download_table(bq_table_uri: str):
# Remove bq:// prefix if present
prefix = "bq://"
if bq_table_uri.startswith(prefix):
bq_table_uri = bq_table_uri[len(prefix) :]
95
Chapter 3 Big Data and Machine Learning
df_test = download_table(test_data_uri)
def convert_dataframe_to_dataset(
df_train: pd.DataFrame,
df_validation: pd.DataFrame,
):
df_train_x, df_train_y = df_train, df_train.
pop(LABEL_COLUMN)
df_validation_x, df_validation_y = df_validation, df_
validation.pop(LABEL_COLUMN)
y_train = np.asarray(df_train_y).astype("float32")
y_validation = np.asarray(df_validation_y).
astype("float32")
dataset_train = tf.data.Dataset.from_tensor_slices((x_
train, y_train))
dataset_validation = tf.data.Dataset.from_tensor_slices((x_
test, y_validation))
return (dataset_train, dataset_validation)
# Create datasets
dataset_train, dataset_validation = convert_dataframe_to_
dataset(df_train, df_validation)
96
Chapter 3 Big Data and Machine Learning
97
Chapter 3 Big Data and Machine Learning
def create_model(num_features):
# Create model
Dense = tf.keras.layers.Dense
model = tf.keras.Sequential(
[
Dense(
100,
activation=tf.nn.relu,
kernel_initializer="uniform",
input_dim=num_features,
),
Dense(75, activation=tf.nn.relu),
Dense(50, activation=tf.nn.relu),
Dense(25, activation=tf.nn.relu),
Dense(3, activation=tf.nn.softmax),
]
)
98
Chapter 3 Big Data and Machine Learning
return model
# Set up datasets
dataset_train = dataset_train.batch(args.batch_size)
dataset_validation = dataset_validation.batch(args.batch_size)
tf.saved_model.save(model, os.getenv("AIP_MODEL_DIR"))
“--label_column=” + LABEL_COLUMN:
1.
Specifies the name of the column that holds
the label data. The value for this argument is
taken from the LABEL_COLUMN variable.
99
Chapter 3 Big Data and Machine Learning
JOB_NAME = "custom_job_unique"
EPOCHS = 20
BATCH_SIZE = 10
CMDARGS = [
"--label_column=" + LABEL_COLUMN,
"--epochs=" + str(EPOCHS),
"--batch_size=" + str(BATCH_SIZE),
]
100
Chapter 3 Big Data and Machine Learning
job = aiplatform.CustomTrainingJob(
display_name=JOB_NAME,
script_path="task.py",
container_uri="us-docker.pkg.dev/vertex-ai/training/tf-
cpu.2-8:latest",
requirements=["google-cloud-bigquery>=2.20.0", "db-
dtypes"],
model_serving_container_image_uri="us-docker.pkg.dev/
vertex-ai/prediction/tf2-cpu.2-8:latest",
)
101
Chapter 3 Big Data and Machine Learning
MODEL_DISPLAY_NAME = "penguins_model_unique"
102
Chapter 3 Big Data and Machine Learning
model_display_name=MODEL_DISPLAY_NAME,
bigquery_destination=f"bq://{project_id}",
args=CMDARGS,
)
DEPLOYED_NAME = "penguins_deployed_unique"
endpoint = model.deploy(deployed_model_display_
name=DEPLOYED_NAME)
Make a prediction.
Prepare prediction test data: The following code removes the
“species” column from the data frame df_for_prediction. This is likely
because the species column is the label or target column that the model
was trained to predict. The species column is not needed for making
predictions with the trained model.
Next, the code converts the data frame df_for_prediction to a Python
list test_data_list. This is likely to prepare the data for making predictions
with the deployed model. The values attribute of the data frame is used to
convert the data frame to a Numpy array, and then the tolist() method is
used to convert the Numpy array to a list:
103
Chapter 3 Big Data and Machine Learning
The following code uses the argmax method of Numpy to find the best
prediction for each set of input data. predictions.predictions is a Numpy
array containing the predicted class probabilities for each instance (i.e.,
each row) of input data. np.argmax(predictions.predictions, axis=1)
returns the indices of the highest values along the specified axis (in this
case, axis=1 means along each row), effectively giving us the predicted
class for each instance. The resulting array, species_predictions, represents
the best prediction for the penguin species based on the input data:
104
Chapter 3 Big Data and Machine Learning
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2,
2])
Clean up resources.
You can delete the project, which can free all the resources associated
with it (you can shut down projects using the Google Cloud Console ), or
you can use the following steps while retaining the project:
105
Chapter 3 Big Data and Machine Learning
11. Click the name of the dataset that you created for
this project and want to delete.
For this tutorial, you can also use the following code for deleting the
resource:
Import os
Note Make sure to delete all resources associated with the project,
as Google Cloud charges for the use of its resources.
106
Chapter 3 Big Data and Machine Learning
107
Chapter 3 Big Data and Machine Learning
108
Chapter 3 Big Data and Machine Learning
109
Chapter 3 Big Data and Machine Learning
110
Chapter 3 Big Data and Machine Learning
For proof of concept purposes, you can select the Single Node (1
master, 0 workers) cluster type.
111
Chapter 3 Big Data and Machine Learning
112
Chapter 3 Big Data and Machine Learning
113
Chapter 3 Big Data and Machine Learning
Click Submit.
114
Chapter 3 Big Data and Machine Learning
Here in the example, the Spark job uses the Monte Carlo method to
estimate the value of Pi. Check for details here: https://cloud.google.
com/architecture/monte-carlo-methods-with-hadoop-spark.
Note that updating a cluster can cause some downtime or data loss,
depending on the type of updates you make. Review the changes carefully
and plan accordingly to minimize any impact on your data processing
workflows.
Cleaning up
1. On the cluster details page for your created cluster,
click Delete to delete the cluster.
T ensorFlow
TensorFlow is a popular open source software library used for building and
training machine learning models. Originating from the Google Brain team
in 2015, it has evolved into one of the most extensively employed machine
learning libraries globally (Ahmed, 2023).
115
Chapter 3 Big Data and Machine Learning
The library provides various tools and functionality for building and
training multiple types of machine learning models, including neural
networks, decision trees, and linear regression models. Furthermore, it
encompasses an extensive array of pre-built model architectures and
training algorithms, simplifying the process for developers to construct
and train their machine learning models.
A prominent attribute of TensorFlow is its capability to efficiently
handle extensive datasets and facilitate distributed training across
numerous devices or clusters. This allows developers to build and train
more complex models that can handle larger amounts of data and produce
more accurate results.
TensorFlow’s versatility and compatibility with various platforms and
programming languages, such as Python, C++, and Java, contribute to its
noteworthy reputation. This makes it easy to integrate TensorFlow models
into a variety of applications and workflows.
In summary, TensorFlow stands as a robust tool for constructing and
training machine learning models, encompassing a broad spectrum of
features and functionalities that have solidified its status as a favored
choice among developers and data scientists.
Here is a simple TensorFlow tutorial that you can run on Colab.
In this tutorial, we will build a simple TensorFlow model to classify
images from the MNIST dataset.
Step 1: Set up the environment.
The first step is to set up the environment on Colab. We will install
TensorFlow 2.0 and import the necessary libraries:
import tensorflow as tf
from tensorflow import keras
116
Chapter 3 Big Data and Machine Learning
Next, we will load and preprocess the data. We will use the MNIST
dataset, which contains 60,000 training images and 10,000 test images
of handwritten digits (Reis, 2016). The following code loads the MNIST
dataset and preprocesses it by scaling the pixel values between 0 and 1 and
converting the labels to one-hot encoding:
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
117
Chapter 3 Big Data and Machine Learning
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
That’s it! You have successfully built and trained a TensorFlow model
on Colab.
To run this tutorial on Colab, simply create a new notebook and copy-
paste the code into it. Make sure to select a GPU runtime for faster training.
S
ummary
In this chapter, we delved into the world of big data and machine learning,
exploring several Google Cloud services and tools.
We started with an introduction to BigQuery, a fully managed data
warehouse solution that enables storing and querying massive datasets.
We discussed its use cases and how it facilitates large-scale data analytics.
We have implemented it using Sandbox, which is free to use.
Next, we explored BigQuery ML, a feature of BigQuery that allows us to
build machine learning models directly within the BigQuery environment.
We learned how to create and deploy models using SQL queries,
simplifying the machine learning process.
118
Chapter 3 Big Data and Machine Learning
119
CHAPTER 4
122
Chapter 4 Data Visualization and Business Intelligence
These are some of the key features of Google Looker Studio. With its
powerful data modeling, interactive data visualizations, and collaboration
features, Looker Studio makes exploring, understanding, and sharing data
insights with others easier.
123
Chapter 4 Data Visualization and Business Intelligence
124
Chapter 4 Data Visualization and Business Intelligence
125
Chapter 4 Data Visualization and Business Intelligence
126
Chapter 4 Data Visualization and Business Intelligence
127
Chapter 4 Data Visualization and Business Intelligence
3. Go to Looker.
128
Chapter 4 Data Visualization and Business Intelligence
5. Go to Data source.
129
Chapter 4 Data Visualization and Business Intelligence
130
Chapter 4 Data Visualization and Business Intelligence
11. After the file gets connected, you will see the view
like the following screenshot. Then check the exact
format of the dimensions. For example, Customer
Name is in Text type, Order Date is in Date type,
Order Quantity is in Number type, etc.
131
Chapter 4 Data Visualization and Business Intelligence
132
Chapter 4 Data Visualization and Business Intelligence
This is how you can create a report and visualize your data.
133
Chapter 4 Data Visualization and Business Intelligence
134
Chapter 4 Data Visualization and Business Intelligence
135
Chapter 4 Data Visualization and Business Intelligence
Building a Dashboard
This section will guide you through developing a custom dashboard using
Looker Studio. To begin, sign into Looker Studio with your Gmail account.
As mentioned in the previous section, you have already uploaded an Excel
sheet and connected it to Looker Studio. Let’s get started with that report.
Return to your previous report and click the Page option in the header.
From there, select New page to create a blank report for you to work with.
136
Chapter 4 Data Visualization and Business Intelligence
137
Chapter 4 Data Visualization and Business Intelligence
138
Chapter 4 Data Visualization and Business Intelligence
139
Chapter 4 Data Visualization and Business Intelligence
140
Chapter 4 Data Visualization and Business Intelligence
141
Chapter 4 Data Visualization and Business Intelligence
142
Chapter 4 Data Visualization and Business Intelligence
To get started with PyGWalker, you need to install it using pip. Open a
Jupyter Notebook and install the library as follows:
import pandas as pd
import numpy as np
import pygwalker as pyg
data = {
'age': np.random.randint(18, 65, 100),
'gender': np.random.choice(['Male', 'Female'], 100),
'income': np.random.normal(50000, 10000, 100),
'state': np.random.choice(['CA', 'NY', 'TX'], 100)
}
df = pd.DataFrame(data)
pyg.walk(df)
143
Chapter 4 Data Visualization and Business Intelligence
144
Chapter 4 Data Visualization and Business Intelligence
145
Chapter 4 Data Visualization and Business Intelligence
You can also click Aggregation to turn it off and create graphs without
binning, like the following.
You can also change the chart type like the following.
Summary
In this chapter, we explored the world of data visualization and business
intelligence, focusing on Looker Studio and Colab for creating compelling
visualizations and reports.
146
Chapter 4 Data Visualization and Business Intelligence
147
CHAPTER 5
Data Processing
and Transformation
In this chapter, we will explore the domain of data manipulation and
alteration, centering our attention on Google Cloud Dataflow and Google
Cloud Dataprep.
Our journey begins with an initiation into Google Cloud Dataflow,
an influential service dedicated to both batch and stream data
handling. We will delve into the diverse scenarios where Dataflow
stands out, encompassing real-time analytics, ETL (Extract, Transform,
Load) pipelines, and event-triggered processing. This will foster an
understanding of Dataflow’s role in facilitating scalable and efficient data
manipulation.
Progressing onward, we will immerse ourselves in the execution of
data manipulation pipelines within Cloud Dataflow. This phase involves
comprehending the concept of pipelines, a framework that enables the
definition of a sequence of steps required to process and reshape data.
We will investigate how Dataflow proficiently accommodates batch and
stream data manipulation and delve into its seamless integration with
other Google Cloud services, notably BigQuery and Cloud Pub/Sub.
Steering ahead, our exploration takes us to Google Cloud Dataprep, a
service tailored for data refinement and metamorphosis. We will explore
a spectrum of contexts where Dataprep proves its utility, including data
150
Chapter 5 Data Processing and Transformation
Some common use cases for batch processing with Google Cloud
Dataflow include data warehousing, ETL (Extract, Transform, Load)
processing, and analytics. For example, you can use Dataflow to extract
data from multiple sources, clean and transform the data, and load it into a
data warehousing solution like BigQuery for analysis.
For stream processing, Dataflow is used for real-time analytics, fraud
detection, IoT data processing, and clickstream analysis. For example, you
can use Dataflow to process real-time data from IoT devices and analyze
the data to gain insights and make decisions in real time.
Overall, Google Cloud Dataflow is a highly scalable and flexible data
processing solution that can handle both batch and stream processing
workloads, making it a valuable tool for data engineers and data scientists.
151
Chapter 5 Data Processing and Transformation
152
Chapter 5 Data Processing and Transformation
• Dataflow API
• BigQuery API
153
Chapter 5 Data Processing and Transformation
154
Chapter 5 Data Processing and Transformation
155
Chapter 5 Data Processing and Transformation
156
Chapter 5 Data Processing and Transformation
157
Chapter 5 Data Processing and Transformation
Clean up:
10. Click the checkbox next to the bucket that you want
to delete. To delete the bucket, click Delete and follow
the instructions to complete the deletion process.
Summary
In this chapter, we delved into the realm of data processing and
transformation, focusing on Google Cloud Dataflow and Google Cloud
Dataprep.
158
Chapter 5 Data Processing and Transformation
159
CHAPTER 6
162
Chapter 6 Data Analytics and Storage
Key Features
Google Cloud Storage is a highly scalable, durable, and secure object
storage service that provides a range of features to help users store,
manage, and access their data. Here are some of the key features of Google
Cloud Storage:
Scalability: Google Cloud Storage provides virtually unlimited storage
capacity, allowing users to store and manage petabytes of data.
Durability: Google Cloud Storage provides high durability by
automatically replicating data across multiple geographic locations, ensuring
data is protected against hardware failures and other types of disasters.
Security: Google Cloud Storage provides multiple layers of security,
including encryption at rest and in transit, fine-grained access controls,
and identity and access management (IAM).
Performance: Google Cloud Storage provides high-performance data
access, enabling fast upload and download speeds for objects of any size.
Integration with other Google Cloud Platform services: Google
Cloud Storage integrates seamlessly with other GCP services, such as
Compute Engine, App Engine, and BigQuery, to support a wide range of
use cases.
Multi-Regional and Regional storage options: Google Cloud Storage
offers Multi-Regional and Regional storage options to cater to different
data storage and access needs.
163
Chapter 6 Data Analytics and Storage
Storage Options
Google Cloud Storage provides multiple storage options that cater to
different data storage and access needs. Here are some of the storage
classes offered by Google Cloud Storage:
164
Chapter 6 Data Analytics and Storage
Storage Locations
In addition to the preceding storage classes, Google Cloud Storage also
provides two types of storage locations:
165
Chapter 6 Data Analytics and Storage
166
Chapter 6 Data Analytics and Storage
168
Chapter 6 Data Analytics and Storage
Creating a data lake for analytics with Google Cloud Storage involves
setting up a bucket, configuring access control, ingesting and organizing
data, and using GCP’s analytics and AI tools to gain insights from the data.
By adhering to these instructions, you can establish a robust data lake
that empowers you to make informed decisions based on data and gain a
competitive edge within your industry.
169
Chapter 6 Data Analytics and Storage
2. Enable APIs:
170
Chapter 6 Data Analytics and Storage
171
Chapter 6 Data Analytics and Storage
172
Chapter 6 Data Analytics and Storage
173
Chapter 6 Data Analytics and Storage
That’s it! You have now successfully created a MySQL instance using
Cloud SQL.
174
Chapter 6 Data Analytics and Storage
3. Authorize.
175
Chapter 6 Data Analytics and Storage
USE college;
176
Chapter 6 Data Analytics and Storage
177
Chapter 6 Data Analytics and Storage
5. Clean up.
178
Chapter 6 Data Analytics and Storage
179
Chapter 6 Data Analytics and Storage
180
Chapter 6 Data Analytics and Storage
181
Chapter 6 Data Analytics and Storage
182
Chapter 6 Data Analytics and Storage
183
Chapter 6 Data Analytics and Storage
184
Chapter 6 Data Analytics and Storage
185
Chapter 6 Data Analytics and Storage
Summary
This chapter explored the world of data analytics and storage, focusing on
Google Cloud Storage, Google Cloud SQL, and Google Cloud Pub/Sub.
We began by introducing Google Cloud Storage, a highly scalable
and durable object storage service. We discussed the various use cases
where Cloud Storage is beneficial, such as storing and archiving data,
hosting static websites, and serving as a backup and recovery solution. We
explored the key features of Cloud Storage, including its storage classes,
access controls, and integration with other Google Cloud services.
Next, we delved into Google Cloud SQL, a fully managed relational
database service. We discussed the advantages of using Cloud SQL for
storing and managing relational data, including its automatic backups,
scalability, and compatibility with popular database engines such as MySQL
and PostgreSQL. We explored the use cases where Cloud SQL is suitable,
such as web applications, content management systems, and data analytics.
186
Chapter 6 Data Analytics and Storage
187
CHAPTER 7
Advanced Topics
This chapter delves into advanced aspects of securing and managing
Google Cloud Platform (GCP) resources, version control using Google
Cloud Source Repositories, and powerful data integration tools: Dataplex
and Cloud Data Fusion.
We start by emphasizing secure resource management with Identity
and Access Management (IAM), covering roles, permissions, and best
practices for strong access controls.
Google Cloud Source Repositories is explored as a managed version
control system for scalable code management and collaboration. Dataplex
is introduced as a data fabric platform, aiding data discovery, lineage, and
governance, while Cloud Data Fusion simplifies data pipeline creation and
management.
Throughout, we’ll gain insights to enhance GCP security, streamline
code collaboration, and optimize data integration for various
processing needs.
and define the actions that individuals can execute on those resources. It
provides a centralized place to manage access control, making enforcing
security policies easier and reducing the risk of unauthorized access.
Here are some key concepts related to IAM:
190
Chapter 7 Advanced Topics
2. Enable APIs:
191
Chapter 7 Advanced Topics
Identify the member and the role to be revoked: Before you can
revoke an IAM role, you need to know which member has been assigned
the role and which role you want to revoke.
Navigate to the IAM page: Open the IAM page in the Google Cloud
Console by clicking the IAM tab in the left navigation menu.
193
Chapter 7 Advanced Topics
Find the member: Use the search bar at the top of the page to find
the member who has been assigned the role you want to revoke. Click the
member’s name to view their assigned roles.
Remove the role: Locate the role you want to revoke and click the X
icon to remove the role from the member. A confirmation dialog box will
appear. Click Remove to confirm that you want to revoke the role.
Verify the role has been revoked: After you revoke the role, you
should verify that the member no longer has access to the resources
associated with that role. You can use the Google Cloud Console or the
Cloud Identity and Access Management API to check the member’s roles
and permissions.
It’s important to note that revoking an IAM role does not remove
the member from your project or organization. The members may still
have other roles assigned to them, and they may still be able to access
some resources. To completely remove a member from your project or
organization, you need to remove all their assigned roles and permissions.
194
Chapter 7 Advanced Topics
195
Chapter 7 Advanced Topics
Dataplex
Dataplex is a new data platform launched by Google Cloud in 2021. It is
designed to simplify the complexities of modern data management by
providing a unified, serverless, and intelligent data platform. Dataplex
allows users to manage their data across multiple clouds and on-premises
data stores through a single interface.
The platform offers many key features that make it a valuable tool for
data management:
196
Chapter 7 Advanced Topics
197
Chapter 7 Advanced Topics
198
Chapter 7 Advanced Topics
199
Chapter 7 Advanced Topics
200
Chapter 7 Advanced Topics
201
Chapter 7 Advanced Topics
202
Chapter 7 Advanced Topics
203
Chapter 7 Advanced Topics
14. For not getting any cost, if you don’t need the
resources created, they should be deleted. First,
delete the assets. Then you can delete the zones and
then the lake.
204
Chapter 7 Advanced Topics
205
Chapter 7 Advanced Topics
You can start using Cloud Data Fusion with your project by completing
the following steps.
206
Chapter 7 Advanced Topics
207
Chapter 7 Advanced Topics
208
Chapter 7 Advanced Topics
8. Click Deploy.
209
Chapter 7 Advanced Topics
Summary
This chapter explored advanced topics related to securing and managing
Google Cloud Platform (GCP) resources, version control with Google
Cloud Source Repositories, and two powerful data integration and
management tools: Dataplex and Cloud Data Fusion.
210
Chapter 7 Advanced Topics
211
Chapter 7 Advanced Topics
With the knowledge and skills acquired from this chapter, we can
confidently navigate the advanced aspects of Google Cloud Platform,
implement robust security measures, streamline code collaboration, and
efficiently manage and integrate data for various data processing and
analytics requirements.
212
Bibliography
Ahmed, F. (2023, January 25). What is Tensorflow. Retrieved from
www.makermodules.com/what-is-tensorflow/
Blog, Neos Vietnam (n.d.). Tutorial: TensorFlow Lite. Retrieved from
https://blog.neoscorp.vn/tutorial-tensorflow-lite/
Google Cloud (2023, February 7). Introduction to Vertex AI. Retrieved
from https://cloud.google.com/vertex-ai/docs/start/introduction-
unified-platform
Google Cloud (2023, June 6). What is IaaS? Retrieved from
https://cloud.google.com/learn/what-is-iaas
Google Cloud (n.d.). The Home Depot: Helping doers get more done
through a data-driven approach. Retrieved from https://cloud.google.
com/customers/the-home-depot
Google Cloud (n.d.). Spotify: The future of audio. Putting data to
work, one listener at a time. Retrieved from https://cloud.google.com/
customers/spotify
IDC. (2021, June ). Worldwide Foundational Cloud Services Forecast,
2021–2025: The Base of the Digital-First Economy. Retrieved from www.idc.
com/getdoc.jsp?containerId=US46058920
Reis, N. (2016, May). Deep Learning. Retrieved from
https://no.overleaf.com/articles/deep-learning/xhgfttpzrfkz
Secureicon. (2022). Tag: cloud security incidents. Retrieved from www.
securicon.com/tag/cloud-security-incidents/
Sukhdeve, S. R. (2020). Step Up for Leadership in Enterprise Data
Science and Artificial Intelligence with Big Data: Illustrations with R and
Python. KDP.
214
Index
A Bubble map, 141
Business intelligence, 36, 108, 121,
AI Platform, 168
122, 146, 147
capabilities, 51
tools, 54–59
aiplatform.init method, 87 C
Apache Beam, 108, 156,
Cloud AI Platform, 5, 7
180, 205
Google, 33, 35, 51–56, 100, 119
AutoML Tables, 57
Notebooks, 57
Cloud AutoML, 2, 57
B Cloud-based platform, 2
Bar chart, 122, 126, 139 Cloud CDN, 4
BigQuery, 35, 36, 121, 149, 162, 163, Cloud Data Fusion, 189, 204–211
168, 196, 205 Cloud ML Engine, 57
analysis, 151 Cloud Pub/Sub, 149, 161
and Cloud Storage, 11, 211 Cloud Storage, 211
dataset, 203, 210 Cloud Storage bucket, 66–69,
editor, 46, 51 154, 165–168
Google Cloud AI Code reviews, 195
Platform, 33 Code search, 194
and Looker Studio, 134, 135 Colab, 11, 122
ML model, 43–51 data visualization, 142–146
Sandbox, 37–44, 135 files, 45
SQL queries, 36–43 interface, 14, 15
table, 155 notebook, 42
Vertex AI, 87–90 sheet, 43
web UI, 151 Custom-trained model, 71–82
D E
Dashboard building, 151 Extract, Transform, Load (ETL)
bar chart, 139 operations, 35, 149, 151,
bubble map, 141 159, 204
date range, 141
pie chart, 140
reports, 136, 137 F
scorecards, 137, 138 Free-tier offering services, 5, 6
table, 140
time series chart, 138
Data analytics, 1, 118, 122 G, H
data storage, 162–169 Geospatial data analysis, 36
relational databases, 169–178 Glassdoor, 3
Dataflow, 149, 161 Google BigQuery, 2, 33, 123
Dataplex, 189, 196–205 Google Cloud, 122, 153
Dataproc, 35, 207, 211 account, 82
Data processing, 1, 119, 151, 187, client libraries, 55
205, 211, 212 Dataplex, 196–205
Cloud Dataflow, 151 project, 82
patterns, 150 Google Cloud AI Platform, 2, 33
and transformation, 158 capabilities, 51
Data science services, 3–7, 12, 55, Notebooks, 55
57, 62, 151 step-by-step tutorial, 52–54
Data storage, 1, 87, 162–169 tools, 54–56
and access, 163 Google Cloud AutoML, 2
and analytics, 187 Google Cloud Console, 54, 158,
options, 164 170, 193, 194, 197
Data visualization Google Cloud Dataflow, 179, 180
on Colab, 142–146 batch and stream
Looker Studio, 122, 123 processing, 151
Data warehouse, 36, 151 cloud service, 150
Deep learning, 12, 56 data processing, 151
Deployment, 59 ETL, 151
216
INDEX
217
INDEX
Nielsen, 2
S
Security Command Center, 4, 168
O Spotify, 2
ORDER BY clause, 41 Stackdriver, 4, 107
218
INDEX
T legacy services, 57
ML model, 97–106
TensorFlow, 34–36, 56, 57,
ML workflow, 58, 59
115–118, 162
model evaluation, 59
The Home Depot, 1
model selection, 58
Notebooks, 151
U and Python SDK, 82–87
User-managed notebooks tabular dataset, 90–97
instance, 61–66 training, 58
user-managed notebook
instance, 61
V benefits of, 62
Vertex AI step-by-step method, 63–67
BigQuery, 87–90 Vertex AI SDK, 87, 88
cloud storage bucket, 67–70 Vertex AI Workbench, 56, 58
custom-trained model, 71–82 Virtual machines (VMs), 4, 56, 107
Dashboard, 61 Virtual Private Cloud (VPC), 4
data preparation, 58
Dataproc cluster, 108–113
deployment, 59 W, X, Y, Z
JupyterLab Notebook, 83 WHERE clause, 41
219