Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries
Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries
Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries
Ebook604 pages3 hours

Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries

Rating: 0 out of 5 stars

()

Read preview

About this ebook

BigQuery ML enables you to easily build machine learning (ML) models with SQL without much coding. This book will help you to accelerate the development and deployment of ML models with BigQuery ML.

The book starts with a quick overview of Google Cloud and BigQuery architecture. You'll then learn how to configure a Google Cloud project, understand the architectural components and capabilities of BigQuery, and find out how to build ML models with BigQuery ML. The book teaches you how to use ML using SQL on BigQuery. You'll analyze the key phases of a ML model's lifecycle and get to grips with the SQL statements used to train, evaluate, test, and use a model. As you advance, you'll build a series of use cases by applying different ML techniques such as linear regression, binary and multiclass logistic regression, k-means, ARIMA time series, deep neural networks, and XGBoost using practical use cases. Moving on, you'll cover matrix factorization and deep neural networks using BigQuery ML's capabilities. Finally, you'll explore the integration of BigQuery ML with other Google Cloud Platform components such as AI Platform Notebooks and TensorFlow along with discovering best practices and tips and tricks for hyperparameter tuning and performance enhancement.

By the end of this BigQuery book, you'll be able to build and evaluate your own ML models with BigQuery ML.

LanguageEnglish
Release dateJun 11, 2021
ISBN9781800562189
Machine Learning with BigQuery ML: Create, execute, and improve machine learning models in BigQuery using standard SQL queries

Related to Machine Learning with BigQuery ML

Related ebooks

Computers For You

View More

Related articles

Reviews for Machine Learning with BigQuery ML

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning with BigQuery ML - Alessandro Marrandino

    cover.png

    BIRMINGHAM—MUMBAI

    Machine Learning with BigQuery ML

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Kunal Parikh

    Publishing Product Manager: Sunith Shetty

    Senior Editor: David Sugarman

    Content Development Editor: Nathanya Dias

    Technical Editor: Manikandan Kurup

    Copy Editor: Safis Editing

    Project Coordinator: Aparna Ravikumar Nair

    Proofreader: Safis Editing

    Indexer: Rekha Nair

    Production Designer: Prashant Ghare

    First published: June 2021

    Production reference: 1120521

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham

    B3 2PB, UK.

    ISBN 978-1-80056-030-7

    www.packt.com

    Contributors

    About the author

    Alessandro Marrandino is a Google Cloud customer engineer. He helps various enterprises on the digital transformation to adopt cloud technologies. He is actively focused on and experienced in data management and smart analytics solutions. He has spent his entire career on data and artificial intelligence projects for global companies in different industries.

    I want to thank the people who have been close to me and supported me, especially my wife, Federica. Thanks to her love and availability, I was able to dedicate most of my free time to writing this book, while we were waiting for the most important person in our life: Eva. Special thanks go to all my family. They have always believed in me and in my passion for technology and data. Just a final remark for my mom: The internet has had some success and there are people working on it!

    About the reviewers

    Marijan Milovec currently works as a software developer. He is highly ambitious and interested in software development, DevOps, and software architecture. He is also the lead organizer of the Google Developer Group Zagreb, which focuses on software development, software architecture, artificial intelligence, machine learning, deep learning, data science, DevOps, Docker, Kubernetes, Google Cloud, and more.

    Sathish VJ is a software architect, technology trainer, and angel investor. He has all the open certifications on Google Cloud, including Google Cloud Machine Learning Engineer, and is also a Google Cloud Authorized Trainer. He runs a YouTube channel, called AwesomeGCP, where he teaches people how to apply Google Cloud to their projects and prepare for certifications.

    Sharmistha Chatterjee is a data science evangelist with 15+ years of professional experience in the field of machine learning (AI research and productionizing scalable solutions) and cloud applications. She has worked in both Fortune 500 companies, as well as in very early-stage startups. She is currently working as a Senior Manager of Data Sciences at Publicis Sapient where she leads the digital transformation of clients across industry verticals. She is an active blogger, an international speaker at various tech conferences, and 2X Google Developer Expert in Machine Learning and Google Cloud. She is also the Hackernoon Tech award winner for 2020, been listed as 40 under 40. Data Scientist by AIM and '21 tech trailblazers 2021 by Google.

    Table of Contents

    Preface

    Section 1: Introduction and Environment Setup

    Chapter 1: Introduction to Google Cloud and BigQuery

    Introducing Google Cloud Platform

    Interacting with GCP

    Discovering GCP's key differentiators

    Exploring AI and ML services on GCP

    Core platform services

    Building blocks

    Solutions

    Introducing BigQuery

    BigQuery architecture

    BigQuery's advantages over traditional data warehouses

    Interacting with BigQuery

    BigQuery data structures

    Discovering BigQuery ML

    BigQuery ML benefits

    BigQuery ML algorithms

    Understanding BigQuery pricing

    BigQuery pricing

    BigQuery ML pricing

    Free operations and free tiers

    Pricing calculator

    Summary

    Further resources

    Chapter 2: Setting Up Your GCP and BigQuery Environment

    Technical requirements

    Creating your GCP account and project

    Registering a GCP account

    Exploring Google Cloud Console

    Creating a GCP project

    Activating BigQuery

    Discovering the BigQuery web UI

    Exploring the BigQuery public datasets

    Searching for a public dataset

    Analyzing a table

    Summary

    Further reading

    Chapter 3: Introducing BigQuery Syntax

    Technical requirements

    Creating a BigQuery dataset

    Discovering BigQuery SQL

    CRUD operations

    Diving into BigQuery ML

    Summary

    Further resources

    Section 2: Deep Learning Networks

    Chapter 4: Predicting Numerical Values with Linear Regression

    Technical requirements

    Introducing the business scenario

    Discovering linear regression

    Exploring and understanding the dataset

    Understanding the data

    Checking the data's quality

    Segmenting the dataset

    Training the linear regression model

    Evaluating the linear regression model

    Utilizing the linear regression model

    Drawing business conclusions

    Summary

    Further reading

    Chapter 5: Predicting Boolean Values Using Binary Logistic Regression

    Technical requirements

    Introducing the business scenario

    Discovering binary logistic regression

    Exploring and understanding the dataset

    Understanding the data

    Segmenting the dataset

    Training the binary logistic regression model

    Evaluating the binary logistic regression model

    Using the binary logistic regression model

    Drawing business conclusions

    Summary

    Further resources

    Chapter 6: Classifying Trees with Multiclass Logistic Regression

    Technical requirements

    Introducing the business scenario

    Discovering multiclass logistic regression

    Exploring and understanding the dataset

    Understanding the data

    Checking the data quality

    Segmenting the dataset

    Training the multiclass logistic regression model

    Evaluating the multiclass logistic regression model

    Using the multiclass logistic regression model

    Drawing business conclusions

    Summary

    Further resources

    Section 3: Advanced Models with BigQuery ML

    Chapter 7: Clustering Using the K-Means Algorithm

    Technical requirements

    Introducing the business scenario

    Discovering K-Means clustering

    Exploring and understanding the dataset

    Understanding the data

    Checking the data quality

    Creating the training datasets

    Training the K-Means clustering model

    Evaluating the K-Means clustering model

    Using the K-Means clustering model

    Drawing business conclusions

    Summary

    Further resources

    Chapter 8: Forecasting Using Time Series

    Technical requirements

    Introducing the business scenario

    Discovering time series forecasting

    Exploring and understanding the dataset

    Understanding the data

    Checking the data quality

    Creating the training dataset

    Training the time series forecasting model

    Evaluating the time series forecasting model

    Using the time series forecasting model

    Presenting the forecast

    Summary

    Further resources

    Chapter 9: Suggesting the Right Product by Using Matrix Factorization

    Technical requirements

    Introducing the business scenario

    Discovering matrix factorization

    Configuring BigQuery Flex Slots

    Exploring and preparing the dataset

    Understanding the data

    Creating the training dataset

    Training the matrix factorization model

    Evaluating the matrix factorization model

    Using the matrix factorization model

    Drawing business conclusions

    Summary

    Further resources

    Chapter 10: Predicting Boolean Values Using XGBoost

    Technical requirements

    Introducing the business scenario

    Discovering the XGBoost Boosted Tree classification model

    Exploring and understanding the dataset

    Checking the data quality

    Segmenting the dataset

    Training the XGBoost classification model

    Evaluating the XGBoost classification model

    Using the XGBoost classification model

    Drawing business conclusions

    Summary

    Further resources

    Chapter 11: Implementing Deep Neural Networks

    Technical requirements

    Introducing the business scenario

    Discovering DNNs

    DNNs in BigQuery ML

    Preparing the dataset

    Training the DNN models

    Evaluating the DNN models

    Using the DNN models

    Drawing business conclusions

    Deep neural networks versus linear models

    Summary

    Further resources

    Section 4: Further Extending Your ML Capabilities with GCP

    Chapter 12: Using BigQuery ML with AI Notebooks

    Technical requirements

    Discovering AI Platform Notebooks

    AI Platform Notebooks pricing

    Configuring the first notebook

    Implementing BigQuery ML models within notebooks

    Compiling the AI notebook

    Running the code in the AI notebook

    Summary

    Further resources

    Chapter 13: Running TensorFlow Models with BigQuery ML

    Technical requirements

    Introducing TensorFlow

    Discovering the relationship between BigQuery ML and TensorFlow

    Understanding commonalities and differences

    Collaborating with BigQuery ML and TensorFlow

    Converting BigQuery ML models into TensorFlow

    Training the BigQuery ML to export it

    Exporting the BigQuery ML model

    Running TensorFlow models with BigQuery ML

    Summary

    Further resources

    Chapter 14: BigQuery ML Tips and Best Practices

    Choosing the right BigQuery ML algorithm

    Preparing the datasets

    Working with high-quality data

    Segmenting the datasets

    Understanding feature engineering

    Tuning hyperparameters

    Using BigQuery ML for online predictions

    Summary

    Further resources

    Other Books You May Enjoy

    Preface

    Machine Learning (ML) democratization is one of the fastest growing trends in the AI industry. In this field, BigQuery ML represents a fundamental tool for bridging the gap between data analysis and the implementation of innovative ML models. Through this book, you will have the opportunity to learn how to use BigQuery and BigQuery ML with an incremental approach that combines technical explanations with hands-on exercises. Following a brief introduction, you will immediately be able to build ML models on concrete use cases using BigQuery ML. By the end of this book, you will be able to choose the right ML algorithm to train, evaluate, and use advanced ML models.

    Who this book is for

    This book is for data scientists, data analysts, data engineers, and anyone looking to get started with Google's BigQuery ML. You'll also find this book useful if you want to accelerate the development of ML models or if you are a business user who wants to apply ML in an easy way using SQL. A basic knowledge of BigQuery and SQL is required.

    What this book covers

    Chapter 1, Introduction to Google Cloud and BigQuery, provides an overview of the Google Cloud Platform and of the BigQuery analytics database.

    Chapter 2, Setting Up Your GCP and BigQuery Environment, explains the configuration of your first Google Cloud account, project, and BigQuery environment.

    Chapter 3, Introducing BigQuery Syntax, covers the main SQL operations for working on BigQuery.

    Chapter 4, Predicting Numerical Values with Linear Regression, explains the development of a linear regression ML model to predict the trip durations of a bike rental service.

    Chapter 5, Predicting Boolean Values Using Binary Logistic, explains the implementation of a binary logistic regression ML model to predict the behavior of a taxi company's customers.

    Chapter 6, Classifying Trees with Multiclass Logistic Regression, explains the development of a multiclass logistic ML algorithm to automatically classify species of trees according to their natural characteristics.

    Chapter 7, Clustering Using the K-Means Algorithm, covers the implementation of a clustering system to identify the best-performing drivers in a taxi company.

    Chapter 8, Forecasting Using Time Series, outlines the design and implementation of a forecasting tool to predict and present the sales of specific products.

    Chapter 9, Suggesting the Right Product by Using Matrix Factorization, explains how to build a recommendation engine, using the matrix factorization algorithm, that suggests the best product to each customer.

    Chapter 10, Predicting Boolean Values Using XGBoost, covers the implementation of a boosted tree ML model to predict the behavior of a taxi company's customers.

    Chapter 11, Implementing Deep Neural Networks, covers the design and implementation of a Deep Neural Network (DNN) to predict the trip durations of a bike rental service.

    Chapter 12, Using BigQuery ML with AI Notebooks, explains how AI Platform Notebooks can be integrated with BigQuery ML to develop and share ML models.

    Chapter 13, Running TensorFlow Models with BigQuery ML, explains how BigQuery ML and TensorFlow can work together.

    Chapter 14, BigQuery ML Tips and Best Practices, covers ML best practices and tips that can be applied during the development of a BigQuery ML model.

    To get the most out of this book

    You will need to have a basic knowledge of SQL syntax and some experience of using databases.

    A knowledge of the fundamentals of ML is not mandatory but is advised.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you to avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-with-BigQuery-ML. In case there's an update to the code, it will be updated on the existing GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Code in Action

    Code in Action videos for this book can be viewed at https://bit.ly/3f11XbU.

    Download the color images

    We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800560307_ColorImages.pdf.

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: Sort the results of a query according to a specific list of fields with the ORDER BY clause.

    A block of code is set as follows:

    UPDATE

        `bigqueryml-packt.03_bigquery_syntax.first_table`

    SET

        description= 'This is my updated description'

    WHERE

        id_key=1;

    Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: BigQuery supports two different SQL dialects: standard SQL and legacy SQL.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

    Reviews

    Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

    For more information about Packt, please visit packt.com.

    Section 1: Introduction and Environment Setup

    This section provides an introduction to machine learning and an overview of the technical tools that will be used in the next sections of the book: Google Cloud Platform, BigQuery, and BigQuery ML, as well as the SQL syntax related to it.

    This section comprises the following chapters:

    Chapter 1, Introduction to Google Cloud and BigQuery

    Chapter 2, Setting Up Your GCP and BigQuery Environment

    Chapter 3, Introducing BigQuery Syntax

    Chapter 1: Introduction to Google Cloud and BigQuery

    The adoption of the public cloud enables companies and users to access innovative and cost-effective technologies. This is particularly valuable in the big data and Artificial Intelligence (AI) areas, where new solutions are providing possibilities that seemed impossible to achieve with on-premises systems only a few years ago. In order to be effective in the day-to-day business of a company, the new AI capabilities need to be shared between different roles and not concentrated only with technicians. Most cloud providers are currently addressing the challenge of democratizing AI across different departments and employees with different skills.

    In this context, Google Cloud provides several services to accelerate the processing of large amounts of data and build Machine Learning (ML) applications that can make better decisions.

    In this chapter, we'll gradually introduce the main concepts that will be useful in the upcoming hands-on activities. Using an incremental approach, we'll go through the following topics:

    Introducing Google Cloud Platform

    Exploring AI and ML services on GCP

    Introducing BigQuery

    Discovering BigQuery ML

    Understanding BigQuery pricing

    Introducing Google Cloud Platform

    Starting from 1998 with the launch of Google Search, Google has developed one of the largest and most powerful IT infrastructures in the world. Today, this infrastructure is used by billions of users to use services such as Gmail, YouTube, Google Photo, and Maps. After 10 years, in 2008, Google decided to open its network and IT infrastructure to business customers, taking an infrastructure that was initially developed for consumer applications to public service and launching Google Cloud Platform (GCP).

    The 90+ services that Google currently provides to large enterprises and small- and medium-sized businesses cover the following categories:

    Compute: Used to support workloads or applications with virtual machines such as Google Compute Engine, containers with Google Kubernetes Engine, or platforms such as AppEngine.

    Storage and databases: Used to store datasets and objects in an easy and convenient way. Some examples are Google Cloud Storage, Cloud SQL, and Spanner.

    Networking: Used to easily connect different locations and data centers across the globe with Virtual Private Clouds (VPCs), firewalls, and fully managed global routers.

    Big data: Used to store and process large amounts of information in a structured, semi-structured, or unstructured format. Among these services are Google DataProc, the Hadoop services offered by GCP, and BigQuery, which is the main focus of this book.

    AI and machine learning: This product area provides various tools for different kinds of users, enabling them to leverage AI and ML in their everyday business. Some examples are TensorFlow, AutoML, Vision APIs, and BigQuery ML, the main focus of this book.

    Identity, security, and management tools: This area includes all the services that are necessary to prevent unauthorized access, ensure security, and monitor all other cloud infrastructure. Identity Access Management, Key Management Service, Cloud Logging, and Cloud Audit Logs are just some of these tools.

    Internet of Things (IoT): Used to connect plants, vehicles, or any other objects to the GCP infrastructure, enabling the development of modern IoT use cases. The core component of this area is Google IoT Core.

    API management: Tools to expose services to customers and partners through REST APIs, providing the ability to fully leverage the benefits of interconnectivity. In this pillar, Google Apigee is one of the most famous products and is recognized as the leader of this market segment.

    Productivity: Used to improve productivity and collaboration for all companies that want to start working with Google and embracing its way of doing business through the powerful tools of Google Workplace (previously GSuite).

    Interacting with GCP

    All the services just mentioned can be accessed through four different interfaces:

    Google Cloud Console: The web-based user interface of GCP, easily accessible from compatible web browsers such as Google Chrome, Edge, or Firefox. For the hands-on exercises in this book, we'll mainly use Google Cloud Console:

    Figure 1.1 – Screenshot of Google Cloud Console

    Figure 1.1 – Screenshot of Google

    Enjoying the preview?
    Page 1 of 1