Data Analytics
Data Analytics
Visualization
ALL-IN-ONE
Copyright © 2024 by John Wiley & Sons, Inc., Hoboken, New Jersey
Media and software compilation copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under
Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher.
Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons,
Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/
go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related
trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without
written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not
associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHORS MAKE NO REPRESENTATIONS
OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK
AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS
FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL
MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION.
THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL,
ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES
OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHORS
SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS
REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES
NOT MEAN THAT THE AUTHORS OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR
WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT
INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK
WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within
the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit
https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with
standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to
media such as a CD or DVD that is not included in the version you purchased, you may download this material at
http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
Table of Contents
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Foolish Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Icons Used in This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Beyond the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Table of Contents v
What’s All the Fuss about Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Welcome to the zettabyte era. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
From data to insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Identifying Important Data Sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Role of Big Data in Data Science and Engineering . . . . . . . . . . . . . . . . . 36
Defining data science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Defining machine learning engineering. . . . . . . . . . . . . . . . . . . . . . . 37
Defining data engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Connecting Big Data with Business Intelligence. . . . . . . . . . . . . . . . . . . 39
Analyzing Data with Enterprise Business Intelligence Practices. . . . . . 39
Table of Contents ix
Exporting reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Perfecting reports for distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Diving into Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Configuring dashboards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Creating a new dashboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Enriching your dashboard with content. . . . . . . . . . . . . . . . . . . . . . 242
Pinning reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Table of Contents xi
Structuring for Data Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Binning and histograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Distributions and outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Pivoting with data: Tall versus wide . . . . . . . . . . . . . . . . . . . . . . . . . 345
Normalizing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Table of Contents xv
Natural join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Condition join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
Column-name join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Inner join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Outer join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
ON versus WHERE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
Join Conditions and Clustering Indexes. . . . . . . . . . . . . . . . . . . . . . . . . 603
INDEX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761
Data analytics and visualization allow anyone to turn raw data into meaningful
stories and insights. You, as the analyst, act as the detective. Instead of having
to solve a mystery with clues, you are provided datasets that, if provided with
enough clarity, can answer complex questions using trend and pattern analysis.
If you review a dataset enough, you’ll inevitably have an ah-ha moment in your
interpretation quest, but if the dataset can be presented visually, you can accel-
erate your understanding like a racecar going from 0 to 100 miles per hour in
seconds.
Data analytics and visualization help you uncover creative ways to showcase data
in a manner that is both informative and engaging. Data often starts out as noth-
ing more than a bunch of jumbled numbers; turning those numbers into a story
that can influence decisions and drive change is incredibly powerful. Global enter-
prises rely on folks who have the skills you are about to embark on in this book as
a way to determine business strategies, make corporate decisions, and influence
change. If you are ready to learn these skills, you are in for a treat with this book.
Introduction 1
Book 1 covers the foundational aspects of the data analytics and visualiza-
tion lifecycle that every user must understand to be proficient as an analyt-
ics and visualization savvy. Books 2 and 3 focus on the two leading tools in
the enterprise business intelligence market used to perform complex data
analytics and visualization tasks; Microsoft Power BI and Tableau. Books 4
through 6 cover the key programming languages used by both proprietary and
open-source data analytics and visualization platforms to extract, assess, and
visualize data at scale when commercial off-the-shelf enterprise business plat-
forms are unavailable.
»» Bold text means that you’re meant to type the text just as it appears in the
book. The exception is when you’re working through a steps list: Because each
step is bold, the text to type is not bold.
»» For command sequences in software, this book uses the command arrow.
Here’s an example that uses Microsoft Word: Click the Office button and
then choose Page Layout➪ Margins➪ Narrow to decrease the default
margin setting.
If you don’t think the book contains any conventions that need to be spelled out in
this section, discuss omitting conventions information with your editor.
Foolish Assumptions
To get the most out of this book, you need the following:
»» Access to the Internet: This may sound a bit obvious. Even with the Desktop
client, an Internet connection is required in order to access datasets from
the Internet.
Introduction 3
Best Practice icons highlight points of common knowledge among seasoned
professionals in the data industry. If you don’t want to look like a complete new-
bie, follow the well-worn advice described in these paragraphs.
Tips point out shortcuts or essential suggestions that you can use to do things
quicker, faster, and more efficiently.
Consider these small suggestions that are quite helpful. Remember icons are like
signs on the road to suggest a potential better route.
The Technical Stuff icon marks information of a highly technical nature that you
can normally skip over. When appropriate, these paragraphs also suggest special-
ized resources you may find helpful down the road.
The Warning icon makes you aware of a common issue or product challenge many
users face. Don’t fret, but do take note when you see this icon.
If you want to learn the essential data analytics and visualization concepts, includ-
ing learning the lingo of the land, head to Book 1.
The underpinning for data analytics and visualization is SQL, a querying language.
To get a crash course on SQL, which is necessary for any proprietary or open-
source data analytics and visualization platform, head to Book 4.
Introduction 5
1
Learning Data
Analytics &
Visualizations
Foundations
Contents at a Glance
CHAPTER 1: Exploring Definitions and Roles . . . . . . . . . . . . . . . . . . . . . 9
What Is Data, Really?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Discovering Business Intelligence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Understanding Data Analytics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Exploring Data Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Diving into Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Visualizing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17