0% found this document useful (0 votes)

6 views13 pages

Undergraduate Topics in Computer Science: Series Editor

The document outlines the 'Undergraduate Topics in Computer Science' series, which provides high-quality instructional content for undergraduate students in computing and information science. It introduces the second edition of 'Introduction to Data Science,' focusing on concepts, techniques, and applications in data science, with practical examples using Python. The book is intended for upper-tier undergraduate and beginning graduate students, as well as professionals in continuous education, and includes supplemental resources such as IPython Notebooks.

Uploaded by

6630240021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

Undergraduate Topics in Computer Science: Series Editor

Uploaded by

6630240021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Undergraduate Topics in Computer

Science

Series Editor
Ian Mackie, University of Sussex, Brighton, UK

Advisory Editors
Samson Abramsky , Department of Computer Science, University of Oxford,
Oxford, UK
Chris Hankin , Department of Computing, Imperial College London, London, UK
Mike Hinchey , Lero—The Irish Software Research Centre, University of Limerick,
Limerick, Ireland
Dexter C. Kozen, Department of Computer Science, Cornell University, Ithaca,
NY, USA
Hanne Riis Nielson , Department of Applied Mathematics and Computer Science,
Technical University of Denmark, Kongens Lyngby, Denmark
Steven S. Skiena, Department of Computer Science, Stony Brook University, Stony
Brook, NY, USA
Iain Stewart , Department of Computer Science, Durham University, Durham, UK
Joseph Migga Kizza, Engineering and Computer Science, University of Tennessee at
Chattanooga, Chattanooga, TN, USA
Roy Crole, School of Computing and Mathematics Sciences, University of Leicester,
Leicester, UK
Elizabeth Scott, Department of Computer Science, Royal Holloway University of
London, Egham, UK
‘Undergraduate Topics in Computer Science’ (UTiCS) delivers high-quality
instructional content for undergraduates studying in all areas of computing and
information science. From core foundational and theoretical material to final-year
topics and applications, UTiCS books take a fresh, concise, and modern approach
and are ideal for self-study or for a one- or two-semester course. The texts
are authored by established experts in their fields, reviewed by an international
advisory board, and contain numerous examples and problems, many of which
include fully worked solutions.
The UTiCS concept centers on high-quality, ideally and generally quite concise
books in softback format. For advanced undergraduate textbooks that are likely
to be longer and more expository, Springer continues to offer the highly regarded
Texts in Computer Science series, to which we refer potential authors.
Laura Igual · Santi Seguí

Introduction to Data
Science
A Python Approach to Concepts,
Techniques and Applications
Second Edition
Laura Igual Santi Seguí
Departament de Matemàtiques i Informàtica Departament de Matemàtiques i Informàtica
Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

With Contribution by
Jordi Vitrià Eloi Puertas
Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

Petia Radeva Oriol Pujol

Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

Sergio Escalera Francesc Dantí

Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

ISSN 1863-7310 ISSN 2197-1781 (electronic)

Undergraduate Topics in Computer Science
ISBN 978-3-031-48955-6 ISBN 978-3-031-48956-3 (eBook)
https://doi.org/10.1007/978-3-031-48956-3

1st edition: © Springer International Publishing Switzerland 2017

2nd edition: © Springer Nature Switzerland AG 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

In memory of our colleague and friend,
Dr. Lluís Garrido.
Preface

Subject Area of the Book

In this era, where a huge amount of information from different fields is gath-
ered and stored, its analysis and the extraction of value have become one of the
most attractive tasks for companies and society in general. The design of solutions
for the new questions emerged from data has required multidisciplinary teams.
Computer scientists, statisticians, mathematicians, biologists, journalists, and soci-
ologists, as well as many others are now working together in order to provide
knowledge from data. This new interdisciplinary field is called data science. The
pipeline of any data science goes through asking the right questions, gathering
data, cleaning data, generating hypothesis, making inferences, visualizing data,
assessing solutions, etc.

Organization and Feature of the Book

This book is an introduction to concepts, techniques, and applications in data sci-

ence. This book focuses on the analysis of data, covering concepts from statistics
to machine learning and deep learning; techniques for graph analysis and natu-
ral language processing; applications such as recommender systems or sentiment
analysis. The book ends with a review of the ethical issues that must be addressed
to achieve responsible data science.
All chapters introduce new concepts that are illustrated by practical cases using
real data (except the last chapter on responsible data science). Public databases
such as Eurostat, different social networks, and MoviLens are used. Specific ques-
tions about the data are posed in each chapter. The solutions to these questions are
implemented using Python programming language and presented in code boxes
properly commented. This allows the reader to learn data science by solving
problems which can generalize to other problems.
This book is not intended to cover the whole set of data science methods nei-
ther to provide a complete collection of references. Currently, data science is
an increasing and evolving field, so readers are encouraged to look for specific
methods and references using keywords in the net.

vii
viii Preface

Target Audiences

This book is addressed to upper-tier undergraduate and beginning graduate stu-

dents from technical disciplines. Moreover, this book is also addressed to profes-
sional audiences following continuous education short courses and to researchers
from diverse areas following self-study courses.
Basic skills in computer science, mathematics, and statistics are required. Code
programming in Python is of benefit. However, even if the reader is new to Python,
this should not be a problem, since acquiring the Python basics is manageable in
a short period of time.

Previous Uses of the Materials

Parts of the presented materials have been used in the postgraduate course of Data
Science and Big Data from Universitat de Barcelona. All contributing authors are
involved in this course.

Suggested Uses of the Book

This book can be used in any introductory data science course. The problem-based
approach adopted to introduce new concepts can be useful for the beginners. The
implemented code solutions for different problems are a good set of exercises for
the students. Moreover, these codes can serve as a baseline when students face
bigger projects.

Supplemental Resources

This book is accompanied by a set of IPython Notebooks containing all the codes
necessary to solve the practical cases of the book. The Notebooks can be found on
the following GitHub repository: https://github.com/DataScienceUB/introduction-
datascience-python-book.

Acknowledgments

We acknowledge all the contributing authors: J. Vitrià, E. Puertas, P. Radeva,

O. Pujol, S. Escalera and F. Dantí.

Barcelona, Spain Laura Igual

Santi Seguí
Contents

1 Introduction to Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 What Is Data Science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Data Science Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Fundamental Python Libraries for Data Scientists . . . . . . . . . . . . . . 6
2.4 Datascience Ecosystem Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Integrated Development Environments (IDE) . . . . . . . . . . . . . . . . . . 8
2.6 Useful Resources for Data Scientists . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Get Started with Python and Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 The Adult Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Summarizing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Data Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Outlier Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.4 Measuring Asymmetry: Skewness and Pearson’s
Median Skewness Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.5 Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.6 Kernel Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Sample and Estimated Mean, Variance
and Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.2 Covariance, and Pearson’s and Spearman’s Rank
Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

ix
x Contents

4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Statistical Inference: The Frequentist Approach . . . . . . . . . . . . . . . . 52
4.3 Measuring the Variability in Estimates . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Testing Hypotheses Using Confidence Intervals . . . . . . . 60
4.4.2 Testing Hypotheses Using p-Values . . . . . . . . . . . . . . . . . . . 61
4.5 But, Is the Effect E Real? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 First Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 What Is Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Learning Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Training, Validation and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.7 Two Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7.1 Generalities Concerning Learning Models . . . . . . . . . . . . . 87
5.7.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.7.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.8 Ending the Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.9 A Toy Business Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 Simple Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . 100
6.2.2 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.3 Practical Case 1: Sea Ice Data and Climate Change . . . 103
6.2.4 Polynomial Regression Model . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.5 Regularization and Sparse Models . . . . . . . . . . . . . . . . . . . . 108
6.2.6 Practical Case 2: Boston Housing Data and Price
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Practical Case 3: Winning or Losing Football
Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Contents xi

7 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2.1 Similarity and Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.2 What Is a Good Clustering? Defining Metrics
to Measure Clustering Quality . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.3 Taxonomies of Clustering Techniques . . . . . . . . . . . . . . . . . 130
7.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.2 Basic Definitions in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.3 Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3.1 NetworkX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3.2 Practical Case: Facebook Dataset . . . . . . . . . . . . . . . . . . . . . 155
8.4 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4.1 Drawing Centrality in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.4.2 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5 Ego-Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.6 Community Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.2 How Do Recommender Systems Work? . . . . . . . . . . . . . . . . . . . . . . . 176
9.2.1 Content-Based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.2.2 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.2.3 Hybrid Recommenders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.3 Modelling User Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.4 Evaluating Recommenders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.5 Practical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.5.1 MovieLens Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.5.2 The Naïve Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.5.3 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . 184
9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10 Basics of Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.3 Text Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.3.1 Bi-Grams and n-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
xii Contents

10.4 Practical Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.2 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3.1 Training Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.4 Playing with Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.5 Deep Learning with Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.5.1 Running Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.5.2 Our First Neural Network Model . . . . . . . . . . . . . . . . . . . . . 215
11.6 Practical Case: Building an Image Classifier . . . . . . . . . . . . . . . . . . . 219
11.7 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.7.1 Building Blocks of a CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.8 Practical Case: Improving Our Image Classifier . . . . . . . . . . . . . . . . 226
11.8.1 Avoiding Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
12 Responsible Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
12.1 Data Science and Social Responsibility . . . . . . . . . . . . . . . . . . . . . . . . 233
12.2 What Does Data Have to Do with Ethics? . . . . . . . . . . . . . . . . . . . . . 234
12.3 Responsible Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
12.3.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12.3.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
12.3.3 Robustness and Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
12.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Authors and Contributors

About the Authors

Dr. Laura Igual is an Associate Professor at the Department of Mathematics and

Computer Science at the Universitat de Barcelona. She received Degree in Math-
ematics from Universitat de Valencia (Spain) in 2000 and Ph.D. degree from the
Universitat Pompeu Fabra (Spain) in 2006. Her areas of interest include computer
vision, medical imaging, machine learning, and data science.

Dr. Santi Seguí is an Associate Professor at the Department of Mathematics and

Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2007.
He received his Ph.D. degree from the Universitat de Barcelona (Spain) in 2011.
His areas of interest include computer vision, applied machine learning, and data
science.

Contributors

Dr. Jordi Vitrià is a Full Professor at the Department of Mathematics and Com-
puter Science at the Universitat de Barcelona. He received his Ph.D. degree from
the Universitat Autònoma de Barcelona in 1990. Dr. Jordi Vitrià has published
more than 100 papers in SCI-indexed journals and has more than 30 years of expe-
rience in working on Computer Vision, Machine Learning, Causal Inference, and
Artificial Intelligence and their applications to several fields. He is now the leader
of the “Data Science Group at Universitat de Barcelona”, a multidisciplinary tech-
nology transfer unit that conveys results from scientific and technological research
in the market and society in general.
Dr. Eloi Puertas is an Assistant Professor in the Department of Mathematics and
Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2002.
He received his Ph.D. degree from the Universitat de Barcelona (Spain) in 2014.
His areas of interest include artificial intelligence, software engineering, and data
science.
xiii
xiv Authors and Contributors

Dr. Petia Radeva is a Full Professor at the Universitat de Barcelona. She graduated
in Applied Mathematics and Computer Science in 1989 at the University of Sofia,
Bulgaria, and received her Ph.D. degree on Computer Vision for Medical Imag-
ing in 1998 from the Universitat Autònoma de Barcelona, Spain. She has been
an ICREA Academia Researcher since 2015, head of the Consolidated Research
Group “Artificial Intelligence and Biomedical Applications”. Her present research
interests are on the development of learning-based approaches for computer vision,
deep learning, data-centric data analysis, food data analysis, egocentric vision, and
data science.
Dr. Oriol Pujol is a Full Professor at the Department of Mathematics and Com-
puter Science at the Universitat de Barcelona. He received his Ph.D. degree from
the Universitat Autònoma de Barcelona (Spain) in 2004 for his work in machine
learning and computer vision. His areas of interest include machine learning,
computer vision, and data science.
Dr. Sergio Escalera is a Full Professor at the Department of Mathematics and
Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2003. He
received his Ph.D. degree from the Universitat Autònoma de Barcelona (Spain)
in 2008. His research interests include, among others, statistical pattern recogni-
tion and visual object recognition, with special interest in behavior analysis from
multi-modal data.
Francesc Dantí is an adjunct professor and system administrator from the Depart-
ment of Mathematics and Computer Science at the Universitat de Barcelona. He is
a computer science engineer by the Universitat Oberta de Catalunya (Spain). His
particular areas of interest are HPC and grid computing, parallel computing, and
cybersecurity. Francesc Dantí is coauthor of Chap. 2.

Introduction To Data Science A Python Approach To Concepts Techniques and Applications 2nd Laura Igual Download
No ratings yet
Introduction To Data Science A Python Approach To Concepts Techniques and Applications 2nd Laura Igual Download
90 pages
Data Science Design
No ratings yet
Data Science Design
299 pages
Introduction To Data Science
100% (2)
Introduction To Data Science
445 pages
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
No ratings yet
Unit 1 FUNDAMENTALS OF DATA SCIENCE-1
27 pages
Previewpdf
No ratings yet
Previewpdf
78 pages
Concise Guide To Databases: A Practical Introduction Konstantinos Domdouzis 2024 Scribd Download
No ratings yet
Concise Guide To Databases: A Practical Introduction Konstantinos Domdouzis 2024 Scribd Download
50 pages
Data Science in Practice (Alan Said Vicenç Torra) (Z-Library)
No ratings yet
Data Science in Practice (Alan Said Vicenç Torra) (Z-Library)
265 pages
Essentials of Python For Artificial Intelligence and Machine Learning Pramod Gupta Instant Download
No ratings yet
Essentials of Python For Artificial Intelligence and Machine Learning Pramod Gupta Instant Download
66 pages
2022 Bookmatter StatisticsForDataScientists
No ratings yet
2022 Bookmatter StatisticsForDataScientists
24 pages
Introduction To Modeling and Simulation With Matlab and Python - Compress
No ratings yet
Introduction To Modeling and Simulation With Matlab and Python - Compress
231 pages
Andrews M. Doing Data Science in R. An Introduction... 2021
No ratings yet
Andrews M. Doing Data Science in R. An Introduction... 2021
486 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
255 pages
Complete Download (Ebook) The Data Science Design Manual by Steven S. Skiena ISBN 9783319554433, 3319554433 PDF All Chapters
100% (9)
Complete Download (Ebook) The Data Science Design Manual by Steven S. Skiena ISBN 9783319554433, 3319554433 PDF All Chapters
65 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Mathematical Foundations of Data Science
No ratings yet
Mathematical Foundations of Data Science
180 pages
Dzemyda G. Data Science in Applications 2023
No ratings yet
Dzemyda G. Data Science in Applications 2023
260 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Lectura 1
No ratings yet
Lectura 1
43 pages
CRC - Introduction.to .Modeling - and .Simulation - With .MATLAB - and .Python.1119484170-Iedu - Us PDF
No ratings yet
CRC - Introduction.to .Modeling - and .Simulation - With .MATLAB - and .Python.1119484170-Iedu - Us PDF
63 pages
Zdzislaw Polkowski (Editor), Sambit Kumar Mishra (Editor), Julian Vasilev (Editor) - Data Science in Engineering and Management - Applications, New Developments, and Future Trends-CRC Press (2022)
No ratings yet
Zdzislaw Polkowski (Editor), Sambit Kumar Mishra (Editor), Julian Vasilev (Editor) - Data Science in Engineering and Management - Applications, New Developments, and Future Trends-CRC Press (2022)
161 pages
Data Science Text
100% (4)
Data Science Text
460 pages
Guide To Intelligent Data Science: Michael R. Berthold Christian Borgelt Frank Höppner Frank Klawonn Rosaria Silipo
100% (1)
Guide To Intelligent Data Science: Michael R. Berthold Christian Borgelt Frank Höppner Frank Klawonn Rosaria Silipo
427 pages
Syllabus PracticalDataScience
No ratings yet
Syllabus PracticalDataScience
7 pages
A Hans On Introduction To Data Science-1-300
No ratings yet
A Hans On Introduction To Data Science-1-300
300 pages
FODS Full Notes
No ratings yet
FODS Full Notes
217 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
36 pages
INTRODUCTION TO DATA SCIENCE - VI SEMESTER - BOOK - DR - PS - 58 COPIES
No ratings yet
INTRODUCTION TO DATA SCIENCE - VI SEMESTER - BOOK - DR - PS - 58 COPIES
190 pages
Applied Data Analysis
No ratings yet
Applied Data Analysis
128 pages
Advanced R
100% (2)
Advanced R
24 pages
An Introduction To Data Analysis in R - 9783030489977 PDF
100% (3)
An Introduction To Data Analysis in R - 9783030489977 PDF
289 pages
Module 1 - 1
No ratings yet
Module 1 - 1
48 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Module 1 Introduction To DataScience and Analytics
No ratings yet
Module 1 Introduction To DataScience and Analytics
10 pages
(Texts in Computer Science) Tomas Hrycej, Bernhard Bermeitinger, Matthias Cetto, Siegfried Handschuh - Mathematical Foundations of Data Science-Springer (2023)
No ratings yet
(Texts in Computer Science) Tomas Hrycej, Bernhard Bermeitinger, Matthias Cetto, Siegfried Handschuh - Mathematical Foundations of Data Science-Springer (2023)
219 pages
Class X AI Unit 4: Data Science
No ratings yet
Class X AI Unit 4: Data Science
57 pages
Data Science
No ratings yet
Data Science
244 pages
Lecture 1 and 2 Powerpoints
No ratings yet
Lecture 1 and 2 Powerpoints
32 pages
DS Syllabus
No ratings yet
DS Syllabus
29 pages
DS Unit 1 - NUMPY
No ratings yet
DS Unit 1 - NUMPY
29 pages
CE880 Lecture 1 Slides
No ratings yet
CE880 Lecture 1 Slides
30 pages
347 862932 Introduction
No ratings yet
347 862932 Introduction
35 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
Industrial Training Report
No ratings yet
Industrial Training Report
24 pages
Data Science
No ratings yet
Data Science
6 pages
Data Science
No ratings yet
Data Science
9 pages
Data Science
No ratings yet
Data Science
8 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Intelligent Techniques For Data Science
100% (12)
Intelligent Techniques For Data Science
282 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Summer Training 2020: Advanced Data Science With IBM & Bionic Robotic Arm
No ratings yet
Summer Training 2020: Advanced Data Science With IBM & Bionic Robotic Arm
10 pages
Unit 1
No ratings yet
Unit 1
21 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
A Review On Data Science Technologies
No ratings yet
A Review On Data Science Technologies
3 pages
Data Science 2
No ratings yet
Data Science 2
3 pages
Oops CS8392 MCQ
No ratings yet
Oops CS8392 MCQ
37 pages
Loss Prevention Bulletin Vol.44 Light
100% (1)
Loss Prevention Bulletin Vol.44 Light
32 pages
Auto Cad 2D Multiple Choice Questions & Answers - Interview Questions and Answers - AtozIQ
100% (11)
Auto Cad 2D Multiple Choice Questions & Answers - Interview Questions and Answers - AtozIQ
3 pages
SEMIKRON Product-Catalogue EN PDF
100% (1)
SEMIKRON Product-Catalogue EN PDF
126 pages
Database Procedure
No ratings yet
Database Procedure
65 pages
Metrohm KF Coulometer 831
No ratings yet
Metrohm KF Coulometer 831
7 pages
ARM Cortex Portfolio - Public Version - 2113
No ratings yet
ARM Cortex Portfolio - Public Version - 2113
5 pages
Elix Essential - Manual
No ratings yet
Elix Essential - Manual
54 pages
My Siwes Report
No ratings yet
My Siwes Report
48 pages
EPG REST Integration V17
No ratings yet
EPG REST Integration V17
48 pages
Leonardo - Pisano - Fibonacci - 5 8498 4 - 64 9
No ratings yet
Leonardo - Pisano - Fibonacci - 5 8498 4 - 64 9
11 pages
Editing Assignment
No ratings yet
Editing Assignment
10 pages
MySQL 8 Installation Guide
No ratings yet
MySQL 8 Installation Guide
11 pages
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
100% (4)
Computer Architecture and Organization 1st Edition by Ian East ISBN 0273030388 9780273030386 Download
43 pages
Image Compression
No ratings yet
Image Compression
15 pages
Functionality Doc Fleet Management App Version1
No ratings yet
Functionality Doc Fleet Management App Version1
15 pages
Picozero Readthedocs Io en Latest
No ratings yet
Picozero Readthedocs Io en Latest
69 pages
MIPS Assembly - Stack
No ratings yet
MIPS Assembly - Stack
24 pages
Klayman Et Al v. Obama Et Al Opinion
No ratings yet
Klayman Et Al v. Obama Et Al Opinion
68 pages
Optima XR200amx: Mobile Digital-Ready Radiographic System
No ratings yet
Optima XR200amx: Mobile Digital-Ready Radiographic System
4 pages
Fisher D4 Control Valve With Gen 2 Easy-Drive Electric Actuator
No ratings yet
Fisher D4 Control Valve With Gen 2 Easy-Drive Electric Actuator
36 pages
Demystifying Noise Spectre Example
No ratings yet
Demystifying Noise Spectre Example
20 pages
Flowchart: North Fairview High School - West Fairview Annex
No ratings yet
Flowchart: North Fairview High School - West Fairview Annex
14 pages
Jawaban - 1.3.2.4 Lab - Tracing Internet Connectivity
No ratings yet
Jawaban - 1.3.2.4 Lab - Tracing Internet Connectivity
11 pages
Carding
0% (1)
Carding
1 page
Your Charges in Detail - 7400447196: Monthly Rentals
No ratings yet
Your Charges in Detail - 7400447196: Monthly Rentals
5 pages
L4D2 Server Config
No ratings yet
L4D2 Server Config
3 pages
AY23-24 CSE TT Time Table - Template - Final
No ratings yet
AY23-24 CSE TT Time Table - Template - Final
2 pages
Dayananda Sagar College of Engineering: Shavige Malleshwara Hills, Kumaraswamy Layout, Bangalore-560078
No ratings yet
Dayananda Sagar College of Engineering: Shavige Malleshwara Hills, Kumaraswamy Layout, Bangalore-560078
12 pages
How To Install Bugzilla?
No ratings yet
How To Install Bugzilla?
6 pages
Introduction to Scientific Programming with Python
From Everand
Introduction to Scientific Programming with Python
Pankaj Jayaraman
No ratings yet
Python Data Science Essentials - Second Edition
From Everand
Python Data Science Essentials - Second Edition
Alberto Boschetti
4.5/5 (3)
Data Science Fusion: Integrating Maths, Python, and Machine Learning
From Everand
Data Science Fusion: Integrating Maths, Python, and Machine Learning
NIBEDITA Sahu
No ratings yet