0% found this document useful (0 votes)
6 views13 pages

Undergraduate Topics in Computer Science: Series Editor

The document outlines the 'Undergraduate Topics in Computer Science' series, which provides high-quality instructional content for undergraduate students in computing and information science. It introduces the second edition of 'Introduction to Data Science,' focusing on concepts, techniques, and applications in data science, with practical examples using Python. The book is intended for upper-tier undergraduate and beginning graduate students, as well as professionals in continuous education, and includes supplemental resources such as IPython Notebooks.

Uploaded by

6630240021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Undergraduate Topics in Computer Science: Series Editor

The document outlines the 'Undergraduate Topics in Computer Science' series, which provides high-quality instructional content for undergraduate students in computing and information science. It introduces the second edition of 'Introduction to Data Science,' focusing on concepts, techniques, and applications in data science, with practical examples using Python. The book is intended for upper-tier undergraduate and beginning graduate students, as well as professionals in continuous education, and includes supplemental resources such as IPython Notebooks.

Uploaded by

6630240021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Undergraduate Topics in Computer

Science

Series Editor
Ian Mackie, University of Sussex, Brighton, UK

Advisory Editors
Samson Abramsky , Department of Computer Science, University of Oxford,
Oxford, UK
Chris Hankin , Department of Computing, Imperial College London, London, UK
Mike Hinchey , Lero—The Irish Software Research Centre, University of Limerick,
Limerick, Ireland
Dexter C. Kozen, Department of Computer Science, Cornell University, Ithaca,
NY, USA
Hanne Riis Nielson , Department of Applied Mathematics and Computer Science,
Technical University of Denmark, Kongens Lyngby, Denmark
Steven S. Skiena, Department of Computer Science, Stony Brook University, Stony
Brook, NY, USA
Iain Stewart , Department of Computer Science, Durham University, Durham, UK
Joseph Migga Kizza, Engineering and Computer Science, University of Tennessee at
Chattanooga, Chattanooga, TN, USA
Roy Crole, School of Computing and Mathematics Sciences, University of Leicester,
Leicester, UK
Elizabeth Scott, Department of Computer Science, Royal Holloway University of
London, Egham, UK
‘Undergraduate Topics in Computer Science’ (UTiCS) delivers high-quality
instructional content for undergraduates studying in all areas of computing and
information science. From core foundational and theoretical material to final-year
topics and applications, UTiCS books take a fresh, concise, and modern approach
and are ideal for self-study or for a one- or two-semester course. The texts
are authored by established experts in their fields, reviewed by an international
advisory board, and contain numerous examples and problems, many of which
include fully worked solutions.
The UTiCS concept centers on high-quality, ideally and generally quite concise
books in softback format. For advanced undergraduate textbooks that are likely
to be longer and more expository, Springer continues to offer the highly regarded
Texts in Computer Science series, to which we refer potential authors.
Laura Igual · Santi Seguí

Introduction to Data
Science
A Python Approach to Concepts,
Techniques and Applications
Second Edition
Laura Igual Santi Seguí
Departament de Matemàtiques i Informàtica Departament de Matemàtiques i Informàtica
Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

With Contribution by
Jordi Vitrià Eloi Puertas
Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

Petia Radeva Oriol Pujol


Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

Sergio Escalera Francesc Dantí


Universitat de Barcelona Universitat de Barcelona
Barcelona, Spain Barcelona, Spain

ISSN 1863-7310 ISSN 2197-1781 (electronic)


Undergraduate Topics in Computer Science
ISBN 978-3-031-48955-6 ISBN 978-3-031-48956-3 (eBook)
https://doi.org/10.1007/978-3-031-48956-3

1st edition: © Springer International Publishing Switzerland 2017


2nd edition: © Springer Nature Switzerland AG 2024

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


In memory of our colleague and friend,
Dr. Lluís Garrido.
Preface

Subject Area of the Book

In this era, where a huge amount of information from different fields is gath-
ered and stored, its analysis and the extraction of value have become one of the
most attractive tasks for companies and society in general. The design of solutions
for the new questions emerged from data has required multidisciplinary teams.
Computer scientists, statisticians, mathematicians, biologists, journalists, and soci-
ologists, as well as many others are now working together in order to provide
knowledge from data. This new interdisciplinary field is called data science. The
pipeline of any data science goes through asking the right questions, gathering
data, cleaning data, generating hypothesis, making inferences, visualizing data,
assessing solutions, etc.

Organization and Feature of the Book

This book is an introduction to concepts, techniques, and applications in data sci-


ence. This book focuses on the analysis of data, covering concepts from statistics
to machine learning and deep learning; techniques for graph analysis and natu-
ral language processing; applications such as recommender systems or sentiment
analysis. The book ends with a review of the ethical issues that must be addressed
to achieve responsible data science.
All chapters introduce new concepts that are illustrated by practical cases using
real data (except the last chapter on responsible data science). Public databases
such as Eurostat, different social networks, and MoviLens are used. Specific ques-
tions about the data are posed in each chapter. The solutions to these questions are
implemented using Python programming language and presented in code boxes
properly commented. This allows the reader to learn data science by solving
problems which can generalize to other problems.
This book is not intended to cover the whole set of data science methods nei-
ther to provide a complete collection of references. Currently, data science is
an increasing and evolving field, so readers are encouraged to look for specific
methods and references using keywords in the net.

vii
viii Preface

Target Audiences

This book is addressed to upper-tier undergraduate and beginning graduate stu-


dents from technical disciplines. Moreover, this book is also addressed to profes-
sional audiences following continuous education short courses and to researchers
from diverse areas following self-study courses.
Basic skills in computer science, mathematics, and statistics are required. Code
programming in Python is of benefit. However, even if the reader is new to Python,
this should not be a problem, since acquiring the Python basics is manageable in
a short period of time.

Previous Uses of the Materials

Parts of the presented materials have been used in the postgraduate course of Data
Science and Big Data from Universitat de Barcelona. All contributing authors are
involved in this course.

Suggested Uses of the Book

This book can be used in any introductory data science course. The problem-based
approach adopted to introduce new concepts can be useful for the beginners. The
implemented code solutions for different problems are a good set of exercises for
the students. Moreover, these codes can serve as a baseline when students face
bigger projects.

Supplemental Resources

This book is accompanied by a set of IPython Notebooks containing all the codes
necessary to solve the practical cases of the book. The Notebooks can be found on
the following GitHub repository: https://github.com/DataScienceUB/introduction-
datascience-python-book.

Acknowledgments

We acknowledge all the contributing authors: J. Vitrià, E. Puertas, P. Radeva,


O. Pujol, S. Escalera and F. Dantí.

Barcelona, Spain Laura Igual


Santi Seguí
Contents

1 Introduction to Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 What Is Data Science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Data Science Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Fundamental Python Libraries for Data Scientists . . . . . . . . . . . . . . 6
2.4 Datascience Ecosystem Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Integrated Development Environments (IDE) . . . . . . . . . . . . . . . . . . 8
2.6 Useful Resources for Data Scientists . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Get Started with Python and Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 The Adult Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Summarizing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Data Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Outlier Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.4 Measuring Asymmetry: Skewness and Pearson’s
Median Skewness Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.5 Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.6 Kernel Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Sample and Estimated Mean, Variance
and Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.2 Covariance, and Pearson’s and Spearman’s Rank
Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

ix
x Contents

4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Statistical Inference: The Frequentist Approach . . . . . . . . . . . . . . . . 52
4.3 Measuring the Variability in Estimates . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.2 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Testing Hypotheses Using Confidence Intervals . . . . . . . 60
4.4.2 Testing Hypotheses Using p-Values . . . . . . . . . . . . . . . . . . . 61
4.5 But, Is the Effect E Real? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 First Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4 What Is Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Learning Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.6 Training, Validation and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.7 Two Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.7.1 Generalities Concerning Learning Models . . . . . . . . . . . . . 87
5.7.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.7.3 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.8 Ending the Learning Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.9 A Toy Business Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 Simple Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . 100
6.2.2 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.3 Practical Case 1: Sea Ice Data and Climate Change . . . 103
6.2.4 Polynomial Regression Model . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2.5 Regularization and Sparse Models . . . . . . . . . . . . . . . . . . . . 108
6.2.6 Practical Case 2: Boston Housing Data and Price
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.3.1 Practical Case 3: Winning or Losing Football
Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Contents xi

7 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2.1 Similarity and Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.2 What Is a Good Clustering? Defining Metrics
to Measure Clustering Quality . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.3 Taxonomies of Clustering Techniques . . . . . . . . . . . . . . . . . 130
7.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.2 Basic Definitions in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.3 Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3.1 NetworkX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.3.2 Practical Case: Facebook Dataset . . . . . . . . . . . . . . . . . . . . . 155
8.4 Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4.1 Drawing Centrality in Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.4.2 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5 Ego-Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.6 Community Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.2 How Do Recommender Systems Work? . . . . . . . . . . . . . . . . . . . . . . . 176
9.2.1 Content-Based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.2.2 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.2.3 Hybrid Recommenders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.3 Modelling User Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.4 Evaluating Recommenders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.5 Practical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.5.1 MovieLens Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.5.2 The Naïve Recommender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.5.3 User-Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . 184
9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10 Basics of Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.3 Text Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.3.1 Bi-Grams and n-Grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
xii Contents

10.4 Practical Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.2 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3 Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3.1 Training Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.4 Playing with Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.5 Deep Learning with Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.5.1 Running Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.5.2 Our First Neural Network Model . . . . . . . . . . . . . . . . . . . . . 215
11.6 Practical Case: Building an Image Classifier . . . . . . . . . . . . . . . . . . . 219
11.7 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.7.1 Building Blocks of a CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.8 Practical Case: Improving Our Image Classifier . . . . . . . . . . . . . . . . 226
11.8.1 Avoiding Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
12 Responsible Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
12.1 Data Science and Social Responsibility . . . . . . . . . . . . . . . . . . . . . . . . 233
12.2 What Does Data Have to Do with Ethics? . . . . . . . . . . . . . . . . . . . . . 234
12.3 Responsible Data Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
12.3.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
12.3.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
12.3.3 Robustness and Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
12.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Authors and Contributors

About the Authors

Dr. Laura Igual is an Associate Professor at the Department of Mathematics and


Computer Science at the Universitat de Barcelona. She received Degree in Math-
ematics from Universitat de Valencia (Spain) in 2000 and Ph.D. degree from the
Universitat Pompeu Fabra (Spain) in 2006. Her areas of interest include computer
vision, medical imaging, machine learning, and data science.

Dr. Santi Seguí is an Associate Professor at the Department of Mathematics and


Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2007.
He received his Ph.D. degree from the Universitat de Barcelona (Spain) in 2011.
His areas of interest include computer vision, applied machine learning, and data
science.

Contributors

Dr. Jordi Vitrià is a Full Professor at the Department of Mathematics and Com-
puter Science at the Universitat de Barcelona. He received his Ph.D. degree from
the Universitat Autònoma de Barcelona in 1990. Dr. Jordi Vitrià has published
more than 100 papers in SCI-indexed journals and has more than 30 years of expe-
rience in working on Computer Vision, Machine Learning, Causal Inference, and
Artificial Intelligence and their applications to several fields. He is now the leader
of the “Data Science Group at Universitat de Barcelona”, a multidisciplinary tech-
nology transfer unit that conveys results from scientific and technological research
in the market and society in general.
Dr. Eloi Puertas is an Assistant Professor in the Department of Mathematics and
Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2002.
He received his Ph.D. degree from the Universitat de Barcelona (Spain) in 2014.
His areas of interest include artificial intelligence, software engineering, and data
science.
xiii
xiv Authors and Contributors

Dr. Petia Radeva is a Full Professor at the Universitat de Barcelona. She graduated
in Applied Mathematics and Computer Science in 1989 at the University of Sofia,
Bulgaria, and received her Ph.D. degree on Computer Vision for Medical Imag-
ing in 1998 from the Universitat Autònoma de Barcelona, Spain. She has been
an ICREA Academia Researcher since 2015, head of the Consolidated Research
Group “Artificial Intelligence and Biomedical Applications”. Her present research
interests are on the development of learning-based approaches for computer vision,
deep learning, data-centric data analysis, food data analysis, egocentric vision, and
data science.
Dr. Oriol Pujol is a Full Professor at the Department of Mathematics and Com-
puter Science at the Universitat de Barcelona. He received his Ph.D. degree from
the Universitat Autònoma de Barcelona (Spain) in 2004 for his work in machine
learning and computer vision. His areas of interest include machine learning,
computer vision, and data science.
Dr. Sergio Escalera is a Full Professor at the Department of Mathematics and
Computer Science at the Universitat de Barcelona. He has been a Computer Sci-
ence Engineer by the Universitat Autònoma de Barcelona (Spain) since 2003. He
received his Ph.D. degree from the Universitat Autònoma de Barcelona (Spain)
in 2008. His research interests include, among others, statistical pattern recogni-
tion and visual object recognition, with special interest in behavior analysis from
multi-modal data.
Francesc Dantí is an adjunct professor and system administrator from the Depart-
ment of Mathematics and Computer Science at the Universitat de Barcelona. He is
a computer science engineer by the Universitat Oberta de Catalunya (Spain). His
particular areas of interest are HPC and grid computing, parallel computing, and
cybersecurity. Francesc Dantí is coauthor of Chap. 2.

You might also like