Data Science in R With Rubén Sánchez Sancho
Data Science in R With Rubén Sánchez Sancho
The objective of this book is to provide you with solid foundations in the vast
majority of tools. Our model of necessary tools in a typical project of Data Science
is shown in the following figure:
https://i.imgur.com/DO2BKK7.png
The book is organized according to the necessary tools in a typical Data Science
project, in the order in which we will use them in our data analysis.
Programming in R
In the first part of the book, we will learn the programming language R:
1. The Syntax of R.
2. Data Structures in R.
3. Control Structures in R.
4. Functions in R.
5. Packages in R.
Import Data in R
In the second part of the book we will deal with the two tasks of the data import
phase and which we will analyze next:
First, we will see how to import our data to R. With this we mean, that we will find
data stored in files, databases, or in a web API, and the objective of this task will
be to load the data in a dataframe.
Once we have imported our data, the next task will be tidy our data. The objective
of this task is to store our data in a consistent format in which the semantics of
the data set coincide with the medium in which they are stored. In summary, our
data will be in tidy format when each variable is in a column and each observation
in its own row.
Data Exploration
In the third part of the book we will deal with the tasks of the exploration phase of
our data, known in English as data wrangling.
First, a common task is to transform our data. Transforming our data includes
filtering the observations of our interest (such as all the people in a city, or all the
data of the last year), creating new variables resulting from the calculation of
functions of existing variables (for example, calculating the speed with space and
time) and, calculate a set of statistical indicators in a summary (such as means
or standard deviations).
Afterwards, we will visualize and model our data. In this course we will only treat
visualization.