0% found this document useful (0 votes)
35 views15 pages

R Proook Pages 1

Uploaded by

jivobol408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views15 pages

R Proook Pages 1

Uploaded by

jivobol408
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

R Programming

– An Approach to Data Analytics

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
R Programming
– An Approach to Data Analytics

Dr. G. Sudhamathy

Assistant Professor, Department of Computer Science,


Avinashilingam University, Coimbatore – 641043.
Dr. C. Jothi Venkateswaran

Principal, Government Arts and Science College,


Perumbakkam, Chennai – 600100.

Chennai New Delhi Tirunelveli

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


right
r Copy
Honou & cy
e Pira
Exclud
ed by
rotect
b o o k i s p duction of
This epro
ght. R cluding
c o p y r i in any form in e done
p a r t not b
any shall
opying on from
photoc ith authorizati
w
except sher.
li
the pub

ISBN 978-81-8094-408-6 MJP Publishers


All rights reserved No. 44, Nallathambi Street,
Printed and bound in India Triplicane, Chennai 600 005
MJP 371 © Publishers, 2018
Publisher: C. Janarthanan
Project Editor: C. Ambica

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


FOREWORD

If you are looking for a complete step-by-step instructions for learning R Programming
for Statistical Data Analysis, Graphical Visualization and Data Mining, authors Dr.
Sudhamathy & Dr. Jothi venkateswaran’s “R Programming - An Approach to Data
Analytics” is a hands-on book packed with examples and references that would help
you get started coding in R for variety of data science problems.

As the authors explain in their book, understanding the techniques and


algorithms of data analytics for large dataset is critical for effective data classification.
This helps as developer not just learn R Programming but also to apply right
algorithms and statistical model.

Hopefully you can take the instructions provided in this book to get started in R
programming for your next data analysis project, do some exciting data visualization
and data mining on your own.

Mr. Sajeev Madhavan


Director of Architecture, ORACLE,
San Francisco Bay Area, USA

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
FOREWORD

It’s my immense happiness in penning this foreword for a book that is quite impressive
for any techie who is interested in R-programming. It’s also equally joyous to have a
book written by experts, Dr. G. Sudhamathy and Dr. C. Jothi Venkateswaran. When
a book can teach you and guide you as you work hands on the tool, you are in the
right direction in your learning path.

This book on R-Programming starts with simple concepts like programming


logic statements, data types, moves on to cover some advanced topics like Statistical
Analysis using R, Data mining using R, in addition to the Graphics programming.
The concepts are well supported by the real-time works carried out in R, with
sufficient figurative illustrations wherever it is strongly essential for explanations.
The powerful presentations on the Analytics chapters have an in-depth knowledge
of the tool, with a vital emphasis on the Analytics which have been supported with
the five most important case-studies given towards the end of the book.

One can be definitively sure that book will be of great help and guidance for the
learner to carry out their works on Analytics using R, either in the research, practice
or just to learn the tool.

My heartfelt appreciations to the authors who have done a wonderful job in


bringing out this book which is very much needed at this point of the hour where
the entire world is diving into data, Big data and analyzing those data for newer
knowledge and perceptions to drive everyday business.

Best wishes for a bestselling of this book in the Academia, Research and Practice.

Dr. S. Justus
Associate Professor & Chair - Software Engineering Research Group
VIT University, Chennai
Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
PREFACE

Huge volumes of data are being generated by many sources like commercial
enterprises, scientific domains and general public daily. According to a recent
research, data production will be 44 times greater in 2020 than it was in 2010.
Data being a vital resource for business organizations and other domains like
education, health, manufacturing etc., its management and analysis is becoming
increasingly important. This data, due to its volume, variety and velocity, often
referred to as Big Data, also includes highly unstructured data in the form of textual
documents, web pages, graphical information and social media comments. Since
Big Data is characterised by massive sample sizes, high dimensionality and intrinsic
heterogeneity, traditional approaches to data management, visualisation and
analytics are no longer satisfactorily applicable. There is therefore an urgent need
for newer tools, better frameworks and workable methodologies for such data to
be appropriately categorised, logically segmented, efficiently analysed and securely
managed. This requirement has resulted in an emerging new discipline of Data
Science that is now gaining much attention with researchers and practitioners in
the field of Data Analytics.

R is a programming language and a free open source software environment for


data analytics. It is growing exponentially by most measures, count over a million
users, and it has over 10,865 standard and add-on packages contributed by the
community, with that number increasing by about 25% each year. R is a powerful
tool for approaching statistical, graphical, and data mining problems. It is used
by many organizations and individuals daily to perform serious data analytics. R
does not require the users to have basic programming knowledge as it is made up
of many inbuilt packages and function which can achieve very complex processing
easily in fraction of seconds. This book is full of easy simple steps to achieve greater
results with complex data. It also details on how to model a specific problem and
come out with predictions for the future. The main motivation of this book is to
break the complexities remaining in the minds of students and researchers about
Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
x Preface

R programming language and make it easy to approach by any one. The chapters
are designed in such a fashion that it targets the beginners with the first 4 chapters
and targets the advanced concept learners in the next 3 chapters. The book also
helps the reader with the list of all packages and functions used in this book along
with the page numbers to know the usage of those. Every concept discussed in the
various sections in this book has proper example dealt with a set of code and its
results (as text or as graphs).

The book is organized into 7 chapters and the concept discussed in each chapter
is as detailed below.

Chapter 1 introduces the programming language R, briefs on how to install R


Studio, how to use the editor and write simple code using R. This chapter also details
on how to get help in R from its manuals and how to perform simple mathematical
operations using R. The chapter then progresses with the introduction of the
concepts of packages, environments and functions. Finally this chapter details on
the programming concepts of flow control and loops.

Chapter 2 discusses on the basic data types in R, the primitive data types such
as vectors, matrices and arrays, lists and factors. It also deals with the complex data
types such as data frames, strings, dates and times. The chapter not only discusses
on the data creation, but also basic operations on the data of different data types.

Chapter 3 deals with data preparation in which it details on how and where to
fetch the datasets from, how to import and export data from various sources which
are of different types like CSV files, XML files, etc. It also discusses on the ways of
accessing various databases. The data cleaning and transformation techniques such
as data reshaping, grouping functions are also outlined in this chapter.

Chapter 4 is about using the graphical features in R for exploratory data analysis.
It gives examples of pie charts, scatter plots, line plots, histograms, box plots and
bar plots using the various graphical packages such as base, lattice and ggplot2.

Chapter 5 deals with statistical analysis concepts using R such as the basic
statistical measures like mean, median, mode, standard deviation, variance and
ranges. It discusses on the distribution of data as normal distribution and binomial

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


xi Preface

distribution and how it can be viewed and analyzed using R. Then, the chapter
explores on the complex statistical techniques such as correlation analysis, regression
analysis, ANOVA and hypothesis testing which can be implemented using R.

Chapter 6 is about exploring the data mining techniques available in R. It


explores the K-Means, K-Medoids, Hierarchical and Density Based Clustering
techniques using proper examples and case studies. The decision tree classification
techniques are also discussed with suitable examples. Outlier detection is also
explored using various techniques such as univariate, multivariate, LOF and
clustering. Dimensionality reduction is done using PCA and feature selection.
Association rule mining is done using the titanic dataset and a proper case study
analysis is presented in this chapter.

Chapter 7 is mainly to explore the various essential case studies such as text
analytics, credit risk analysis, social network analysis and few exploratory data
analysis. The main purpose of this chapter is to use the basic and advanced concepts
presented in the other previous chapters of this book.

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)
ACKNOWLEDGEMENTS

One of the authors (Dr. G. Sudhamathy) thanks the authorities of Avinashilingam


University, Coimbatore, for providing all the support for making this book a reality.

The author expresses her reverential gratitude to Shri. Dr. P. R. Krishnakumar,


Chancellor, Dr. Premavathy Vijayan, Vice Chancellor and Dr. A. Kowsalya, Registrar,
Avinashilingam Universty, Coimbatore, for providing the opportunity to publish
this book.

The author would like to mention her special regards and thanks to Dr. G.
P. Jeyanthi, Research and Consultancy Director, Dr. A. Parvathi, Dean, Faculty of
Science and Dr. V. Radha, Head, Department of Computer Science, Avinashilingam
Universty, Coimbatore, for their constant encouragement and support to turn this
work into a useful product.

A special thanks to Dr. G. Padmavathi, Professor, Department of Computer


Science, Avinashilingam University, Coimbatore, who was the motivational support
for acquiring the technical knowledge behind this book.

The author wishes to thank all the faculty members of the Department of
Computer Science, Avinashilingam University, Coimbatore, for their continuous
support and suggestions for this book.

We are grateful to the students and teacher community who kept us on our
toes with their constant bombardment of queries which prompted us to learn more,
simplify our learning and findings and place them neatly in a book.

Our Special regards for the experts Mr. Sajeev Madhavan, Director of
Architecture, Oracle, USA and Dr. S. Justus, Associate Professor, VIT, Chennai who
gave their expert opinion in shaping this book into a more appealing format.

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)


xiv Acknowledgement

Most importantly we would like to thank our family members without whose
support this book would not have been a reality.

Last, but not the least, this work is a dedication to God, the Almighty whose
grace has showered upon us in making our dream come true.

G. Sudhamathy
C. Jothi Venkateswaran

Libros de Estadística-Ciencia de Datos|Statistics-Data Science Books (PDF)

You might also like