0% found this document useful (0 votes)
130 views

Chapter 2 Introduction To R and Python

Uploaded by

barnabas
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

Chapter 2 Introduction To R and Python

Uploaded by

barnabas
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

INTRODUCTION TO R

AND PYTHON
CHAPTER
2

“Introduction to Data Science : Practical Approach with R and Python ”


B.Uma Maheswari and R Sujatha
Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
LEARNING OBJECTIVES
After reading this chapter, you would be able to:
 Understand the basics of R and Python programming languages.
 Learn how to work in RStudio environment and Python Jupyter notebook.
 Learn about the various libraries in R and Python used for data analysis.
 Understand the fundamentals such as variables, operators, data types, and functions in
R and Python.
 Understand the creation of data frames and database operations such as append, select,
group, and drop rows and columns.
R
R is a statistical programming language developed by Ross Ihaka and Robert
Gentleman in 1990s.
R language is derived from two other languages such as S and Scheme.
R is an open source software. The source code is made available under the terms of
Free Software Foundation’s GNU general license.
R has good memory management and fewer paging problems
R allows functions to access variables which were in effect when the function was
defined
R uses a device independent 24bit model for colour graphics.
R provides flexible plot layouts and has a World Wide Web interface called Rweb
RSTUDIO ENVIRONMENT
RStudio is an integrated development environment (IDE).
It has the graphical user interface that allows a user to interact with R programming
language very easily.
RStudio provides drop-down menus and many user friendly features to
programmers.
The environment is readily customizable as per the user’s requirements.
1. Source 3. Environment/History

2. Console 4. File/Plots/Packages
FOUR WINDOWS IN RSTUDIO
First Window
 The first window is the source window where the R code is entered.
 A new R Script window is opened by clicking on the menu File->New File ->R
Script.
 The source code is typed in the script window.
 The code has to be executed by using ‘Run’ in the same window or by pressing
“Control + Enter” on the keyboard.
Second Window
 The second window is the console window.
 The codes typed in script window gets executed in the console window.
 In this window, the codes are executed in R prompt symbolized by ‘>’.
FOUR WINDOWS IN RSTUDIO
Third Window
 The third window is the environment/history window.
 The Environment tab of this window displays the names of the all the
variables or dataframes that are currently defined in the R session.
Fourth Window
 The fourth window shows the list of files and packages. T
 he file tab provides access to the file directory of the hard drive.
 The users can navigate to the folder and set it as working directory by
clicking ‘More’ and ‘Set as Working Directory’.
SET WORKING DIRECTORY IN
R
The working directory is the file path of the computer that sets the default location
for reading and saving files in R.
setwd(“D:\datascience\yourfoldername”)
Working directory can also be set using the menu Session->Set Working Directory -
> Choose Directory.
COMMENT STATEMENTS IN R
Comment statements are typed with # in the beginning.
Any statement starting with # symbol will be ignored and will not executed.
It is always a good practice to write explanations about the code using # statements.

### My First Program ###


x<-10
y<-20
VARIABLES IN R
Variables in R programming can be used to store integers, complex numbers, strings,
vectors and matrices.
Declaration of the variables is not need as R is a dynamically programmed language.
Variable names can contain only alphabets, numbers, either dot or underscore.
Variable names should startwith an alphabet followed by alphanumeric characters.
x <- 21
course <- "programing"
newlist <- c(1,2,3,4)
department1 <- "Marketing"
new_number <- 431
var.r <- 3678
DATA TYPES IN R
Data Type Description
Numeric Numeric data type stores decimal values to variables.
as.numeric() function converts the values into numeric data type

Integer Integers are the data types which assigns numbers without decimal digits.
Integer variables can also be created by assigning alphabet ‘L’ to the number
as.integer() function can be used to assign integers to variables.
Complex Complex data types contain two numeric values with an imaginary number ‘i’

Character Character data type is used to represent strings.


Strings should be written within single or double quotes.
as.character() function to convert objects into character values.
Logical Logical data type stores two values, TRUE or FALSE.
OPERATORS IN R Arithmetic
Operator
Operation

+ Addition
Assignment Description – Subtraction
Operator
* Multiplication
<- Assignment rightwards
/ Division
<<- Assignment rightwards
^ Exponent
= Assignment
Modulus Operator. Outputs remainder value after
-> Assignment leftwards %%
division.
->> Assignment leftwards %/% Integer Division

Relational Operator Description


Logical Operator Description
< Less than

> Greater than ! Logical NOT

<= Less than or equal to & Element-wise logical AND

>= Greater than or equal to Logical AND (Compares the first element in
&&
vector)
== Equal to
| Element-wise logical OR
!= Not equal to
Logical OR (Compares the first element in
BUILT-IN FUNCTIONS IN R
Mathematical Description
abs() Absolute value
General Purpose Description
log(x,base=y) Logarithm of x with base y; if base not
range(x) Minimum and Maximum Values
specified, returns the natural logarithm
sort(x) Sort the data in ascending order
exp(x) Exponential value
order(x) Returns the order of the values in
sqrt(x) Square root of the data
the dataset
factorial(x) Factorial of data (x!)
length() Length of the dataset
sign() Sign of the data
Statistical Description
tolower() Converts alphabets to lowercase
mean(x) Mean of x toupper() Converts alphabets to uppercase
median(x) Median of x ls() Clears the console screen
var(x) Variance of x
sd(x) Standard deviation of x
quantile(x) Divides the dataset into quartiles
USER DEFINED FUNCTIONS IN
R
Users can write their own functions for any specific purpose.

function_name <- function (argument list) {


body of function
}
VECTORS IN R
A vector is a sequence of elements.
All the elements in the vector can be of same or different data type.
Atomic vector is one that has all elements of same data type.
A vector with elements of different data type is called a list.
The data type of the elements can be logical, integer, double, character or complex.

Vector Creation Description


c() function Combines the elements and returns a vector in the
form of a one-dimensional array.
‘:’ operator ‘:’ operator assigns numbers within a given range
LISTS IN R
A list contains elements of different data types such as numbers, strings, characters
or vectors within it.
The list is created using an inbuilt function called list().

Operation Example
Creating a list > firstlist<-list("Green", 'Yellow', c(102, 110,
210), c(FALSE, TRUE, FALSE), 128.5,10.6)
Accessing first element > firstlist[1][[1]][1] "Green"
from the list
Accessing third > firstlist[3][[1]][1] 102 110 210
element from the list
Accessing first three > firstlist[1:3][[1]][1] "Green" [[2]][1]
elements from the list "Yellow" [[3]][1] 102 110 210
DATAFRAMES IN R
A dataframe is a two-dimensional array-like object represented in a tabular format.
Each column in the data frame represents a variable and each row represents one set
of values of the variable or record.
The number of rows gives the number of observations in a data frame.
The column names cannot be empty.
Dataframe can be expanded by adding columns and rows.
R also provides options for slicing and dicing the data frame by rows and columns
PYTHON LANGUAGE
Python is a general purpose language developed by Guido van Rossum.
Python is used in data science, web programming, game development, desktop
applications and many other scientific applications.
Python is a scripting language with structured programming style.
Python has a dynamic built in data structure
Python has database connectivity interfaces to MySQL, Oracle, PostgreSQL etc.
Python is an extensible language that can be easily integrated with other
programming languages such as C, C++ etc.
Python has libraries for internet protocols like HTML, XML, JSON etc., which helps
for web application development
PYTHON ENVIRONMENT
Anaconda is the most widely used Python distribution for data science and comes
pre-loaded with all the most popular libraries and tools.
Anaconda Navigator is a desktop GUI that comes along with Anaconda Individual
Edition.
The Jupyter notebook application allows you to create and edit documents that
display the input and output of a Python or R language script.
Jupyter notebook integrates code and its output into a single document that combines
visualizations, narrative text, mathematical equations, and other rich media.
MENUS IN JUPYTER
NOTEBOOK
File menu is used to create a new notebook or open an existing notebook.
Make a copy option helps to create a copy of the file.
Users can save and checkpoint that updates the notebook and check points.
Edit menu helps to cut, copy and paste cells and also has options for delete, split or
merging a cell.
View menu toggles the visibility of header and toolbar. Cells can be viewed as
slideshow using this menu.
Insert menu is used for inserting cells above and below the currently selected cell.
Cell menu allows to run a cell, group of cells or cell above or below the current cell.
Headers can be created in Markdown cells by using # symbol.
MENUS IN JUPYTER
NOTEBOOK
Kernel menu allows to restart the kernel, reconnect it or even shutdown the kernel.
Kernels are process that run interactive code and return output to the user.
Kernels are available for different programming languages.
A user can restart the kernel and clear all outputs so that the commands can be
executed afresh.
Widgets are used to build interactive Graphical User Interface such as slider or tool
bars and dynamic dashboards.
SET WORKING DIRECTORY IN
PYTHON
Python function chdir() changes the current working directory to the given path.
OS library in Python provides functions for interacting with the operating system.
The function chdir() is available in OS library.

import os
os.chdir("D:/datascience/yourfoldername")
COMMENT STATEMENTS IN
PYTHON
Like R programming language, comments statements are written with # symbol at
the beginning.
Comment statements are not executed.

### My First Python Program ###


x=10
y=50
VARIABLES IN PYTHON
Variables store values in designated memory locations.
A variable reserves certain memory space depending on the type of values assigned,
such as integers, decimals or characters.
When values are assigned to the variables, they are created with a particular data
type based on the assigned values.
Variable names must start with an alphabet or underscore character and not with
numbers.
A variable name can contain only contain alpha numeric characters and underscores.
DATA TYPES IN PYTHON
The data types available in Python are
 Numbers - Number data types store numeric values.
 String - Strings in Python are set of characters represented in the single or double
quotation marks.
 List - A list contains items separated by commas and enclosed within square
brackets [ ].
 Tuple - A tuple consists of a number of values separated by commas.
 Dictionary - Dictionary consist of key-value pairs. A dictionary key is variable that
can be any Python type
OPERATORS IN PYTHON
Assignment
Description
Operator
Assigns the result of right side of the expression to the
= operand on the left side
Add right side operand with left side operand and Arithmeti Description
+= assign the result to left operand c
Operator
Subtract right operand from left operand and assign
-= the result to left operand + Addition

Multiply right operand with left operand and assign - Subtraction


*= the result to left operand
* Multiplication
Divide left operand with right operand and assign the
/= result to left operand / Division

Calculates modulus using left and right operands and % Modulus. Gives remainder after
%= assign the result to left operand division
Divide left operand with right operand and assign the ** Exponentiation
//= quotient to left operand
// Floor division. Gives quotient
Calculates exponent value using the operands and after division
**= assign the result to left operand
OPERATORS IN PYTHON
Relationalship Name
Operator
== Equal
!= Not equal
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to

Membership Description Logical Description


Operator Operator
in Returns True if a sequence with the specified value and Returns True if both statements are true
is present in the object
or Returns True if one of the statements is true
not in Returns True if a sequence with the specified value
is not present in the object not Reverse the result, returns False if the result is
true
USER DEFINED FUNCTIONS IN
PYTHON
A function in Python is defined by a def statement.
Indentation is a must in Python.
Function body can contain a return statement which optionally returns back
an expression to the caller and then exits the function.
If there is no return statement, then the function ends when the execution
reaches the end of the function.
def function_name (parameter list):
“function docstring”
statement 1
statement 2

return expression
LAMBDA FUNCTION IN
PYTHON
Small one line functions can be created using lambda keyword.
They contain only a single statement.
Lambda functions take any number of arguments and returns just one value or
expression.

lambda argument(s) : expression


PYTHON LIBRARIES
Python library is a collection of syntax and semantics of python language that can be
included in programs.
Python libraries are bundled with core Python distribution.
The libraries simplify the task of rewriting large chunks of codes as they are stored
as libraries.
Python libraries handle basic functionality like Input/Output operations,
visualization techniques and also complicated database operations.
The libraries enhance the portability of the programs and makes it platform
independent.
Some popular Python libraries are Numpy, Pandas, SciPy, Matplotlib, Keras etc.
NUMPY LIBRARY
Numpy stands for Numerical Python
Numpy is a popular package used for array-processing.
It also has functions for performing mathematical applications such as matrices and
algebra.

import numpy as np
PANDAS LIBRARY
Pandas is a popular open-source Python library for data analysis.
It is built on the top of the NumPy library.
Pandas is used for working with relational or labelled data. Pandas provide various
data structures and operations for manipulating numerical data and time series.
It has many inbuilt methods for manipulating, grouping and extracting data.
Pandas allows to create data frames from Excel, Comma Separated Values (CSV)
and many other formats
DATAFRAMES IN PANDAS
Pandas Dataframe is two-dimensional, heterogeneous tabular data structure with
labeled axes (rows and columns).
DataFrame object has two axes: “axis 0” and “axis 1”. “axis 0” represents rows and
“axis 1” represents columns.
SLICING AND DICING THE
DATA FRAME
Columns from a data frame can be extracted by mentioning the column names with
quotes in square brackets [ ].
When multiple columns are extracted, the column names are to be provided with two
square brackets [[ ]].
Columns can also be selected by providing the column indexes.
.LOC AND .ILOC IN PYTHON
loc and iloc are two unique functions to retrieve rows and columns from data frame.
DataFrame.loc is label-based.
DataFrame.loc[] method takes only index labels and returns row or columns if the
index label exists in the data frame.

Dataframe.iloc[] method is integer index based. The rows and columns have to be
specified by their integer index.
If the index label is not known, iloc method is very useful for data filtering.

You might also like