Chapter 2 Introduction To R and Python
Chapter 2 Introduction To R and Python
AND PYTHON
CHAPTER
2
2. Console 4. File/Plots/Packages
FOUR WINDOWS IN RSTUDIO
First Window
The first window is the source window where the R code is entered.
A new R Script window is opened by clicking on the menu File->New File ->R
Script.
The source code is typed in the script window.
The code has to be executed by using ‘Run’ in the same window or by pressing
“Control + Enter” on the keyboard.
Second Window
The second window is the console window.
The codes typed in script window gets executed in the console window.
In this window, the codes are executed in R prompt symbolized by ‘>’.
FOUR WINDOWS IN RSTUDIO
Third Window
The third window is the environment/history window.
The Environment tab of this window displays the names of the all the
variables or dataframes that are currently defined in the R session.
Fourth Window
The fourth window shows the list of files and packages. T
he file tab provides access to the file directory of the hard drive.
The users can navigate to the folder and set it as working directory by
clicking ‘More’ and ‘Set as Working Directory’.
SET WORKING DIRECTORY IN
R
The working directory is the file path of the computer that sets the default location
for reading and saving files in R.
setwd(“D:\datascience\yourfoldername”)
Working directory can also be set using the menu Session->Set Working Directory -
> Choose Directory.
COMMENT STATEMENTS IN R
Comment statements are typed with # in the beginning.
Any statement starting with # symbol will be ignored and will not executed.
It is always a good practice to write explanations about the code using # statements.
Integer Integers are the data types which assigns numbers without decimal digits.
Integer variables can also be created by assigning alphabet ‘L’ to the number
as.integer() function can be used to assign integers to variables.
Complex Complex data types contain two numeric values with an imaginary number ‘i’
+ Addition
Assignment Description – Subtraction
Operator
* Multiplication
<- Assignment rightwards
/ Division
<<- Assignment rightwards
^ Exponent
= Assignment
Modulus Operator. Outputs remainder value after
-> Assignment leftwards %%
division.
->> Assignment leftwards %/% Integer Division
>= Greater than or equal to Logical AND (Compares the first element in
&&
vector)
== Equal to
| Element-wise logical OR
!= Not equal to
Logical OR (Compares the first element in
BUILT-IN FUNCTIONS IN R
Mathematical Description
abs() Absolute value
General Purpose Description
log(x,base=y) Logarithm of x with base y; if base not
range(x) Minimum and Maximum Values
specified, returns the natural logarithm
sort(x) Sort the data in ascending order
exp(x) Exponential value
order(x) Returns the order of the values in
sqrt(x) Square root of the data
the dataset
factorial(x) Factorial of data (x!)
length() Length of the dataset
sign() Sign of the data
Statistical Description
tolower() Converts alphabets to lowercase
mean(x) Mean of x toupper() Converts alphabets to uppercase
median(x) Median of x ls() Clears the console screen
var(x) Variance of x
sd(x) Standard deviation of x
quantile(x) Divides the dataset into quartiles
USER DEFINED FUNCTIONS IN
R
Users can write their own functions for any specific purpose.
Operation Example
Creating a list > firstlist<-list("Green", 'Yellow', c(102, 110,
210), c(FALSE, TRUE, FALSE), 128.5,10.6)
Accessing first element > firstlist[1][[1]][1] "Green"
from the list
Accessing third > firstlist[3][[1]][1] 102 110 210
element from the list
Accessing first three > firstlist[1:3][[1]][1] "Green" [[2]][1]
elements from the list "Yellow" [[3]][1] 102 110 210
DATAFRAMES IN R
A dataframe is a two-dimensional array-like object represented in a tabular format.
Each column in the data frame represents a variable and each row represents one set
of values of the variable or record.
The number of rows gives the number of observations in a data frame.
The column names cannot be empty.
Dataframe can be expanded by adding columns and rows.
R also provides options for slicing and dicing the data frame by rows and columns
PYTHON LANGUAGE
Python is a general purpose language developed by Guido van Rossum.
Python is used in data science, web programming, game development, desktop
applications and many other scientific applications.
Python is a scripting language with structured programming style.
Python has a dynamic built in data structure
Python has database connectivity interfaces to MySQL, Oracle, PostgreSQL etc.
Python is an extensible language that can be easily integrated with other
programming languages such as C, C++ etc.
Python has libraries for internet protocols like HTML, XML, JSON etc., which helps
for web application development
PYTHON ENVIRONMENT
Anaconda is the most widely used Python distribution for data science and comes
pre-loaded with all the most popular libraries and tools.
Anaconda Navigator is a desktop GUI that comes along with Anaconda Individual
Edition.
The Jupyter notebook application allows you to create and edit documents that
display the input and output of a Python or R language script.
Jupyter notebook integrates code and its output into a single document that combines
visualizations, narrative text, mathematical equations, and other rich media.
MENUS IN JUPYTER
NOTEBOOK
File menu is used to create a new notebook or open an existing notebook.
Make a copy option helps to create a copy of the file.
Users can save and checkpoint that updates the notebook and check points.
Edit menu helps to cut, copy and paste cells and also has options for delete, split or
merging a cell.
View menu toggles the visibility of header and toolbar. Cells can be viewed as
slideshow using this menu.
Insert menu is used for inserting cells above and below the currently selected cell.
Cell menu allows to run a cell, group of cells or cell above or below the current cell.
Headers can be created in Markdown cells by using # symbol.
MENUS IN JUPYTER
NOTEBOOK
Kernel menu allows to restart the kernel, reconnect it or even shutdown the kernel.
Kernels are process that run interactive code and return output to the user.
Kernels are available for different programming languages.
A user can restart the kernel and clear all outputs so that the commands can be
executed afresh.
Widgets are used to build interactive Graphical User Interface such as slider or tool
bars and dynamic dashboards.
SET WORKING DIRECTORY IN
PYTHON
Python function chdir() changes the current working directory to the given path.
OS library in Python provides functions for interacting with the operating system.
The function chdir() is available in OS library.
import os
os.chdir("D:/datascience/yourfoldername")
COMMENT STATEMENTS IN
PYTHON
Like R programming language, comments statements are written with # symbol at
the beginning.
Comment statements are not executed.
Calculates modulus using left and right operands and % Modulus. Gives remainder after
%= assign the result to left operand division
Divide left operand with right operand and assign the ** Exponentiation
//= quotient to left operand
// Floor division. Gives quotient
Calculates exponent value using the operands and after division
**= assign the result to left operand
OPERATORS IN PYTHON
Relationalship Name
Operator
== Equal
!= Not equal
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
import numpy as np
PANDAS LIBRARY
Pandas is a popular open-source Python library for data analysis.
It is built on the top of the NumPy library.
Pandas is used for working with relational or labelled data. Pandas provide various
data structures and operations for manipulating numerical data and time series.
It has many inbuilt methods for manipulating, grouping and extracting data.
Pandas allows to create data frames from Excel, Comma Separated Values (CSV)
and many other formats
DATAFRAMES IN PANDAS
Pandas Dataframe is two-dimensional, heterogeneous tabular data structure with
labeled axes (rows and columns).
DataFrame object has two axes: “axis 0” and “axis 1”. “axis 0” represents rows and
“axis 1” represents columns.
SLICING AND DICING THE
DATA FRAME
Columns from a data frame can be extracted by mentioning the column names with
quotes in square brackets [ ].
When multiple columns are extracted, the column names are to be provided with two
square brackets [[ ]].
Columns can also be selected by providing the column indexes.
.LOC AND .ILOC IN PYTHON
loc and iloc are two unique functions to retrieve rows and columns from data frame.
DataFrame.loc is label-based.
DataFrame.loc[] method takes only index labels and returns row or columns if the
index label exists in the data frame.
Dataframe.iloc[] method is integer index based. The rows and columns have to be
specified by their integer index.
If the index label is not known, iloc method is very useful for data filtering.