0% found this document useful (0 votes)
15 views121 pages

Week 1

Uploaded by

poornimak2403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views121 pages

Week 1

Uploaded by

poornimak2403
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 121

Introduction to Python

Popular tools used in data science


 Data pre-processing and analysis
◦ Python, R, Microsoft Excel, SAS, SPSS

 Data exploration and visualization


◦ Tableau, Qlikview, Microsoft Excel

 Parallel and distributed computing incase of big data


◦ Apache Spark,Apache Hadoop

Python for Data Science 2


Evolution of Python
 Python was developed by Guido van Rossum in the late
eighties at the ‘National Research Institute for Mathematics
and Computer Science’ at Netherlands
 Python Editions
◦ Python 1.0
◦ Python 2.0
◦ Python 3.0

Python for Data Science 3


Python as a programming language
 Supports multiple programming paradigm
◦ Functional, Structural, OOPs, etc.
 Dynamic typing
◦ Runtime type safety checks
 Reference counts
◦ Deallocates objects which are not used for long
 Late binding
◦ Methods are looked up by name during runtime
 Python’s design is guided by 20 aphorisms as described in Zen of
Python by Tim Peters
Python for Data Science 4
Python as a programming language
 Standard CPython interpreter is managed by “Python Software
Foundation”

 There are other interpreters namely JPython (Java), Iron Python


(C#), Stackless Python (C, used for parallelism), PyPy (Python
itself JIT compilation)

 Standard libraries are written in python itself

 High standards of readability

Python for Data Science 5


Python as a programming language
 Cross-platform (Windows, Linux, Mac)

 Highly supported by a large community group

 Better error handle

Python for Data Science 6


Python as a programming language
 Comparison to Java
 Python vs Java
◦ Java is statically typed i.e. type safety is checked during compilation
(static compilation)
◦ Thus in Java the time required to develop the code is more
◦ Python which is dynamically typed compensates for huge
compilation time when compared to Java
◦ Codes which are dynamically typed tend to be less verbose
therefore offering more readability

Python for Data Science 7


Advantages of using python
 Python has several features that make it well suited for data
science
 Open source and community development
◦ Developed under Open Source Initiative license making it free to use
and distribute even commercially
 Syntax used is simple to understand and code
 Libraries designed for specific data science tasks
 Combines well with majority of the cloud platform service
providers
Python for Data Science 8
Coding environment
 A software program can be written using a terminal, a
command prompt (cmd), a text editor or through an Integrated
Development Environment (IDE)

 The program needs to be saved in a file with an appropriate


extension (.py for python, .mat for matlab, etc...) and can be
executed in corresponding environment (Python, Matlab, etc…)

 Integrated Development Environment (IDE) is a software


product solely developed to support software development in
various or specific programming language(s)
Python for Data Science 9
Coding environment
 Python 2.x support will be available till 2020
 Python 3.x is an enhanced version of 2.x and will only be maintained
from 3.6.x post 2020
 Install basic python version or use the online python console as in
https://www.python.org/
 Execute following commands and view the outputs in terminal or
command prompt
• Basic print statement
• Naming conventions for variables and functions, operators
• Conditional operations, looping statements (nested)
• Function declaration and calling
• Installing modules
Python for Data Science 10
https://www.python.org/

Python for Data Science 11


https://www.python.org/

Python for Data Science 12


Integrated development environment (IDE)
 Software application consisting of a cohesive unit of tools
required for development

 Designed to simplify software development

 Utilities provided by IDEs include tools for managing, compiling,


deploying and debugging software

Python for Data Science 13


Coding environment- IDE
 An IDE usually comprises of
◦ Source code editor
◦ Compiler
◦ Debugger
◦ Additional features include syntax and error highlighting,
code completion
 Offers supports in building and executing the program along
with debugging the code from within the environment

Python for Data Science 14


Coding environment- IDE
 Best IDEs provide version control features
 Eclipse+PyDev, SublimeText, Atom, GNU Emacs,Vi/Vim,Visual
Studio,Visual Studio Code are general IDEs with python
support
 Apart from these some of the python specific editors include
Pycharm, Jupyter, Spyder, Thonny

Python for Data Science 15


Spyder
 Supported across Linux, Mac OS X and Windows platforms
 Available as open source version

 Can be installed separately or through Anaconda distribution

 Developed for Python and specifically data science

 Features include
◦ Code editor with robust syntax and error highlighting
◦ Code completion and navigation
◦ Debugger
◦ Integrated document
 Interface similar to MATLAB and RStudio
Python for Data Science 16
Spyder

Python for Data Science 17


PyCharm
 Supported across Linux, Mac OS X and Windows platforms
 Available as community (free open source) and professional (paid) version
 Supports only Python
 Can be installed separately or through Anaconda distribution

 Features include
◦ Code editor provides syntax and error highlighting
◦ Code completion and navigation
◦ Unit testing
◦ Debugger
◦ Version control

Python for Data Science 18


PyCharm

Python for Data Science 19


Jupyter Notebook
 Web application that allows creation and manipulation of
documents called ‘notebook’
 Supported across Linux, Mac OS X and Windows platforms

 Available as open source version

Python for Data Science 20


Jupyter Notebook

Source-https://jupyter.org/

Python for Data Science 21


Jupyter Notebook
 Bundled with Anaconda
distribution or can be installed
separately
 Supports Julia, Python, R and
Scala
 Consists of ordered collection of
input and output cells that contain
code, text, plots etc.
Source-https://jupyter.org/

Python for Data Science 22


Jupyter Notebook
 Allows sharing of code and
narrative text through output
formats like PDF, HTML etc.
◦ Education and presentation
tool
 Lacksmost of the features of
a good IDE

Source-https://jupyter.org/

Python for Data Science 23


How to choose the best IDE?
 Requirements
 Working with different IDEs helps us understand our own
requirement

Python for Data Science 24


THANK YOU
Introduction to Spyder
In this lecture
 How does Spyder look?

 How to set the working


directory?

 How to create a Python


file and save it?

Python for Data Science 2


Appearance of Spyder

Python for Data Science 3


Appearance of Spyder

Python version 3.6

Python for Data Science 4


Appearance of Spyder

Files/ Variables/
Help

Scripts

Console

Python for Data Science 5


Setting working directory

Python for Data Science 6


Setting working directory
 Thereare three ways to set
a working directory
◦ Icon
◦ Using library os
◦ Using command cd

Python for Data Science 7


Setting working directory
Method 1

To choose a
working
directory,
click on the icon

Choose a suitable
location by
clicking on the
indicated icon

Python for Data Science 8


Setting working directory
 Type the following in the console
Method 2

Method 3

cd C:/Users/DELL/Desktop

Python for Data Science 9


Accessing file explorer

Click here to check for files after


setting the working directory
Python for Data Science 10
File creation

Python for Data Science 11


Creating a script file
 There are two ways of creating a script file
 By clicking the icon “ ” below the menubar

Method 1

Python for Data Science 12


Creating a script file
 By clicking the “File” menu in the menubar and select “New File”
Method 2

Python for Data Science 13


Variable

Python for Data Science 14


Variable
 An identifier containing a
known information
 Information is referred to as
value
 Variable name points to a
memory address or a
storage location and used
to reference the stored
value

Python for Data Science 15


Creating variables

Python for Data Science 16


Saving script files

Python for Data Science 17


Saving a script file

Python for Data Science 18


Saving a script file for the first time

Python for Data Science 19


Summary
 Interface of Spyder

 Setting the working directory

 Create and save Python script


file

Python for Data Science 20


THANK YOU
Introduction to Spyder
In this lecture
 How to execute a Python file?

 How to execute pieces of code - Run?

 How to add comments?

 How to reset and clear console

Python for Data Science 2


File execution

Python for Data Science 3


Executing script files

To run chosen line, select the line and


1. Press ‘Run selection’ from icon bar
To run full code:- 2. Press Ctrl+Enter or F9
1. Press ‘Run file’ from icon bar
2. F5 to run full code

Python for Data Science 4


Executing script files using Run file/F5

RESULT

Python for Data Science 5


Executing script files using Run selection/F9
Step 1: Assign a new value of 14 to ‘a’ in the script and press F9

Console output

Python for Data Science 6


Executing script files using Run selection/F9
Step 2: Select line 2 and press F9

Console output

Step 3: Select line 3 and press F9

Console output

Python for Data Science 7


Commenting script files

Python for Data Science 8


Commenting lines of codes
 Adding comments will help in
understanding algorithms used
while developing codes

 In practice, commented statements


will be added before the code and
begin with a ‘#’

 Multiple lines can also be


commented
Python for Data Science 9
Commenting multiple lines
 Select lines that have to be
commented and then press
“Ctrl + 1”

 Select “Edit” in menu and select


“Comment/Uncomment”

 Uses - to add description, render


lines of code inert during testing

Python for Data Science 10


Clearing console and environment

Python for Data Science 11


Clearing an overpopulated console
Console Type %clear in console Place cursor on console
and press Ctrl+L

Python for Data Science 12


After clearing an overpopulated console

Python for Data Science 13


Removing/deleting variable(s)

Environment

Python for Data Science 14


Removing/deleting variable(s)
Removing single variable Removing multiple variables

Using del followed by variable name

Python for Data Science 15


Clearing the entire environment at once
 There are two ways to clear the environment
Type %reset in
console and type ‘y’
Method 1 after the prompt

Python for Data Science 16


Clearing the entire environment at once
Method 2

Click the symbol to


remove variables in
environment

Python for Data Science 17


Basic libraries in Python

Python for Data Science 18


Basic libraries in Python
 Basic libraries
◦ NumPy – Numerical Python
◦ Pandas – Dataframe Python
◦ Matplotlib - Visualization
◦ Sklearn – Machine Learning
 Modules within a library. E.g.-

Python for Data Science 19


Help in Python
Type the name of the library in ‘Object’ The following are the sub libraries

Note: You can click the details of the sublibraries by typing libraryname.sublibraryname under object
Eg- numpy.lib in object

Python for Data Science 20


Summary
 Execute Python script file

 Commenting lines of code

 Clearing
console and
environment

 Basic libraries in Python

Python for Data Science 21


THANK YOU
Variables and Data Types
In this lecture
 Naming variables
 Basic data types
◦ Identify data type of an object
◦ Verify if an object is of a certain data type
◦ Coerce object to new data type

Python for Data Science 2


Naming variables
 Values assigned to variables using an
assignment operator ‘=’
 Variable name should be short and
descriptive
◦ Avoid using variable names that clash with
inbuilt functions
 Designed to indicate the intent of its use to
the end user
 Avoid one character variable names
◦ One character variable names are usually
used in looping constructs, functions, etc
Python for Data Science 3
Naming variables
 Variables can be named alphanumerically

 However the first letter must start with an alphabet


(lowercase or uppercase)

Python for Data Science 4


Naming variables
 Other special character
◦ Underscore ( _ )

◦ Use of any other special


character will throw an error

◦ Variable names should not begin


or end with underscore even
though both are allowed

Python for Data Science 5


Naming conventions
 Commonly accepted case types
◦ Camel (lower and upper)

◦ Snake

◦ Pascal

Python for Data Science 6


Assigning values to multiple variables
Code

Values reflected in environment

Python for Data Science 7


Data types

Python for Data Science 8


Basic data types

Basic data
Description Values Representation
types

represents two values of logic and


Boolean True and False bool
associated with conditional statements
Integer positive and negative whole numbers set of all integers, Z int

Complex contains real and imaginary part (a+ib) set of complex numbers complex

Float real numbers floating point numbers float

all strings or characters enclosed


String sequence of characters str
between single or double quotes

Python for Data Science 9


Identifying object data type
 Find data type of object using
 Syntax: type(object)

Checking the data type of an object

Python for Data Science 10


Verifying object data type
 Verifyif an object is of a certain data type
 Syntax: type(object) is datatype

Verifying the data type of an object

Python for Data Science 11


Coercing object to new data type
 Convert the data type of an object to another
 Syntax: datatype(object)

 Changes can be stored in same variable or in different variable

Coercing the data type of an object

Python for Data Science 12


Coercing object to new data type
 Only few coercions are accepted
 Consider the variable ‘Salary_tier’ which is of string data type
 ‘Salary_tier’ contains an integer enclosed between single
quotes

Coercing the data type of an


object

Python for Data Science 13


Coercing object to new data type
 However if the value enclosed within the quotes is a string then
conversions will not be possible

Python for Data Science 14


Summary
 Conventions to name a variable
 Basic data types
◦ Get data type of a variable
◦ Verify if a variable is of a certain data type
◦ Coerce variable to new data type

Python for Data Science 15


THANK YOU
Operators
In this lecture
 Operators and operands
 Different types of operators
◦ Arithmetic
◦ Assignment
◦ Relational or comparison
◦ Logical
◦ Bitwise
 Precedence of operators

Python for Data Science 2


Operators and operands
 Operators are special symbols that
help in carrying out an assignment
operation or arithmetic or logical
computation
 Value that the operator operates on
is called operand

Python for Data Science 3


Arithmetic operators
 Used to perform mathematical operations between two
operands
 Create two variable a and b with values 10 and 5 respectively

Symbol Operation Example

+ Addition

Python for Data Science 4


Arithmetic operators
 Used to perform mathematical operations between two
operands
 Create two variable a and b with values 10 and 5 respectively

Symbol Operation Example

+ Addition

- Subtraction

Python for Data Science 5


Arithmetic operators
Symbol Operation Example

* Multiplication

Python for Data Science 6


Arithmetic operators
Symbol Operation Example

* Multiplication

/ Division

Python for Data Science 7


Arithmetic operators
Symbol Operation Example

* Multiplication

/ Division

% Remainder

Python for Data Science 8


Arithmetic operators
Symbol Operation Example

* Multiplication

/ Division

% Remainder

** Exponent

Python for Data Science 9


Hierarchy of arithmetic operators

Decreasing order of A=7–2x


𝟐𝟕
+𝟒
Operation 𝟑𝟐
precedence
Parentheses ()
Exponent **
Division /
Multiplication *

Addition and subtraction +,-

Python for Data Science 10


Assignment operators
 Used to assign values to variables
Symbol Operation Example

Assign values from right side operands to left side


=
operand

Python for Data Science 11


Assignment operators
 Used to assign values to variables
Symbol Operation Example

Assign values from right side operands to left side


=
operand

Adds right operand to left operand and stores


+=
result on left side operand (a=a+b)

Python for Data Science 12


Assignment operators
 Used to assign values to variables
Symbol Operation Example

Assign values from right side operands to left side


=
operand

Adds right operand to left operand and stores


+=
result on left side operand (a=a+b)

Subtracts right operand from left operand and


-=
stores result on left side operand (a=a-b)

Python for Data Science 13


Assignment operators

Symbol Operation Example

Multiplies right operand from left operand and


*=
stores result on left side operand (a=a*b)

Python for Data Science 14


Assignment operators

Symbol Operation Example

Multiplies right operand from left operand and


*=
stores result on left side operand (a=a*b)

Divides right operand from left operand and stores


/=
result on left side operand (a=a/b)

Python for Data Science 15


Relational or comparison operators
 Tests numerical equalities and inequalities between two operands and returns a
boolean value
 All operators have same precedence
 Create two variables x and y with values 5 and 7 respectively

Symbol Operation Example

< Strictly less than

Python for Data Science 16


Relational or comparison operators
 Tests numerical equalities and inequalities between two operands and returns a
boolean value
 All operators have same precedence
 Create two variables x and y with values 5 and 7 respectively

Symbol Operation Example

< Strictly less than

<= Less than equal to

Python for Data Science 17


Relational or comparison operators

Symbol Operation Example

> Strictly greater than

>= Greater than equal to

Python for Data Science 18


Relational or comparison operators

Symbol Operation Example

> Strictly greater than

>= Greater than equal to

== Equal to equal to

Python for Data Science 19


Relational or comparison operators

Symbol Operation Example

> Strictly greater than

>= Greater than equal to

== Equal to equal to

!= Not equal to

Python for Data Science 20


Logical operators
 Used when operands are conditional statements and returns boolean value
 In python, logical operators are designed to work with scalars or boolean
values
Symbol Operation Example

or Logical OR

Python for Data Science 21


Logical operators
 Used when operands are conditional statements and returns boolean value
 In python, logical operators are designed to work with scalars or boolean
values
Symbol Operation Example

or Logical OR

and Logical AND

Python for Data Science 22


Logical operators
 Used when operands are conditional statements and returns boolean value
 In python, logical operators are designed to work with scalars or boolean
values
Symbol Operation Example

or Logical OR

and Logical AND

not Logical NOT

Python for Data Science 23


Bitwise operators
 Used when operands are integers
 Integers are treated as a string of
binary digits
 Operates bit by bit
 Can also operate on conditional
statements which compare scalar
values or arrays
 Bitwise OR (|), AND(&)

Python for Data Science 24


Bitwise operators
 Create two variables x and y with values 5 and 7 respectively

 Binary code for 5 is 0000 0101 and for 7 is 0000 0111


 0 corresponds to False and 1 corresponds to True
 In bitwise OR ( | ), operator copies a bit to the result if it exists in
either operand
 In bitwise AND (& ), operator copies a bit to the result if it exists
in both operands

Python for Data Science 25


Bitwise OR on integers
Code and output in
console

Binary code for 5 Binary code for 7

0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1

0 present in corresponding positions, therefore resultant cell


is also 0

Python for Data Science 26


Bitwise OR on integers
Binary code for 5 Binary code for 7

0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1

0 0 0 0 0 1

0 present in positions 2-5, therefore resultant cell will


also contain 0
 In the 6th position, 1 is present in both operands and
hence resultant will also contain 1

Python for Data Science 27


Bitwise OR on integers
 The 7th position has 0 in the first operand and 1 in the second
Binary code for 5 Binary code for 7

0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1

 Since this is an OR operator, only the True condition is considered


0 0 0 0 0 1 1

Binary code for 5 Binary code for 7

0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1

0 0 0 0 0 1 1 1
Python for Data Science 28
Bitwise operators
 Bitwise
operators can also operate on conditional
statements
Symbol Operation Example

| Bitwise OR

Python for Data Science 29


Bitwise operators
 Bitwise
operators can also operate on conditional
statements
Symbol Operation Example

| Bitwise OR

& Bitwise AND

Python for Data Science 30


Precedence of operators
Decreasing order
Operation
of precedence
Parentheses ()
Exponent **
Division /
Multiplication *
Addition and
+,-
subtraction
Bitwise AND &

Python for Data Science 31


Precedence of operators
Decreasing order
Operation
of precedence
Bitwise OR |
Relational/
==, !=, >, >=, <,
comparison
<=
operators
Logical NOT not
Logical AND and
Logical OR or

Python for Data Science 32


Summary
 Important operators
◦ Arithmetic
◦ Assignment
◦ Relational
◦ Logical
◦ Bitwise

Python for Data Science 33


THANK YOU
4/19/2020 Operators - Jupyter Notebook

Operators

Unary Operators

+, -, , /, //, * and % are known as operators and these operators can be unary or binary
A unary operator has only one operand
The unary - (minus) operator yields the negation of its numeric argument
The unary + (plus) operator yields its numeric argument unchanged
The unary ~ (invert) operator yields the bit-wise inversion of its plain or long integer argument

Example

The - (minus) operator is used to negate any positive number

In [1]:

-15 # in this case the - (minus) operator is acting as a unary operator

Out[1]:

-15

In [2]:

100 - 40 # The -(minus) operator is acting as a binary operator

Out[2]:

60

Identity operators
Identity operators are used to compare if two objects are same with the same memory location
They are usually used to determine the data type of a variable
The identity operators are 'is' and 'is not'

‘is’ operator - evaluates to true if the variables on either side of the operator
point to the same object and false otherwise

localhost:8888/notebooks/Desktop/Operators.ipynb 1/3
4/19/2020 Operators - Jupyter Notebook

In [3]:

a = 15
if (type(a) is float):
print ("true")
else:
print ("false") # returns false because the data type of 'a' is 'int' not 'float'

false

‘is not’ operator - evaluates to false if the variables on either side of the operator
point to the same object and true otherwise

In [4]:

b = 15.6
if (type(b) is not float):
print ("true")
else:
print ("false") # returns false because the data type of 'b' is 'float' not 'int'

false

Membership Operators

Membership operators are operators used to validate the membership of a value in a sequence
The membership operators are 'in' and 'not in'

'in' operator - checks if a value exists in a sequence or not

It evaluates to true if it finds a variable in the specified sequence and false


otherwise

In [5]:

x = [1,2,3,4,5]

print(4 in x) # returns True because 4 exists in the x

True

‘not in’ operator - evaluates to true if it does not find a variable in the specified
sequence and false otherwise

In [6]:

y = [1,2,3,4,5]

print(8 not in y) # returns True because 8 doesn't exists in the y

True

localhost:8888/notebooks/Desktop/Operators.ipynb 2/3
4/19/2020 Operators - Jupyter Notebook

Bitwise Operators

END OF SCRIPT

localhost:8888/notebooks/Desktop/Operators.ipynb 3/3

You might also like