0% found this document useful (0 votes)
4 views

2. Python for Data Science

The document covers the fundamentals of Python for data science, including its versatility, variable usage, expressions, string operations, and data structures like lists and tuples. It highlights Python's high-level programming features, ease of learning, and applications in data analytics, web scraping, and machine learning. Additionally, it provides guidance on installation, using Jupyter Notebook and Google Colab, and introduces Python operators and string operations.

Uploaded by

qwerty123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2. Python for Data Science

The document covers the fundamentals of Python for data science, including its versatility, variable usage, expressions, string operations, and data structures like lists and tuples. It highlights Python's high-level programming features, ease of learning, and applications in data analytics, web scraping, and machine learning. Additionally, it provides guidance on installation, using Jupyter Notebook and Google Colab, and introduces Python operators and string operations.

Uploaded by

qwerty123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 115

PYTHON FOR DATA

SCIENCE
Learning Objectives
UNDERSTANDING THE FUNDAMENTALS OF PYTHON AND ITS VERSATILITY.

UNDERSTANDING VARIABLES, THEIR USAGE, AND IMPORTANCE WHILE CODING.

DIFFERENT TYPES OF EXPRESSIONS AND ITS USE IN ARITHMETIC OPERATIONS.

VARIOUS METHODS OF STRING OPERATIONS.

PROPERTIES OF LIST AND TUPLE. VARIOUS OPERATIONS PERFORMED ON BOTH LIST AND TUPLE.

CONDITIONED-BASED USE OF LOOPS LIKE “IF”, “FOR” AND “WHILE”.

FILE HANDLING AND ITS MANIPULATION.


Introduction – Python Basics Why Python
why python?
 It’s more productive.
 It has an expansive, supportive community of users (over 1,37,000 libraries)
 It has high corporate demand.
 High level (human friendly rather than computer friendly).
 General purpose (solves any problem).
 Open source and free.
 Easy to learn.
 Cross platform compatibility and integration capabilities.
 Rapid prototyping and development.
Introduction Programming Language:
A programming language is a way
for programmers (developers) to
communicate with computers.

High-Level Language: Low-Level Language:


A high-level language is any A low-level language is a programming
programming language that allows language that works with the hardware
program development in a much more elements and limitations of a computer. It
user-friendly programming context and works to administer a computer’s
is generally independent of the operational definition and has either a
computer’s hardware architecture. low level of abstraction in relation to a
Eg: Python, java script, VBA, php, C#, computer or no level of abstraction at all.
java Eg: C, C++, Assembly, and Fortran.
How Python is a high-level Programming Language
Example of high level programming language: The below code emails sales report from system.

#Example of a Script
import yagmail

user = yagmail.SMTP(user=‘[email protected]',password='zpsgowuwqlansmfv’)

user.send(to ='[email protected]', subject ='Sales_Reports',contents ='This is test mail for python automation',
attachments = 'Sales.xlsx')
Chat GPT Prompt
Explain how the
From above piece of code its easy to guess in Line no 1 yagmail laibrary is imported code <Paste code
here> works in
Python..
In Line 2 the from mail id ‘[email protected]’, and password is set as 'zpsgowuwqlansmfv’

Line 3 send the email to '[email protected]’, with subject Sales report and with body of email and attachment of
file named Sales.xlsx
Introduction – Use case scenarios
 Data analytics
 Office automation
 Web scrapping
 Machine learning
 Share market (data analytics)
 Application development
 Web applications (Django and Flask)

For further reading refer


https://brochure.getpython.info/media/releases/python-brochure-current
Introduction – Use case scenarios
How web scraping works! - Refer RPA Chapter for more Examples
Introduction – Use case scenarios
Methodology for DA and ML
Introduction – Use case scenarios
Some Ways to run python code

Jupyter Notebook Deloyment Mode

.py File – Idle mode

CoLab – Google (https://colab.google/)

For further reading the students may refer


- IDLE - https://docs.python.org/3/library/idle.html
- Jupyter Notebook - https://jupyter.org/try
- Google Co Lab https://colab.google/
How to Install Python

Go to https://www.python.org/downloads/
Click download the relevant files based on the OS and version
12. Pip in Python
 pip is the package installer for Python, which stands for "Pip Installs Packages"
(originally "Pip Installs Python"). It's a tool used to install and manage Python
packages (libraries or modules) that are not part of the Python standard library.
 Eg
 pip install package_name
 Pip install Django
 Pip install pandas

Note : Python Libraries are discussed in Module 2


13. Jupyter Notebook Deloyment Mode

 pip install jupyterlab


 pip install notebook
The common convention is to alias NumPy as `np`.
14. PYTHON IDLE
IDLE is an Integrated DeveLopment Environ-ment for Python, typically used on Windows.
Multi-window text editor with syntax highlighting, auto-completion, smart indent and
other.
Python shell with syntax highlighting.
Integrated
The common convention isdebugger with stepping, persis-tent breakpoints,and call stack visibility
to alias NumPy as `np`.

- IDLE - https://docs.python.org/3/library/idle.html
15. COLAB - GOOGLE
 Refer this url to use notebook by Colab in Google
 https://colab.google/

The common convention is to alias NumPy as `np`.


Introduction - Python Comments and Variables
Python Comments: Comments can be used to explain Python code. Comments can be used to
make the code more readable. Comments can be used to prevent execution when testing
code.
Example: #This is a comment
print("Hello,
The common ICAI!")
convention is to alias NumPy as `np`.

Variables: Variables are containers for storing data values. Python has no command for
declaring a variable. A variable is created the moment you first assign a value to it.
Example: x = 5
y = "John"
print(x)
print(y)
Refer more for Variables naming in -
https://www.w3schools.com/python/python_variables_names.asp
Introduction - Python Comments and Variables
Exercise:
 Comments – (INT18)
 Variables – (INT19)

The common convention is to alias NumPy as `np`.


Introduction - Python Indendation
 Indentation is a very important concept of Python because without properly
indenting the Python code, you will end up seeing Indentation Error and the code
will not get compiled.
 To indicate a block of code in Python, you must indent each line of the block by the
same whitespace.
The common convention is to alias NumPy as `np`.
Introduction - Python Operators
Python Operators: Operators are used to perform operations on variables and values.
Python divides the operators in the following groups:
 Arithmetic operators
 Assignment operators
 Comparison operators
The common convention is to alias NumPy as `np`.

 Logical operators
Introduction - Python Operators
Python Operators: Operators are used to perform operations on variables and values.
Python divides the operators in the following groups:
 Arithmetic operators
 Assignment operators
 Comparison operators
The common convention is to alias NumPy as `np`.

 Logical operators
 Identity operators
 Membership operators
 Bitwise operators
Introduction - Python Operators
Arithmetic Operators: Arithmetic operators are used with numeric values to perform
common mathematical operations.
Operator Name Example
+ is to alias NumPy as `np`.
The common convention Addition x+y
- Subtraction x-y
* Multiplication x*y
/ Division x/y
% Modulus x%y
** Exponentiation x ** y
// Floor division x // y
Introduction - Python Operators
Assignment Operators: Assignment operators are used to assign values to variables:

Operator Example Same As


= x=5 x=5
+= x += 3 x=x+3
The common convention is to alias NumPy as `np`.
-= x -= 3 x=x-3
*= x *= 3 x=x*3
/= x /= 3 x=x/3
%= x %= 3 x=x%3
//= x //= 3 x = x // 3
**= x **= 3 x = x ** 3
&= x &= 3 x=x&3
|= x |= 3 x=x|3
^= x ^= 3 x=x^3
>>= x >>= 3 x = x >> 3
<<= x <<= 3 x = x << 3
Introduction - Python Operators
Comparison Operators: Comparison operators are used to compare two values:
Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x>y
< Less than x<y
Logical Operators: Logical operators are >=
used to combine conditional statements:Greater than or equal to x >= y
<= Less than or equal to x <= y

Logical Operators: Logical operators are used to combine conditional statements:


Operator Description Example
and Returns True if both x < 5 and x < 10
statements are true
or Returns True if one of the x < 5 or x < 4
statements is true
not Reverse the result, not(x < 5 and x < 10)
returns False if the result
is true
Introduction - Python Operators
Identity Operators: Identity operators are used to compare the objects, not if they
are equal, but if they are actually the same object, with the same memory location:
Operator Description Example
is Returns True if both variables are the same object x is y
is not Returns True if both variables are not the same x is not y
object statements:
Logical Operators: Logical operators are used to combine conditional

Membership Operators: Membership operators are used to test if a sequence is


presented in an object:
Operator Description Example
in Returns True if a sequence with the x in y
specified value is present in the object
not in Returns True if a sequence with the x not in y
specified value is not present in the
object
Introduction - Python Operators - Precedence
Operator Precedence: Operator precedence describes the order in which operations are performed. Parentheses has the
highest precedence, meaning that expressions inside parentheses must be evaluated first, Example: print((6 + 3) - (6 + 3)).
If two operators have the same precedence, the expression is evaluated from left to right.
The precedence order is described in the table below, starting with the highest precedence at the top:
Operator Description
() Parentheses
** Exponentiation
+x -x
The common convention ~x NumPy as `np`.
is to alias Unary plus, unary minus, and bitwise NOT
* / // % Multiplication, division, floor division, and modulus
+ - Addition and subtraction
<< >> Bitwise left and right shifts
& Bitwise AND
^ Bitwise XOR
| Bitwise OR
== != > >= < <= is is Comparisons, identity, and membership operators
not in not in
not Logical NOT
and AND
or OR
Introduction - Python Operators
Operator Precedence: Operator precedence describes the order in which operations are performed. Parentheses has the
highest precedence, meaning that expressions inside parentheses must be evaluated first, Example: print((6 + 3) - (6 + 3)).
If two operators have the same precedence, the expression is evaluated from left to right.
The precedence order is described in the table below, starting with the highest precedence at the top:
Operator Description
() Parentheses
** Exponentiation
+x -x
The common convention ~x NumPy as `np`.
is to alias Unary plus, unary minus, and bitwise NOT
* / // % Multiplication, division, floor division, and modulus
+ - Addition and subtraction
<< >> Bitwise left and right shifts
& Bitwise AND
^ Bitwise XOR
| Bitwise OR
== != > >= < <= is is Comparisons, identity, and membership operators
not in not in
not Logical NOT
and AND
or OR
Introduction - Python Operators

Exercise
 Arithmetic operators - (INT20)
 Comparison operators - (INT20)
 Logical operators - (INT20)
 Membership operators - (INT20)
The common convention is to alias NumPy as `np`.
Introduction – String operations
String: Python uses string operations to work with strings. Strings in python are surrounded by
either single quotation marks, or double quotation marks.
Function Name Description
Capitalize () Converts the first character of the string to a capital (uppercase) letter
Count () Returns the number of occurrences of a substring in the string.
Index () Returns the position of the first occurrence of a substring in a string
Isalnum () Checks whether all the characters in a given string are alphanumeric or not
The common conventionIsalpha
is to alias()
NumPy as `np`. Returns "True" if all characters in the string are alphabets
Isdecimal () Returns true if all characters in a string are decimal
Isdigit () Returns "True" if all characters in the string are digits
is lower () Check if all characters in the string are lowercase
isnumeric () Returns "True" if all characters in the string are numeric characters
is join () Returns a concatenated String
is lower () Converts all uppercase characters in a string into lowercase
Replace () Replace all occurrences of a substring with another substring
Startswith () Returns "True" if a string starts with the given prefix
Strip () Returns the string with both leading and trailing characters
Swapcase () Converts all uppercase characters to lowercase and vice versa
title () Convert string to title case
Upper () Converts all lowercase characters in a string into uppercase
Introduction – String operations
Escape character: To insert characters that are illegal in a string, use an escape character. An escape
character is a backslash \ followed by the character you want to insert.

Code Result
\' Single Quote
\\ Backslash
\nas `np`.
The common convention is to alias NumPy New Line
\r Carriage Return
\t Tab
\b Backspace
\f Form Feed
\ooo Octal value
\xhh Hex value
Introduction – String operations
Exercise:
 Print a string – (INT01)
 Print a string with a variable – (INT02)
 Multiline Strings – (INT03)
 String position – (INT04)
The String
common slicing
convention – (INT05)
is to alias NumPy as `np`.

 Modify string – (INT06)


 Concatenate string – (INT07)
 Escape character – (INT07A)
Introduction - Numeric
Numeric: The numeric data type in Python represents the data that has a numeric
value. A numeric value can be an integer, a floating number, or even a complex
number. These values are defined as Python int, Python float, and Python complex
classes in Python.
Integers – This value is represented by int class. It contains positive or negative whole
Thenumbers
common convention(without
is to alias NumPy asfractions
`np`. or decimals). In Python, there is no limit to how long an
integer value can be. Example: X = 1
Float – This value is represented by the float class. It is a real number with a floating-
point representation. It is specified by a decimal point. Optionally, the character e or
E followed by a positive or negative integer may be appended to specify scientific
notation. Example: Y = 2.8
Complex Numbers – A complex number is represented by a complex class. It is
specified as (real part) + (imaginary part)j. For example – 2+3j
Introduction - Numeric
Exercise
 Numeric examples – (INT08)

The common convention is to alias NumPy as `np`.


Introduction – Data Strutcture

The common convention is to alias NumPy as `np`.


Introduction - List
List: In Python, the sequence of various data types is stored in a list. A list is a built-in data
structure in Python, which is the collection of different kinds of values or items. Every
item in the list, also known as the Index, has an address allocated to it. The index value
begins at 0 and continues until the final component, which is the positive index. You can
also retrieve elements from last to first using negative indexing, which starts at -1.
Integers – This value is represented by int class. It contains positive or negative whole
The common convention is to alias NumPy as `np`.

numbers (without fractions or decimals). In Python, there is no limit to how long an


integer value can be.
Lists are created using square brackets, and then the elements are added accordingly. If
none of the elements are entered in the square brackets, then an empty list would be
created.
The most common associated method of lists are list(), append(), insert(), extend(), sort(),
reverse(), remove(), pop(), clear() etc.
Example: Var = [10, 20, 14]
Introduction - List
Exercise:
 List – (INT09)
 Modification of list – (INT10)
 Length of list - (INT11)
 Sorting of list - (INT11)
 Reverse of list - (INT11)
The common convention is to alias NumPy as `np`.
Introduction - Tuples
Tuples: A tuple is a built-in data structure in Python similar to lists but also has some significant
differences. The difference between the two is that once the elements are assigned to the tuple can’t
be changed. Tuples are used to store an ordered series of elements and are defined by parentheses
(). The most common associated method of tuples are tuple(), count(), index() etc. Tuples allow
duplicate values:
Examples of Tuple: thistuple
The common convention is to alias NumPy as `np`.
= ("apple", "banana", "cherry", "apple", "cherry")

Python Lists Python Tuples


Lists are mutable. Tuples are immutable.
Since lists are mutable, they have Due to their immutability, tuples have
several built-in methods. fewer built-in functions than other data
types.
Iterations are time-consuming Iterations are comparatively Faster
Python lists are created using square Python tuples are created using
brackets []' parentheses ()'
Introduction - Tuples
Exercise
 Tuple – (INT12)
 Tuple iterations – (INT13)
 Tuple functions concatenate and repeating – (INT14)
 Tuple coordinates – (INT15)
The common convention is to alias NumPy as `np`.
Introduction – Dictionary
Dictionary: A dictionary in Python is a data structure that stores the value in value:key pairs. The most common
associated method of dictionaries are get(), keys(), values(), items(), update(), pop(), clear() etc.
Example: Dict = {1: ‘ICAI', 2: 'For', 3: ‘CA’}
As you can see from the example, data is stored in key: value pairs in dictionaries, which makes it easier to find
values.
Dictionary vs Json

Json Dictionary
The common convention is to alias NumPy as `np`.
JSON keys can only be strings. The dictionary’s keys can be any hashable object.
The keys in JSON are ordered sequentially and can be The keys in the dictionary cannot be repeated and
repeated. must be distinct.
The keys in JSON have a default value of undefined. There is no default value in dictionaries.
The subscript operator is used to access the values
The values in a JSON file are accessed by using the “.” (dot) in the dictionary. For example, if ‘dict’ =
or “[]” operator. ‘A’:’123R’,’B’:’678S’, we can retrieve data related by
simply calling dict[‘A’].
For string objects, we can use either a single or
For the string object, we must use double quotation marks.
double quotation.
The ‘dict’ object type is the return object type in a
In JSON, the return object type is a’string’ object type.
dictionary.
Introduction - Python Functions
Functions: A function is a block of code which only runs when it is called.
You can pass data, known as parameters, into a function. A function can
return data as a result. In Python a function is defined using the def
keyword: def my_function():
The common convention is to alias NumPy as `np`.
In Python a function is defined using the def keyword:
A function is a block of code which only runs when it is called.
You can pass data, known as parameters, into a function.
A function can return data as a result.
Arguments in a Function
Introduction - Python Functions
Exercise:
 Function Basic – (INT21)
 Function with Operators - (INT22)
 Functions with default parameters - (INT23)
 Functions with Variable Number of Arguments - (INT24)
The common convention is to alias NumPy as `np`.
Regex
Regex: A RegEx, or Regular Expression, is a sequence of characters that
forms a search pattern. RegEx can be used to check if a string contains
the specified search pattern.
RegEx Functions:
Function
The common convention is to alias NumPy as `np`. Description
findall Returns a list containing all
matches
search Returns a Match object if there is
split a match anywhere in the string
Returns a list where the string has
been split at each match
sub Replaces one or many matches
with a string
Regex
Meta Characters:
Character Description Example
[] A set of characters "[a-m]"
\ Signals a special sequence (can also be used to escape "\d"
special characters)
. Any character (except newline character) "he..o"
The common
^ convention is to alias NumPywith
Starts as `np`. "^hello"
$ Ends with "planet$"
* Zero or more occurrences "he.*o"
+ One or more occurrences "he.+o"
? Zero or one occurrences "he.?o"
{} Exactly the specified number of occurrences "he.{2}o"
| Either or "falls|stays"
() Capture and group
Regex
Special Sequences:
Character Description Example
\A Returns a match if the specified characters are at the beginning "\AThe"
of the string
\b Returns a match where the specified characters are at the r"\bain"
beginning or at the end of a word
(the "r" in the beginning is making sure that the string is being r"ain\b"
treated as a "raw string")
The common
\B convention is to alias NumPy as `np`.
Returns a match where the specified characters are present, r"\Bain"
but NOT at the beginning (or at the end) of a word
(the "r" in the beginning is making sure that the string is being r"ain\B"
treated as a "raw string")
\d Returns a match where the string contains digits (numbers from "\d"
0-9)
\D Returns a match where the string DOES NOT contain digits "\D"
\s Returns a match where the string contains a white space "\s"
character
\S Returns a match where the string DOES NOT contain a white "\S"
space character
\w Returns a match where the string contains any word characters "\w"
(characters from a to Z, digits from 0-9, and the underscore _
character)
\W Returns a match where the string DOES NOT contain any word "\W"
characters
\Z Returns a match if the specified characters are at the end of "Spain\Z"
Regex
Sets:
Set Description
[arn] Returns a match where one of the specified characters (a, r, or n) is
present
[a-n] Returns a match for any lower case character, alphabetically
between a and n
The common convention is to alias NumPy as `np`.
[^arn] Returns a match for any character EXCEPT a, r, and n

[0123] Returns a match where any of the specified digits (0, 1, 2, or 3) are
present
[0-9] Returns a match for any digit between 0 and 9

[0-5][0-9] Returns a match for any two-digit numbers from 00 and 59

[a-zA-Z] Returns a match for any character alphabetically between a and z,


lower case OR upper case
[+] In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means:
return a match for any + character in the string
Regex
Exercises:
 Regex Simple Match (REG01) – Search
 Regex extract Email - (REG02)
 Regex Split REG03
 Regex Date Split REG04
The common convention is to alias NumPy as `np`.
Date and Time in Python
In Python, date and time are not data types of their own, but a module named
DateTime in Python can be imported to work with the date as well as time.

Python Datetime module comes built into Python, so there is no need to install it
externally.
Examples
The common convention is to alias NumPy as `np`.

Date – 01
Date -02
Date - 03
For further Reading :
https://www.w3schools.com/python/python_datetime.asp
Regex – Ai
Use any Ai tool like https://chat.openai.com
Prompt
Explain the below attached regex patten
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]
+\.[A-Z|a-z]{2,}\b'
The common convention is to alias NumPy as `np`.
Python If and Else
Python supports the usual logical conditions from mathematics:

 Equals: a == b
 Not Equals: a != b
 Less than: a < b
The Less
common than
convention orNumPy
is to alias equal
as `np`.to: a <= b

 Greater than: a > b


 Greater than or equal to: a >= b

These conditions can be used in several ways, most commonly in "if statements"
and loops. Python relies on indentation (whitespace at the beginning of a line) to
define scope in the code.
Python If and Else
 An "if statement" is written by using the if keyword.
Example: a = 55
b = 400
if b > a:
print("b is greater than a")
The The
common elif iskeyword
convention is Python's way of saying "if the previous conditions were not
to alias NumPy as `np`.

true, then try this condition".


Example: a = 55
b = 55
if b > a:
print("b is greater than a")
elif a == b:
print("a and b are equal")
Python If and Else
 The else keyword catches anything which isn't caught by the preceding
conditions.
Example: a = 400
b = 55
if b > a:
print("b
The common isis togreater
convention alias NumPy as than
`np`. a")
elif a == b:
print("a and b are equal")
else:
print("a is greater than b")
Python If and Else
 If you have only one statement to execute, you can put it on the same line as the if
statement. (Short hand if)
Example: if a > b: print("a is greater than b")
 If you have only one statement to execute, one for if, and one for else, you can put it all
on the same line. (Short Hand If ... Else)
Example: a = 4
The common convention is to alias NumPy as `np`.
b = 550
print("A") if a > b else print("B")
 And - The and keyword is a logical operator, and is used to combine conditional
statements:
Example: a = 200
b = 33
c = 500
if a > b and c > a:
print("Both conditions are True")
Python If and Else
 OR - The or keyword is a logical operator, and is used to combine conditional statements:
Example: a = 200
b = 33
c = 500
if a > b or a > c:
print("At least one of the conditions is True")
 Not - The not keyword is a logical operator, and is used to reverse the result of the
The common convention is to alias NumPy as `np`.

conditional statement:
Example: a = 33
b = 200
if not a > b:
print("a is NOT greater than b")
Python If and Else
 Nested If - You can have if statements inside if statements, this is called nested if
statements.
Example: x = 41
if x > 10:
print("Above ten,")
if x > 20:
The common convention is to alias NumPy as `np`.
print("and also above 20!")
else:
print("but not above 20.")
Python If and Else
Exercises:
 Basic if statement (IFEL01)
 Indentation in if (code to explain error) (IFEL02)
 Elif (IFEL03)
 Else (IFEL04)
 IF And (IFEL05)
The common convention is to alias NumPy as `np`.
 IF Or (IFEL06)
 IF Not (IFEL07)
 Nested IF (IFEL08)
Python Loop – For and While
Loop: Python has two primitive loop commands:
while loops
for loops
The while loop: With the while loop we can execute a set of statements as long as a
condition is true.
Example: i = 1
The common convention is to alias NumPy as `np`.
while i < 6:
print(i)
i += 1
The For loop: A for loop is used for iterating over a sequence (that is either a list, a tuple, a
dictionary, a set, or a string).
Example: fruits = ["apple", "banana", "cherry"]
for x in fruits:
print(x)
Python Loop – For and While

The common convention is to alias NumPy as `np`.


Python Loop – For and While

Exercises:
 While loop (FW01)
 For loop (FW02)
The common convention is to alias NumPy as `np`.
File operations

“r” – Read
“w” – Write
“a” – Append
“x” – Create
The common convention is to alias NumPy as `np`.

Exercises:
“ r“ – FAR 01
“w” - FAR 02
“a” – FAR 03
“x” – FAR 04
DATA ANALYSIS FOR
PYTHON
Learning Objectives
TO UNDERSTAND THE IMPORTANCE OF PYTHON LIBRARIES IN DATA ANALYSIS.

LEARN HOW TO IMPORT AND UTILIZE EXTERNAL LIBRARIES IN PYTHON.

MASTER NUMPY'S ROLE IN NUMERICAL COMPUTING AND ARRAY MANIPULATION.

TO UNDERSTAND PANDAS' IMPORTANCE FOR STRUCTURED DATA MANIPULATION AND ANALYSIS.

TO UNDERSTAND THE IMPORTANCE OF DATA PREPROCESSING IN PREPARING DATA.

RECOGNIZE EDA'S ROLE IN DATA UNDERSTANDING AND VISUALIZATION.


Introduction – Libraries
 A python library is a collection of related modules.
 It contains bundles of code that can be used repeatedly in different programs.
 It makes python programming simpler and convenient for the programmer.
As we don’t need to write the same code again and again for different
programs.
 Python libraries play a very vital role in fields of machine learning, data science,
data visualization, etc.
Introduction – Important Libraries/Packages
 Pandas - Data Analysis
 Numpy – Data Analysis
 Matplotlib - Visualisation
 Seaborn - Visualisation
 Scikit-learn - ML
 Requests – Api
 Selenium – Web scrapping / Browser Automation
 Pyodbc
 xml.etree.ElementTree
 Openpyxl
 Xlsxwriter
Numpy
NumPy, short for "Numerical Python," is a foundational library for numerical and
scientific computing in the Python programming language.

It is the go-to library for performing efficient numerical operations on large


datasets, and it serves as the backbone for numerous other scientific and data-
related libraries
Numpy
 Array Representation
 Data Storage
 Vectorized Operations
 Universal Functions (ufuncs)
 Broadcasting
 Indexing and Slicing
 Mathematical Functions
BASIC METHODS IN NUMPY
 1. Importing NumPy
To use NumPy in Python, you first need to import it

The common convention is to alias NumPy as `np`.

The common convention is to alias NumPy as `np`.


BASIC METHODS IN NUMPY
 2. Creating Arrays
NumPy arrays are the fundamental data structure. You can create arrays using
various methods, such as:
The common convention is to alias NumPy as `np`.
BASIC METHODS IN NUMPY
 3. Basic Operations
NumPy allows you to perform element-wise operations on arrays. For example:

The common convention is to alias NumPy as `np`.


BASIC METHODS IN NUMPY
 4. Array Shape and Dimensions:
Check the shape and dimensions of an array using the `shape` and `ndim` attributes:

The common convention is to alias NumPy as `np`.


BASIC METHODS IN NUMPY
 5. Indexing and Slicing
NumPy supports indexing and slicing to access elements or subsets of arrays.
Indexing starts at:
The common convention is to alias NumPy as `np`.
BASIC METHODS IN NUMPY
 6. Aggregation and Statistics
NumPy provides functions for computing various statistics on arrays
i. Aggregation
The common convention is to alias NumPy as `np`.
BASIC METHODS IN NUMPY
ii. Statistics

The common convention is to alias NumPy as `np`.


 7. Reshaping and Transposing
Reshaping and transposing are fundamental operations when working with multi-dimensional
data, such as matrices or arrays. These operations allow you to change the structure or
dimensions of your data.
i. Reshaping:
Reshaping involves changing the shape or dimensions of your data while maintaining the total
number of elements. This operation is often used in machine learning and data preprocessing
to prepare data for modeling
The common convention is to alias NumPy as `np`.
ii. Transposing:
Transposing involves switching the rows and columns of a two-dimensional data structure like
a matrix or array. This operation is particularly useful for linear algebra operations or when
working with tabular data.

The common convention is to alias NumPy as `np`.


 8. Universal Functions (ufuncs)
NumPy provides universal functions that operate element-wise on arrays, including
trigonometric, logarithmic, and exponential functions.

The common convention is to alias NumPy as `np`.


 9. Random Number Generation
NumPy includes functions for generating random numbers from various distributions, such as
`np. random. rand`, `np. random. rand`, and `np. random. rand`.

The common convention is to alias NumPy as `np`.


 10. Broadcasting
NumPy allows you to perform operations on arrays of different shapes, often automatically
aligning their shapes, thanks to broadcasting rules.

 11. Reshaping Arrays


The common convention is to alias NumPy as `np`.

Reshape arrays into different dimensions using np. reshape or the reshape method.
Pandas - Data Analysis
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008
Pandas - Data Analysis - Contents
 Data Structures
- Series
- Data Frame
 Data Alignment
 Label Based Indexing
 Data Cleaning
 Data Aggregation
 Data Merging and Joining
 Data Visualisation Integration
Pandas - Data Analysis
 Examples – Creating and Loading Dataframe
 Creating Data Frame
- From Dictionary
 Loading Data to Dataframe
- From External Data Sources
- CSV
- JSON
- XML
- Excel
- Database (Tally / Access) using Sql
Pandas - Data Analysis - Viewing Data
 Examples - Viewing Data

 df.head()
 df.tail()
 df.shape
 df.info()
 df.describe()
 df.sample(~)

These methods are invaluable for getting an initial sense of your data's structure
and Content.
Pandas - Data Analysis - Indexing and Selecting Data
 Examples - Indexing and Selecting Data

 Viewing Data

 Name_Column = df[`Name`
 Subset = df[[‘Name’, ‘Age’]]
 Young_People = df[df[“age”] <30]

 Hint : For further reference


https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Pandas - Data Analysis – Sorting Data
 Examples - Sorting Data
 Viewing Data

 Hint : For further reference


https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Pandas – DATA AGGREGATION AND SUMMARY STATISTICS
Pandas – ADDING AND DROPPING COLUMNS
Pandas – Handling Missing Data
Pandas – Merging and Concatenating Data Frames
Pandas – Saving and Loading Data
Pandas – Saving Data from Data Frame
Saving the data to a csv file
df.to_csv(r'C:\Users\Ram Office\Desktop\file3.csv’)

Saving the data to a excel file


df.to_excel("output.xlsx")

Further reading on formatting excel file


https://xlsxwriter.readthedocs.io/working_with_pandas.html

Note: Loading data already discussed under Creating and Loading Data Frame
Data Preprocessing Steps

IMPORTANCE OF DATA PREPROCESSING

 Data Quality Improvement:


 Enhanced Model Performance
 Extraction and Engineering
 Normalization and Scaling
 Handling Categorical Data:
 Dimensionality Reduction:
 Improved Interpretability:
Data Preprocessing Steps
 DATA COLLECTION
GATHER THE RAW DATA FROM VARIOUS SOURCES, SUCH AS DATABASES, FILES, APIS, OR SENSORS.
 DATA CLEANING
 Handling Missing Values
IDENTIFY AND HANDLE MISSING DATA, WHICH CAN INVOLVE FILLING IN MISSING VALUES WITH
DEFAULT VALUES, USING INTERPOLATION, OR REMOVING ROWS/COLUMNS WITH MISSING DATA.
Data Preprocessing Steps

 DATA REDUCTION
 Dimensionality Reduction
 Principal Component Analysis (PCA)
 Feature Selection
 Recursive Feature Elimination (RFE)
 DATA IMBALANCE HANDLING
 Oversampling
 Undersampling
 Synthetic Data Generation (SMOTE)
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 1

Conversion of JSON Data to Excel

Students may use any


GSTR2A or GSTR2 or GSTR3B File to Convert data to Excel

Approach - 1
Using pandas data frame to read Json file and then write to excel
Approach – 2
Using openpyxl library read json parts and write to excel directly
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 2

Conversion of XML Data to Excel

Students may use any


Income Tax return file to extract ITR Balance sheet and profit and loss data to excel

Approach - 1
Use XML Element tree Module
https://docs.python.org/3/library/xml.etree.elementtree.html
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 3

Consolidate multiple excel files to single file

Students may use the excel file provided to consolidate into single file

Approach :

Use Dataframe in pandas and merging feature.


Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 4

Convert 26As text file to excel

Approach :

Use Dataframe in pandas and merging feature .


Use Regex
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 5

Get Ledger Master Data from Tally data using sql Query

Query
Select $Name, $Parent, $_PRimaryGroup, $OpeningBalance, $_ClosingBalance
from Ledger

Libraries used

Pyodbc
DATA
VISUALIZATION
WITH PYTHON
Learning Objectives

UNDERSTANDING DIFFERENT LIBRARIES USED FOR DATA VISUALIZATION IN PYTHON.

LEARN ABOUT DIFFERENT TYPES OF PLOTS AND THEIR USE.

USE OF MATPLOTLIB, SEABORN, AND PLOTLY FOR DATA VISUALIZATION


Basic Plots
LINE PLOT
Basic Plots
SCATTER PLOT
Basic Plots
HISTOGRAM
Basic Plots
BAR PLOT
Basic Plots
PIE PLOT
DATA PREPARATION FOR MODEL BUILDING

INDEPENDENT FEATURES

DEPENDENT FEATURES
TYPES OF ML ALGORITHMS

SUPERVISED ALGORITHM

UNSUPERVISED ALGORITHM

REINFORCEMENT LEARNING
TYPES OF ML ALGORITHMS
DATA
VISUALIZATION
WITH PYTHON
Learning Objectives

UNDERSTANDING DIFFERENT LIBRARIES USED FOR DATA VISUALIZATION IN PYTHON.

LEARN ABOUT DIFFERENT TYPES OF PLOTS AND THEIR USE.

USE OF MATPLOTLIB, SEABORN, AND PLOTLY FOR DATA VISUALIZATION


Basic Plots
LINE PLOT
Basic Plots
SCATTER PLOT
Basic Plots
HISTOGRAM
Basic Plots
BAR PLOT
Basic Plots
PIE PLOT
Basic Plots
SUN BURST
TH A N
K
YO U

You might also like