0% found this document useful (0 votes)
4 views69 pages

Unit 1

The Advanced Data Analytics Course is designed for Arts and Science students, focusing on project-based learning over 45 hours with a curriculum centered on Python and its libraries for data analysis. Key topics include data manipulation with Pandas, numerical computing with NumPy, and data visualization using Matplotlib and Seaborn. The course aims to equip students with essential skills for various data analyst roles across multiple disciplines.

Uploaded by

Jaikishen Nehru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views69 pages

Unit 1

The Advanced Data Analytics Course is designed for Arts and Science students, focusing on project-based learning over 45 hours with a curriculum centered on Python and its libraries for data analysis. Key topics include data manipulation with Pandas, numerical computing with NumPy, and data visualization using Matplotlib and Seaborn. The course aims to equip students with essential skills for various data analyst roles across multiple disciplines.

Uploaded by

Jaikishen Nehru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Advanced Data Analytics Course – Arts & Science Students

Course Introduction - Advanced Data Analytics

Learning Methodology

● Project Based Learning


● Total number of Hours: 45 hours & 2 credits
● 5 Days On-site class
● Target Audience : Arts & Science Students
● Hardware & Devices to be used: PC/Laptop
● Software Licenses to be taught/used:Python using Google Colab
Course Outline (FDP) - Advanced Data Analytics Using Python
Growth In Advanced Data Analytics
Roles and responsibilities of the Data Analyst
Job Roles

Mathematics Students Physics Students


● Data Scientist ● Data Scientist (Physics Emphasis)
● Quantitative Analyst ● Research Scientist (Physics)
● Operations Research Analyst ● Environmental Analyst
● Cryptanalyst

Statistics Students Economics Students


● Statistician ● Economic Analyst
● Biostatistician ● Policy Analyst (Economics Emphasis)
● Market Research Analyst (Statistics) ● Financial Analyst (Economics)
Companies related to Data Analytics
Today’s Agenda

• Data Analysis with Python

• Python Basics

• Pandas: Data manipulation and analysis

• NumPy: Numerical computing in Python

• Matplotlib: Basic data visualization

• Seaborn: Statistical data visualization

• Real-world Applications and Case Studies


What is Data Analytics?

Definition:

Data analytics help in collecting, analyzing, cleaning,


and extracting meaningful information from the
abundance of available data. A data analytics
software aids in the process to enable fast, data-
driven, and data-informed decision making.
Data Analytics Process Steps
Types of Data Analytics
Applications of Data Analytics

Applications
of
Data Analytics
Why Data Analytics Using Python?

• Python is easy to learn and understand and has a


simple syntax.
• The programming language is scalable and flexible.
• It has a vast collection of libraries for numerical
computation and data manipulation.
• Python provides libraries for graphics and data
visualization to build plots.
Some Basic functions of Python

Prints the head contents of


print the particular dataframe

Used to display text or


df.describe variables in the console.

prints the last contents of the


df.head() particular dataframe

prints the basic statistical


df.tail() information of the dataframe
Introduction To Python

• Python is a general purpose


programming language. It is very easy to
learn, easy syntax and readability is one
of the reasons why developers are
switching to python from other
programming languages.

• We can use python as object oriented


and procedure oriented language as
well.
Python Variables

Creating Variables Casting Get the Type Case-Sensitive

x=5 x = str(3) x=5 a=4


y = "John" y = int(3) y = "John" A = "Spiderman"
print(x) z = float(3) print(type(x)) #A will not overwrite a
print(y) print(type(y))

5 3 <class 'int'> 4
John 3 <class 'str'> Spiderman
3.0
Introduction to Python data types

Each variable that is used in Python is associated with some data type. In Python, programmers do not need to define the
type. However, most of the programming languages require to declare the datatype during its declaration stage itself
whereas Python has the advantage of explicit data type conversion

We can use the type ( ) method to know the


data type of any variable in Python. Output

Type of a: <class 'int'>


a=5
print("Type of a: ", type(a)) Type of b: <class 'float'>
Type of c: <class 'complex'>
b = 5.0
print("\nType of b: ", type(b))

c = 2 + 4j
print("\nType of c: ", type(c))
Python data types

Common Data Types:


Integer (int): Whole numbers without decimal
points.
Float (float): Numbers with decimal points.
String (str): Ordered sequence of characters.
Boolean (bool): Represents True or False.
Integer Data Type

Integers – This value is represented by int class. It contains positive or negative whole numbers (without fractions or
decimals). In Python, there is no limit to how long an integer value can be.

Example Output

# Example of integer variables Sum: -5


x=5 Difference: 15
y = -10
# Mathematical operations with integers
sum_result = x + y
difference_result = x - y
print("Sum:", sum_result)
print("Difference:", difference_result)
Float Data Type

Float – This value is represented by the float class. It is a real number with a floating-point representation. It is specified by a
decimal point.

Example Output

# Example of float variables Sum (float): 0.64


a = 3.14 Product (float): -7.85
b = -2.5
# Mathematical operations with floats
sum_res
ult_float = a + b
product_result_float = a * b
print("Sum (float):", sum_result_float)
print("Product (float):", product_result_float)
String Data Type

Strings in Python are arrays of bytes representing Unicode characters. A string is a collection of one or more characters
put in a single quote, double-quote, or triple-quote.

Creating a String

Strings in Python can be created using single quotes, double quotes, or even triple quotes.

Example Output

String1 = 'Welcome' Good Morning


print("Good Morning") Welcome
print(String1)
Boolean Data Type

Data type with one of the two built-in values, True or False. Boolean objects that are equal to True are truthy (true), and
those equal to False are falsy (false).

Example Output

# Numeric variables x > y: False


x=5
y = 10
# Boolean comparison
greater_than = x > y
# Output
print("x > y:", greater_than)
Python Operators

Operators are used to perform operations on variables and values.

Example
x=5

x += 3 #Same As X = X+3

print(x)

Output

8
Python Conditions and If statements

Python supports the usual logical conditions from mathematics:

• Equals: a == b If statement
• Not Equals: a != b
a = 33
• Less than: a < b b = 200
• Less than or equal to: a <= b
if b > a:
• Greater than: a > b print("b is greater than a")
• Greater than or equal to: a >= b
Output

b is greater than a
Python Conditions and If statements

Elif Else Short Hand If

a = 33 a = 200 a = 200
b = 33 b = 33 b = 33
if b > a: if b > a:
print("b is greater than a") print("b is greater than a") if a > b: print("a is greater than
elif a == b: elif a == b: b")
print("a and b are equal") print("a and b are equal")
else:
print("a is greater than b") Output
Output

a and b are equal a is greater than b


Output

b is greater than a
Python Loops

Python has two primitive loop commands:

• while loops
• for loops

While loop Output

i=1 1
while i < 6: 2
print(i) 3
i += 1 4
5

With the while loop we can execute a set of


statements as long as a condition is true.
Python While Loops

The continue Statement The else Statement The break Statement

i=0 i=1 i=1


while i < 6: while i < 6: while i < 6:
i += 1 print(i) print(i)
if i == 3: i += 1 if i == 3:
continue else: break
print(i) print("i is no longer less than i += 1
6")

Output Output Output

1 1 1
2 2 2
4 3 3
5 4
6 5
i is no longer less than 6
Python For Loops
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

For Loop The break Statement The continue Statement

fruits = ["apple", "banana", "cherry"] fruits = ["apple", "banana", "cherry"] fruits = ["apple", "banana", "cherry"]
for x in fruits: for x in fruits: for x in fruits:
print(x) print(x) if x == "banana":

Output if x == "banana": continue

apple break print(x)


banana
cherry Output Output
apple apple
banana cherry
Python For Loops
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

Nested Loops The range() Function

adj = ["red", "big", "tasty"] for x in range(6):


fruits = ["apple", "banana", "cherry"] print(x)
for x in adj:
for y in fruits: Output
print(x, y) 0
1
Output 2
red apple big cherry 3
red bananatasty apple 4
red cherrytasty banana 5
big apple tasty cherry
big banana
Python Functions

• A function is a block of code which only runs when it is called.

• You can pass data, known as parameters, into a function.

• A function can return data as a result.

Calling a Function

def my_function():
print("Hello from a function")
my_function()

Output

Hello from a function


Python Arrays

• An array is a special variable, which can hold more than one value at a time.

Calling a Function

cars = ["Ford", "Volvo", "BMW"]

x = cars[0]

print(x)

Output

Ford
Difference between List, Tuple, Dictionary, Set
Python Classes and Objects

• Python is an object oriented programming


language.

• Almost everything in Python is an object, with


its properties and methods.

• A Class is like an object constructor, or a


"blueprint" for creating objects.
Popular data science libraries
Pandas Introduction

What is Pandas?

• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and manipulating data.

• The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in
2008.

Why Use Pandas?

• Pandas allows us to analyze big data and make conclusions based on statistical theories.

• Pandas can clean messy data sets, and make them readable and relevant.
Introducing the pandas DataFrame

The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels.

Pandas DataFrames are data structures that


contain:

• Data organized in two dimensions, rows

and columns

• Labels that correspond to the rows and

columns
Introducing the pandas DataFrame
Pandas

Installation of Pandas
import pandas as pd
• pip install pandas mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
Import Pandas 'passings': [3, 7, 2]
}
• Once Pandas is installed, import it in your applications
myvar = pd.DataFrame(mydataset)
by adding the import keyword:
print(myvar)
• import pandas as pd
Output

cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
Pandas DataFrame

Dataframe from CSV

To create a DataFrame from CSV, we use the


read_csv('file_name') function that takes the
file name as input and returns DataFrame as
output.
Importing Convention

• pd.read_csv(“filename”)
• pd.read_table(“filename”)
• pd.read_excel(“filename”)
• pd.read_sql(query, connection_object)
• pd.read_json(json_string)
DataFrame Attributes

Definition

DataFrame has provided many built-in


attributes. Attributes do not modify the
underlying data, unlike functions, but it
is used to get more details about the
DataFrame.
Pandas DataFrame
Pandas DataFrame

Selection The original dataframe:

Name Age Department


Select columns by name in pandas 0 Jim 26 Sales
1 Dwight 28 Sales
df_selected = df[['Name', 'Department']] 2 Angela 27 Accounting
3 Tobi 32 Human Resources

Dataframe with the selected columns:


Using the .loc property
Name Department
0 Jim Sales
df_selected = df.loc[['Name', 'Department']] 1 Dwight Sales
2 Angela Accounting
3 Tobi Human Resources
Pandas DataFrame

DataFrame selection
Pandas DataFrame

Slicing

print(df.head(2))

print(df.tail(2))
Pandas DataFrame

Indexing

Selecting a single column Selecting multiple columns

df[“Skill”] df[[“EmpID”,”Skill”]]
Pandas DataFrame

Filtering

Syntax : DataFrame.filter(self, items=None, like=None,


regex=None, axis=None)

Definition

The filter() function is used to subset rows or columns


of dataframe according to labels in the specified
index.
Pandas DataFrame Visualization Tools
NumPy

NumPy stands for ‘Numerical Python’. It is a package in Python to work with arrays. It is a basic scientific library. Its most
important feature is the n-dimensional array object. It has uses in statistical functions, linear algebra, arithmetic operations,
bitwise operations, etc.
Installation of NumPy

• pip install NumPy

Import NumPy

• Once NumPy is installed, import it in your applications


by adding the import keyword:

• import NumPy as np
Representation of NumPy multidimensional arrays
Why NumPy array instead of Python List ?
Why use NumPy for machine learning, Deep Learning, and
Data Science?
Uses of NumPy
NumPy Operations
NumPy String Operations

It converts all the uppercase characters in the string to lowercase. If there are
np.lower()
no upper-case characters, then it returns the original string.

np.upper()
import numpy as np Output
np.join()
#lower case [‘data’ ‘flair’]
print(np.char.lower(['Data', 'Flair'])) np.strip()

np.split() This function returns a string after breaking the string, separated by an input np.capitalize(
separator. )
np.title()
#split Output
print(np.char.split("Data Flair"))
[‘Data’, ‘Flair’] np.lower()

np.lower()
How to create 1 & 2 dimensional NumPy array?
Creating Matrices

We can pass python lists of lists in the following shape to have NumPy create a matrix to represent them:
Matrix Indexing & Matrix Aggregation

Indexing and slicing operations become even more useful when we’re manipulating matrices:

Matrix
Indexing

We can aggregate matrices the same way we aggregated vectors:

Matrix
Aggregation
Difference between Pandas Vs Numpy

Pandas Numpy

Used for numerical and Used for data manipulation,


mathematical operations on cleaning, and analysis with
Objective arrays labeled data, especially in
tabular data.

Provides arrays (ndarray) - Introduces DataFrames and


homogeneous, multi- Series - flexible, labeled, and
dimensional, and fixed-size. two-dimensional structures
Data Structures
suitable for handling
heterogeneous tabular data

Uses integer-based indexing Employs labeled indexing,


similar to traditional arrays. making it easier to work with
Indexing and reference specific data
points
Matplotlib
Definition:

Matplotlib Is One Of The Plotting Library In


Python Which Is However Widely In Use
For Machine Learning Application With Its
Numerical Mathematics Extension- Numpy
To Create Static, Animated And Interactive
Visualisations.
TYPES OF PLOT

Bar Graph Histogram

Pie Chart Scatter Plot

Box Plot
Plots of Matplotlib

Bar Graph Histogram Scatter Plot

Pie Chart Box Plot


BASIC FUNCTIONS IN LIBRARY

#You Can Import Pyplot Using The Following Code From Matplotlib Import Pyplot As Plt

Plt.Show() This Function In Pyplot Is In Use To Display All The Figures Created In Your Code.

Plt.Xlabel() This Function Is Used To Create A Label In X Axis For Reference.

Plt.Ylabel() This Function Is In Use To Create A Label In Y Axis For Reference.

Plt.Title() This Function Is Used To Create Title For The Figure.

Plt.Annotate( This Function Is Used To Add Comments Inside The Figure To Make The Figure Meaningful.
)
Plt.Grid() This Function Is Used To Display Grid Inside The Figure.
What Is Matplotlib

Bar Graph
Syntax:
Matplotlib.Pyplot.Bar(X,Height,Width=0.5)

Histogram
Syntax:
Plt.Hist(X,Optional Parameters)

Scatter Plot
Syntax:
Plt.Scatter(X,Y,Optional Parameters)

Box Plot
Syntax:
Plt.Boxplot(X,Optional Parameters)

Pie Chart
Syntax:
Plt.Pie(X,Optional Parameters)
Let’s create a simple plot

Pip Install Matplotlib

Output:
Data Visualization Using Seaborn

Introduction:

Seaborn is a Python data visualization library based on the Matplotlib library. It provides a high-level
interface for drawing attractive and informative statistical graphs.

TYPES OF PLOT

Scatter Plot Histogram

Bar Plot Pairwise Plots

Box and
Whiskers Plot
Plots of Seaborn

Scatter Plot Histogram Bar Plot

Pairwise Plots Box and Whiskers Plot


Let’s create a simple plot

pip install seaborn

Output:
Data science techniques to real-world problems

Regression Analysis Application: Predicting sales, house prices, or any continuous variable based on historical
data.

Classification Algorithms Application: Spam detection in emails, sentiment analysis in social media, or disease
diagnosis.

Clustering Application: Customer segmentation for targeted marketing or grouping similar


documents.

Time Series Analysis Application: Forecasting stock prices, predicting demand for products, or analyzing
trends over time.

NLP Application: Sentiment analysis, chatbot development, or language translation.


Data science techniques to real-world problems

Image Recognition Application: Facial recognition, object detection, or medical image analysis.

Association Rule Mining Application: Market basket analysis for product or identifying patterns in customer behavior.

Optimization Algorithms Application: Supply chain optimization, resource allocation, or route planning.

Anomaly Detection Application: Fraud detection, network security, or equipment failure prediction.

Reinforcement Learning Application: Game playing, robotic control, or autonomous vehicles.


THANK YOU

You might also like