Unit 1
Unit 1
Learning Methodology
• Python Basics
Definition:
Applications
of
Data Analytics
Why Data Analytics Using Python?
5 3 <class 'int'> 4
John 3 <class 'str'> Spiderman
3.0
Introduction to Python data types
Each variable that is used in Python is associated with some data type. In Python, programmers do not need to define the
type. However, most of the programming languages require to declare the datatype during its declaration stage itself
whereas Python has the advantage of explicit data type conversion
c = 2 + 4j
print("\nType of c: ", type(c))
Python data types
Integers – This value is represented by int class. It contains positive or negative whole numbers (without fractions or
decimals). In Python, there is no limit to how long an integer value can be.
Example Output
Float – This value is represented by the float class. It is a real number with a floating-point representation. It is specified by a
decimal point.
Example Output
Strings in Python are arrays of bytes representing Unicode characters. A string is a collection of one or more characters
put in a single quote, double-quote, or triple-quote.
Creating a String
Strings in Python can be created using single quotes, double quotes, or even triple quotes.
Example Output
Data type with one of the two built-in values, True or False. Boolean objects that are equal to True are truthy (true), and
those equal to False are falsy (false).
Example Output
Example
x=5
x += 3 #Same As X = X+3
print(x)
Output
8
Python Conditions and If statements
• Equals: a == b If statement
• Not Equals: a != b
a = 33
• Less than: a < b b = 200
• Less than or equal to: a <= b
if b > a:
• Greater than: a > b print("b is greater than a")
• Greater than or equal to: a >= b
Output
b is greater than a
Python Conditions and If statements
a = 33 a = 200 a = 200
b = 33 b = 33 b = 33
if b > a: if b > a:
print("b is greater than a") print("b is greater than a") if a > b: print("a is greater than
elif a == b: elif a == b: b")
print("a and b are equal") print("a and b are equal")
else:
print("a is greater than b") Output
Output
b is greater than a
Python Loops
• while loops
• for loops
i=1 1
while i < 6: 2
print(i) 3
i += 1 4
5
1 1 1
2 2 2
4 3 3
5 4
6 5
i is no longer less than 6
Python For Loops
A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).
fruits = ["apple", "banana", "cherry"] fruits = ["apple", "banana", "cherry"] fruits = ["apple", "banana", "cherry"]
for x in fruits: for x in fruits: for x in fruits:
print(x) print(x) if x == "banana":
Calling a Function
def my_function():
print("Hello from a function")
my_function()
Output
• An array is a special variable, which can hold more than one value at a time.
Calling a Function
x = cars[0]
print(x)
Output
Ford
Difference between List, Tuple, Dictionary, Set
Python Classes and Objects
What is Pandas?
• The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in
2008.
• Pandas allows us to analyze big data and make conclusions based on statistical theories.
• Pandas can clean messy data sets, and make them readable and relevant.
Introducing the pandas DataFrame
The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels.
and columns
columns
Introducing the pandas DataFrame
Pandas
Installation of Pandas
import pandas as pd
• pip install pandas mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
Import Pandas 'passings': [3, 7, 2]
}
• Once Pandas is installed, import it in your applications
myvar = pd.DataFrame(mydataset)
by adding the import keyword:
print(myvar)
• import pandas as pd
Output
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
Pandas DataFrame
• pd.read_csv(“filename”)
• pd.read_table(“filename”)
• pd.read_excel(“filename”)
• pd.read_sql(query, connection_object)
• pd.read_json(json_string)
DataFrame Attributes
Definition
DataFrame selection
Pandas DataFrame
Slicing
print(df.head(2))
print(df.tail(2))
Pandas DataFrame
Indexing
df[“Skill”] df[[“EmpID”,”Skill”]]
Pandas DataFrame
Filtering
Definition
NumPy stands for ‘Numerical Python’. It is a package in Python to work with arrays. It is a basic scientific library. Its most
important feature is the n-dimensional array object. It has uses in statistical functions, linear algebra, arithmetic operations,
bitwise operations, etc.
Installation of NumPy
Import NumPy
• import NumPy as np
Representation of NumPy multidimensional arrays
Why NumPy array instead of Python List ?
Why use NumPy for machine learning, Deep Learning, and
Data Science?
Uses of NumPy
NumPy Operations
NumPy String Operations
It converts all the uppercase characters in the string to lowercase. If there are
np.lower()
no upper-case characters, then it returns the original string.
np.upper()
import numpy as np Output
np.join()
#lower case [‘data’ ‘flair’]
print(np.char.lower(['Data', 'Flair'])) np.strip()
np.split() This function returns a string after breaking the string, separated by an input np.capitalize(
separator. )
np.title()
#split Output
print(np.char.split("Data Flair"))
[‘Data’, ‘Flair’] np.lower()
np.lower()
How to create 1 & 2 dimensional NumPy array?
Creating Matrices
We can pass python lists of lists in the following shape to have NumPy create a matrix to represent them:
Matrix Indexing & Matrix Aggregation
Indexing and slicing operations become even more useful when we’re manipulating matrices:
Matrix
Indexing
Matrix
Aggregation
Difference between Pandas Vs Numpy
Pandas Numpy
Box Plot
Plots of Matplotlib
#You Can Import Pyplot Using The Following Code From Matplotlib Import Pyplot As Plt
Plt.Show() This Function In Pyplot Is In Use To Display All The Figures Created In Your Code.
Plt.Annotate( This Function Is Used To Add Comments Inside The Figure To Make The Figure Meaningful.
)
Plt.Grid() This Function Is Used To Display Grid Inside The Figure.
What Is Matplotlib
Bar Graph
Syntax:
Matplotlib.Pyplot.Bar(X,Height,Width=0.5)
Histogram
Syntax:
Plt.Hist(X,Optional Parameters)
Scatter Plot
Syntax:
Plt.Scatter(X,Y,Optional Parameters)
Box Plot
Syntax:
Plt.Boxplot(X,Optional Parameters)
Pie Chart
Syntax:
Plt.Pie(X,Optional Parameters)
Let’s create a simple plot
Output:
Data Visualization Using Seaborn
Introduction:
Seaborn is a Python data visualization library based on the Matplotlib library. It provides a high-level
interface for drawing attractive and informative statistical graphs.
TYPES OF PLOT
Box and
Whiskers Plot
Plots of Seaborn
Output:
Data science techniques to real-world problems
Regression Analysis Application: Predicting sales, house prices, or any continuous variable based on historical
data.
Classification Algorithms Application: Spam detection in emails, sentiment analysis in social media, or disease
diagnosis.
Time Series Analysis Application: Forecasting stock prices, predicting demand for products, or analyzing
trends over time.
Image Recognition Application: Facial recognition, object detection, or medical image analysis.
Association Rule Mining Application: Market basket analysis for product or identifying patterns in customer behavior.
Optimization Algorithms Application: Supply chain optimization, resource allocation, or route planning.
Anomaly Detection Application: Fraud detection, network security, or equipment failure prediction.