EX-02-Data manipulation pandas matplot
EX-02-Data manipulation pandas matplot
To perform data manipulation using numpy and pandas and data visualization using matplotlib.
Program:
A. Data Manipulation using Numpy
NumPy stands for Numerical Python. NumPy is a python library used for working with arrays.
Output:
[[1 2 3]
[4 5 6]]
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42
print(arr)
print(x)
Output:
[42 2 3 4 5]
[1 2 3 4 5]
[42 12 13 14 15]
[42 12 13 14 15]
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
print(np.split(arr,2))
Output:
[1 2 3 4 5 6]
[array([1, 2, 3]), array([4, 5, 6])]
Output:
[0 1 2 3]
3
Pandas is a python library that allow us to analyze big data and make conclusions based on
statistical theories.
read_csv(dataset name): Used to load dataset
dropna(): Used to remove rows or columns with missing values
fillna(): Used to replace the NaN values with other values
mean(): Used to find the mean of the values for the requested axis
sum(): Function return the sum of the values for the requested axis
count(): is used to count the no. of non-NA/null observations across the given axis.
# Load Dataset
dframe = pd.read_csv(‘Data.csv’)
print(dframe)
dframe.dropna(inplace = True)
print(dframe)
Output:
a b c
0 23 10.0 0.0
1 24 12.0 NaN
2 22 NaN NaN
a b c
0 23 10.0 0.0
a b c
0 23 10.0 0.0
1 24 12.0 0.0
2 22 11.0 0.0
left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'], 'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'], 'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print(left.join(right))
Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3
df = pd.read_csv(‘data.csv’)
print(df)
# Mean
print(df.mean())
# Count
print(df.count())
# Median
print(df.median())
# Sum
print(df.sum())
# Var (Variance)
print(df.var())
Output:
A B
0 0.374540 0.155995
1 0.950714 0.058084
2 0.731994 0.866176
3 0.598658 0.601115
4 0.156019 0.708073
A 0.562385
B 0.477888
dtype: float64
A 5
B 5
dtype: int64
A 0.598658
B 0.601115
dtype: float64
A 2.811925
B 2.389442
dtype: float64
A 0.237685
B 0.296679
dtype: float64
A 0.308748
B 0.353125
dtype: float64
A 0.095325
B 0.124697
dtype: float64
1. Line Drawing
Program:
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn-whitegrid')
fig = plt.figure()
plt.xlim(1,10)
plt.ylim(1,20)
plt.axis('tight')
plt.title('Sin ans Cos curves')
plt.xlabel('x')
plt.ylabel('Y')
x=np.linspace(1,20,10)
plt.plot(x, x + 0, linestyle='solid', linewidth=5,label='solid')
plt.plot(x, x + 1, linestyle='dashed',label='dashed')
plt.plot(x, x + 2, linestyle='dashdot',label='dash dot')
plt.plot(x, x + 3, linestyle='dotted',label='dotted')
plt.legend()
plt.savefig(‘sincos.png’) #Output graph saved into png format in current folder
Output:
2. Scatter Plot
Program:
import matplotlib.pyplot as plt
import numpy as np
plt.xlim(1,10)
plt.ylim(0,10)
plt.title('Scatter plots')
plt.xlabel('x axis')
plt.ylabel('Y axis')
x = np.linspace(0, 10, 10)
y = x+2
plt.plot(x, y, 'o', color=' green ') #scatter plot using plt.plot()
OR
plt.scatter(x,y,marker='o',color='green') #scatter plot using plt.scatter()
plt.savefig(‘scatter.pdf’) #Output graph saved into pdf format in current folder
Output:
3. Bar chart
Program:
import matplotlib.pyplot as plt
# heights of bars
height = [10, 24, 36, 40, 5]
# plot title
plt.title('My bar chart!')
Output:
+