0% found this document useful (0 votes)
2 views

EX-02-Data manipulation pandas matplot

The document outlines data manipulation techniques using NumPy and Pandas, as well as data visualization methods with Matplotlib. It includes various programs demonstrating array creation, reshaping, copying, concatenating, and handling missing data, along with aggregation functions in Pandas. Additionally, it covers plotting techniques like line graphs, scatter plots, and bar charts using Matplotlib.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

EX-02-Data manipulation pandas matplot

The document outlines data manipulation techniques using NumPy and Pandas, as well as data visualization methods with Matplotlib. It includes various programs demonstrating array creation, reshaping, copying, concatenating, and handling missing data, along with aggregation functions in Pandas. Additionally, it covers plotting techniques like line graphs, scatter plots, and bar charts using Matplotlib.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

AIM:

To perform data manipulation using numpy and pandas and data visualization using matplotlib.

Program:
A. Data Manipulation using Numpy

NumPy stands for Numerical Python. NumPy is a python library used for working with arrays.

array(): Used to create numpy ndarray


copy(): It is a new array and it owns the data and any changes made to the copy will not affect
original array
view():View is just a view of the original array.The view does not own the data and any changes
made to the view will affect the original array
reshape() : Used to change the dimensions of the array
concatenate(): Used to join two or more arrays into single array
split(): Splitting is reverse operation of Joining. Array and number of splits are passed as
argument.
where(): search an array for a certain value, and return the indexes that get a match
sort(): Sorting means putting elements in an ordered sequence.
random: Module to work with random numbers

1. Program to create array and perform reshape:


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
x=arr.reshape(2,3)
print( x)

Output:
[[1 2 3]
[4 5 6]]

2. Program to demonstrate copy and view

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42
print(arr)
print(x)

arr = np.array([11, 12, 13, 14, 15])


y = arr.copy()
arr[0] = 42
print(arr)
print(y)

Output:
[42 2 3 4 5]
[1 2 3 4 5]

[42 12 13 14 15]
[42 12 13 14 15]

3. Program to demonstrate concatenate and split

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
print(np.split(arr,2))

Output:
[1 2 3 4 5 6]
[array([1, 2, 3]), array([4, 5, 6])]

4. Program to demonstrate sorting and searching


import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))
print(np.where(arr==2))

Output:
[0 1 2 3]
3

B. Data Manipulation using Pandas

Pandas is a python library that allow us to analyze big data and make conclusions based on
statistical theories.
read_csv(dataset name): Used to load dataset
dropna(): Used to remove rows or columns with missing values
fillna(): Used to replace the NaN values with other values
mean(): Used to find the mean of the values for the requested axis
sum(): Function return the sum of the values for the requested axis
count(): is used to count the no. of non-NA/null observations across the given axis.

1. Program for handling missing data:


import numpy as np
import pandas as pd

# Load Dataset

dframe = pd.read_csv(‘Data.csv’)
print(dframe)

# Dropping missing data

dframe.dropna(inplace = True)
print(dframe)

# Filling missing data with mean

dframe.fillna(value = dframe.mean(), inplace = True)


print(dframe)

Output:
a b c
0 23 10.0 0.0
1 24 12.0 NaN
2 22 NaN NaN

a b c
0 23 10.0 0.0

a b c
0 23 10.0 0.0
1 24 12.0 0.0
2 22 11.0 0.0

2. Program for merge and join of data frame using panda:


import numpy as np
import pandas as pd

left = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'], 'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'Key': ['K0', 'K1', 'K2', 'K3'], 'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

# Merging the dataframes

print(pd.merge(left, right, how ='inner', on ='Key'))

# Joining the dataframes

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],


'B': ['B0', 'B1', 'B2', 'B3']}, index= ['K0', 'K1', 'K2', 'K3'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],


'D': ['D0', 'D1', 'D2', 'D3']}, index= ['K0', 'K1', 'K2', 'K3'])

print(left.join(right))

Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3

3. Program for aggregation in panda:


import numpy as np
import pandas as pd

df = pd.read_csv(‘data.csv’)
print(df)
# Mean
print(df.mean())

# Count
print(df.count())

# Median
print(df.median())

# Sum
print(df.sum())

#Mad (Mean Absolute Deviation)


print(df.mad())

# Std (Standard deviation)


print(df.std())

# Var (Variance)
print(df.var())

Output:
A B
0 0.374540 0.155995
1 0.950714 0.058084
2 0.731994 0.866176
3 0.598658 0.601115
4 0.156019 0.708073
A 0.562385
B 0.477888
dtype: float64

A 5
B 5
dtype: int64

A 0.598658
B 0.601115
dtype: float64

A 2.811925
B 2.389442
dtype: float64

A 0.237685
B 0.296679
dtype: float64

A 0.308748
B 0.353125
dtype: float64

A 0.095325
B 0.124697
dtype: float64

C. Data Visualization using Matplotlib

Matplotlib is a multi-platform data visualization library built on NumPy arrays. Matplotlib


consists of several plots like line, bar, scatter, histogram etc.
figure(): Thought of as a single container that contains all the objects representing axes, graphics,
text, and labels
axes(): Bounding box with ticks and labels, which will eventually contain the plot elements that
make up our visualization.
xlim(), ylim(): Used to set x and y limit for axis
axis[xmin,xmax,ymin,ymax]: Used to set x and y limit for axis
xlabel(), ylabel(): Used to set label for axis
legend(): Used to set label for multiple plot lines in graph
plt.axis(‘equal’): Equal splitting of x and y limit values
plt.axis(‘tight’): Default spacing left between values of x and y limits
plt.savefig(): Used to save the output graph to specified format

1. Line Drawing
Program:
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('seaborn-whitegrid')
fig = plt.figure()
plt.xlim(1,10)
plt.ylim(1,20)
plt.axis('tight')
plt.title('Sin ans Cos curves')
plt.xlabel('x')
plt.ylabel('Y')
x=np.linspace(1,20,10)
plt.plot(x, x + 0, linestyle='solid', linewidth=5,label='solid')
plt.plot(x, x + 1, linestyle='dashed',label='dashed')
plt.plot(x, x + 2, linestyle='dashdot',label='dash dot')
plt.plot(x, x + 3, linestyle='dotted',label='dotted')
plt.legend()
plt.savefig(‘sincos.png’) #Output graph saved into png format in current folder

Output:

2. Scatter Plot
Program:
import matplotlib.pyplot as plt
import numpy as np
plt.xlim(1,10)
plt.ylim(0,10)
plt.title('Scatter plots')
plt.xlabel('x axis')
plt.ylabel('Y axis')
x = np.linspace(0, 10, 10)
y = x+2
plt.plot(x, y, 'o', color=' green ') #scatter plot using plt.plot()
OR
plt.scatter(x,y,marker='o',color='green') #scatter plot using plt.scatter()
plt.savefig(‘scatter.pdf’) #Output graph saved into pdf format in current folder

Output:
3. Bar chart
Program:
import matplotlib.pyplot as plt

# x-coordinates of left sides of bars


left = [1, 2, 3, 4, 5]

# heights of bars
height = [10, 24, 36, 40, 5]

# labels for bars


tick_label = ['one', 'two', 'three', 'four', 'five']

# plotting a bar chart


plt.bar(left, height, tick_label = tick_label,
width = 0.8, color = ['red', 'green'])

# naming the x-axis


plt.xlabel('x - axis')
# naming the y-axis
plt.ylabel('y - axis')

# plot title
plt.title('My bar chart!')

Output:
+

You might also like