APP Lab Manual Final
APP Lab Manual Final
IV Sem CSE
Week-1: Study and implementation of various Basic Slicing and Advanced Indexing operations
of NumPy arrays using Python over example data series?
Week-2: Implement the program using python Aggregations like Min, Max, and etc.?
Example: Consider the heights of all US presidents and find the Average Height of prime
ministers of America? This data is available in the file “president_heights.csv”.
Week-3: Write a python Program using Numpy Comparisons, Masks, and Boolean Logic?
Example: Consider the series of data that represents the amount of precipitation each day for a
year in a given city and count the Rainy Days.
Week-4: Write a python Program using Numpy Fancy Indexing in single and multiple
dimensions by selecting Random Points?
Week-9: Implement the python program for the following matplotlib features
i) Color bars.
ii) Annotation
iii) Matplotlib to Text.
iv) Histograms
v) Scatter Plots
vi) Box plot
Week 10: Write the python program to implement various sub packages of Scipy.
Week11: Write a Python program to create a parent class and child class along with their own
methods. Access parent class members in child class to implement the following sceneries.
a) Constructors & destructors
b) Polymorphism
Example:
Create a class ATM and define ATM operations to create account, deposit, check_balance,
withdraw and delete account. Use constructor to initialize members.
Week-12: Implement the various data cleaning steps of example data sets using python nympy
and pandas
Week13: Implement the feature selection of data set using appropriate sklearn libraries.
Week-1: Study and implementation of various Basic Slicing and Advanced Indexing
operations of NumPy arrays using Python over example data series?
A Python slice object is constructed by giving start, stop, and step parameters to the
built-in slice function.
import numpy as np
a = np.arange(10)
s = slice(2,7,2)
print a[s]
a = np.arange(10)
b = a[5]
print b
Its output is as follows −
5
# slice items starting from index
import numpy as np
a = np.arange(10)
print a[2:]
Now, the output would be −
[2 3 4 5 6 7 8 9]
import numpy as np
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print a
Slicing can also include ellipsis (…) to make a selection tuple of the same length as the
dimension of an array. If ellipsis is used at the row position, it will return an ndarray comprising
of items in rows.
Advanced indexing always returns a copy of the data. As against this, the slicing only presents a
view
Integer Indexing
In the following example, one element of specified column from each row of ndarray object is
selected. Hence, the row index contains all row numbers, and the column index specifies the
element to be selected.
import numpy as np
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
y = x[rows,cols]
Advanced and basic indexing can be combined by using one slice (:) or ellipsis (…) with an
index array. The following example uses slice for row and advanced index for column. The result
is the same when slice is used for both. But advanced index results in copy and may have
different memory layout.
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
# slicing
z = x[1:4,1:3]
In this example, items greater than 5 are returned as a result of Boolean indexing.
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
Description: usage of different aggregate functions of numpy over randomly generated list of 100
elements like mean, median, std,var,sum,min,max.
import numpy as np
array1 = np.random.randint(1,1000,size = (100))
print(array1)
print("Mean: ", np.mean(array1))
print("median: ", np.median(array1))
print("Std: ", np.std(array1))
print("Var: ", np.var(array1))
print("Sum: ", np.sum(array1))
print("Prod: ", np.prod(array1))
print("min: ", np.min(array1))
print("max: ", np.max(array1))
print("argmin: ", np.argmin(array1))
print("argmax: ", np.argmax(array1))
Aggregates available in NumPy can be extremely useful for summarizing a set of values. As a
simple example, let's consider the heights of all US presidents. This data is available in the file
president_heights.csv, which is a simple comma-separated list of labels and values:
import pandas as pd
data = pd.read_csv('president_heights.csv')
heights = np.array(data['height(cm)'])
print(heights)
print("Mean height: ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height: ", heights.min())
print("Maximum height: ", heights.max())
Description: numpy mask is a condition to select subset of qualified arrays elements, outcome of
every element is Boolean type.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
print('x<3 : \n', x < 3)
print('x>3 : \n', x > 3)
print('x<=3 : \n', x <= 3)
print('x>=3 : \n', x >= 3)
print('x==3 : \n', x == 3)
print('x!=3 : \n', x != 3)
print('2 * x == x ** 2: \n', (2 * x) == (x ** 2))
import numpy as np
import pandas as pd
# construct a mask of all summer days (June 21st is the 172nd day)
days = np.arange(365)
summer = (days > 172) & (days < 262)
Week-4: Write a python Program using Numpy Fancy Indexing in single and multiple
dimensions by selecting Random Points?
Description : Fancy indexing is conceptually simple: it means passing an array of indices to
access multiple array elements at once. For example, consider the following array:
import numpy as np
rand = np.random.RandomState(42)
x = rand.randint(100, size=10)
print(x)
#When using fancy indexing, the shape of the result reflects the shape of the index
arrays rather than the shape of the array being indexed:
ind = np.array([[3, 7],
[4, 5]])
print(x[ind])
#Fancy indexing also works in multiple dimensions. Consider the following array:
X = np.arange(12).reshape((3, 4))
print(X)
#Like with standard indexing, the first index refers to the row, and the second to the
column:
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
print(X[row, col])
Week-5: Study and implementation of various Pandas operations on
i) Data sets ii) Data Frames iii) Crosstab iv) Group by
v) Filter vi) Missing values
Description
Data Frames
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
Features of DataFrame
● Potentially columns are of different types
● Size – Mutable
● Labeled axes (rows and columns)
● Can Perform Arithmetic operations on rows and columns
import pandas as pd
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print (df)
#addition of columm
df['gender']= ['male','female','male','male']
print(df)
import pandas as pd
df = pd.read_csv("survey.csv")
print(df)
X=pd.crosstab(df.Nationality,df.Handedness)
print(X)
Out:
X=pd.crosstab(df.Gender,df.Handedness)
print(X)
Out[18]:
Print(X)
Out[19]:
Print(X)
Print(X)
Print(X)
Out[22]:
Group By
weather_by_cities.csv
windspee
day city temperature event
d
0 1/1/2017 new york 32 6 Rain
1 1/2/2017 new york 36 7 Sunny
2 1/3/2017 new york 28 12 Snow
3 1/4/2017 new york 33 7 Sunny
4 1/1/2017 mumbai 90 5 Sunny
5 1/2/2017 mumbai 85 12 Fog
6 1/3/2017 mumbai 87 15 Fog
7 1/4/2017 mumbai 92 5 Rain
8 1/1/2017 paris 45 20 Sunny
9 1/2/2017 paris 50 13 Cloudy
10 1/3/2017 paris 54 8 Cloudy
11 1/4/2017 paris 42 10 Cloudy
1. What was the maximum temperature in each of these 3 cities?
2. What was the average windspeed in each of these 3 cities?
import pandas as pd
df = pd.read_csv("weather_by_cities.csv")
g = df.groupby("city")
print("city:",city)
print("\n")
print("data:",data)
g.get_group('mumbai')
g.max()
g.mean()
g.min()
# describe g
g.describe()
g.size()
g.count()
%matplotlib inline
g.plot()
Group data using custom function: Let's say you want to group your data using custom function. Here the
requirement is to create three groups
For this you need to write custom grouping function and pass that to groupby
Missing Values
Filters
description: A common operation in data analysis is to filter values based on a condition or
multiple conditions. Pandas provides a variety of ways to filter data points (i.e. rows).
import numpy as np
import pandas as pd
df = pd.DataFrame({
'name':['Jane','John','Ashley','Mike','Emily','Jack','Catlin'],
'ctg':['A','A','C','B','B','C','B'],
'val':np.random.random(7).round(2),
'val2':np.random.randint(1,10, size=7)
})
print(df)
print("\n")
#We can use the logical operators on column values to filter rows.
#ows in which the value in “val” column is greater than 0.5.
print(df[df.val > 0.5], "\n")
#The “&” signs stands for “and” , the “|” stands for “or”.
print(df[(df.val < 0.5) | (df.val2 == 7)])
print("\n")
#isin method is another way of applying multiple condition for filtering.
names = ['John','Catlin','Mike']
print(df[df.name.isin(names)])
print("\n")
#The nlargest and nsmallest functions allow for selecting rows that have the largest or smallest
values in a column,
print(df.nlargest(3, 'val'))
print("\n")
print(df.nsmallest(2, 'val2'))
print("\n")
Description: Both join and merge can be used to combines two dataframes but the
join method combines two dataframes on the basis of their indexes whereas the
merge method is more versatile and allows us to specify columns beside the
index to join on for both dataframes.
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print(left)
print (right)
df= pd.merge(left,right,on='id')
print(df)
# merge to two data sets based on ‘id’ and 'subject_id'
df= pd.merge(left,right,on=['id','subject_id'])
print(df)
print(df)
print(df)
print(df)
print(df)
Description: In order to join dataframe, we use .join() function this function is used for
combining the columns of two potentially differently-indexed DataFrames into a single result
DataFrame.
import pandas as pd
# joining dataframe
res = df.join(df1)
print(res)
# getting union
print(res1)
print(df)
Pivot table is a statistical table that summarizes a substantial table like big datasets. It is part of
data processing. This summary in pivot tables may include mean, median, sum, or other
statistical terms. we can create a pivot table in Python using Pandas using the
dataframe.pivot() method.
Parameters –
index: Column for making new frame’s index.
columns: Column for new frame’s columns.
values: Column(s) for populating new frame’s values.
aggfunc: function, list of functions, dict, default numpy.mean
# importing pandas
import pandas as pd
# creating dataframe
df = pd.DataFrame({'Product' : ['Carrots', 'Broccoli', 'Banana', 'Banana',
'Beans', 'Orange', 'Broccoli',
'Banana'],
'Category' : ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable',
'Fruit'],
'Quantity' : [8, 5, 3, 4, 5, 9, 11, 8],
'Amount' : [270, 239, 617, 384, 626, 610, 62, 90]})
df
OUTPUT:
Get the total sales of each product
# each product
values =['Amount'],
aggfunc ='sum')
print(pivot)
Output:
Get the total sales of each category
values =['Amount'],
aggfunc ='sum')
print(pivot)
Output:
# product
print (pivot)
Output –
print (pivot)
Output –
print (pivot)
Output:
Output:
Aim: Program using Pandas to Vectorized String Operations.
Program: Strings are amongst the most popular types in Python. We can create them simply
by enclosing characters in quotes. Python treats single quotes the same as double
quotes
import pandas as pd
data=[‘ peter’,‘Paul’,‘None’,‘MARY’,‘gUIDO’]
names = pd.Series(data)
names//printing names
0 peter
1 Paul
2 None
3 MARY
4 gUIDO
dtype: object
We can now call a single method that will capitalize all the entries, while skipping over any
missing values:
names.str.capitalize()
0 Peter
1 Paul
2 None
3 Mary
4 Guido
dtype: object
monte.str.lower()
0 graham chapman
1 john cleese
2 terry gilliam
3 eric idle
4 terry jones
5 michael palin
dtype: object
monte.str.len()
0 14
1 11
2 13
3 9
4 11
5 13
dtype: int64
monte.str.startswith('T')
0 False
1 False
2 True
3 False
4 True
5 False
dtype: bool
monte.str.split()
0 [Graham, Chapman]
1 [John, Cleese]
2 [Terry, Gilliam]
3 [Eric, Idle]
4 [Terry, Jones]
5 [Michael, Palin]
dtype: object
Methods using regular expressions
Method Description
match() Call re.match() on each element, returning a boolean.
extract() Call re.match() on each element, returning matched groups as strings.
findall() Call re.findall() on each element
replace() Replace occurrences of pattern with some other string
contains() Call re.search() on each element, returning a boolean
count() Count occurrences of pattern
split() Equivalent to str.split(), but accepts regexps
rsplit() Equivalent to str.rsplit(), but accepts regexps
monte.str.extract('([A-Za-z]+)', expand=False)
0 Graham
1 John
2 Terry
3 Eric
4 Terry
5 Michael
dtype: object
monte.str.findall(r'^[^AEIOU].*[^aeiou]$')
0 [Graham Chapman]
1 []
2 [Terry Gilliam]
3 []
4 [Terry Jones]
5 [Michael Palin]
dtype: object
The get() and slice() operations, in particular, enable vectorized element access from each array.
For example, we can get a slice of the first three characters of each array using str.slice(0, 3).
Note that this behavior is also available through Python's normal indexing syntax–for example,
df.str.slice(0, 3) is equivalent to df.str[0:3]:
monte.str[0:3]
0 Gra
1 Joh
2 Ter
3 Eri
4 Ter
5 Mic
dtype: object
These get() and slice() methods also let you access elements of arrays returned by split(). For
example, to extract the last name of each entry, we can combine split() and get():
monte.str.split().str.get(-1)
0 Chapman
1 Cleese
2 Gilliam
3 Idle
4 Jones
5 Palin
dtype: object
WEEK 8
Program:
data['2015']
Output:
2015-07-04 2
2015-08-04 3
dtype: int64
output:
DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
'2015-07-08'],
dtype='datetime64[ns]', freq=None)
dates.to_period('D')
Output:
dates - dates[0]
Output:
TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=No
ne)
pd.date_range('2015-07-03', '2015-07-10')
Output:
pd.date_range('2015-07-03', periods=8)
Output:
i) Color bars.
ii) Annotation
iii) Matplotlib to Text.
iv) Histograms
v) Scatter Plots
vi) Box plot
# Dataset
# List of total number of items purchased
# from each products
purchaseCount = [100, 200, 150, 23, 30, 50,
156, 32, 67, 89]
# scatterplot
plt.scatter(x=purchaseCount, y=likes, c=ratio, cmap="summer")
Annotation
# Implementation of matplotlib.pyplot.annotate()
# function
geeeks.set_ylim(-2, 2)
Matplotlib to Text.
import matplotlib.pyplot as plt
import numpy as np
plt.plot(x, y, c='g')
plt.show()
Histograms
from matplotlib import pyplot as plt
import numpy as np
# Creating dataset
a = np.array([22, 87, 5, 43, 56,
73, 55, 54, 11,
20, 51, 5, 79, 31,
27])
# Creating histogram
fig, ax = plt.subplots(figsize =(10, 7))
ax.hist(a, bins = [0, 25, 50, 75, 100])
# Show plot
plt.show()
Scatter Plots
import matplotlib.pyplot as plt
# dataset-1
x1 = [89, 43, 36, 36, 95, 10,
66, 34, 38, 20]
# dataset2
x2 = [26, 29, 48, 64, 6, 5,
36, 66, 72, 40]
Box plot
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
np.random.seed(10)
data = np.random.normal(100, 20, 200)
# Creating plot
plt.boxplot(data)
# show plot
plt.show()
10. Write the python program to implement various sub packages of Scipy.
SciPy in Python is an open-source library used for solving mathematical, scientific, engineering, and technical
problems. It allows users to manipulate the data and visualize the data using a wide range of high-level Python
commands. SciPy is built on the Python NumPy extention.
scipy constants
from scipy import constants
print(constants.minute) #60.0
print(constants.hour) #3600.0
print(constants.day) #86400.0
print(constants.week) #604800.0
print(constants.year) #31536000.0
print(constants.Julian_year) #31557600.0
print(constants.inch) #0.0254
print(constants.foot) #0.30479999999999996
print(constants.yard) #0.9143999999999999
print(constants.mile) #1609.3439999999998
print(constants.mil) #2.5399999999999997e-05
print(constants.pt) #0.00035277777777777776
print(constants.point) #0.00035277777777777776
print(constants.survey_foot) #0.3048006096012192
print(constants.survey_mile) #1609.3472186944373
print(constants.nautical_mile) #1852.0
print(constants.fermi) #1e-15
print(constants.angstrom) #1e-10
print(constants.micron) #1e-06
print(constants.au) #149597870691.0
print(constants.astronomical_unit) #149597870691.0
print(constants.light_year) #9460730472580800.0
print(constants.parsec) #3.0856775813057292e+16
Week11: Write a Python program to create a parent class and child class along with their own
methods. Access parent class members in child class to implement the following sceneries.
a) Constructors & destructors
b) Polymorphism
Example:
Create a class ATM and define ATM operations to create account, deposit, check_balance,
withdraw and delete account. Use constructor to initialize members.
Constructors are generally used for instantiating an object. The task of constructors is to
initialize(assign values) to the data members of the class when an object of the class is created. In
Python the __init__() method is called the constructor and is always called when an object is
created.
class Addition:
first = 0
second = 0
answer = 0
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s
def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))
def calculate(self):
self.answer = self.first + self.second
# perform Addition
obj.calculate()
# display result
obj.display()
class Employee:
# Initializing
def __init__(self):
print('Employee created')
# Calling destructor
def __del__(self):
print("Destructor called")
def Create_obj():
print('Making Object...')
obj = Employee()
print('function end...')
return obj
What is Polymorphism: The word polymorphism means having many forms. In programming,
polymorphism means the same function name (but different signatures) being used for different
types.
class Bird:
def intro(self):
print("There are many types of birds.")
def flight(self):
print("Most of the birds can fly but some cannot.")
class sparrow(Bird):
def flight(self):
print("Sparrows can fly.")
class ostrich(Bird):
def flight(self):
print("Ostriches cannot fly.")
obj_bird = Bird()
obj_spr = sparrow()
obj_ost = ostrich()
obj_bird.intro()
obj_bird.flight()
obj_spr.intro()
obj_spr.flight()
obj_ost.intro()
obj_ost.flight()
def deposit(self):
amount=float(input("Enter amount to be Deposited: "))
self.balance += amount
print("\n Amount Deposited:",amount)
def withdraw(self):
amount = float(input("Enter amount to be Withdrawn: "))
if self.balance>=amount:
self.balance-=amount
print("\n You Withdrew:", amount)
else:
print("\n Insufficient balance ")
def display(self):
print("\n Net Available Balance=",self.balance)
# Driver code
# creating an object of class
s = Bank_Account()
Week-12: Implement the various data cleaning steps of example data sets using python nympy
and pandas
Data cleaning is the process of correcting or removing corrupt, incorrect, or unnecessary data
from a data set before data analysis. Here are the basic data cleaning tasks :
1. Importing Libraries
2. Input Customer Feedback Dataset
3. Locate Missing Data
4. Check for Duplicates
5. Detect Outliers
6. Normalize Casing
data
data
# drop duplicates
data.drop_duplicates()
#Detect Outliers
data['Rating'].describe()
data.loc[10,'Rating'] = 1
data
#Normalize Casing
data['Review Title'] = data['Review Title'].str.lower()
data
#Normalize Casing
data['Customer Name'] = data['Customer Name'].str.title()
data