0% found this document useful (0 votes)
40 views28 pages

GE Practical Sem 2

The document outlines a series of programming tasks in Python using NumPy and Pandas libraries. It includes operations such as computing statistical measures, manipulating arrays and data frames, merging data from Excel files, and performing data visualizations with the Iris dataset. Additionally, it involves analyzing family income data and the Titanic dataset for various statistical insights.

Uploaded by

diveejac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views28 pages

GE Practical Sem 2

The document outlines a series of programming tasks in Python using NumPy and Pandas libraries. It includes operations such as computing statistical measures, manipulating arrays and data frames, merging data from Excel files, and performing data visualizations with the Iris dataset. Additionally, it involves analyzing family income data and the Titanic dataset for various statistical insights.

Uploaded by

diveejac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1.

Write programs in Python using NumPy library to do the following:

a. Compute the mean, standard deviation, and variance of a two dimensional random integer array
along the second axis.
b. Create a 2-dimensional array of size m x n integer elements, also print the shape, type and data
type of the array and then reshape it into an n x m array, where n and m are user inputs given at the
run time.
c. Test whether the elements of a given 1D array are zero, non-zero and NaN. Record the indices of
these elements in seperate arrays.
d. Create three random arrays of the same size: Array1, Array2 and Array3. Subtract Array 2 from
Array3 and store in Array4, Create another array Array5 having two times the values in Array1. Find
Co- variance and Correlation of Array1 with Array4 and Array5 respectively.
e. Create two random arrays of the same size 10: Array1, and Array2. Find the sum of the first half of
both the arrays and product of the second half of both the arrays.
2. Do the following using PANDAS Series:

a. Create a series with 5 elements. Display the series sorted on index and also sorted on values
separately.
b. Create a series with N elements with some duplicate values. Find the minimum and maximum
ranks assigned to the values using 'first' and 'max' methods.
c. Display the index value of the minimum and maximum elements of a Series.
3. Create a data frame having at least 3 columns and 50 rows to store numeric data generated using a
random function. Replace 10% of the values by null values whose index positions are generated using
random function.
a. Identify and count missing values in a data frame.

b. Drop the column having more than 5 null values.


c. Identify the row label having maximum of the sum of all values in a row and drop that row.
d. Sort the data frame on the basis of the first column.
e. Remove all duplicates from the first column.
f. Find the correlation between first and second column and covariance between second and third
column.

g. Discretize the second column and create 5 bins.


4. Consider two excel files having attendance of two workshops. Each file has three fields 'Name',
'Date, duration (in minutes) where names are unique within a file. Note that duration may take one
of three values (30, 40, 50) only Import the data into two data frames and do the following:

a. Perform merging of the two data frames to find the names of student who had attended both
workshops.

b. Find names of all students who have attended a single workshop only.
c. Merge two data frames row-wise and find the total number of records in the data frame.

d. Merge two data frames row-wise and use two columns viz. names and dates as multi-row indexes.
Generate descriptive statistics for this hierarchical data frame.
5. Using iris data, plot the following with proper legend and axis labels: (Download IRIS data from:
https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn datasets)
a. Plot bar chart to show the frequency of each class label in the data.
b. Draw a scatter plot for Petal width vs sepal width and fit a regression line.
c. Plot density distribution for feature petal length.
d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
e. Draw heatmap for the four numeric attributes
f. Compute mean, mode, median, standard deviation, confidence interval and standard error for each
feature.
g. Compute correlation coefficients between each pair of features and plot heatmap.
6. Consider the following data frame containing a family name, gender of the family member and
her/his monthly income in each record.

Name Gender MonthlyIncome (Rs.)

Shah Male 114000.00

Vats Male 65000.00

Vats Female 43150.00

Kumar Female 69500.00

Vats Female 155000.00

Kumar Male 103000.00

Shah Male 55000.00

Shah Female 112400.00

Kumar Female 81030.00

Vats Male 71900.00

Write a program in Python using Pandas to perform the following:


a. Calculate and display family wise gross monthly income.

b. Calculate and display the member with the highest monthly income.

c. Calculate and display monthly income of all members with income greater than Rs 60000.00.

d. Calculate and display the average monthly income of the female members.
7. Using Titanic dataset, to do the following:

a. Find total number of passengers with age less than 30.

b. Find total fare paid by passengers of first class.

c. Compare number of survivors of each passenger class.

d. Compute descriptive statistics for any numeric attribute gender wise.

You might also like