0% found this document useful (0 votes)
5 views3 pages

DATASCIENCE

The document outlines a series of practical exercises focused on data science using Python and libraries like Pandas and NumPy. It includes tasks such as creating dataframes, performing statistical analysis, handling missing values, and generating various types of plots. Additionally, it covers operations on datasets, including importing, cleaning, and visualizing data.

Uploaded by

jesaboc231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

DATASCIENCE

The document outlines a series of practical exercises focused on data science using Python and libraries like Pandas and NumPy. It includes tasks such as creating dataframes, performing statistical analysis, handling missing values, and generating various types of plots. Additionally, it covers operations on datasets, including importing, cleaning, and visualizing data.

Uploaded by

jesaboc231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Subject:Foundation Of DataScience

Following questions for the practical


1. Write a Python program to create a dataframe containing columns name, age
and percentage. Add 10 rows to the dataframe. View the dataframe.
2. Write a Python program to print the shape, number of rows-columns, data
types, feature names and the description of the data
3. Write a Python program to view basic statistical details of the data.
4. Write a Python program to Add 5 rows with duplicate values and missing
values. Add a column ‘remarks’ with empty values. Display the data
5. Write a Python program to get the number of observations, missing values
and duplicate values.
6. Write a Python program to drop ‘remarks’ column from the dataframe. Also
drop all null and empty values. Print the modified data.
7. Write a Python program to generate a line plot of name vs percentage
8. Write a Python program to generate a scatter plot of name vs percentage.
9. Write a Python program to find the maximum and minimum value of a
given flattened array.
Expected Output:
Original flattened array: [[0 1] [2 3]]
Maximum value of the above flattened array: 3
Minimum value of the above flattened array: 0
10.Write a python program to compute Euclidian Distance between two data
points in a dataset. [Hint: Use linalgo.norm function from NumPy]
11.Create one dataframe of data values. Find out mean, range and IQR for this
data.
12.Write a python program to compute sum of Manhattan distance between all
pairs of points.
13.Write a NumPy program to compute the histogram of nums against the bins.
Sample Output:
nums: [0.5 0.7 1. 1.2 1.3 2.1]
bins: [0 1 2 3]
Result: (array([2, 3, 1], dtype=int64), array([0, 1, 2, 3]))
14.Create a dataframe for students’ information such name, graduation
percentage and age. Display average age of students, average of graduation
percentage. And, also describe all basic statistics of data. (Hint: use
describe()).
15.Import Dataset and do the followings: a) Describing the dataset b) Shape of
the dataset c) Display first 3 rows from dataset.
16.Handling Missing Value: a) Replace missing value of salary,age column
with mean of that column.
17.Data.csv have two categorical column (the country column, and the
purchased column). a. Apply OneHot coding on Country column. b. Apply
Label encoding on purchased column.
18.Generate a random array of 50 integers and display them using a line chart,
scatter plot, histogram and box plot. Apply appropriate color, labels and
styling options.
19.Add two outliers to the above data and display the box plot.
20.Create two lists, one representing subject names and the other representing
marks obtained in those subjects. Display the data in a pie chart and bar
chart.
21.Write a Python program to create a Bar plot to get the frequency of the three
species of the Iris data.
22.Write a Python program to create a Pie plot to get the frequency of the three
species of the Iris data.
23.Write a Python program to create a histogram of the three species of the Iris
data.
24.Write a Python program to create a graph to find relationship between the
petal length and petal width.
25.Download any dataset from UCI (do not repeat it from set B). Read this csv
file using read_csv() function. Describe the dataset using appropriate
function. Display mean value of numeric attribute. Check any data values
are missing or not.
26. Download nursery dataset from UCI. Split dataset on any one categorical
attribute. Compare the means of each split. (Use groupby)
27. Create one dataframe with 5 subjects and marks of 10 students for each
subject. Find arithmetic mean, geometric mean, and harmonic mean.
28.Download the heights and weights dataset and load the dataset from a given
csv file into a dataframe. Print the first, last 10 rows and random 20 rows.
(https://www.kaggle.com/burnoutminer/heightsand-weights-dataset)
29.Write a Python program to find the shape, size, datatypes of the dataframe
object.
30.Write a Python program to view basic statistical details of the data.
31.Write a Python program to get the number of observations, missing values
and nan values.
32.Write a Python program to add a column to the dataframe “BMI” which is
calculated as : weight/height2

You might also like