The document outlines a series of practical exercises focused on data science using Python and libraries like Pandas and NumPy. It includes tasks such as creating dataframes, performing statistical analysis, handling missing values, and generating various types of plots. Additionally, it covers operations on datasets, including importing, cleaning, and visualizing data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
5 views3 pages
DATASCIENCE
The document outlines a series of practical exercises focused on data science using Python and libraries like Pandas and NumPy. It includes tasks such as creating dataframes, performing statistical analysis, handling missing values, and generating various types of plots. Additionally, it covers operations on datasets, including importing, cleaning, and visualizing data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3
Subject:Foundation Of DataScience
Following questions for the practical
1. Write a Python program to create a dataframe containing columns name, age and percentage. Add 10 rows to the dataframe. View the dataframe. 2. Write a Python program to print the shape, number of rows-columns, data types, feature names and the description of the data 3. Write a Python program to view basic statistical details of the data. 4. Write a Python program to Add 5 rows with duplicate values and missing values. Add a column ‘remarks’ with empty values. Display the data 5. Write a Python program to get the number of observations, missing values and duplicate values. 6. Write a Python program to drop ‘remarks’ column from the dataframe. Also drop all null and empty values. Print the modified data. 7. Write a Python program to generate a line plot of name vs percentage 8. Write a Python program to generate a scatter plot of name vs percentage. 9. Write a Python program to find the maximum and minimum value of a given flattened array. Expected Output: Original flattened array: [[0 1] [2 3]] Maximum value of the above flattened array: 3 Minimum value of the above flattened array: 0 10.Write a python program to compute Euclidian Distance between two data points in a dataset. [Hint: Use linalgo.norm function from NumPy] 11.Create one dataframe of data values. Find out mean, range and IQR for this data. 12.Write a python program to compute sum of Manhattan distance between all pairs of points. 13.Write a NumPy program to compute the histogram of nums against the bins. Sample Output: nums: [0.5 0.7 1. 1.2 1.3 2.1] bins: [0 1 2 3] Result: (array([2, 3, 1], dtype=int64), array([0, 1, 2, 3])) 14.Create a dataframe for students’ information such name, graduation percentage and age. Display average age of students, average of graduation percentage. And, also describe all basic statistics of data. (Hint: use describe()). 15.Import Dataset and do the followings: a) Describing the dataset b) Shape of the dataset c) Display first 3 rows from dataset. 16.Handling Missing Value: a) Replace missing value of salary,age column with mean of that column. 17.Data.csv have two categorical column (the country column, and the purchased column). a. Apply OneHot coding on Country column. b. Apply Label encoding on purchased column. 18.Generate a random array of 50 integers and display them using a line chart, scatter plot, histogram and box plot. Apply appropriate color, labels and styling options. 19.Add two outliers to the above data and display the box plot. 20.Create two lists, one representing subject names and the other representing marks obtained in those subjects. Display the data in a pie chart and bar chart. 21.Write a Python program to create a Bar plot to get the frequency of the three species of the Iris data. 22.Write a Python program to create a Pie plot to get the frequency of the three species of the Iris data. 23.Write a Python program to create a histogram of the three species of the Iris data. 24.Write a Python program to create a graph to find relationship between the petal length and petal width. 25.Download any dataset from UCI (do not repeat it from set B). Read this csv file using read_csv() function. Describe the dataset using appropriate function. Display mean value of numeric attribute. Check any data values are missing or not. 26. Download nursery dataset from UCI. Split dataset on any one categorical attribute. Compare the means of each split. (Use groupby) 27. Create one dataframe with 5 subjects and marks of 10 students for each subject. Find arithmetic mean, geometric mean, and harmonic mean. 28.Download the heights and weights dataset and load the dataset from a given csv file into a dataframe. Print the first, last 10 rows and random 20 rows. (https://www.kaggle.com/burnoutminer/heightsand-weights-dataset) 29.Write a Python program to find the shape, size, datatypes of the dataframe object. 30.Write a Python program to view basic statistical details of the data. 31.Write a Python program to get the number of observations, missing values and nan values. 32.Write a Python program to add a column to the dataframe “BMI” which is calculated as : weight/height2