DSBDA Lab Plan

The document outlines the lab plan for the Data Science and Big Data Analytics Laboratory for the academic year 2024-25 at Alard College of Engineering and Management. It includes a series of experiments focused on data wrangling, analytics, visualization, and machine learning using various datasets and Python programming. Each experiment details specific tasks, expected outcomes, and the use of different statistical and analytical techniques.

Uploaded by

kejawac705

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

DSBDA Lab Plan

Uploaded by

kejawac705

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Alard Charitable Trust

ALARD COLLEGE OF ENGINEERING AND MANAGEMENT

We are the Way Finder
S.No. 50, Marunje, Rajiv Gandhi Infotech Park, Pune-411 057
Ph.No: 020-66523701 Website: www.alardinstitues.com

Department of Computer Engineering

Academic Year:2024-25 Term-II
LAB PLAN

Class: TE Subject Name: 310256: Data Science and Big Data Analytics Laboratory
Faculty Name :Prof. Manali S. Patil

Sr. No. Experiments Planne Actual Date Remark

d Date
1 Data Wrangling- I
Perform the following operations using Python on any open source dataset (e.g.,
data.csv)
1. Import all the required Python Libraries.
2. Locate an open source data from the web (e.g., https://www.kaggle.com).
Provide a clear description of the data and its source (i.e., URL of the web site).
3. Load the Dataset into pandas dataframe.
4. Data Preprocessing: check for missing values in the data using pandas isnull(),
describe() function to get some initial statistics. Provide variable descriptions.
Types of variables etc. Check the dimensions of the data frame.
5. Data Formatting and Data Normalization: Summarize the types of variables by
checking the data types (i.e., character, numeric, integer, factor, and logical) of the
variables in the data set. If variables are not in the correct data type, apply proper
type conversions.
6. Turn categorical variables into quantitative variables in Python.

In addition to the codes and outputs, explain every operation that you do in the
above steps and explain everything that you do to import/read/scrape the data set.
2 Data Wrangling II
Create an “Academic performance” dataset of students and perform the following
operations using Python.
1. Scan all variables for missing values and inconsistencies. If there are missing
values and/or inconsistencies, use any of the suitable techniques to
2. deal with them. 2. Scan all numeric variables for outliers. If there are outliers,
use any of the suitable techniques to deal with them.
3. Apply data transformations on at least one of the variables. The purpose of this
transformation should be one of the following reasons: to change the scale for
better understanding of the variable, to convert a non-linear relation into a linear
one, or to decrease the skewness and convert the distribution into a normal
distribution.

Reason and document your approach properly.

Descriptive Statistics –
Measures of Central Tendency and variability Perform the following operations on
any open source dataset (e.g., data.csv)
1. Provide summary statistics (mean, median, minimum, maximum, standard
deviation) for a dataset (age, income etc.) with numeric variables grouped by one of
the qualitative (categorical) variable. For example, if your categorical variable is age
groups and quantitative variable is income, then provide summary statistics of
income grouped by the age groups. Create a list that contains a numeric value for
each response to the categorical variable.
2. Write a Python program to display some basic statistical details like percentile,
mean, standard deviation etc. of the species of ‘Iris-setosa’, ‘Iris-versicolor’ and
‘Iris-versicolor’ of iris.csv dataset.
Provide the codes with outputs and explain everything that you do in this step. l
4 Data Analytics I
Create a Linear Regression Model using Python/R to predict home prices using
Boston Housing Dataset (https://www.kaggle.com/c/boston-housing).
The Boston Housing dataset contains information about various houses in Boston
through different parameters. There are 506 samples and 14 feature variables in this
dataset. The objective is to predict the value of prices of the house using the given
features.
5 Data Analytics II
1. Implement logistic regression using Python/R to perform classification on
Social_Network_Ads.csv dataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate,
Precision, Recall on the given dataset.
6 Data Analytics III
1. Implement Simple Naïve Bayes classification algorithm using Python/R on
iris.csv dataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate,
Precision, Recall on the given dataset.
7 Text Analytics
1. Extract Sample document and apply following document preprocessing methods:
Tokenization, POS Tagging, stop words removal, Stemming and Lemmatization.
2. Create representation of document by calculating Term Frequency and Inverse
Document Frequency.
8 Data Visualization I
1. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and contains
information about the passengers who boarded the unfortunate Titanic ship. Use the
Seaborn library to see if we can find any patterns in the data.
2. Write a code to check how the price of the ticket (column name: 'fare') for each
passenger is distributed by plotting a histogram.
9 Data Visualization II
1. Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box plot for
distribution of age with respect to each gender along with the information about
whether they survived or not. (Column names : 'sex' and 'age')
2. Write observations on the inference from the above statistics.
10 Data Visualization III
Download the Iris flower dataset or any other dataset into a DataFrame. (e.g.,
https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give the inference
as:
1. List down the features and their types (e.g., numeric, nominal) available in the
dataset.
2. Create a histogram for each feature in the dataset to illustrate the feature
distributions.
3. Create a boxplot for each feature in the dataset.
4. Compare distributions and identify outliers.
11 Write a code in JAVA for a simple WordCount application that counts the number of
occurrences of each word in a given input set using the Hadoop MapReduce
framework on local-standalone set-up.
12 Design a distributed application using MapReduce which processes a log file of a
system.
13 Locate dataset (e.g., sample_weather.txt) for working on weather data which reads
the text input files and finds average for temperature, dew point and wind speed
14 Develop a movie recommendation model using the scikit-learn library in python.
Refer dataset
https://github.com/rashida048/SomeNLPProjects/blob/master/movie_dataset.csv
15 Use the following covid_vaccine_statewise.csv dataset and perform following
analytics on the given dataset
a. Describe the dataset
b. Number of persons state wise vaccinated for first dose in India
c. Number of persons state wise vaccinated for second dose in India
d. Number of Males vaccinated
e. Number of females vaccinated
Subject Incharge H.O.D.

DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
PR List Dsbda
No ratings yet
PR List Dsbda
2 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
Datascience
No ratings yet
Datascience
8 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
DSBDA Sample Problem Statements
No ratings yet
DSBDA Sample Problem Statements
3 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
DATASCIENCE
No ratings yet
DATASCIENCE
3 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
1
No ratings yet
1
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Rdatascience - Problem Statements
No ratings yet
Rdatascience - Problem Statements
2 pages
ML Lab Question Set - 1
No ratings yet
ML Lab Question Set - 1
5 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
Data Science
No ratings yet
Data Science
18 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
Questions
No ratings yet
Questions
7 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
Machine Learning Labnem
No ratings yet
Machine Learning Labnem
5 pages
ML Lab Question Set - 2
No ratings yet
ML Lab Question Set - 2
5 pages
List of Experiments - CL-I
No ratings yet
List of Experiments - CL-I
3 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
Ai Class 12 Practical
No ratings yet
Ai Class 12 Practical
21 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Diploma in Information Technology: Centralized Question Bank
No ratings yet
Diploma in Information Technology: Centralized Question Bank
4 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
Machinelearninglabmanual
No ratings yet
Machinelearninglabmanual
47 pages
Essential Python
No ratings yet
Essential Python
16 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
Int375 Etp Paper
No ratings yet
Int375 Etp Paper
11 pages
109 Sourabh Vivek Chougule
No ratings yet
109 Sourabh Vivek Chougule
75 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Class 10th WTP 06 Retest Maths 25-05-2025 S. 2025-26
No ratings yet
Class 10th WTP 06 Retest Maths 25-05-2025 S. 2025-26
1 page
Random Variables and Probability Distributions
100% (1)
Random Variables and Probability Distributions
15 pages
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
No ratings yet
Handwriting Enhancement Recognition-Based and Recognition-Independent Approaches For On-Device Online Handwritten Text Alignment
15 pages
Lyceum of Alabang Basic Education
No ratings yet
Lyceum of Alabang Basic Education
41 pages
Full Guide To The Guide Mesh
No ratings yet
Full Guide To The Guide Mesh
3 pages
Homological Algebra
0% (1)
Homological Algebra
279 pages
Acceleration
No ratings yet
Acceleration
4 pages
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
No ratings yet
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
8 pages
Handout - Measuring Risk and Return
No ratings yet
Handout - Measuring Risk and Return
79 pages
Programming Logic and Design: Seventh Edition
No ratings yet
Programming Logic and Design: Seventh Edition
32 pages
The Balanced Scorecard: Superfactory Excellence Program™
No ratings yet
The Balanced Scorecard: Superfactory Excellence Program™
65 pages
M2 - T4 - Cell Number Formats
No ratings yet
M2 - T4 - Cell Number Formats
2 pages
Time Complexity: Dr. Zahid Halim
No ratings yet
Time Complexity: Dr. Zahid Halim
32 pages
Complex Number and Quadratic Equation BITSAT Previous Year Chapter
No ratings yet
Complex Number and Quadratic Equation BITSAT Previous Year Chapter
6 pages
Social Capital and Fear of Crime: A Test of Organizational Participation Effect in Nigeria
No ratings yet
Social Capital and Fear of Crime: A Test of Organizational Participation Effect in Nigeria
11 pages
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
No ratings yet
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
20 pages
GOVT 702: Advanced Political Analysis Georgetown University
No ratings yet
GOVT 702: Advanced Political Analysis Georgetown University
5 pages
Flog Ug
No ratings yet
Flog Ug
924 pages
Flow Meter Manual
No ratings yet
Flow Meter Manual
70 pages
MATH 472: Numerical Methods With Financial Applications: Course Basics Fundamentals
No ratings yet
MATH 472: Numerical Methods With Financial Applications: Course Basics Fundamentals
38 pages
Chapter 13 Capital Budgeting Estimating Cash Flow and Analyzing Risk Answers To End of Chapter Questions 13 3 Since The Cost of Capital Includes A Premium For Expected Inflation Failure 1
100% (1)
Chapter 13 Capital Budgeting Estimating Cash Flow and Analyzing Risk Answers To End of Chapter Questions 13 3 Since The Cost of Capital Includes A Premium For Expected Inflation Failure 1
8 pages
Q1) - What Are The Types Classes For Classful IP Addressing Are There in The Internet ? Ans)
No ratings yet
Q1) - What Are The Types Classes For Classful IP Addressing Are There in The Internet ? Ans)
16 pages
Frame Design Using Web-Tapered Members: Problem
No ratings yet
Frame Design Using Web-Tapered Members: Problem
27 pages
Grip Worksheets 35 and 39 Grade 7
No ratings yet
Grip Worksheets 35 and 39 Grade 7
2 pages
CAPE Communication Studies Past Papers 2005 2016 1
No ratings yet
CAPE Communication Studies Past Papers 2005 2016 1
7 pages
AP Calc AB 2003 PDF
No ratings yet
AP Calc AB 2003 PDF
34 pages
Fiitjee Rit 2
No ratings yet
Fiitjee Rit 2
11 pages
Experiment 2.4 DL
No ratings yet
Experiment 2.4 DL
4 pages
Unit 3. Introduction To Programming in C
No ratings yet
Unit 3. Introduction To Programming in C
76 pages

DSBDA Lab Plan

Uploaded by

DSBDA Lab Plan

Uploaded by

Alard Charitable Trust

ALARD COLLEGE OF ENGINEERING AND MANAGEMENT

Department of Computer Engineering

Sr. No. Experiments Planne Actual Date Remark

Reason and document your approach properly.

You might also like