Data Analysis and Visualization
Data Analysis and Visualization
Certificate
This is to certify that Mr./Ms.
Enrollment No. _ of B.E. Semester _
Information Technology of this Institute (GTU Code: 028 ) has satisfactorily
completed the Practical / Tutorial work for the subject Data Analysis and
Visualization (3161613) for the academic year 2023-24.
Place:
Date:
Preface
Main motto of any laboratory/practical/field work is for enhancing required skills as well as
creating ability amongst students to solve real time problem by developing relevant
competencies in psychomotor domain. By keeping in view, GTU has designed competency
focused outcome-based curriculum for engineering degree programs where sufficient weightage
is given to practical work. It shows importance of enhancement of skills amongst the students
and it pays attention to utilize every second of time allotted for practical amongst students,
instructors and faculty members to achieve relevant outcomes by performing the experiments
rather than having merely study type experiments. It is must for effective implementation of
competency focused outcome-based curriculum that every practical is keenly designed to serve
as a tool to develop and enhance relevant competency required by the various industry among
every student. These psychomotor skills are very difficult to develop through traditional chalk
and board content delivery method in the classroom. Accordingly, this lab manual is designed
to focus on the industry defined relevant outcomes, rather than old practice of conducting
practical to prove concept and theory.
By using this lab manual students can go through the relevant theory and procedure in advance
before the actual performance which creates an interest and students can have basic idea prior to
performance. This in turn enhances pre-determined outcomes amongst students. Each
experiment in this manual begins with competency, industry relevant skills, course outcomes as
well as practical outcomes (objectives). The students will also achieve safety and necessary
precautions to be taken while performing practical.
This manual also provides guidelines to faculty members to facilitate student centric lab
activities through each experiment by arranging and managing necessary resources in order that
the students follow the procedures with required safety and necessary precautions to achieve the
outcomes. It also gives an idea that how students will be assessed by providing rubrics.
Data Analytics involves data discovery that helps in making smart decisions, creating
suggestions for options based on previous choices. Data visualization sees the pattern in data
and also sees the pattern when data is not part of pattern.
Utmost care has been taken while preparing this lab manual however always there is chances of
improvement. Therefore, we welcome constructive suggestions for improvement and removal
of errors if any.
Parekh Janvi (210130116029) DAV
4. KNN CLASSIFICATION √
The following industry relevant competencies are expected to be developed in the student by
undertaking the practical work of this laboratory.
1. Data Visualization and Representation
2. Summarization of Data for Interpretations
Index
(Progressive Assessment Sheet)
4 KNN CLASSIFICATION
OUTLIER DETECTION AND REMOVAL
5 FOR NORMALLY DISTRIBUTED
FEATURES
OUTLIER DETECTION AND REMOVAL
6
FOR SKEWED DISTRIBUTED FEATURES
SHOWING THE DATA AS A BAR CHART
7
USING D3.JS
SHOWING THE DATA AS A LINE CHART
8
USING D3.JS
SHOWING THE DATA AS A PIE CHART
9
USING D3.JS
10 TABLEAU MAPS FROM SPATIAL FILES
Total
Parekh Janvi (210130116029) DAV
Experiment No: 1
DATA STATISTICAL OPERATIONS
Date:
Competency and Practical Skills:
Relevant CO: CO1
Objectives: Perform various data statistical operations on sample dataset.
Equipment/Instruments: Basic Statistics, Numpy, Statistic library , PANDAS Library
Theory:
Descriptive statistics is about describing and summarizing data. It uses two main approaches:
1. The quantitative approach describes and summarizes data numerically.
2. The visual approach illustrates data with charts, plots, histograms, and other graphs.
Learnings:
A. What numerical quantities you can use to describe and summarize your datasets
B. How to calculate descriptive statistics in pure Python
C. How to get descriptive statistics with available Python libraries
D. How to visualize your datasets
E. Central tendency tells you about the centers of the data. Useful measures include the mean,
median, and mode.
F. Variability tells you about the spread of the data. Useful measures include variance and standard
deviation.
G. Correlation or joint variability tells you about the relation between a pair of variables in a
dataset. Useful measures include covariance and the correlation coefficient.
https://realpython.com/python-statistics/
Set up diagram:
Procedure:
Observations:
Calculation:
Result:
Conclusion:
Quiz:
Suggested Reference:
References used by the students:
Rubric wise marks obtained:
Rubrics 1 2 3 4 5 Total
Marks
Parekh Janvi (210130116029) DAV
Experiment No: 2
AIM: Perform various data pre-processing related operationsover dataset using
machine learning libraries.
Date:
Competency and Practical Skills:
Relevant CO: CO1
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Parekh Janvi (210130116029) DAV
Experiment No: 3
Aim: Demonstrate K-MEANS data clustering methods
Date:
Competency and Practical Skills:
Relevant CO: CO2
Objectives: Demonstrates K-means data clustering technique using sample dataset.
1. Get introduced to K-Means Clustering.
2. Understand the properties of clusters and the various evaluation metrics for
clustering.
3. Get acquainted with some of the many real-world applications of K-Means
Clustering.
4. Implement K-Means Clustering in Python on a real-world dataset.
Conclusion: K-means clustering effectively segmented the dataset into distinct groups,
revealing underlying patterns and structures. Evaluation metrics confirmed the quality of
clustering, indicating meaningful separation between clusters. These clusters provide
valuable insights for targeted strategies or further analysis, such as in marketing where they
can represent different customer segments with unique preferences. Overall, k-means
clustering yielded actionable insights, guiding decision-making and strategy development
based on the inherent characteristics of the data.
Parekh Janvi (210130116029) DAV
Experiment No.: 4
Aim: Demonstrate the implementation of KNN classification
Date:
Experiment No: 5
Aim: Demonstrate outlier detection and removal for normallydistributed
features.
Date:
Competency and Practical Skills:
Relevant CO: CO2
Objectives: Demonstrate outlier detection and removal for normally distributed
features.
1. An Overview of outliers and why it’s important for a data scientist to
identify and remove them from data.
2. Understand different techniques for outlier treatment: trimming, capping,
treating as a missing value, and discretization.
3. Understanding different plots and libraries for visualizing and treating outliers in
a dataset.
Conclusion: Utilizing outlier detection methods such as the Z-score and 3 standard
deviation approach on a normally distributed dataset facilitated the identification and
treatment of anomalous data points. By quantifying the deviation of each data point from
the mean in terms of standard deviations, these methods effectively pinpointed outliers
beyond a certain threshold. This process allowed for the detection of data points that
significantly deviate from the expected distribution, potentially indicating errors or
anomalies in the dataset. Subsequently, appropriate actions could be taken, such as data
cleansing or further investigation into the nature of the outliers. Overall, the application of
these outlier detection techniques contributed to the robustness and reliability of the data
analysis process, ensuring the integrity of the insights derived from the dataset.
Parekh Janvi (210130116029) DAV
Experiment No: 6
AIM: Demonstrate outlier detection and removal for skeweddistributed
features.
Date:
Competency and Practical Skills:
Relevant CO: CO2
Objectives: Demonstrate outlier detection and removal for skewed distributed
features.
1. An Overview of outliers and why it’s important for a data scientist to identify
and remove them from data.
2. Understand different techniques for outlier treatment: trimming, capping,
treating as a missing value, and discretization.
3. Understanding different plots and libraries for visualizing and treating outliers in a
dataset.
Experiment No: 7
AIM: Develop a program showing the data as a bar chart usingd3.js.
Date:
Relevant CO: CO3
Objectives: Develop a program showing the data as a bar chart using d3.js.
Source code:
<!DOCTYPE html>
<html>
<head>
<title>Bar Chart Example</title>
<script src="https://d3js.org/d3.v7.min.js"></script>
<style>
/* Add your CSS styles here, if needed */
.bar {
fill: steelblue;
}
</style>
</head>
<body>
<h1>Bar Chart Example</h1>
<div id="bar-chart"></div>
<script>
// Sample data for the bar chart
var data = [
{ category: "A", value: 10 },
{ category: "B", value: 20 },
{ category: "C", value: 15 },
{ category: "D", value: 30 },
{ category: "E", value: 25 } ];
.nice()
.range([height, 0]);
// Create the bars
svg.selectAll(".bar")
.data(data)
.enter().append("rect")
.attr("class", "bar")
.attr("x", function(d) { return xScale(d.category); })
.attr("y", function(d) { return yScale(d.value); })
.attr("width", xScale.bandwidth())
.attr("height", function(d) { return height - yScale(d.value); });
// Create x-axis
svg.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(xScale));
// Create y-axis
svg.append("g")
.call(d3.axisLeft(yScale));
</script>
</body>
</html
Parekh Janvi (210130116029) DAV
Experiment No: 8
AIM: Develop a program showing the data as a Line chart usingd3.js.
Date:
Competency and Practical Skills:
Relevant CO: CO3
Objectives: Showing the data as a Line chart using d3.js.
// Create x-axis
svg.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(xScale));
// Create y-axis
svg.append("g")
.call(d3.axisLeft(yScale));
</script>
</body>
</html
Parekh Janvi (210130116029) DAV
Experiment No: 9
Aim: Develop a program showing the data as a pie chart usingd3.js.
Date:
<!DOCTYPE html>
<html>
<head>
<title>Pie Chart Example</title>
<script src="https://d3js.org/d3.v7.min.js"></script>
</head>
<body>
<h1>Pie Chart Example</h1>
<div id="pie-chart"></div>
<script>
// Sample data for the pie chart
var data = [
{category: "AUS", value: 10 },
{category: "ENG", value: 20 },
{category: "NED", value: 15 },
{category: "IND", value: 30 },
{category: "NZL", value: 25 }
];
Experiment No: 10
AIM: Create Tableau Maps from Spatial Files
Date:
Competency and Practical Skills:
Relevant CO: CO4
We need to install Tableau Desktop to generate spatial maps. You can get a free
trial version or a paid licence from the Tableau website.
harnessing the features of Tableau and integrating spatial files, users can
uncover patterns, trends, and relationships within their data, facilitating data-
driven decision-making and enhancing the understanding of geospatial
information.