Case Study DSBDA
Case Study DSBDA
Case Study DSBDA
Mini Project
Submitted by
1. Phase 1: Discovery
The initial problem statement was to build a chatbot which will display the
students results and test analysis. A Chatbot can assist in increasing student
engagement.
Initial Hypothesis -
1. Examining the data and coming up with ideas on how to develop our
chatbot model by identifying the goals and motives of our project.
2. Discovering Plotly’s various features for developing the dashboard of
our project.
3. Searching for the data required by the user and displaying it on the
dashboard as per the requirements.
2. Phase 2: Data Preparation
In order to better understand the nature of the data, data analysts utilize this
data visualization and statistical tools to convey dataset characterizations,
such as size, amount, and accuracy.
The potential data sources for this project will be full csv files with the results
of the students which include the subjects, seat number, name of the student,
internal, theory marks and grades. This data source is further cleaned and
made visualization ready for further analysis.
Data preparation involves data cleaning. Data scientists spend a large amount
of their time cleaning datasets and getting them down to a form with which
they can work. In fact, a lot of data scientists argue that the initial steps of
obtaining and cleaning data constitute 80% of the job.
Therefore, if you are just stepping into this field or planning to step into
this field, it is important to be able to deal with messy data, whether
that means missing values, inconsistent formatting, malformed records,
or nonsensical outliers.
In our dataset the raw data was very noisy with multiple data types,
missing columns and complex file structure. Our data was structured as well
as unstructured. Structured data was the Student Result data and the chats
used to train the chatbot was unstructured data. In order to clean the data we
used pandas and excel filtering. We segregated the data according to different
criteria like Semester Marks, Summary, Subjects, All Student Data. This was
helpful for efficiently querying the data whenever needed.
The technologies that we used for building our model which consists of
integrating a chatbot with a dashboard for displaying the exam results of
students according to their Seat numbers are as follows:
1. Chatbot- We have used the flask app for building our chatbot. The
various libraries used are- Natural language Toolkit (NLTK), Pytorch
and Deep Neural networks.
2. Dashboard- We have used a dash app for building our dashboard. It
consists of technologies such as plotly for visualization and dash.
Common tools that we are going to use in our project are- Python
programming language, Flask, Dash and Plotly.
Model building is the phase which makes use of the technologies mentioned
in the Model Planning phase and follows the workflow accordingly to build
the chatbot model.
Although the modeling techniques and logic required to develop models can
be highly complex, the actual duration of this phase can be short compared to
the time spent preparing the data and defining the approaches. Creating
robust models that are suitable to a specific situation requires thoughtful
consideration to ensure the models being developed ultimately meet the
objectives outlined in Phase 1.
The common tools that we used in this phase of model building are Visual
studio Code and live server.
The most important thing in model building is data visualization. It helps in
discovering the trends in data. After all, it is much easier to observe data
trends when all the data is laid out in front of you in a visual form as
compared to data in a table. For example, The Tableau dashboard of sales
data demonstrates the sum of sales made by each customer in descending
order. However, the color red denotes loss while gray denotes profits. So it is
very easy to observe from this visualization that even though some customers
may have huge sales, they are still at a loss. This would be very difficult to
observe from a table.
The different data visualizations used in our Data Science Life Cycle are:
1. Bar charts
A barplot (or barchart) is one of the most common types of graphics. It shows
the relationship between a numeric and a categorical variable. Each entity of
the categoric variable is represented as a bar. The size of the bar represents its
numeric value. In the dashboard bar chart is used to display the minimum,
maximum and average marks of the subjects given in the excel sheet.
2. Pie chart
Pie chart is a circle divided into slices. Each slice represents a numerical
value and has slice size proportional to the value.
In this project we have used pie chart to display how many students belong to
a specific class i.e. Distinction, First Class, Pass, Fail etc.
Here, in this project we have used scatter plots to compare the different
subjects with each other and find correlation between two subjects.
1. Scatter Plots
A scatter plot (also called a scatter plot, scatter graph, scatter chart,
scattergram, or scatter diagram) is a type of plot or mathematical diagram
using Cartesian coordinates to display values for typically two variables for a
set of data.
2. Histograms
A histogram is a graphical representation that organizes a group of data
points into user-specified ranges. Similar in appearance to a bar graph, the
histogram condenses a data series into an easily interpreted visual by taking
many data points and grouping them into logical ranges or bins.
Here, we have used histogram to display results of each individual subject.
6. Phase 6: Operationalize
Following are a few of the snapshots of our project results. The chatbot can
be seen showing the results of a student after he enters his seat number. The
dashboard shows generalized results in pie charts, histograms, bar graphs and
scatter plots.