0% found this document useful (0 votes)

90 views19 pages

Unit 5 Ids

Unit 5 of the Data Science course focuses on data visualization and prototype application development, emphasizing the importance of effectively communicating insights through various methods such as presentations, dashboards, and reports. It discusses the selection of appropriate visualization techniques based on data type and context, and introduces tools like dc.js and Crossfilter for creating interactive dashboards. A case study illustrates the application of these concepts in a hospital pharmacy setting, showcasing how to manage and visualize data efficiently.

Uploaded by

Mamatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views19 pages

Unit 5 Ids

Uploaded by

Mamatha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Introduction to Data Science Unit-5

UNIT-V
Data Visualization and Prototype Application Development: Data Visualization options,
Crossfilter, the JavaScript MapReduce library, Creating an interactive dashboard with dc.js,
Dashboard development tools. Applying the Data Science process for real world problem solving
scenarios as a detailed case study.
5.1 Introduction
Data scientists must deliver their new insights to the end user. The results can be communicated
in several ways:
• A one-time presentation: Research questions are one-shot deals because the business
decision derived from them will bind the organization to a certain course for many years to
come.
Example: Company investment decisions:
➢ Do we distribute our goods from two distribution centers or only one?
➢ Where do they need to be located for optimal efficiency?
When the decision is made, the exercise may not be repeated until you’ve retired.
• A new viewport on your data: The most obvious example is customer segmentation. The
segments themselves will be communicated via reports and presentations. When a clear and
relevant customer segmentation is discovered, it can be fed back to the database as a new
dimension on the data from which it was derived. From then on, people can make their own
reports, such as how many products were sold to each segment of customers.
• A real-time dashboard: Sometimes the task of a data scientist doesn’t end when the
discovered new information is send to the database. But when other people start making
reports on this newly discovered data, they might interpret it incorrectly and make reports
that don’t make sense. The data scientist should make the first refreshable report so others,
mainly reporters and IT, can understand it and follow it. This in turn shorten the delivery
time of our insights to the end user who wants to use it on an everyday basis.
Important factors that we come across while preparing a final report are:
• What kind of decision are you supporting? Is it a strategic or an operational one? Strategic
decisions often only require you to analyze and report once, whereas operational decisions
require the report to be refreshed regularly.

1
Introduction to Data Science Unit-5

• How big is your organization? In smaller ones you’ll be in charge of the entire cycle: from
data gathering to reporting. In bigger ones a team of reporters might be available to make the
dashboards for you. But even in this last situation, delivering a prototype dashboard can be
beneficial because it presents an example and often shortens delivery time.
Data visualization options
• The art of presenting your data and information as graphs, charts, or maps is known as data
visualization.
• Data visualization's purpose is to emphasize observations that would not otherwise jump out
when looking at a linear list of values and numbers to enable people to quickly and easily
grasp their data.
How to Select the Appropriate Graph or Chart for Your Data?
To successfully express our message and insights, selecting the appropriate chart or graph for the
data is essential. The following factors need to be considered while choosing the optimal data
visualization:
Purpose
What are you trying to visualize? Are you attempting to demonstrate contrasts, patterns, or
connections in your data?
Type of Data
What kind of data do you have? Is it a numerical or category list? Both continuous and discrete?
This will aid in choosing the best types of data visualization charts.
Context
What context does your data come from? Is it recent or historical? Local or worldwide? This will
enable you to choose the proper scale and coverage for your visualization.
Most Common Types of Data Visualization are:
1. Column Chart
2. Line Graph
3. Pie Chart
4. Bar Chart
5. Heat Maps
6. Scatter Plot
7. Bubble Chart

2
Introduction to Data Science Unit-5

8. Funnel Chart
9. Radar Chart
10. Tree Chart

Fig. 5.1 Top most best data visualization tools

Case Study:
• Consider a hospital pharmacy with a stock of a few thousand medicines. The government
came out with a new norm to all pharmacies: all medicines should be checked for their
sensitivity to light and be stored in new, special containers.
• One thing the government didn’t supply to the pharmacies was an actual list of light-
sensitive medicines. This is no problem for you as a data scientist because every medicine
has a patient information leaflet that contains this information. You distill the information
with the clever use of text mining and assign a “light sensitive” or “not light sensitive” tag to
each medicine. This information is then uploaded to the central database. In addition, the
pharmacy needs to know how many containers would be necessary. For this they give you
access to the pharmacy stock data. When you draw a sample with only the variables you
require, the data set looks like figure 5.2 when opened in Excel

3
Introduction to Data Science Unit-5

Fig. 5.2 Pharmacy medicines data set opened in Excel: the first 10 lines of stock data are
enhanced with a light-sensitivity variable
• As we can see, the information is time-series data for an entire year of stock movement, so
every medicine thus has 365 entries in the data set. For the example’s sake we’ll use a
fraction of this amount.
• Also, the data set is limited to 29 medicines, a little more than 10,000 lines of data.
• Also, it’s not recommended to load your entire database into the user’s browser; the browser
will freeze while loading, and if it’s too much data, the browser will even crash.
• Normally data is precalculated on the server and parts of it are requested using, for example,
a REST service.
• We use the data visualization option dc.js, which is a cross-breed between the JavaScript
MapReduce library Crossfilter and the data visualization library d3.js.
• Crossfilter was developed by Square Register, a company that handles payment transactions.
Square developed Crossfilter to allow their customers extremely speedy slice and dice on
their payment history. Crossfilter is not the only JavaScript library capable of MapReduce
processing, but it most certainly does the job, is open source, is free to use, and is
maintained by an established company (Square).
• Example alternatives to Crossfilter are Map.js, Meguro, and Underscore.js.
• d3.js can safely be called the most versatile JavaScript data visualization library; it was
developed by Mike Bostock as a successor to his Protovis library. Many JavaScript libraries
are built on top of d3.js.

4
Introduction to Data Science Unit-5

• NVD3, C3.js, xCharts, and Dimple offer same services like d3.js; an abstraction layer on top
of d3.js, which makes it easier to draw simple graphs. They mainly differ in the type of
graphs they support and their default design.
• The main reason for choosing dc.js among many options is: dc.js can easily set up an
interactive dashboard where clicking one graph will create filtered views on related graphs.

Fig. 5.3 A dc.js interactive example on its official website

5.3 Crossfilter, the JavaScript MapReduce library
• JavaScript isn’t the greatest language for data crunching. But that didn’t stop people, like the
folks at Square, from developing MapReduce libraries for it.
• If we’re dealing with data, every bit of speed gain helps.

5
Introduction to Data Science Unit-5

• We don’t want to send enormous loads of data over the internet or even your internal
network though, for these reasons:
➢ Sending a bulk of data will tax the network to the point where it will bother other
users.
➢ The browser is on the receiving end, and while loading in the data it will temporarily
freeze. For small amounts of data this is unnoticeable, but when we start looking at
100,000 lines, it can become a visible lag. When we go over 1,000,000 lines,
depending on the width of our data, our browser could give up.
• For the data we do send, there is a Crossfilter to handle it once it arrives in the browser. In
this case study, the pharmacist requested the central server for stock data of 2015 for 29
medicines she was particularly interested in.
5.3.1 Setting up everything
• dc.js is the visualization library we will use to create your interactive dashboard
• To build the actual dc.js application we require the following these libraries:
➢ JQuery—To handle the interactivity
➢ Crossfilter.js—A MapReduce library and prerequisite to dc.js
➢ d3.js—A popular data visualization library and prerequisite to dc.js
➢ Bootstrap—A widely used layout library you’ll use to make it all look better
• We write only three files:
➢ index.html—The HTML page that contains our application
➢ application.js—To hold all the JavaScript code
➢ application.css—For CSS (Cascading Style Sheet)
• In addition, we also need to run our code on an HTTP server. So we had to set up a LAMP
(Linux, Apache, MySQL, PHP), WAMP (Windows, Apache, MySQL, PHP), or XAMPP
(Cross Environment, Apache, MySQL, PHP, Perl) server.
• But for the sake of simplicity we won’t set up any of those servers here. Instead we can do it
with a single Python command.
• Use the command-line tool (Linux shell or Windows CMD) and move to the folder
containing the index.html.
• The following command will launch a Python HTTP server on our localhost.
python -m SimpleHTTPServer

6
Introduction to Data Science Unit-5

For Python 3.4

python -m http.server 8000

Fig. 5.4 Starting up a simple Python HTTP server

• The following files should be also be available in the same folder as our index.html. We can
download them from the Manning website or from their creator’s websites.
➢ dc.css and dc.min.js—https://dc-js.github.io/dc.js/
➢ d3.v3.min.js—http://d3js.org/
➢ crossfilter.min.js—http://square.github.io/crossfilter/
• Now we write a html code in index.html page. Using a JQuery onload handler, our
application will be loaded when the rest of the page is ready.
• Now that you have your HTML set up, it’s time to write code in application.js file. First, we
wrap the entire code “to be” in a JQuery onload handler.
$(function()
{
//All future code will end up in this wrapper
})
• Use the following code to load in data.
d3.csv('medicines.csv',function(data) {
main(data)
});
• Apart from the main function we have a CreateTable function to create the tables, as shown
in the following listing.

7
Introduction to Data Science Unit-5

Listing 5.1 The CreateTable function

CreateTable()
CreateTable() requires three arguments:
➢ data: The data it needs to put into a table.
➢ variablesInTable: What variables it needs to show.
➢ Title: The title of the table.
CreateTable() uses a predefined variable, tableTemplate, that contains our overall table layout.
CreateTable() can then add rows of data to this template.

8
Introduction to Data Science Unit-5

Listing 5.2 JavaScript main function

We show our data on the screen, but preferably not all of it; only the first five entries as shown
in figure 5.5. We can have a date variable in our data and if we want to make sure Crossfilter
will recognize it as such later on, so we first parse it and create a new variable called Day. We
show the original, Date, to appear in the table for now, but later on we’ll use Day for all our
calculations.

Fig. 5.5 Input medicine table shown in browser: first five lines
5.3.2 Unleashing Crossfilter to filter the medicine data set
• Now let’s go into Crossfilter to use filtering and MapReduce. We should put all our code
now within the main() function.
• The first thing we’ll need to do is declare a Crossfilter instance and initiate it with our data.
CrossfilterInstance = crossfilter(medicineData);
• On this instance we can register dimensions, which are the columns of the table.
• Currently Crossfilter is limited to 32 dimensions. If we are handling data wider than 32
dimensions, we should consider narrowing it down before sending it to the browser.

9
Introduction to Data Science Unit-5

• Let’s create our first dimension, the medicine name dimension:

var medNameDim = CrossfilterInstance.dimension(function(d) {return d.MedName;});
• The first dimension is the name of the medicines, and we can already use this to filter the
data set and show the filtered data using our CreateTable() function.
var dataFiltered= medNameDim.filter('Grazax 75 000 SQ-T')
var filteredTable = $('#filteredtable');
filteredTable
.empty()
.append(CreateTable(dataFiltered.top(5),variablesInTable,'Our First Filtered
Table'));

Fig. 5.6 Data filtered on medicine name Grazax 75 000 SQ-T

• The top(5) fuction had sorted the data and shown the top 5 entries of the data set.
• Let’s register another dimension, the date dimension:
var DateDim = CrossfilterInstance.dimension( function(d)
{return d.Day;});
• Now we can sort on date instead of medicine name:
filteredTable
.empty()
.append(CreateTable(DateDim.bottom(5),variablesInTable,'Our First Filtered
Table'));

10
Introduction to Data Science Unit-5

Fig. 5.7 Data filtered on medicine name Grazax 75 000 SQ-T and sorted by day
• If we like to know how many observations we have per medicine. Logic dictates that you
should end up with the same number for every medicine: 365, or 1 observation per day in
2015.
• Crossfilter comes with two MapReduce functions: reduceCount() and reduceSum().
• If we want to do anything apart from counting and summing, we need to write reduce
functions for it.
• The countPerMed variable now contains the data grouped by the medicine dimension and a
line count for each medicine in the form of a key and a value.
• To create the table we need to address the variable key instead of medName and value for
the count.
var countPerMed = medNameDim.group().reduceCount();
variablesInTable = ["key","value"]
filteredTable
.empty()
.append(CreateTable(countPerMed.top(Infinity), variablesInTable,'Reduced Table'));

Fig. 5.8 MapReduced table with the medicine as the group and a count of data lines as the value

11
Introduction to Data Science Unit-5

• Apart from the reduceCount() and reduceSum() functions, Crossfilter has the more general
reduce() function. This function takes three arguments:
➢ The reduceAdd() function: A function that describes what happens when an extra
observation is added.
➢ The reduceRemove() function: A function that describes what needs to happen when
an observation disappears (for instance, because a filter is applied).
➢ The reduceInit() function: This one sets the initial values for everything that’s
calculated. For a sum and count the most logical starting point is 0.
• A custom reduce function requires three components: an initiation, an add function, and a
remove function.
• The initial reduce function will set starting values of the p object:
var reduceInitAvg = function(p,v)
{ return {count: 0, stockSum : 0, stockAvg:0};
}
• The reduce functions themselves take two arguments.
➢ p is an object that contains the combination situation so far; it persists over all
observations. This variable keeps track of the sum and count for you and thus
represents your goal, your end result.
➢ v represents a record of the input data and has all its variables available to you. The
reduceInit() is called only once, but reduceAdd() is called every time a record is
added and reduceRemove() every time a line of data is removed.
➢ The reduceInit() function, here called reduceInitAvg() because we’re going to
calculate an average, basically initializes the p object by defining its components
(count, sum, and average) and setting their initial values.
Let’s look at reduceAddAvg():
var reduceAddAvg = function(p,v){
p.count += 1;
p.stockSum = p.stockSum + Number(v.Stock);
p.stockAvg = Math.round(p.stockSum / p.count);
return p; }

12
Introduction to Data Science Unit-5

• reduceAddAvg() takes the same p and v arguments. The Stock is summed up for every
record we add, and then the average is calculated based on the accumulated sum and record
count:
var reduceRemoveAvg = function(p,v){
p.count -= 1;
p.stockSum = p.stockSum - Number(v.Stock);
p.stockAvg = Math.round(p.stockSum / p.count); return p;
}
• The reduceRemoveAvg() function looks similar but does the opposite: when a record is
removed, the count and sum are lowered. Now apply this MapReduce function to the data
set:

Fig. 5.9 MapReduced table with average stock per medicine

• The results speak for themselves, as shown in figure 5.9.
• It seems we’ve borrowed Cimalgex from other hospitals, going into an average negative
stock.
• This is all the Crossfilter you need to know to work with dc.js, so let’s move on and bring
out those interactive graphs.
5.4 Creating an interactive dashboard with dc.js
• Now that you know the basics of Crossfilter, it’s time to take the final step: building the
dashboard.

13
Introduction to Data Science Unit-5

• This can be done by inserting the spot of the graphs related code in the index.html page.
• In application.js we can add all the upcoming code in your main() function.
• dc.renderAll() is dc’s command to draw the graphs which should be placed only once at the
bottom of the main() function.
• The first graph we need is the “total stock over time,” as shown in the following listing. We
already have the time dimension declared, so all we need is to sum the stock by the time
dimension.
Listing 5.3 Code to generate "total stock over time" graph

• .group() takes the time dimension and represents the x-axis.

• .dimension() represents the y-axis and takes the summated data as input.

14
Introduction to Data Science Unit-5

Figure 5.10 dc.js graph: sum of medicine stock over the year 2015
• Now let’s create a row chart that represents the average stock per medicine.
Listing 5.4 Code to generate “average stock per medicine” graph

• Since we used custom-defined reduce() function this time, dc.js doesn’t know what data to
represent. With the .valueAccessor() method we can specify p.value.stockAvg as the value
of our choice.
• The dc.js row chart’s label’s font color is gray; this makes our row chart somewhat hard to
read. We can remedy this by overwriting its following CSS in the application.css file:
.dc-chart g.row text {fill: black;}
• One simple line can make the difference between a clear and an obscure graph.
• Now when we select an area on the line chart, the row chart is automatically adapted to
represent the data for the correct time period. Inversely, we can select one or multiple
medicines on the row chart, causing the line chart to adjust accordingly.

15
Introduction to Data Science Unit-5

Fig. 5.11 dc.js line chart and row chart interaction

• Finally, let’s add the light-sensitivity dimension so the pharmacist can distinguish between
stock for light-sensitive medicines and non-light-sensitive ones.
Listing 5.6 Adding the light-sensitivity dimension

• Next we need to register light dimension onto the Crossfilter instance first. We can also add
a reset button, which causes all filters to reset, as shown in the following listing.
Listing 5.7 The dashboard reset filters button

• .filterAll() method removes all filters on a specific dimension.

• dc.redrawAll() then manually triggers all dc charts to redraw. The final result is an
interactive dashboard (figure 5.12), ready to be used by the pharmacist to gain insight into
her stock’s behavior.

16
Introduction to Data Science Unit-5

Fig. 5.12 dc.js fully interactive dashboard on medicines and their stock within the hospital
pharmacy
5.5 Dashboard development tools
• We have the proven and true software packages of renowned developers such as Tableau,
MicroStrategy, Qlik, SAP, IBM, SAS, Microsoft, Spotfire, and so on.
• These companies all offer dashboard tools worth investigating but they are paid tools.
• Developers can also offer free public versions with limited functionality.
• Some companies will at least give us a trial version. In the end we have to pay for the full
version of any of these packages.
• We can also get visualization libraries that only come with a trial period and no free
community edition, such as Wijmo, Kendo, and FusionCharts. They are worth looking into
because they also provide support and guarantee regular updates.
• HTML is a free data visualization tool, which proliferates with free JavaScript libraries to
plot any data we want.
• Some of the visualization tools are:

17
Introduction to Data Science Unit-5

➢ HighCharts: One of the most mature browser-based graphing libraries. The free license
applies only to noncommercial pursuits. If you want to use it in a commercial context,
prices range anywhere from $90 to $4000.
➢ Chartkick: A JavaScript charting library for Ruby on Rails fans.
➢ Google Charts: The free charting library of Google. As with many Google products, it is
free to use, even commercially, and offers a wide range of graphs.
➢ d3.js: This is an odd one out because it isn’t a graphing library but a data visualization
library. Libraries such as HighCharts and Google Charts are meant to draw certain
predefined charts, d3.js doesn’t lay down such restrictions. d3.js is currently the most
versatile JavaScript data visualization library available.
• Even though we have many options why or when would we consider building our own
interface with HTML5 instead of using alternatives such as SAP’s BusinessObjects, SAS
JMP, Tableau, Clickview, or one of the many others?
• Here are a few reasons:
➢ No budget: When we work in a startup or other small company, the licensing costs
accompanying this kind of software can be high.
➢ High accessibility: The data science application is meant to release results to any kind of
user, especially people who might only have a browser at their disposal—our own
customers, for instance. Data visualization in HTML5 runs fluently on mobile.
➢ Big pools of talent out there: Although there aren’t that many Tableau developers, scads
of people have web-development skills. When planning a project, it’s important to take
into account whether you can staff it.
➢ Quick release: Going through the entire IT cycle might take too long at the company,
and we want people to enjoy our analysis quickly. Once the interface is available and
being used, IT can take all the time they want to industrialize the product.
➢ Prototyping: The better you can show IT its purpose and what it should be capable of,
the easier it is for them to build or buy a sustainable application that does what you want
it to do
➢ Customizability: Although the established software packages are great at what they do,
an application can never be as customized as when you create it yourself.
And why wouldn’t you do this?

18
Introduction to Data Science Unit-5

➢ Company policy: This is the biggest one: it’s not allowed. Large companies have IT
backup teams that allow only a certain number of tools to be used so they can keep their
supporting role under control.
➢ Mature reporting team: If you have a good reporting department, why would you still
bother?
➢ Customization is satisfactory: Not everyone wants the shiny stuff; basic can be enough.
Several of the bigger platforms are browser interfaces with JavaScript running under the
hood. Tableau, BusinessObjects Webi, SAS Visual Analytics, and so on all have HTML
interfaces; their tolerance to customization might grow over time.

Data Analytics Life Cycle
No ratings yet
Data Analytics Life Cycle
8 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
SQL Practical Questions
No ratings yet
SQL Practical Questions
14 pages
Social Network Analysis Unit-2
No ratings yet
Social Network Analysis Unit-2
24 pages
Structured Text Tutorial
No ratings yet
Structured Text Tutorial
31 pages
Unit 5 Ids
No ratings yet
Unit 5 Ids
18 pages
FINAL Object Oriented Programming Lab Manual
No ratings yet
FINAL Object Oriented Programming Lab Manual
80 pages
Ids Unit-5
No ratings yet
Ids Unit-5
28 pages
Easy Way To Learn C Programming PDF
No ratings yet
Easy Way To Learn C Programming PDF
288 pages
Module 4-Data Visualization To The End User
No ratings yet
Module 4-Data Visualization To The End User
9 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
Social Media Analytics
No ratings yet
Social Media Analytics
16 pages
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
No ratings yet
Module 4.1 - Memory and Data Locality: GPU Teaching Kit
132 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Contiguous and Non Contiguous Memory
100% (1)
Contiguous and Non Contiguous Memory
3 pages
Unit 3 Ids
100% (1)
Unit 3 Ids
24 pages
SQL Quires
No ratings yet
SQL Quires
29 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Unit-3: Non-Linear Data Structure
No ratings yet
Unit-3: Non-Linear Data Structure
23 pages
O.R - Unit - I, II, III
No ratings yet
O.R - Unit - I, II, III
44 pages
Gcse Computer Science Answers
No ratings yet
Gcse Computer Science Answers
81 pages
Darshan - III Sem - DS - 2130702 - Linked List - 29092014 - 032047PM
100% (1)
Darshan - III Sem - DS - 2130702 - Linked List - 29092014 - 032047PM
26 pages
Telecommunications Sector
No ratings yet
Telecommunications Sector
16 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Improve Communication Between Your C - C++ Applications and SAP Systems With SAP NetWeaver RFC SDK - Part 3: Advanced Topics
No ratings yet
Improve Communication Between Your C - C++ Applications and SAP Systems With SAP NetWeaver RFC SDK - Part 3: Advanced Topics
18 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Lesson 15 A Turing Machine
No ratings yet
Lesson 15 A Turing Machine
87 pages
File Allocation
No ratings yet
File Allocation
37 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
Unit 2 - Knowledge Delivery
No ratings yet
Unit 2 - Knowledge Delivery
31 pages
Problem Solving With The Sequential Logic Structure
No ratings yet
Problem Solving With The Sequential Logic Structure
16 pages
Input Output in Java
No ratings yet
Input Output in Java
4 pages
Sepm Unit 3.... Roshan
No ratings yet
Sepm Unit 3.... Roshan
16 pages
r20 III I CN Lab Manual Final
No ratings yet
r20 III I CN Lab Manual Final
71 pages
Abaqus Script
No ratings yet
Abaqus Script
3 pages
Ds Lab Manual
No ratings yet
Ds Lab Manual
51 pages
Operating Systems:: Threads
No ratings yet
Operating Systems:: Threads
26 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
Semantic Web SN
No ratings yet
Semantic Web SN
22 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Worksheet Debuging Chapter 7 & 8
No ratings yet
Worksheet Debuging Chapter 7 & 8
14 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
Unit 2 Omputer Network Aktu
100% (1)
Unit 2 Omputer Network Aktu
30 pages
Unit - Iv: Machine Learning (ML) For Iot
No ratings yet
Unit - Iv: Machine Learning (ML) For Iot
17 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Unit - III
No ratings yet
Unit - III
34 pages
Ignou MCS 14
No ratings yet
Ignou MCS 14
5 pages
Web Development Using PHP
No ratings yet
Web Development Using PHP
65 pages
Output Log
No ratings yet
Output Log
11 pages
Bda Unit-Iii-R20
No ratings yet
Bda Unit-Iii-R20
44 pages
Basic Files Processing: Professor Dumont Csc119 - Introduction To Unix/Linux
No ratings yet
Basic Files Processing: Professor Dumont Csc119 - Introduction To Unix/Linux
11 pages
Sohana Nizam 2121715630 - Activities
No ratings yet
Sohana Nizam 2121715630 - Activities
18 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
UNIT I Material
No ratings yet
UNIT I Material
25 pages
Thanvanth H CS 2024-2025
No ratings yet
Thanvanth H CS 2024-2025
26 pages
Data Modelling and Visualization
No ratings yet
Data Modelling and Visualization
31 pages
Chap 6 - Software Reuse
No ratings yet
Chap 6 - Software Reuse
51 pages
SE Unit 3
No ratings yet
SE Unit 3
10 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
Pointers Practice Sheet
No ratings yet
Pointers Practice Sheet
18 pages
Software Engineering Notes (Unit-III)
No ratings yet
Software Engineering Notes (Unit-III)
21 pages
Cyber Security IMP Points Short Notes
No ratings yet
Cyber Security IMP Points Short Notes
20 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
AI Chatbot Unit 2
No ratings yet
AI Chatbot Unit 2
7 pages
Cs2357-Ooad Lab Manual
0% (1)
Cs2357-Ooad Lab Manual
199 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
Unit-5-Code Gen
No ratings yet
Unit-5-Code Gen
13 pages
Cuestionario Why Big Data and Where Did It Come From?
50% (2)
Cuestionario Why Big Data and Where Did It Come From?
4 pages
Unit Iv
No ratings yet
Unit Iv
8 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
NoSQL Notes
No ratings yet
NoSQL Notes
5 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages
Unit II: Software Requirement Analysis and Specifications
No ratings yet
Unit II: Software Requirement Analysis and Specifications
64 pages
Sna 5
No ratings yet
Sna 5
6 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
Faisal Daa Mini Micro Macro Project
No ratings yet
Faisal Daa Mini Micro Macro Project
21 pages
Rahul
No ratings yet
Rahul
4 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Weekly Test Material-IDS
No ratings yet
Weekly Test Material-IDS
10 pages
BDA Unit-3
No ratings yet
BDA Unit-3
24 pages
List of End Semester Seminar Topics
No ratings yet
List of End Semester Seminar Topics
1 page
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Problem Set 11: Numerical Difference EXERCISE 7. Solve The Following Problems Using Numerical Differentiation Approximation
No ratings yet
Problem Set 11: Numerical Difference EXERCISE 7. Solve The Following Problems Using Numerical Differentiation Approximation
8 pages
Master of Science-Computer Science-Syllabus
No ratings yet
Master of Science-Computer Science-Syllabus
22 pages
Procedures and Displays
No ratings yet
Procedures and Displays
3 pages
PraposalFinal MIC
No ratings yet
PraposalFinal MIC
4 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Ooad Unit3 Notes
No ratings yet
Ooad Unit3 Notes
15 pages
OOAd 2 Marks
No ratings yet
OOAd 2 Marks
16 pages
Question Bank For Object Oriented Analysis Design Regulation 2013
No ratings yet
Question Bank For Object Oriented Analysis Design Regulation 2013
6 pages

Unit 5 Ids

Uploaded by

Unit 5 Ids

Uploaded by

Introduction to Data Science Unit-5

Fig. 5.1 Top most best data visualization tools

Fig. 5.3 A dc.js interactive example on its official website

For Python 3.4

Fig. 5.4 Starting up a simple Python HTTP server

Listing 5.1 The CreateTable function

Listing 5.2 JavaScript main function

• Let’s create our first dimension, the medicine name dimension:

Fig. 5.6 Data filtered on medicine name Grazax 75 000 SQ-T

Fig. 5.9 MapReduced table with average stock per medicine

• .group() takes the time dimension and represents the x-axis.

Fig. 5.11 dc.js line chart and row chart interaction

• .filterAll() method removes all filters on a specific dimension.

You might also like