DV Lab Manual (Ex - No.1-10)
DV Lab Manual (Ex - No.1-10)
Ex.No.1:
Understanding Data, what is data, where to find data, Foundations for building Data
Visualizations, Creating Your First visualization?
Aim:
Understanding Data, what is data? where to find data? Foundations for building Data
Visualizations and Creating Your First visualization.
Solution:
What is Data?
• Structured Data: This type of data is organized into a specific format, such as tables
or databases, and is easily searchable and analysable. Examples include spreadsheets,
relational databases, and CSV files.
• Unstructured Data: Unstructured data lacks a specific format and can include text
documents, social media posts, images, audio recordings, and more. Analysing
unstructured data often requires advanced techniques like natural language processing
and image recognition.
You can find data from various sources, depending on your specific needs:
• Open Data Portals: Many governments and organizations provide free access to a
wide range of data through open data portals. Examples include Data.gov (United
States) and data.gov.uk (United Kingdom).
• Web Scraping: You can extract data from websites using web scraping tools and
libraries like BeautifulSoup and Scrapy. However, be mindful of the website's terms of
use and legal restrictions.
• Surveys and Surveys: You can conduct your own surveys or collect data through
questionnaires and interviews.
• IoT Devices: Internet of Things (IoT) devices generate vast amounts of data that can
be used for various purposes.
Creating effective data visualizations requires a strong foundation in several key areas:
• Data Analysis: Before creating visualizations, you should thoroughly analyze your
data to understand its structure, relationships, and any patterns or trends. Exploratory
data analysis (EDA) techniques can help with this.
• Statistical Knowledge: Understanding basic statistics is essential for making
meaningful interpretations of data. Concepts like mean, median, standard deviation, and
correlation are commonly used in data visualization.
• Domain Knowledge: Having knowledge of the specific domain or subject matter
related to your data is crucial for creating contextually relevant visualizations. It helps
you ask the right questions and provide valuable insights.
• Visualization Tools: Familiarize yourself with data visualization tools and libraries
such as matplotlib, Seaborn, ggplot2, D3.js, and Tableau. Each tool has its strengths
and can be used for different types of visualizations.
• Design Principles: Study design principles, including color theory, typography, and
visual hierarchy, to create visually appealing and effective visualizations. Avoid
common pitfalls like misleading visualizations.
• Interactivity: Learn how to add interactive elements to your visualizations to engage
users and allow them to explore the data. This can be achieved using tools like
JavaScript, Python libraries, or dedicated visualization software.
Creating Your First Visualization:
• Select Your Data: Choose a dataset that aligns with your goals and interests. Ensure
that the data is clean and well-structured.
• Define Your Objective: Clearly define what you want to communicate or explore with
your visualization. Are you looking to show trends, comparisons, or distributions?
• Choose the Right Visualization Type: Select a visualization type that suits your data
and objectives. Common types include bar charts, line charts, scatter plots, histograms,
and pie charts.
• Prepare and Transform Data: Preprocess your data as needed. This may involve
aggregating, filtering, or transforming the data to fit the chosen visualization.
• Create the Visualization: Use a suitable tool or library to create your visualization.
Customize it with labels, colors, and other design elements.
• Interactivity (Optional): If appropriate, add interactive features to your visualization
to allow users to interact with the data.
• Test and Iterate: Review your visualization for accuracy and clarity. Seek feedback
from others and make improvements as necessary.
• Publish or Share: Once you are satisfied with your visualization, publish it on a
platform, embed it in a report, or share it with your intended audience.
• Document and Explain: Provide context and explanations for your visualization.
Clearly communicate what the viewer should take away from it.
• Maintain and Update: If the data changes or new insights emerge, update your
visualization accordingly.
22CS307PC- DATA VISUALIZATION LAB
LAB PROBLEM: Getting started with Tableau Software using Data file formats, connecting your Data to
Tableau, creating basic charts (line, bar charts, Tree maps), Using the Show me panel.
AIM: Create basic charts using Data file format and R graphics packages.
PROGRAMS:
Ex. No. 2(a): Create Line Chart using R Programming
v <- c(17, 25, 38, 13, 41)
t <- c(22, 19, 36, 19, 23)
m<- c(25, 14, 16, 34, 29)
plot(v, type = "o", col = "BLUE", xlab = "Month", ylab = "Article Written ",
main = "Article Written chart")
lines(t, type = "o", col = "RED")
lines(m, type = "o", col = "GREEN")
OUTPUT: 2(a)
OUTPUT: 2(b)
OUTPUT: 2(c)
LAB PROBLEM: Tableau Calculations, Overview of SUM, AVR, and Aggregate features, Creating
custom calculations and fields.
PROGRAM:
df <- read.csv('D:/R_PRG/csv/student_data.csv')
adf1 <- aggregate(df$marks, by=list(df$subject), FUN=sum)
adf2 <- aggregate(df$marks, by=list(df$subject), FUN=mean)
adf3 <- aggregate(df$marks, by=list(df$subject), FUN=min)
adf4 <- aggregate(df$marks, by=list(df$subject), FUN=max)
adf5 <- aggregate(df$marks, by=list(df$subject), FUN=length)
adf6 <- aggregate(df$marks, by=list(df$subject), FUN=sd)
adf <- cbind(adf1, adf2$x,adf3$x, adf4$x,adf5$x,adf6$x)
colnames(adf) <- c('Subject', 'Total', 'Average', 'Min', 'Max', 'count', 'Std. Deviation')
adf
INPUT:
Marks
sno student
Name English Maths Science
1 Bala 72 80 68
2 Damu 95 78 82
3 Gopu 95 90 92
4 John 75 52 95
5 Mary 18 52 86
6 Raju 93 89 27
7 Ram 95 71 90
8 Sita 61 85 88
9 Sudha 75 70 85
10 Syed 99 82 60
student_data.csv
sno, student, subject, marks
1, Bala , English, 72
2, Damu, English, 95
3, Gopu, English, 95
4, John , English, 75
5, Mary, English, 18
6, Raju , English, 93
7, Ram , English, 95
8, Sita, English, 61
9, Sudha, English, 75
10, Syed, English, 99
22CS307PC- DATA VISUALIZATION LAB
OUTPUT:
Subject Total Average Min Max count Std. Deviation
English 778 77.8 18 99 10 24.66577
Maths 749 74.9 52 90 10 13.75540
Science 773 77.3 27 95 10 20.75813
LAB PROBLEM: Applying new data calculations to your visualizations, Formatting Visualizations,
Formatting Tools and Menus, Formatting specific parts of the view.
AIM: Apply new data calculations and format in visualization using R package.
PROGRAM:
library(plotly)
df = read.csv('D:/R_PRG/csv/sales-data.csv')
df['Total_Price'] = df$Price + df$Tax
agg_df <- aggregate(df$Total_Price, by=list(df$ITEM_GROUP), FUN=sum)
colnames(agg_df) <- c('Items', 'Price')
fig <- plot_ly(type='bar',x=agg_df$Items, y=agg_df$Price, text=agg_df$Price)
fig <- fig %>% layout(title = '<b> Super Market - Sales Data',
xaxis = list(title="<b> Grocery Items category", color='Red'),
yaxis = list(title="<b> Total sales(in Rupees)", color='Red'))
fig
sales-data.csv
ITEM_GROUP, ITEM_NAME, Price, Tax
Fruit, Apple, 100, 5
Fruit, Banana, 50, 5
Fruit, Orange, 100, 10
Fruit, Mango, 60, 6
Vegetable, Potato, 50, 5
Vegetable, Brinjal, 40, 4
Vegetable, Raddish, 40, 4
Dairy, Ghee, 100, 10
Dairy, Curd, 40, 4
Dairy, Milk, 50, 5
OUTPUT:
LAB PROBLEM: Editing and Formatting Axes, Manipulating Data in Tableau data, Pivoting Tableau
data.
PROGRAM:
## Data Manipulation using dplyr package
library(dplyr)
d1 <- read.csv('D:/R_PRG/csv/emp.csv')
d2 <- read.csv('D:/R_PRG/csv/dept.csv')
# Employee salary greater than 10000
select(d1, EMP_ID, JOB_ID, F_NAME, L_NAME, SALARY) %>%
filter(SALARY > 10000) %>% arrange(F_NAME) %>%
rename(DESIGNATION=JOB_ID)
## MUTATE&JOINS in SALARY(USD to INR Conversion and Ranking)
df <- left_join(d1,d2, by="DEPT_ID") %>%
select(EMP_ID, F_NAME, L_NAME, DEPT_NAME, SALARY) %>%
group_by(DEPT_NAME) %>%
mutate(SALARY = SALARY * 83, rank = min_rank(desc(SALARY)))
df
## Summarize SALARY department wise
df %>% group_by(DEPT_NAME) %>% summarise(sum(SALARY), mean(SALARY))
## Summarize SALARY of all employees
summarise(d1, sum(SALARY), mean(SALARY))
emp.csv
EMP_ID, F_NAME, L_NAME, JOB_ID, SALARY, DEPT_ID
100, Steven, King, PRESIDENT,24000,90
101, Neena, Kochhar, VICE PRESIDENT,17000,90
102, Lex, De Haan, VICE PRESIDENT,17000,90
103, Alexander, Hunold, IT_PROGRAMMER,9000,60
104, Bruce, Ernst, IT_PROGRAMMER,6000,60
105, David, Austin, IT_PROGRAMMER,4800,60
106, Valli, Pataballa, IT_PROGRAMMER,4800,60
107, Diana, Lorentz, IT_PROGRAMMER,4200,60
108, Nancy, Greenberg, FI_MANAGER,12008,100
109, Daniel, Faviet, ACCOUNTANT,9000,100
110, John, Chen, ACCOUNTANT,8200,100
111, Ismael, Sciarra, ACCOUNTANT,7700,100
112, Jose Manuel, Urman, ACCOUNTANT,7800,100
113, Luis, Popp, ACCOUNTANT,6900,100
114, Den, Raphaely, PU_MAN,11000,30
115, Alexander, Khoo, PU_CLERK,3100,30
116, Shelli, Baida, PU_CLERK,2900,30
117, Sigal, Tobias, PU_CLERK,2800,30
118, Guy, Himuro, PU_CLERK,2600,30
119, Karen, Colmenares, PU_CLERK,2500,30
120, Matthew,Weiss, ST_MAN,8000,50
121, Adam,Fripp, ST_MAN,8200,50
122, Payam,Kaufling, ST_MAN,7900,50
123, Shanta, Vollman, ST_MAN,6500,50
22CS307PC- DATA VISUALIZATION LAB
dept.csv
DEPT_ID, DEPT_NAME
30, PRODUCTION UNIT
40, HUMAN RESOURCE
50, STORE
60, INFORMATION TECHOLOGY
90, ADMINISTRATIVE
100, FINANCE
110, ACCOUNTING
OUTPUT:
EMP_ID DESIGNATION F_NAME L_NAME SALARY
114 PU_MAN Den Raphaely 11000
102 VICE PRESIDENT Lex De Haan 17000
108 FI_MANAGER Nancy Greenberg 12008
101 VICE PRESIDENT Neena Kochhar 17000
100 PRESIDENT Steven King 24000
sum(SALARY) mean(SALARY)
216308 6977.677
LAB PROBLEM: Structuring your data, Sorting and filtering Tableau data, Pivoting Tableau data.
PROGRAM:
library(tidyverse)
fd <- read.csv('D:/R_PRG/csv/fruits.csv')
fd %>% arrange(desc(quantity)) %>%
filter(colour=='green') %>%
mutate(fruits=fct_reorder(fruit,quantity)) %>%
ggplot(aes(fruits,quantity,fill=colour))+
geom_bar(stat="identity")+
scale_y_continuous("",label=scales::percent)+
coord_flip()+
scale_fill_manual(values = c("orange"="orange","green"="green",
"red"="red","yellow"="yellow"))
fruits.csv
fruit, colour, quantity
apples, green,15
apples, red,25
bananas, green,10
bananas, red,40
bananas, yellow,55
oranges, orange,35
mangos, green,25
mangos, yellow,20
grapes, green,60
OUTPUT:
LAB PROBLEM:
Advanced Visualization Tools: Using Filters, Using the Detail panel, using the Size panels, customizing
filters, Using and Customizing tooltips, Formatting your data with colors.
PROGRAM:
library(plotly)
basket <- read.csv("D:/R_PRG/csv/sales.csv")
fig <- plot_ly(
type = 'scatter',
x = basket$ITEM_NAME,
y = basket$Price+basket$Tax,
mode = 'markers',
color = ~basket$ITEM_GROUP,
symbol = ~basket$ITEM_GROUP,
size = 2, alpha = 0.5,
text = ~basket$ITEM_GROUP,
hovertemplate = paste('<b>Price:</b>Rs.%{y:.2f}','<br><b>Item:</b>%{x}',
'<br><b>Group:</b>%{text}'),
marker = list(size = 8,color = "yellow",line = list(color = "red",width=1)),
transforms = list(list(type = 'filter',target = 'y',operation = '>',value = 100))
)
fig <- fig %>% layout(title = "<b>Super Market Items > Rs.100",
xaxis = list(title = "<b>Items", color = 'Blue'),
yaxis = list(title = "<b>Price(in Rupees)", color = 'blue'))
fig
INPUT: sales.csv
ITEM_GROUP, ITEM_NAME, Price, Tax
Fruit, Apple, 200, 10
Fruit, Banana, 80, 4
Fruit, Orange, 100, 5
Fruit, Mango, 60, 3
Fruit, Papaya, 40, 2
Fruit, Lemon, 10, 1
Vegetable, Potato, 20, 1
Vegetable, Brinjal, 20, 1
Vegetable, Radish, 40, 2
Vegetable, Tomato, 40, 2
Vegetable, Onion, 40, 2
Vegetable, Cucumber, 40, 1
Dairy, Butter Milk, 10, 1
Dairy, Ghee, 200, 10
Dairy, Curd, 100, 5
Dairy, Cheese, 100, 5
Dairy, Milk, 60, 3
Dairy, Paneer, 100, 5
22CS307PC- DATA VISUALIZATION LAB
OUTPUT:
RESULT:
The above experiment is successfully executed and output is verified.
22CS307PC- DATA VISUALIZATION LAB
LAB PROBLEM:
Creating Dashboards and Storytelling, creating your first dashboard and Story, Design for different displays,
adding interactivity to your Dashboard, Distributing and Publishing your Visualization.
PROGRAM:
library(shiny)
require(shinydashboard)
library(ggplot2)
library(dplyr)
frow2 <- fluidRow(box(title = "Revenue by Sales Rep", status = "primary", solidHeader = TRUE,
collapsible = TRUE, plotOutput("revenuebyRep", height = "300px")),
box (title = "Regionwise Sales Data",status = "primary", solidHeader = TRUE,
collapsible = TRUE, plotOutput("revenuebyRegion", height = "300px")))
INPUT: Sales_Sample.csv
SalesRep, Region, QTR, Sales, Units_Sold
Amy, North, Q1,24971,84
Amy, South, Q2,25749,557
Amy, East, Q3,24437,95
Amy, West, Q4,25355,706
Bob, North, Q1,25320,231
Bob, South, Q2,25999,84
Bob, East, Q3,22639,260
Bob, West, Q4,23949,109
Chuck, North, Q1,20280,453
Chuck, South, Q2,21584,114
Chuck, East, Q3,19625,83
Chuck, West, Q4,19832,70
Doug, North, Q1,25150,242
Doug, South, Q2,29061,146
Doug, East, Q3,27113,120
Doug, West, Q4,25953,81
John, North, Q1,34971,184
John, South, Q2,35749,657
John, East, Q3,34437,295
John, West, Q4,35355,806
22CS307PC- DATA VISUALIZATION LAB
OUTPUT:
LAB PROBLEM:
Tableau file types, publishing to Tableau Online, Sharing your visualizations, printing, and Exporting.
PROCEDURE:
R-files
⁕ .R - R Script file
⁕ .Rproj - R Project
⁕ .RData - R Data file
Data Files
⁕ .csv files – CSV (comma separated values file) file
⁕ .txt files – Text file / Tab-separated data / Tab delimited files
⁕ .stata – stata (syllabic abbreviation of the words statistics and data) file
⁕ .sav – SPSS (Statistical Package for the Social Sciences) data file
⁕ .xlsx – Microsoft Excel format
GitHub:
GitHub is a web-based platform that allows you to store, manage, and version control your code and files.
It is widely used by developers, researchers, and data analysts to collaborate on projects, track changes, and
host websites.
RStudio:
RStudio Cloud is a web-based platform that allows you to create, run, and share your R projects online. It
is similar to the RStudio IDE, but you don't need to install anything on your computer. You can access your
projects from any browser and any device. RStudio Cloud also lets you collaborate with others in real time,
share your code and data, and publish your results as websites or apps.
If you are using RStudio you can export a plot with the Export menu of the Plots Pane:
The menu allows you to select three options: save the plot as Image, as PDF or copy the plot to the
Clipboard.
Save as image:
If you select Save as Image... the following window will open:You can select the image format to which
you want to save the plot (PNG, JPEG, TIFF, BMP, Metafile, SVG, EPS), the width and height in pixels,
the directory where is going to be saved and the file name.
Save as PDF:
If you select Save as PDF... you can select the PDF size, the orientation, the cairo graphics API, the
directory and the file name
Copy to clipboard:
The last option you can select is copying the image to the clipboard, as Bitmap or Metafile. You can also
specify the width and the height in pixels.
In R GUI you will need to go to File → Save as and select the type of file you prefer. If you select Jpeg,
you can also specify the quality of the resulting image. The last option is copying the image to the
Clipboard.
22CS307PC- DATA VISUALIZATION LAB
LAB PROBLEM: Creating custom charts, cyclical data and circular area charts, Dual Axis charts.
AIM: To create circular area and dual axis charts for cyclical data.
INPUT: temp.csv
Year, Month, Temperature
2018,JAN,4.1
2018,FEB,2.1
2018,MAR,3.8
2018,APR,8.6
2018,MAY,12.2
2018,JUN,14.6
2018,JUL,17.8
2018,AUG,16.2
2018,SEP,13
2018,OCT,10
2018,NOV,7.4
2018,DEC,5.4
2019,JAN,3.4
2019,FEB,6
2019,MAR,7
2019,APR,7.8
2019,MAY,10.3
2019,JUN,13.4
2019,JUL,17
2019,AUG,16.5
2019,SEP,13.4
2019,OCT,8.8
22CS307PC- DATA VISUALIZATION LAB
2019,NOV,5.6
2019,DEC,4.9
2020,JAN,5.7
2020,FEB,5.2
2020,MAR,5.8
2020,APR,9.1
2020,MAY,11.7
2020,JUN,14
2020,JUL,14.8
2020,AUG,16
2020,SEP,13
2020,OCT,9.5
2020,NOV,7.7
2020,DEC,4
2021,JAN,1.9
2021,FEB,4
2021,MAR,6.9
2021,APR,5.5
2021,MAY,9.2
2021,JUN,14.6
2021,JUL,16.9
2021,AUG,15.4
2021,SEP,15.3
2021,OCT,11.1
2021,NOV,7
2021,DEC,4.8
2022,JAN,4.3
2022,FEB,5.7
2022,MAR,6.9
2022,APR,8.2
2022,MAY,12.2
2022,JUN,14.4
2022,JUL,17.6
2022,AUG,17.2
2022,SEP,13.7
2022,OCT,11.7
2022,NOV,8
2022,DEC,2.9
OUTPUT:
22CS307PC- DATA VISUALIZATION LAB
INPUT: rain_temp.csv
year, rainfall, temperature
2010,638.3,28.16
2011,561.7,29.85
2012,943,28.81
2013,605.5,28.85
2014,731.2,30.01
2015,715.5,29.38
2016,678.2,29.38
2017,754.5,29.71
2018,621.6,29.64
2019,805.9,29.54
2020,733.8,29.73
2021,899.8,29.4
2022,610.2,30.24
OUTPUT:
RESULT: The above experiments are successfully executed and outputs are verified.