0% found this document useful (0 votes)
63 views19 pages

Doing Comparison: A) Bar Chart

1. The document provides examples of different types of charts that can be used for data visualization in SAS, including bar charts, column charts, line charts, scatter plots, histograms, and bubble charts. 2. Code examples are given for each type of chart using the proc sgplot procedure in SAS, along with screenshots of the output. 3. Tips are provided, such as using PROC FORMAT to order categories chronologically for clustered charts.

Uploaded by

ashusri000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views19 pages

Doing Comparison: A) Bar Chart

1. The document provides examples of different types of charts that can be used for data visualization in SAS, including bar charts, column charts, line charts, scatter plots, histograms, and bubble charts. 2. Code examples are given for each type of chart using the proc sgplot procedure in SAS, along with screenshots of the output. 3. Tips are provided, such as using PROC FORMAT to order categories chronologically for clustered charts.

Uploaded by

ashusri000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

03/06/2015

DatavisualizationguideforSAS

1. Doing Comparison
a)Bar Chart
A bar chart, also known as bar graph represents grouped data using rectangular bars with lengths
proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical
bar chart is sometimes called a column bar chart.

Illustration
Objective: We want toknow the number of views of each category represented graphically through a bar
chart.

Code:
procsgplotdata=discuss;
hbarcategory/response=viewsstat=sum
datalabeldatalabelattrs=(weight=bold);
title'TotalViewsbyCategory';
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Analyt

5/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/3.png)

b)Column Chart
Column Charts are often self-explanatory. They are simply the vertical version of a Bar Chart where length
of the bars equalsthe magnitude of value they represent. Heres a maneuver:Rotate the chart shown above
by -90 degrees, itll get converted into a column chart.

Code:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Analyt

6/36

03/06/2015

DatavisualizationguideforSAS

procsgplotdata=discuss;
hbarcategory/response=viewsstat=sum
datalabeldatalabelattrs=(weight=bold)barwidth=
0.5;/*Assignwidthtobars*/
title'TotalViewsbyCategory';
run;

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/4.png)

> Code explanationfor Bar Chart and Column Chart:


Category : the variable according to which the data has to be grouped.

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Analyt

7/36

03/06/2015

DatavisualizationguideforSAS

Response = views: the statistics specified by the stat = option is calculated for the variable views
grouped by category variable.
Datalabel option specifies that we want the values calculated to be displayed for each bar.
Weight = bold option specifies that the datalabels for each bar are to be displayed in bold.
The bar width option is used to assign width to the bars.The default value is 0.8 and the range is 0.1-1.

c) Clustered Bar Chart / Column Chart


This type of representation is useful when we want to visualize the distribution of data across two
categories.
Objective: We want to analyze the total views of topics in the discussion forum by category and date
posted.

Code:
datadiscuss_date;
setdiscuss;
month=month(DatePosted);
month_name=PUT(DatePosted,monname.);
putmonth_name=@;
run;
procsgplotdata=discuss_date;

vbarcategory/response=viewsgroup=month_namegroupdisplay=cluster
datalabeldatalabelattrs=(weight=bold)dataskin=gloss;yaxisgrid;
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Analyt

8/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/5.png)
However, there is a problem with this image, the months are not in chronological order. In order to solve for
this, we use PROC FORMAT.

Code with PROC FORMAT:


datadiscuss_date;
setdiscuss;
month=month(DatePosted);
month_num=input(month,5.);
run;

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Analyt

9/36

03/06/2015

DatavisualizationguideforSAS

PROCFORMAT;
VALUEmonthfmt
1='January'
2='February'
3='March'
4='April';
RUN;

procsgplotdata=discuss_date;
vbarcategory/response=viewsgroup=month_numgroupdisplay=clusterdatala
beldatalabelattrs=(weight=bold)dataskin=glossgrouporder=ascending;
formatmonth_nummonthfmt.;
yaxisgrid;
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

10/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/image.png)

d) Line Chart
A line chart or line graph is a type of chart which displays information as a series of data points called
markers connected by straightline segments. A line chart is often used to visualize trends in data over
intervals of time a time series (http://www.analyticsvidhya.com/blog/2015/02/step-step-guide-learntime-series/) thus the line is often drawn chronologically. In these cases they are known as run charts
(http://en.wikipedia.org/wiki/Run_chart).
For this illustration,we will be using a data fromPGDBA from IIT + IIM C + ISI vs. Praxis Business School
PGPBA (http://discuss.analyticsvidhya.com/t/pgdba-from-iit-iim-c-isi-vs-praxis-business-schoolpgpba/526/14).

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

11/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/6.png)

Code:
procsgplotdata=clicks;
vlinedate/response=PGDBA_IIM_;
vlinedate/response=PGPBA_Praxis_;
yaxislabel="Clicks";
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

12/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/7.png)

e) Bar-Line Chart
This combination chart combines the features of the bar chart and the line chart. Itdisplays the data using a
number of bars and/or lines, each of which represent a particular category. A combination of bars and lines
in the same visualization can be useful when comparing values in different categories.
Objective:We want to compare the projected sales with the actual sales for different time periods.

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/8.png)

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

13/36

03/06/2015

DatavisualizationguideforSAS

Code:
procsgplotdata=barline;
vbarmonth/response=actual_salesdatalabeldatalabelattrs=(weight=bold)
fillattrs=(color=tan);

vlinemonth/response=predicted_sales
lineattrs=(thickness=3)markers;
xaxislabel="Month";
yaxislabel="Sales";
keylegend/location=insideposition=topleftacross=1;
run;

Note : The data needs to be sorted by the x-axis variable.

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/9.png)

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

14/36

03/06/2015

DatavisualizationguideforSAS

2) Studying relationship
a) Bubble Chart
A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3)
of associated data is plotted as a disk that expresses two of the vi values through the disks xy location and
the third through its size. Source: Wikipedia.

Data for OS:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/11.png)

Code:
procsgplotdata=os;
bubbleX=expensesY=salessize=profit
/fillattrs=(color=teal)datalabel=profit;
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

15/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/12.png)
As we can see,there is a record for which the Sales and Profit are maximum whereas the comparative
expenses are less than some other data points.

b) Scatter Plot for Relationship


A simple scatter plot between two variables can give us an idea about the relationship between them-linear,
exponential etc. This information can be helpful during further analysis.

Code:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

16/36

03/06/2015

DatavisualizationguideforSAS

procsgplotdata=os;
title'RelationshipofProfitwithSales';
scatterX=salesY=profit/
markerattrs=(symbol=circlefilledsize=15);
run;

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/16.png)

3. Studying Distribution
a) Histogram
A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the
probability distribution of a continuous variable To construct a histogram, the first step is to bin the range
of valuesthat is, divide the entire range of values into a series of small intervalsand then count how many

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

17/36

03/06/2015

DatavisualizationguideforSAS

values fall into each interval.The bins are usually specified as consecutive, non-overlapping intervals of a
variable. The bins (intervals) must be adjacent, and usually equal size. The rectangles of a histogram are
drawn so that they touch each other to indicate that the original variable is continuous.

Code:
procsgplotdata=sashelp.cars;
histogrammsrp/fillattrs=(color=steel)scale=proportion;
densitymsrp;
run;

Output:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

18/36

03/06/2015

DatavisualizationguideforSAS

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/10.png)
We have used the sashelp.mtcars dataset here.A histogram of the MSRP variable gives us the above
figure.This tells us that the variable MSRP is skewed to the right indicating that most of the data points are
below $50,000.These kind of simple but meaningful insights can be found out from histograms.

b) Scatter Plot
In a scatter plot the data is displayed as a collection of points, each having the value of one variable
determining the position on the horizontal axis and the value of the other variable determining the position
on the vertical axis.It can be used both to see the distribution of data and accessing the relationship
between variables.
Note: For Illustration, we will use a dataset discusstakenfrom the Analytics Vidhya Discuss
(http://discuss.analyticsvidhya.com/)

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

19/36

03/06/2015

DatavisualizationguideforSAS

Code:
procsgplotdata=discuss;
scatterX=datepostedY=views/group=category
markerattrs=(symbol=circlefilledsize=15);
run;

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/13.png)
The SGSCATTER Procedure can also be used for scatter plots.It has the advantage of being able to produce
multiple scatter plots. Below is the output using sgcscatter:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

20/36

03/06/2015

DatavisualizationguideforSAS

Code:
procsgscatterdata=discuss;
comparey=viewsx=(repliescategory)
/group=monthmarkerattrs=(symbol=circlefilledsize=10);
run;

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/14.png)

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

21/36

03/06/2015

DatavisualizationguideforSAS

An importantuse of scatter plot is the interpretation of residuals of linear regression. A scatter plot of the
residuals vs the predicted values of the predicted variable helps us in determining whether the data is
heteroskedastic or homoskedastic.
HOMOSKEDASTIC HETEROSKEDASTIC

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/15.png)

4) Composition
a) Stacked Column Chart:
In astacked bar chart, the stacked bars represent different groups on top of each other. The height of the
resulting bar shows the combined result of the groups.
For example,if we want to see the total sales per Item grouped by Location across the total data of the OS
dataset, we can use the stacked column chart. Below is the illustration:

Code:

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

22/36

03/06/2015

DatavisualizationguideforSAS

procsgplotdata=os;
title'ActualSalesbyLocationandItem';
vbarItem/response=Salesgroup=Locationstat=percentdatalabel;
xaxisdisplay=(nolabel);
yaxisgridlabel='Sales';
run;

Output:

(http://www.analyticsvidhya.com/wp-content/uploads/2015/06/17.png)

http://www.analyticsvidhya.com/blog/2015/06/datavisualizationguidesas/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+Anal

23/36

You might also like