Data Visualization
Data Visualization
1. Numerical Data:
Visual Encodings: Numerical data can be encoded using visual properties
such as:
Position: For example, in a bar chart, the height or position of bars
represents numerical values.
Length: The length of lines or bars can also encode numerical values.
Size: The size of visual elements like circles or bubbles can represent
numerical magnitudes.
Color intensity: Gradations in color intensity can indicate numerical
differences.
Examples: Bar charts, line graphs, scatter plots, bubble charts.
2. Categorical Data:
Visual Encodings: Categorical data can be encoded using visual properties
such as:
Color: Different colors or color categories can represent different
groups or categories.
Shape: Different shapes can be used to differentiate between
categories.
Position (for small categories): When the number of categories is
small, position along a common scale can be used.
Examples: Pie charts, stacked bar charts, stacked area charts, treemaps.
3. Temporal Data:
Visual Encodings: Temporal data can be encoded using visual properties such
as:
Position: Time can be represented on the x-axis of a chart, with data
points plotted at corresponding time intervals.
Length or size: The duration or magnitude of events can be represented
by the length of bars or the size of visual elements.
Color: Color can be used to highlight specific time periods or trends.
Examples: Time series line charts, Gantt charts, calendar heatmaps.
4. Spatial Data:
Visual Encodings: Spatial data can be encoded using visual properties such
as:
Position: Geographic locations can be plotted on a map using latitude
and longitude coordinates.
Color: Different colors or shades can represent different values or
categories within geographic regions.
Size: The size of visual elements like bubbles or markers can represent
quantitative data within specific geographic areas.
Examples: Choropleth maps, dot density maps, proportional symbol maps.
5. Multivariate Data:
Visual Encodings: When dealing with multiple variables, combinations of
visual encodings can be used to represent different dimensions of the data. For
example:
Position and color: Using both position and color to encode different
variables in a scatter plot.
Size and shape: Using both size and shape to encode different
categories or variables in a bubble chart.
Examples: Parallel coordinates plots, radar charts, heatmaps.
By understanding the characteristics of different data types and selecting appropriate visual
encodings, data visualizers can effectively communicate insights and patterns within datasets,
making information more accessible and understandable to a wider audience.
VISUALIZING TIME
Visualizing time data effectively is crucial for understanding trends,
patterns, and relationships over time. Here are some common techniques
for visualizing time:
1. Time Series Line Chart: This is one of the most common and
effective ways to visualize temporal data. Time is typically
represented on the x-axis, while the y-axis shows the values
corresponding to each time point. Line charts are useful for showing
trends and patterns over time, such as stock prices, temperature
variations, or website traffic over days, months, or years.
3. Area Chart: Area charts are similar to line charts but with the area
below the line filled with color. They are effective for showing
cumulative data or comparing the proportions of different categories
over time.
The Grammar of Graphics defines a set of basic elements and rules for
mapping data to visual properties. Here are the key components:
WILKINSON'S GRAMMER
It seems you're referring to Leland Wilkinson's Grammar of Graphics, a
foundational concept in data visualization. The Grammar of Graphics
provides a framework for constructing visualizations by breaking them
down into fundamental components. Here's an overview:
WICKHAM'S GRAMMER
Hadley Wickham's Grammar of Graphics, which is a framework for
creating data visualizations, primarily implemented in the R programming
language through the ggplot2 package. Here's an overview of Wickham's
Grammar of Graphics:
By carefully selecting and manipulating these aesthetic attributes, data visualizers can create
informative and visually appealing visualizations that effectively convey insights from data to
viewers.
UNIT 2
INTRODUCTION TO DATA VISUALIZATION
WITH POWER BI
Power BI is a powerful business analytics tool developed by Microsoft that
allows users to visualize and analyze data from various sources in
interactive and insightful ways. Here's an introductory overview of data
visualization with Power BI:
2. Data Modeling:
After connecting to data sources, users can model and transform
the data within Power BI using the Power Query Editor.
Data modeling involves tasks such as cleaning, transforming, and
shaping the data to prepare it for analysis and visualization.
Power BI's data modeling capabilities allow users to create
relationships between different datasets, define calculated columns
and measures, and apply data modeling best practices.
3. Visualization Design:
Power BI offers a wide range of visualization types, including bar
charts, line charts, pie charts, scatter plots, maps, tables, matrices,
and more.
Users can drag and drop fields from their dataset onto the canvas to
create visualizations quickly.
Power BI provides extensive formatting options to customize the
appearance of visualizations, including colors, fonts, labels, axes,
and legends.
4. Interactive Dashboards:
Users can combine multiple visualizations into interactive
dashboards, allowing stakeholders to explore and interact with the
data dynamically.
Power BI dashboards support drill-down, cross-filtering, slicing, and
other interactive features to facilitate data exploration and analysis.
Dashboards can be shared securely with colleagues and
stakeholders within an organization or embedded into other
applications and websites.
7. Mobile Experience:
Power BI offers a mobile app for iOS, Android, and Windows devices,
allowing users to access and interact with their reports and
dashboards on the go.
The mobile app provides a responsive and touch-friendly
experience, enabling users to stay connected to their data anytime,
anywhere.
3. Visualization Capabilities:
Power BI: Power BI provides a wide range of visualization types, including basic
charts, maps, tables, matrices, and custom visuals created by the community. While it
offers robust visualization capabilities, some users find it less flexible compared to
Tableau for complex visualizations.
Tableau: Tableau is renowned for its extensive visualization options and flexibility. It
offers a rich library of built-in visualizations, advanced charting options, and the
ability to create highly customized and interactive dashboards.
6. Pricing:
Power BI: Power BI offers a range of pricing options, including a free version with
limited features and paid plans with additional capabilities. Pricing is based on per-
user licensing, with options for Power BI Pro and Power BI Premium.
Tableau: Tableau offers various pricing tiers, including a free public version and paid
plans for Tableau Desktop, Tableau Server, and Tableau Online. Pricing is based on a
subscription model, with options for individual users, teams, and organizations.
In summary, both Power BI and Tableau are powerful data visualization tools with their
strengths and advantages. The choice between Power BI and Tableau often depends on
factors such as budget, existing technology infrastructure, user preferences, and specific
requirements for data analysis and visualization within an organization.
TYPES OF GRAPHS
1. Bar Chart: A bar chart is used to compare values across categories. It
consists of rectangular bars whose lengths are proportional to the values they
represent.
2. Column Chart: Similar to a bar chart, a column chart represents data using
vertical bars. It's often used to compare data across different categories or
time periods.
3. Line Chart: A line chart is used to show trends over time or to represent
continuous data. It's especially useful for visualizing time series data.
4. Area Chart: An area chart is similar to a line chart but with the area below
the line filled with color. It's commonly used to show cumulative data or to
represent proportions over time.
5. Scatter Plot: A scatter plot is used to display the relationship between two
continuous variables. Each data point is represented by a marker, and the
position of the marker on the chart corresponds to the values of the variables.
6. Pie Chart: A pie chart is used to show the proportion of each category in a
dataset. It's a circular chart divided into slices, with each slice representing a
different category and its size proportional to the value it represents.
7. Donut Chart: Similar to a pie chart, a donut chart also represents proportions
of a whole but with a hole in the center. It's often used to emphasize the total
value while still showing individual categories.
8. Tree Map: A tree map visualizes hierarchical data using nested rectangles.
The size and color of each rectangle represent different measures, making it
easy to compare values within categories and subcategories.
9. Gauge: A gauge chart is used to visualize a single value within a predefined
range. It resembles a speedometer or gauge, with a pointer indicating the
value on a scale.
10. KPI (Key Performance Indicator): A KPI visual represents a single value
and its target or benchmark. It's commonly used to track progress toward
specific goals or objectives.
11. Card: A card visual displays a single value or metric, often with additional
context or comparison to other values. It's useful for highlighting key metrics
or summary statistics.
12. Matrix: A matrix visualizes data in a tabular format, similar to a pivot table. It
allows users to display data across multiple dimensions, with rows and
columns representing different categories or attributes.
13. Slicer: A slicer is a filter control that allows users to interactively filter data
displayed in other visuals. It's often used to segment data based on specific
criteria or categories.
These are just a few examples of the types of graphs and charts you can create in
Power BI. Power BI offers a wide range of visualization options, allowing users to
choose the most appropriate chart type based on their data and analysis
requirements.
1. Bar Graph:
Used to compare discrete categories or groups.
Vertical or horizontal bars represent the values of different
categories.
Useful for visualizing categorical data and making
comparisons.
2. Histogram:
Displays the distribution of continuous data.
Bars represent the frequency or count of data within
predefined intervals (bins).
Helps visualize the shape, central tendency, and spread of the
data.
3. Line Graph:
Shows trends and changes over continuous or ordered
categories, usually time.
Points are connected by lines to represent the relationship
between variables.
Useful for illustrating trends, patterns, and relationships in
data.
4. Pie Chart:
Represents parts of a whole as slices of a circular pie.
Each slice's size corresponds to the proportion of the whole it
represents.
Suitable for displaying percentages or proportions of
categorical data.
5. Scatter Plot:
Displays the relationship between two continuous variables.
Each point represents an observation with values on both
axes.
Helps identify correlations, clusters, or outliers in data.
6. Area Chart:
Similar to a line graph but with the area below the line filled
with color.
Useful for showing cumulative totals or proportions over time.
Helps visualize changes in magnitude over time while
emphasizing the overall trend.
7. Box Plot (Box-and-Whisker Plot):
Summarizes the distribution of continuous data and identifies
outliers.
Includes a box representing the interquartile range (IQR) and
whiskers representing variability outside the IQR.
Provides insights into the spread, central tendency, and
symmetry of the data distribution.
8. Heatmap:
Represents data values in a matrix format using colors.
Each cell's color intensity indicates the value of the data point.
Useful for visualizing relationships and patterns in large
datasets, especially in multidimensional data.
9. Bubble Chart:
Similar to a scatter plot but with a third variable represented
by the size of the markers (bubbles).
Combines the features of a scatter plot and a proportional
symbol map.
Useful for visualizing three-dimensional data and highlighting
patterns among multiple variables.
10. Stacked Bar Chart:
A variation of the bar graph where bars are stacked on top of
each other to represent the total value.
Each segment within a bar represents a different category,
and the total height of the bar remains constant.
Useful for comparing the total values across different groups
while showing the contribution of each category.
These are just a few examples of the types of graphs commonly used in
data visualization. The choice of graph depends on the nature of the data,
the relationships being explored, and the insights you want to convey.
2. Get Data:
Once Power BI Desktop is open, click on the "Get Data" button
located in the Home tab on the ribbon.
5. Load Data:
After connecting to the data source, Power BI will display a
Navigator window showing a preview of the data available in the
selected source.
Select the specific data tables, sheets, or views you want to import
into Power BI by checking the boxes next to them.
You can preview the data by clicking on a table or sheet to ensure
you're selecting the correct data.
Once you've selected the desired data, click on the "Load" button to
import it into Power BI.
7. Data Model:
After loading the data, Power BI will create a data model based on
the imported tables and relationships between them.
You can view and manage the data model by clicking on the "Model"
view in Power BI Desktop.
In the data model view, you can define relationships between
tables, create calculated columns and measures, and perform other
data modeling tasks.
By following these steps, you can easily load data into Power BI and start
creating visualizations and reports to analyze your data.
UNIT 3
NEW COLUMN POWER BI
Creating a new column in Power BI involves adding a calculated column to
your dataset. You can achieve this using Power Query Editor or by
creating calculated columns directly in the Data View. Here's how you can
add a new column using both methods:
By following these steps, you can create a new column in Power BI using
either Power Query Editor or the Data View, depending on your preference
and workflow.
By following these steps, you can create a new measure in Power BI using
either the Data View or the Data Model, depending on your preference
and workflow.
1. Aggregate Functions:
SUM: Calculates the sum of values in a column.
AVERAGE: Calculates the arithmetic mean of values in a column.
MIN / MAX: Returns the minimum or maximum value in a column.
COUNT / COUNTROWS: Counts the number of rows in a table or
the number of non-blank rows in a column.
3. Filter Functions:
FILTER: Filters a table based on a condition.
RELATED / RELATEDTABLE: Retrieves related values from another
table.
ALL / ALLEXCEPT / ALLSELECTED: Removes filters from a column
or table.
4. Logical Functions:
IF / SWITCH: Conditionally evaluates expressions.
AND / OR / NOT: Perform logical operations.
5. Text Functions:
CONCATENATE / CONCATENATEX: Combines strings or
expressions into a single string.
LEFT / RIGHT / MID: Extracts substrings from a string.
LEN: Returns the length of a string.
6. Statistical Functions:
STDEV / STDEVP: Calculates the standard deviation of a sample or
population.
VAR / VARP: Calculates the variance of a sample or population.
7. Math Functions:
ROUND / ROUNDDOWN / ROUNDUP: Rounds a number to a
specified number of digits.
ABS / SQRT: Calculates the absolute value or square root of a
number.
8. Information Functions:
ISBLANK / ISNUMBER / ISTEXT: Checks if a value is blank,
numeric, or text.
CONTAINS / CONTAINSSTRING: Checks if a string contains a
specific substring.
9. Time Intelligence Functions:
TOTALYTD / TOTALMTD / TOTALQTD: Calculates year-to-date,
month-to-date, or quarter-to-date totals.
DATESYTD / DATESMTD / DATESQTD: Returns a set of dates for
the year-to-date, month-to-date, or quarter-to-date period.
These are just a few examples of the many DAX functions available in
Power BI. DAX provides powerful capabilities for data manipulation,
calculation, and analysis, allowing users to derive insights from their data
effectively. Depending on your specific requirements, you can use DAX
functions to create calculated columns, measures, and complex
calculations in Power BI.
POWER BI QUERY
In Power BI, the term "query" typically refers to the process of extracting, transforming, and
loading (ETL) data into your dataset using Power Query Editor. Power Query is a data
connectivity and preparation tool that allows you to connect to various data sources,
transform data, and shape it into the desired format before loading it into Power BI.
Here's an overview of how you can use Power Query to create and manipulate queries in
Power BI:
5. Refresh Data:
After loading the data into Power BI, you can refresh it to reflect any changes made to
the source data.
Click on "Refresh" in the Home tab to refresh the dataset and update it with the latest
data from the source.
By using Power Query in Power BI, you can efficiently prepare and shape your data to meet
your analysis and reporting needs, ensuring that you work with clean, well-structured data for
visualization and analysis.