0% found this document useful (0 votes)
29 views

Data Visualization

Uploaded by

Ayushi Rajput
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Visualization

Uploaded by

Ayushi Rajput
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT 1

INTRODUCTION TO DATA VISUALIZATION:

Data visualization is a powerful tool used to transform raw data into


meaningful insights, patterns, and trends through graphical
representation. It combines elements of art, statistics, and storytelling to
make complex information accessible and understandable to a wide
audience. Here's a basic overview of key concepts in data visualization:

1. Purpose: The primary goal of data visualization is to communicate


information effectively. Whether it's presenting sales figures,
scientific findings, or demographic trends, data visualization helps
convey insights in a clear and compelling manner.
2. Types of Data Visualization: There are various types of
visualizations, including:
 Charts and graphs (e.g., bar charts, line graphs, pie charts)
 Maps (choropleth maps, heat maps)
 Diagrams (network diagrams, flowcharts)
 Infographics (combining visual elements with text)
 Dashboards (interactive displays of multiple visualizations)
3. Data Types: Data can be categorized into different types,
including:
 Numerical data: Quantitative information such as sales
figures, temperatures, or population counts.
 Categorical data: Qualitative information grouped into
categories, like types of products or customer segments.
 Temporal data: Time-based information, such as daily sales
over a month or quarterly financial data.
 Spatial data: Geographical information represented on maps,
like population density or election results by region.
4. Visual Encoding: Visual encoding refers to the mapping of data
attributes to visual properties such as position, size, color, and
shape. For example, in a bar chart, the height of the bars represents
the magnitude of the data, while different colors may indicate
categories.
5. Best Practices:
 Choose the right visualization type for your data and message.
 Simplify complex data while retaining key insights.
 Use appropriate color schemes, labels, and annotations to
enhance clarity.
 Ensure accuracy and integrity of the data being visualized.
 Consider the audience's level of expertise and design
visualizations accordingly.
 Embrace interactivity to allow users to explore data
dynamically.
6. Tools: There are many tools available for creating data
visualizations, ranging from simple spreadsheet software like
Microsoft Excel to more advanced platforms like Tableau, Python
libraries like Matplotlib and Seaborn, or web-based tools like D3.js.
7. Applications: Data visualization is used across various domains,
including business (sales analysis, market trends), science
(experimental results, research findings), journalism (storytelling
with data), healthcare (patient outcomes, disease trends), and
more.

In essence, data visualization empowers decision-makers, analysts, and


researchers to uncover insights, detect patterns, and communicate
findings effectively, ultimately driving informed decision-making and
understanding of complex phenomena.

DATA TYPES AND VISUAL ENCODING:


Data types and visual encoding play a crucial role in the creation of effective data
visualizations. Here's an overview of common data types and how they can be visually
encoded:

1. Numerical Data:
 Visual Encodings: Numerical data can be encoded using visual properties
such as:
 Position: For example, in a bar chart, the height or position of bars
represents numerical values.
 Length: The length of lines or bars can also encode numerical values.
 Size: The size of visual elements like circles or bubbles can represent
numerical magnitudes.
 Color intensity: Gradations in color intensity can indicate numerical
differences.
 Examples: Bar charts, line graphs, scatter plots, bubble charts.
2. Categorical Data:
 Visual Encodings: Categorical data can be encoded using visual properties
such as:
 Color: Different colors or color categories can represent different
groups or categories.
 Shape: Different shapes can be used to differentiate between
categories.
 Position (for small categories): When the number of categories is
small, position along a common scale can be used.
 Examples: Pie charts, stacked bar charts, stacked area charts, treemaps.
3. Temporal Data:
 Visual Encodings: Temporal data can be encoded using visual properties such
as:
 Position: Time can be represented on the x-axis of a chart, with data
points plotted at corresponding time intervals.
 Length or size: The duration or magnitude of events can be represented
by the length of bars or the size of visual elements.
 Color: Color can be used to highlight specific time periods or trends.
 Examples: Time series line charts, Gantt charts, calendar heatmaps.
4. Spatial Data:
 Visual Encodings: Spatial data can be encoded using visual properties such
as:
 Position: Geographic locations can be plotted on a map using latitude
and longitude coordinates.
 Color: Different colors or shades can represent different values or
categories within geographic regions.
 Size: The size of visual elements like bubbles or markers can represent
quantitative data within specific geographic areas.
 Examples: Choropleth maps, dot density maps, proportional symbol maps.
5. Multivariate Data:
 Visual Encodings: When dealing with multiple variables, combinations of
visual encodings can be used to represent different dimensions of the data. For
example:
 Position and color: Using both position and color to encode different
variables in a scatter plot.
 Size and shape: Using both size and shape to encode different
categories or variables in a bubble chart.
 Examples: Parallel coordinates plots, radar charts, heatmaps.

By understanding the characteristics of different data types and selecting appropriate visual
encodings, data visualizers can effectively communicate insights and patterns within datasets,
making information more accessible and understandable to a wider audience.

VISUALIZING TIME
Visualizing time data effectively is crucial for understanding trends,
patterns, and relationships over time. Here are some common techniques
for visualizing time:

1. Time Series Line Chart: This is one of the most common and
effective ways to visualize temporal data. Time is typically
represented on the x-axis, while the y-axis shows the values
corresponding to each time point. Line charts are useful for showing
trends and patterns over time, such as stock prices, temperature
variations, or website traffic over days, months, or years.

2. Bar Chart or Column Chart: Bar or column charts can also be


used to display time-based data, especially when comparing
discrete time intervals or categories. Each bar or column represents
a specific time period, and the height or length corresponds to the
value of the data at that time.

3. Area Chart: Area charts are similar to line charts but with the area
below the line filled with color. They are effective for showing
cumulative data or comparing the proportions of different categories
over time.

4. Heatmap Calendar: This visualization technique involves


representing time-based data using a calendar format, where each
cell corresponds to a day or other time unit. The color intensity or
shading of each cell represents the value of the data for that time
period.
5. Gantt Chart: Gantt charts are commonly used in project
management to visualize the scheduling and progress of tasks over
time. Each task is represented as a horizontal bar along a time axis,
showing its start and end dates.

6. Time Series Scatter Plot: Scatter plots can be used to visualize


relationships between two variables over time. Each data point
represents a specific time period, with one variable plotted on the x-
axis and the other on the y-axis.

7. Time Series Decomposition Plot: This technique involves


decomposing a time series into its trend, seasonal, and residual
components, which are plotted separately. It helps in understanding
the underlying patterns and irregularities in the data.

When visualizing time data, it's essential to choose the appropriate


visualization technique based on the nature of the data and the insights
you want to convey. Additionally, labeling axes, providing context, and
using intuitive color schemes can enhance the clarity and effectiveness of
the visualization.

STRATERGIES OF ILLUSTRATING TRENDS AND RESULTS


IN DATA VISUALIZATION
Illustrating trends and results effectively in data visualization involves
employing various strategies to communicate insights clearly and
intuitively. Here are some strategies to consider:

1. Choose the Right Visualization Type: Selecting the appropriate


visualization type is crucial for illustrating trends and results. For
example:
 Use line charts for showing trends over time.
 Bar charts are effective for comparing discrete categories or
time periods.
 Area charts can highlight cumulative trends or proportions
over time.
 Scatter plots are useful for visualizing relationships between
variables.
2. Highlight Key Insights: Emphasize important trends or results by
using visual cues such as bold colors, annotations, or callouts. Direct
the viewer's attention to significant points or changes in the data.
3. Provide Context: Help viewers understand the significance of the
trends by providing context. This could include adding labels, titles,
captions, or descriptions that explain the data, its source, and any
relevant background information.
4. Use Consistent and Intuitive Scales: Ensure that scales on axes
are consistent and easy to interpret. Avoid distorting scales or using
non-linear axes unless necessary. Use clear labels and units to aid
understanding.
5. Employ Interactive Features: Incorporate interactive elements
where possible to allow users to explore the data further. Interactive
visualizations can enable users to drill down into specific time
periods, filter data based on criteria, or hover over data points to
view detailed information.
6. Show Uncertainty or Variability: If applicable, illustrate
uncertainty or variability in the data using techniques such as error
bars, confidence intervals, or shaded regions. This helps viewers
understand the reliability of the trends or results being presented.
7. Use Trend Lines or Smoothing: Add trend lines or smoothing
curves to highlight underlying patterns in the data and make trends
more apparent. This can help viewers identify long-term trends
amidst noise or fluctuations.
8. Utilize Annotations and Labels: Incorporate textual annotations
or labels to provide additional context or explanations for specific
data points or trends. Annotations can help clarify complex patterns
or outliers in the data.
9. Consider Animation for Time-based Data: For time-based data,
consider using animation to show changes over time dynamically.
Animated visualizations can effectively convey temporal trends and
patterns, especially when dealing with large datasets or complex
temporal relationships.
10. Keep It Simple: Avoid cluttering the visualization with
unnecessary elements. Simplify the design to focus on the most
critical trends and results, ensuring that the visualization is easy to
interpret at a glance.

By employing these strategies, you can create data visualizations that


effectively illustrate trends and results, enabling viewers to gain insights
and make informed decisions based on the data presented.

IMPORTANCE OF DATA VISUALIZATION


The importance of data visualization stems from its ability to transform
complex datasets into understandable, actionable insights. Here are
several reasons why data visualization is crucial:

1. Enhanced Understanding: Visual representations of data make it


easier for individuals to grasp complex information quickly and
intuitively. Patterns, trends, and relationships that might be difficult
to discern from raw data become more apparent when presented
visually.
2. Effective Communication: Visualizations enable effective
communication of data-driven insights to diverse audiences,
including stakeholders, decision-makers, and the general public.
Visual representations transcend language barriers and facilitate
clearer communication of complex ideas.
3. Decision Making: Data visualizations empower decision-makers to
make informed decisions based on data-driven insights. By
presenting information visually, decision-makers can evaluate
options, identify trends, and anticipate outcomes more effectively.
4. Identifying Patterns and Trends: Visualizations help uncover
hidden patterns, trends, and correlations within datasets that may
not be apparent through other means. By visually exploring data,
analysts can discover insights that lead to innovation, problem-
solving, and strategic planning.
5. Storytelling with Data: Visualizations enable storytelling with
data, allowing analysts to craft narratives that convey a compelling
message. By combining data with visual elements, storytellers can
engage audiences, evoke emotions, and drive action.
6. Detecting Anomalies and Outliers: Visualizations make it easier
to identify anomalies, outliers, and irregularities within datasets. By
visually inspecting data, analysts can quickly spot deviations from
expected patterns and investigate potential issues or opportunities.
7. Facilitating Exploration and Discovery: Interactive data
visualizations enable users to explore data dynamically, drilling
down into specific subsets, filtering data based on criteria, and
gaining deeper insights through exploration. This fosters a culture of
curiosity and discovery within organizations.
8. Increased Transparency and Accountability: Transparent data
visualizations provide stakeholders with visibility into underlying
data sources, methodologies, and assumptions. This transparency
fosters trust, accountability, and credibility in data-driven decision-
making processes.
9. Efficient Data Analysis: Visualizations streamline the data
analysis process by condensing large volumes of information into
concise, visually digestible formats. Analysts can quickly identify
key insights and focus their attention on areas of interest, saving
time and resources.
10. Driving Innovation and Creativity: Data visualizations
inspire innovation and creativity by encouraging analysts to
experiment with different visualization techniques, explore
unconventional perspectives, and communicate insights in novel
ways. This fosters a culture of innovation and continuous
improvement within organizations.

In summary, data visualization plays a critical role in transforming data


into actionable insights, facilitating effective communication, driving
decision-making, and fostering innovation. It is an essential tool for
navigating the complexities of the data-driven world and unlocking the full
potential of data for individuals, organizations, and society as a whole.

GRAMMER OF GRAPHICS DATA VISUALIZATION


The Grammar of Graphics is a conceptual framework for creating data
visualizations introduced by Leland Wilkinson in his book "The Grammar of
Graphics" published in 1999. It provides a systematic approach to
understanding and constructing visualizations by breaking them down into
fundamental components.

The Grammar of Graphics defines a set of basic elements and rules for
mapping data to visual properties. Here are the key components:

1. Data: The raw data that you want to visualize. It could be


structured in various formats such as tables, databases, or
spreadsheets.
2. Aesthetic Mapping: Aesthetic mappings define how variables in
your data are mapped to visual properties such as position, color,
size, shape, and texture. For example, mapping a numerical variable
to the y-axis of a chart or mapping a categorical variable to the
color of data points.
3. Geometric Objects (Geoms): Geometric objects represent the
basic visual elements that you see in a plot, such as points, lines,
bars, and polygons. Each geom corresponds to a different type of
visualization.
4. Statistical Transformations (Stats): Statistical transformations
are operations that modify or summarize the data before plotting.
Examples include aggregating data points, smoothing curves, or
calculating confidence intervals.
5. Scales: Scales define the mapping between data values and the
visual properties of the plot. They include axes, legends, and color
scales. Scales are responsible for converting data values into visual
attributes that can be perceived by viewers.
6. Coordinate Systems: Coordinate systems define the spatial layout
of the plot. They determine how data coordinates are mapped onto
the visualization space, including the position, orientation, and
scaling of axes.
7. Faceting (or Conditioning): Faceting involves dividing the data
into subsets and creating separate visualizations for each subset. It
allows you to explore relationships within the data across different
levels of one or more categorical variables.

By understanding and applying the Grammar of Graphics framework, you


can create a wide range of data visualizations in a systematic and
consistent manner. This approach provides flexibility and scalability,
allowing you to create custom visualizations tailored to your specific data
and analytical needs. Additionally, many modern data visualization
libraries and tools, such as ggplot2 in R and Plotly in Python, are built
based on the Grammar of Graphics principles, making it easier to
implement complex visualizations efficiently.

WILKINSON'S GRAMMER
It seems you're referring to Leland Wilkinson's Grammar of Graphics, a
foundational concept in data visualization. The Grammar of Graphics
provides a framework for constructing visualizations by breaking them
down into fundamental components. Here's an overview:

Components of Wilkinson's Grammar of Graphics:


1. Data: The raw information being visualized. It could be in the form
of numerical data, categorical data, or other types.
2. Aesthetic Mappings: Aesthetic mappings define how data
attributes (such as numerical values, categories, or dates) are
mapped to visual properties (such as position, color, size, shape).
For example, mapping a numeric variable to the y-axis of a chart or
mapping categories to different colors in a scatter plot.
3. Geometric Objects (Geoms): Geometric objects represent the
visual elements used to display data, such as points, lines, bars, and
shapes. Different geoms are used to create different types of charts
and visualizations.
4. Statistical Transformations (Stats): Statistical transformations
are operations applied to the raw data to calculate summary
statistics or perform other transformations before visualization.
Common statistical transformations include aggregations (e.g., sum,
mean), smoothing, and binning.
5. Scales: Scales define the mapping between data values and their
visual representation. Scales determine how data values are
translated into positions, sizes, colors, or shapes in the visualization.
Examples include linear scales, logarithmic scales, and categorical
scales.
6. Coordinates: Coordinates define the spatial reference system used
to position and arrange visual elements within the visualization.
Common coordinate systems include Cartesian coordinates (x, y),
polar coordinates, and geographic coordinates.
7. Facets (Subplots): Facets are used to divide the data into subsets
and display them as separate plots or panels. Faceting allows for
comparisons between different subsets of data and is commonly
used for exploring multi-dimensional datasets.

By understanding and leveraging these components, data visualization


practitioners can create a wide variety of visualizations to effectively
communicate insights from data. Various visualization libraries and tools,
such as ggplot2 in R and ggplot in Python, implement the Grammar of
Graphics principles, making it easier for users to create custom and
expressive visualizations.

WICKHAM'S GRAMMER
Hadley Wickham's Grammar of Graphics, which is a framework for
creating data visualizations, primarily implemented in the R programming
language through the ggplot2 package. Here's an overview of Wickham's
Grammar of Graphics:

Components of Wickham's Grammar of Graphics:


1. Data: Like in Wilkinson's Grammar of Graphics, the raw data is the
starting point. This could be in the form of a data frame or another
structured format.
2. Aesthetic Mapping: Aesthetic mappings define how variables in
the data are mapped to visual properties such as color, shape, size,
and position. This mapping tells ggplot2 how to visually represent
the data.
3. Geometric Objects (Geoms): Geometric objects represent the
actual marks that are used to visualize data points. Examples
include points, lines, bars, polygons, and text. Each geom creates a
different type of plot.
4. Statistical Transformations (Stats): Statistical transformations
modify the raw data in some way before plotting. Examples include
summarizing data (e.g., mean, median), smoothing data (e.g., loess
smoothing), and binning data (e.g., creating histograms).
5. Scales: Scales control how data values are mapped to visual
properties. Scales can be continuous (e.g., linear, log) or discrete
(e.g., categorical). They also control the appearance of axes and
legends.
6. Coordinate Systems: Coordinate systems define the spatial layout
of the plot. Common coordinate systems include Cartesian
coordinates (x and y axes), polar coordinates, and map projections
for geographic data.
7. Facets (Subplots): Faceting involves creating multiple small plots,
each showing a different subset of the data. This is useful for
exploring relationships within data subsets or for comparing
different groups.
Example Usage in ggplot2:
Here's an example of how you might use Wickham's Grammar of Graphics
in ggplot2 to create a scatter plot:

ggplot() initializes the plot.


 aes() defines the aesthetic mappings, specifying which variables in
the dataset should be mapped to the x and y axes.
 geom_point() adds points to the plot to represent the data.
 labs() sets the plot title and axis labels.

By following the principles of Wickham's Grammar of Graphics, ggplot2


allows for flexible and customizable data visualizations that can effectively
communicate insights from diverse datasets

AESTHETIC ATTRIBUTES IN DATA


VISUALIZATION
In data visualization, aesthetic attributes are visual properties that can be manipulated to
encode data values or characteristics. These attributes are used to visually represent data
points and convey information effectively to viewers. Common aesthetic attributes include:

1. Position: Position is one of the most fundamental aesthetic attributes. It determines


the spatial location of data points along the x-axis, y-axis, or other coordinate systems.
Position encoding is precise and intuitive, making it suitable for representing
quantitative data.
2. Color: Color is a powerful aesthetic attribute used to encode categorical or
continuous data. It can be used to differentiate between categories, represent
magnitude or intensity, highlight important points, or group related data. Care should
be taken to choose colors that are distinguishable and accessible to all viewers,
including those with color vision deficiencies.
3. Shape: Shape refers to the visual appearance of data points, such as circles, squares,
triangles, or other geometric shapes. Shape encoding is commonly used to represent
categories or groups within the data, especially when combined with color or size
variations.
4. Size: Size represents the physical dimensions of data points, such as the diameter of
circles or the height of bars. Size encoding is effective for representing quantitative
data, allowing viewers to compare magnitudes or proportions easily.
5. Texture/Pattern: Texture or pattern refers to the surface characteristics of data
points, such as solid fill, stripes, dots, or hatching. Texture encoding can be used to
represent categories or groups, but it should be used sparingly to avoid visual clutter
and confusion.
6. Opacity/Transparency: Opacity or transparency controls the degree to which data
points or graphical elements are visible. It can be used to emphasize certain data
points while de-emphasizing others, or to create visual overlays and layering effects.
7. Line Type: Line type refers to the style or pattern of lines used in line charts or other
line-based visualizations. Different line types, such as solid, dashed, or dotted lines,
can be used to represent different groups or categories within the data.
8. Angle: Angle encoding represents the orientation or direction of data points. While
less commonly used than other aesthetic attributes, angle encoding can be effective
for representing directional data, such as wind direction or spatial relationships.
9. Text: Text encoding involves the use of labels, annotations, or textual elements to
provide additional information or context within the visualization. Text can be used to
label data points, axes, legends, titles, or other components of the visualization.

By carefully selecting and manipulating these aesthetic attributes, data visualizers can create
informative and visually appealing visualizations that effectively convey insights from data to
viewers.

UNIT 2
INTRODUCTION TO DATA VISUALIZATION
WITH POWER BI
Power BI is a powerful business analytics tool developed by Microsoft that
allows users to visualize and analyze data from various sources in
interactive and insightful ways. Here's an introductory overview of data
visualization with Power BI:

1. Connecting Data Sources:


 Power BI allows users to connect to a wide range of data sources,
including Excel spreadsheets, databases (e.g., SQL Server, MySQL),
cloud services (e.g., Microsoft Azure, Google Analytics), and web
services (e.g., Salesforce, Facebook).
 Users can import data into Power BI or establish a live connection to
keep the data synchronized with its source.

2. Data Modeling:
 After connecting to data sources, users can model and transform
the data within Power BI using the Power Query Editor.
 Data modeling involves tasks such as cleaning, transforming, and
shaping the data to prepare it for analysis and visualization.
 Power BI's data modeling capabilities allow users to create
relationships between different datasets, define calculated columns
and measures, and apply data modeling best practices.

3. Visualization Design:
 Power BI offers a wide range of visualization types, including bar
charts, line charts, pie charts, scatter plots, maps, tables, matrices,
and more.
 Users can drag and drop fields from their dataset onto the canvas to
create visualizations quickly.
 Power BI provides extensive formatting options to customize the
appearance of visualizations, including colors, fonts, labels, axes,
and legends.

4. Interactive Dashboards:
 Users can combine multiple visualizations into interactive
dashboards, allowing stakeholders to explore and interact with the
data dynamically.
 Power BI dashboards support drill-down, cross-filtering, slicing, and
other interactive features to facilitate data exploration and analysis.
 Dashboards can be shared securely with colleagues and
stakeholders within an organization or embedded into other
applications and websites.

5. Data Insights and Exploration:


 Power BI includes built-in AI capabilities, such as Quick Insights and
Q&A (Natural Language Query), to automatically identify patterns,
trends, and outliers in the data.
 Users can explore data using interactive visuals, filters, and slicers
to gain deeper insights and uncover hidden relationships.

6. Collaboration and Sharing:


 Power BI facilitates collaboration and sharing of insights within
teams and organizations.
 Users can publish reports and dashboards to the Power BI service,
where they can be accessed by others with appropriate permissions.
 Power BI integrates with Microsoft Teams, SharePoint, and other
collaboration tools, enabling seamless sharing and communication.

7. Mobile Experience:
 Power BI offers a mobile app for iOS, Android, and Windows devices,
allowing users to access and interact with their reports and
dashboards on the go.
 The mobile app provides a responsive and touch-friendly
experience, enabling users to stay connected to their data anytime,
anywhere.

In summary, Power BI provides a comprehensive platform for data


visualization and analysis, empowering users to turn raw data into
actionable insights through intuitive visualizations, interactive
dashboards, and collaborative features.
DIFFERENCE BETWEEN POWER BI AND
TABLEAU
Power BI and Tableau are both popular data visualization tools used for analyzing and
visualizing data, but they have some differences in terms of features, pricing, and target
audience. Here's a comparison between Power BI and Tableau:

1. Company and Licensing Model:


 Power BI: Developed by Microsoft, Power BI follows a subscription-based licensing
model. It offers both desktop and cloud-based versions, with various pricing tiers,
including a free version with limited features and paid plans with advanced
capabilities.
 Tableau: Tableau was initially developed by Tableau Software (now acquired by
Salesforce). It also follows a subscription-based licensing model, offering desktop,
server, and online versions. Tableau offers a free public version and paid plans with
additional features.

2. Ease of Use and Learning Curve:


 Power BI: Power BI is known for its user-friendly interface and seamless integration
with other Microsoft products such as Excel and SharePoint. It is often preferred by
organizations already using Microsoft technologies, and users familiar with Excel find
it relatively easy to learn.
 Tableau: Tableau is praised for its intuitive drag-and-drop interface and powerful
visualization capabilities. It has a steeper learning curve compared to Power BI but
offers more advanced features for data analysis and visualization.

3. Visualization Capabilities:
 Power BI: Power BI provides a wide range of visualization types, including basic
charts, maps, tables, matrices, and custom visuals created by the community. While it
offers robust visualization capabilities, some users find it less flexible compared to
Tableau for complex visualizations.
 Tableau: Tableau is renowned for its extensive visualization options and flexibility. It
offers a rich library of built-in visualizations, advanced charting options, and the
ability to create highly customized and interactive dashboards.

4. Data Connectivity and Integration:


 Power BI: Power BI integrates seamlessly with various data sources, including
Microsoft Excel, SQL Server, Azure, SharePoint, and numerous third-party
connectors. It offers native integration with other Microsoft products and services,
simplifying data connectivity.
 Tableau: Tableau also supports a wide range of data sources and offers robust
connectivity options. It integrates with databases, cloud services, spreadsheets, and
other data formats, allowing users to access and analyze data from diverse sources.

5. Collaboration and Sharing:


 Power BI: Power BI enables collaboration and sharing through its cloud-based
service. Users can publish reports and dashboards to the Power BI service, where they
can be accessed, shared, and collaborated on by others within the organization.
 Tableau: Tableau Server and Tableau Online facilitate collaboration and sharing of
visualizations within organizations. Users can publish workbooks to Tableau Server
or Tableau Online, where they can be accessed, shared, and governed centrally.

6. Pricing:
 Power BI: Power BI offers a range of pricing options, including a free version with
limited features and paid plans with additional capabilities. Pricing is based on per-
user licensing, with options for Power BI Pro and Power BI Premium.
 Tableau: Tableau offers various pricing tiers, including a free public version and paid
plans for Tableau Desktop, Tableau Server, and Tableau Online. Pricing is based on a
subscription model, with options for individual users, teams, and organizations.

In summary, both Power BI and Tableau are powerful data visualization tools with their
strengths and advantages. The choice between Power BI and Tableau often depends on
factors such as budget, existing technology infrastructure, user preferences, and specific
requirements for data analysis and visualization within an organization.

TYPES OF GRAPHS
1. Bar Chart: A bar chart is used to compare values across categories. It
consists of rectangular bars whose lengths are proportional to the values they
represent.
2. Column Chart: Similar to a bar chart, a column chart represents data using
vertical bars. It's often used to compare data across different categories or
time periods.
3. Line Chart: A line chart is used to show trends over time or to represent
continuous data. It's especially useful for visualizing time series data.
4. Area Chart: An area chart is similar to a line chart but with the area below
the line filled with color. It's commonly used to show cumulative data or to
represent proportions over time.
5. Scatter Plot: A scatter plot is used to display the relationship between two
continuous variables. Each data point is represented by a marker, and the
position of the marker on the chart corresponds to the values of the variables.
6. Pie Chart: A pie chart is used to show the proportion of each category in a
dataset. It's a circular chart divided into slices, with each slice representing a
different category and its size proportional to the value it represents.
7. Donut Chart: Similar to a pie chart, a donut chart also represents proportions
of a whole but with a hole in the center. It's often used to emphasize the total
value while still showing individual categories.
8. Tree Map: A tree map visualizes hierarchical data using nested rectangles.
The size and color of each rectangle represent different measures, making it
easy to compare values within categories and subcategories.
9. Gauge: A gauge chart is used to visualize a single value within a predefined
range. It resembles a speedometer or gauge, with a pointer indicating the
value on a scale.
10. KPI (Key Performance Indicator): A KPI visual represents a single value
and its target or benchmark. It's commonly used to track progress toward
specific goals or objectives.
11. Card: A card visual displays a single value or metric, often with additional
context or comparison to other values. It's useful for highlighting key metrics
or summary statistics.
12. Matrix: A matrix visualizes data in a tabular format, similar to a pivot table. It
allows users to display data across multiple dimensions, with rows and
columns representing different categories or attributes.
13. Slicer: A slicer is a filter control that allows users to interactively filter data
displayed in other visuals. It's often used to segment data based on specific
criteria or categories.

These are just a few examples of the types of graphs and charts you can create in
Power BI. Power BI offers a wide range of visualization options, allowing users to
choose the most appropriate chart type based on their data and analysis
requirements.
1. Bar Graph:
 Used to compare discrete categories or groups.
 Vertical or horizontal bars represent the values of different
categories.
 Useful for visualizing categorical data and making
comparisons.
2. Histogram:
 Displays the distribution of continuous data.
 Bars represent the frequency or count of data within
predefined intervals (bins).
 Helps visualize the shape, central tendency, and spread of the
data.
3. Line Graph:
 Shows trends and changes over continuous or ordered
categories, usually time.
 Points are connected by lines to represent the relationship
between variables.
 Useful for illustrating trends, patterns, and relationships in
data.
4. Pie Chart:
 Represents parts of a whole as slices of a circular pie.
 Each slice's size corresponds to the proportion of the whole it
represents.
 Suitable for displaying percentages or proportions of
categorical data.
5. Scatter Plot:
 Displays the relationship between two continuous variables.
 Each point represents an observation with values on both
axes.
 Helps identify correlations, clusters, or outliers in data.
6. Area Chart:
 Similar to a line graph but with the area below the line filled
with color.
 Useful for showing cumulative totals or proportions over time.
 Helps visualize changes in magnitude over time while
emphasizing the overall trend.
7. Box Plot (Box-and-Whisker Plot):
 Summarizes the distribution of continuous data and identifies
outliers.
 Includes a box representing the interquartile range (IQR) and
whiskers representing variability outside the IQR.
 Provides insights into the spread, central tendency, and
symmetry of the data distribution.
8. Heatmap:
 Represents data values in a matrix format using colors.
 Each cell's color intensity indicates the value of the data point.
 Useful for visualizing relationships and patterns in large
datasets, especially in multidimensional data.
9. Bubble Chart:
 Similar to a scatter plot but with a third variable represented
by the size of the markers (bubbles).
 Combines the features of a scatter plot and a proportional
symbol map.
 Useful for visualizing three-dimensional data and highlighting
patterns among multiple variables.
10. Stacked Bar Chart:
 A variation of the bar graph where bars are stacked on top of
each other to represent the total value.
 Each segment within a bar represents a different category,
and the total height of the bar remains constant.
 Useful for comparing the total values across different groups
while showing the contribution of each category.

These are just a few examples of the types of graphs commonly used in
data visualization. The choice of graph depends on the nature of the data,
the relationships being explored, and the insights you want to convey.

HOW TO LOAD DATA INTO POWER BI


Here's a step-by-step guide on how to load data into Power BI:

1. Launch Power BI Desktop:


 Start by opening Power BI Desktop, which you can download and
install for free from the Microsoft website.

2. Get Data:
 Once Power BI Desktop is open, click on the "Get Data" button
located in the Home tab on the ribbon.

3. Choose Data Source:


 In the "Get Data" window, you'll see a list of available data sources.
Choose the data source you want to connect to (e.g., Excel, SQL
Server, CSV file, Web, etc.).
 Click on the data source you want to use and then click "Connect."

4. Connect to Data Source:


 Depending on the data source you selected, you may need to
provide additional information such as file path, server name,
database name, or authentication credentials.
 Enter the required information and click "OK" or "Connect" to
establish the connection.

5. Load Data:
 After connecting to the data source, Power BI will display a
Navigator window showing a preview of the data available in the
selected source.
 Select the specific data tables, sheets, or views you want to import
into Power BI by checking the boxes next to them.
 You can preview the data by clicking on a table or sheet to ensure
you're selecting the correct data.
 Once you've selected the desired data, click on the "Load" button to
import it into Power BI.

6. Data Transformation (Optional):


 Before loading the data, you can perform data transformation tasks
such as cleaning, filtering, transforming, or shaping the data using
the Power Query Editor.
 Click on the "Transform Data" button to open the Power Query
Editor and apply transformations as needed.
 Once you're done with transformations, close the Power Query
Editor to apply the changes and return to Power BI.

7. Data Model:
 After loading the data, Power BI will create a data model based on
the imported tables and relationships between them.
 You can view and manage the data model by clicking on the "Model"
view in Power BI Desktop.
 In the data model view, you can define relationships between
tables, create calculated columns and measures, and perform other
data modeling tasks.

8. Refresh Data (For Live Connections):


 If you're using a live connection to a data source, you may need to
refresh the data periodically to ensure your reports and
visualizations reflect the latest data.
 Click on the "Refresh" button in the ribbon to refresh the data from
the connected data source.

By following these steps, you can easily load data into Power BI and start
creating visualizations and reports to analyze your data.

UNIT 3
NEW COLUMN POWER BI
Creating a new column in Power BI involves adding a calculated column to
your dataset. You can achieve this using Power Query Editor or by
creating calculated columns directly in the Data View. Here's how you can
add a new column using both methods:

Using Power Query Editor:


1. Load Data: Load your data into Power BI Desktop.
2. Open Power Query Editor: Click on "Transform Data" or navigate
to "Home" > "Transform Data" to open Power Query Editor.
3. Add New Column: In Power Query Editor, navigate to the "Add
Column" tab in the ribbon.
4. Choose New Column Type: Click on "Custom Column" to create a
new calculated column.
5. Enter Formula: In the "Custom Column" dialog box, enter the
formula for your new column using the M language. You can
reference existing columns in your dataset and use built-in functions
to define the calculation.
6. Click OK: After entering the formula, click OK to apply it.
7. Close and Apply: Once you've added the new column, close Power
Query Editor and apply the changes to your dataset.

Using Data View:


1. Open Data View: Click on the "Data" view in the left-hand panel to
access your dataset's data model.
2. Select Table: Select the table to which you want to add the new
column.
3. Go to Modeling Tab: Go to the "Modeling" tab in the ribbon at the
top of the screen.
4. Click New Column: Click on the "New Column" button in the
"Modeling" tab.
5. Enter DAX Formula: A new column will appear in the table, and
the formula bar will become active. Enter your DAX formula in the
formula bar to define the calculation for the new column.
6. Press Enter: After entering the formula, press Enter to apply it.
Power BI will calculate the new column values based on the
expression you provided.
7. Rename Column (Optional): You can rename the newly created
column by double-clicking on its header and entering a new name.
8. Review Data: Review the data in the new column to ensure that
the calculation was performed correctly.
9. Refresh Data: If necessary, refresh your dataset to reflect the
changes made to the data model.

By following these steps, you can create a new column in Power BI using
either Power Query Editor or the Data View, depending on your preference
and workflow.

NEW MEASURE POWER BI


To create a new measure in Power BI, you can use the "New Measure"
option in the Data View or directly in the data model. Here's how you can
create a new measure in Power BI:

Using Data View:


1. Open Power BI Desktop: Launch Power BI Desktop and open the
report or dataset where you want to create the new measure.
2. Navigate to Data View: Click on the "Data" view in the left-hand
panel to access your dataset's data model.
3. Select Table: Select the table to which you want to add the new
measure.
4. Go to Modeling Tab: Go to the "Modeling" tab in the ribbon at the
top of the screen.
5. Click New Measure: Click on the "New Measure" button in the
"Modeling" tab.
6. Enter DAX Formula: A new measure placeholder will appear in the
table, and the formula bar will become active. Enter your DAX
formula in the formula bar to define the calculation for the new
measure.
7. Press Enter: After entering the formula, press Enter to apply it.
Power BI will calculate the measure values based on the expression
you provided.
8. Rename Measure (Optional): You can rename the newly created
measure by double-clicking on its header and entering a new name.
9. Review Data: Review the data in the new measure to ensure that
the calculation was performed correctly.
10. Refresh Data: If necessary, refresh your dataset to reflect
the changes made to the data model.

Using Data Model:


1. Open Power BI Desktop: Launch Power BI Desktop and open the
report or dataset where you want to create the new measure.
2. Go to Data Model: Click on "Manage Relationships" in the
"Modeling" tab to access the data model view.
3. Click New Measure: Right-click on the table where you want to
create the measure and select "New Measure" from the context
menu.
4. Enter DAX Formula: A new measure placeholder will appear in the
table. Enter your DAX formula in the formula bar to define the
calculation for the new measure.
5. Press Enter: After entering the formula, press Enter to apply it.
Power BI will calculate the measure values based on the expression
you provided.
6. Rename Measure (Optional): You can rename the newly created
measure by right-clicking on it and selecting "Rename."
7. Review Data: Review the data in the new measure to ensure that
the calculation was performed correctly.
8. Refresh Data: If necessary, refresh your dataset to reflect the
changes made to the data model.

By following these steps, you can create a new measure in Power BI using
either the Data View or the Data Model, depending on your preference
and workflow.

DAX FUNTION IN POWER BI


DAX (Data Analysis Expressions) functions are used in Power BI to perform
calculations, create calculated columns, and define measures. These
functions help manipulate and analyze data within Power BI. Here's an
overview of some commonly used DAX functions in Power BI:

1. Aggregate Functions:
 SUM: Calculates the sum of values in a column.
 AVERAGE: Calculates the arithmetic mean of values in a column.
 MIN / MAX: Returns the minimum or maximum value in a column.
 COUNT / COUNTROWS: Counts the number of rows in a table or
the number of non-blank rows in a column.

2. Date and Time Functions:


 DATE / DATEVALUE: Creates a date value from year, month, and
day components.
 TODAY: Returns the current date.
 YEAR / MONTH / DAY: Extracts the year, month, or day component
from a date.
 DATEDIFF: Calculates the difference between two dates.

3. Filter Functions:
 FILTER: Filters a table based on a condition.
 RELATED / RELATEDTABLE: Retrieves related values from another
table.
 ALL / ALLEXCEPT / ALLSELECTED: Removes filters from a column
or table.

4. Logical Functions:
 IF / SWITCH: Conditionally evaluates expressions.
 AND / OR / NOT: Perform logical operations.

5. Text Functions:
 CONCATENATE / CONCATENATEX: Combines strings or
expressions into a single string.
 LEFT / RIGHT / MID: Extracts substrings from a string.
 LEN: Returns the length of a string.

6. Statistical Functions:
 STDEV / STDEVP: Calculates the standard deviation of a sample or
population.
 VAR / VARP: Calculates the variance of a sample or population.

7. Math Functions:
 ROUND / ROUNDDOWN / ROUNDUP: Rounds a number to a
specified number of digits.
 ABS / SQRT: Calculates the absolute value or square root of a
number.

8. Information Functions:
 ISBLANK / ISNUMBER / ISTEXT: Checks if a value is blank,
numeric, or text.
 CONTAINS / CONTAINSSTRING: Checks if a string contains a
specific substring.
9. Time Intelligence Functions:
 TOTALYTD / TOTALMTD / TOTALQTD: Calculates year-to-date,
month-to-date, or quarter-to-date totals.
 DATESYTD / DATESMTD / DATESQTD: Returns a set of dates for
the year-to-date, month-to-date, or quarter-to-date period.

10. Aggregate Table Functions:


 SUMMARIZE / SUMMARIZECOLUMNS: Creates a summary table
based on grouping and aggregation.

These are just a few examples of the many DAX functions available in
Power BI. DAX provides powerful capabilities for data manipulation,
calculation, and analysis, allowing users to derive insights from their data
effectively. Depending on your specific requirements, you can use DAX
functions to create calculated columns, measures, and complex
calculations in Power BI.

POWER BI QUERY
In Power BI, the term "query" typically refers to the process of extracting, transforming, and
loading (ETL) data into your dataset using Power Query Editor. Power Query is a data
connectivity and preparation tool that allows you to connect to various data sources,
transform data, and shape it into the desired format before loading it into Power BI.

Here's an overview of how you can use Power Query to create and manipulate queries in
Power BI:

1. Connect to Data Source:


 Open Power BI Desktop and click on the "Home" tab.
 Click on "Get Data" to connect to your desired data source. Power BI supports a wide
range of data sources, including databases, files (Excel, CSV), online services (Azure,
Google Analytics), and more.

2. Transform Data with Power Query Editor:


 After connecting to a data source, Power Query Editor will open.
 In Power Query Editor, you can perform various data transformation tasks, such as:
 Filtering rows/columns.
 Renaming columns.
 Removing duplicates.
 Splitting columns.
 Merging/combining tables.
 Adding custom columns with calculated values using M language or Power
Query functions.
3. Query Steps:
 Each transformation you apply in Power Query Editor is recorded as a query step.
 You can view and manage query steps in the "Applied Steps" pane on the right-hand
side of the Power Query Editor.
 You can reorder, modify, or delete query steps to refine your data transformation
process.

4. Load Data into Power BI:


 Once you've completed your data transformations, click on "Close & Load" to load
the transformed data into Power BI.
 You can choose to load the data into a new table in the dataset or append it to an
existing table.

5. Refresh Data:
 After loading the data into Power BI, you can refresh it to reflect any changes made to
the source data.
 Click on "Refresh" in the Home tab to refresh the dataset and update it with the latest
data from the source.

6. Advanced Data Modeling:


 In addition to basic data transformations, Power Query also allows for more advanced
data modeling techniques, such as creating relationships between tables, defining
hierarchies, and applying row-level security.

7. Query Editor Options:


 Power Query Editor offers various options and features to enhance your data
transformation experience, such as data profiling, query diagnostics, and query
parameters.

By using Power Query in Power BI, you can efficiently prepare and shape your data to meet
your analysis and reporting needs, ensuring that you work with clean, well-structured data for
visualization and analysis.

You might also like