Data Visualisation Report[1]
Data Visualisation Report[1]
PROJECT REPORT
on
Submitted By
23CS172 – SRIMAHA K
23CS203 – VINISHA B
23CS205 – VISALINI K
24CSL13 – SANJAI V
PROBLEM STATEMENT:
Analysis of energy consumption and production patterns across the world based on
categories as continents.
Graphs: Stacked area chart, Bar chart, Heatmap, Scatter plot, Line chart, Donut
chart,Histogram
Scenario: Analyze energy consumption and production trends across different energy
sources like solar, wind, and fossil fuels. Visualize energy usage by sector, region, and
time to identify sustainability trends.
INSTALLING PACKAGES IN R:
ggplot2:
A versatile library for creating static, publication-quality charts using the grammar
of graphics.
Supports a wide range of customizable plots, including bar charts, scatter plots,
and line graphs.
plotly:
Adds interactivity to visualizations with zooming, tooltips, and dynamic features.
Integrates well with ggplot2 and supports advanced visualizations like 3D plots
and interactive maps.
dplyr:
Simplifies data manipulation with functions for filtering, transforming, and
summarizing data.
Works seamlessly with visualization libraries like ggplot2 for smooth data-to-plot
workflows.
tidyr:
Reshapes messy datasets into tidy formats, essential for effective plotting.
Provides tools like pivot_longer() and pivot_wider() for reorganizing data
columns.
shiny:
Enables the creation of interactive web applications and dashboards in R.
Allows integration of visualizations and user controls for real-time data
exploration.
corrplot:
Specialized in visualizing correlation matrices with intuitive styles like heatmaps
or circle diagrams.
Helps identify and interpret relationships between multiple variables at a glance.
Data visualization in R provides numerous graphing options to help analyze and present
data effectively. Here are some of the most commonly used basic graphs along with
their descriptions:
1. Line Chart
o Description: A line chart connects data points with a line, making it ideal
for showing trends over time or sequences.
o Use Case: Monitoring the growth of renewable energy production over
decades.
o Advantages: Clearly shows upward or downward trends.
2. Bar Chart
o Description: Displays categorical data as rectangular bars, where the
length of each bar is proportional to its value.
o Use Case: Comparing energy production across different countries.
o Advantages: Excellent for visualizing comparative data across categories.
3. Scatter Plot
o Description: Uses dots to represent the relationship between two
continuous variables.
o Use Case: Analyzing the relationship between energy production and
engagement rates.
o Advantages: Highlights correlations, clusters, and outliers.
4. Pie Chart
o Description: A circular chart divided into slices to show proportions of a
whole.
o Use Case: Representing energy production proportions among various
energy types.
o Advantages: Intuitive for visualizing percentages or proportions.
5. Histogram
o Description: Represents the frequency distribution of a numeric variable.
o Use Case: Showing the distribution of energy consumption quantities.
o Advantages: Highlights patterns such as skewness or clustering.
6. Box Plot
o Description: Summarizes a data set using minimum, first quartile, median,
third quartile, and maximum values.
o Use Case: Analyzing variability in energy usage across countries.
o Advantages: Great for identifying outliers and variability.
7. Heatmap
o Description: Represents data in a matrix format with different intensities
or colors to highlight patterns.
o Use Case: Analyzing energy usage by country and year.
o Advantages: Provides a quick overview of patterns in complex data.
8. Donut Chart
o Description: A variation of the pie chart with a blank center.
o Use Case: Highlighting proportions in energy production categories.
o Advantages: Provides similar insights to pie charts but with a modern
appearance.
FEATURES:
Supports diverse graphs (line, bar, heatmaps, etc.).
Highly customizable through libraries like ggplot2.
Enables interactive visuals using plotly or shiny.
MERITS:
Free and open-source.
Reproducible and scalable for large datasets.
Integrates statistical analysis with visuals
The dataset provides insights into global energy production and consumption,
with the following characteristics:
1. Structure:
o Columns: Key variables include country_or_area, year, quantity,
category, and quantity_footnotes.
o Rows: Over 1 million records.
2. Variable Types:
o Numeric: quantity (energy data) shows wide variability with missing
and extreme values.
o Categorical: category (energy types) and country_or_area (regions)
are ideal for grouping and filtering.
3. Missing Values:
o Significant missing data in quantity and quantity_footnotes, requiring
cleaning.
o Use colSums(is.na(energy_data)) to calculate missing values.
4. Key Insights:
o Covers multiple decades, enabling trend analysis.
o Includes diverse countries and energy types, with some regions or
categories likely to dominate.
CREATING GRAPHS IN R:
INFERENCE:
The line chart shows the variation of sentiment over time, helping to
understand whether sentiments are becoming more positive or negative across
the periods observed.
If the trend fluctuates significantly, it could indicate periods of strong sentiment
shifts (positive or negative).
The chart can be used to monitor and analyze sentiment dynamics and identify
possible events or changes influencing sentiment.
DESCRIPTION:
The line chart shows the trend of sentiment scores over 2023, with time on the x-
axis and sentiment scores on the y-axis ranging from -1 to 1. The blue line
connects monthly sentiment data points (red dots), illustrating changes in
sentiment over the year. This visualization helps identify trends, fluctuations, and
patterns in sentiment, highlighting periods of positivity or negativity.
b. Bar chart (sentiment distribution across platforms)
The bar chart compares sentiment distribution across different platforms (e.g.,
Twitter, Facebook, Instagram).
A higher bar indicates a more positive or negative sentiment on a given
platform, helping to identify which platforms have more favorable or unfavorable
sentiment.
It can guide platform-specific strategies for engagement or content adjustment
based on sentiment distribution.
DESCRIPTION:
The bar chart shows sentiment distribution across four platforms: Twitter,
Facebook, Instagram, and LinkedIn. Each bar's height represents the sentiment
score, with Facebook having the highest score (50) and Instagram the lowest (20).
This visualization helps compare sentiment levels across platforms, identifying
where sentiment is strongest or weakest.
INFERENCE:
The pie chart shows the proportion of positive vs. negative sentiment in the
dataset.
A larger slice of "Positive" indicates that the majority of posts or interactions
are favorable, while a larger "Negative" slice indicates discontent or
dissatisfaction.
DESCRIPTION:
The pie chart visualizes the distribution of positive and negative sentiments, with
slices representing their proportions. The "Positive" segment (60%) is larger,
indicating a majority of positive sentiment, while the "Negative" segment (40%) is
smaller. This chart provides a quick, clear overview of the sentiment balance
within the dataset
d. Heatmap (engagement level across regions)
INFERENCE:
The heatmap visualizes the engagement level across different regions and energy
categories.
Regions and categories with darker colors represent higher engagement levels,
while lighter regions indicate lower engagement.
This can help identify regions that are more engaged with specific energy sources,
potentially guiding regional strategies for energy campaigns.
DESCRIPTION:
INFERENCE:
The scatter plot shows the relationship between sentiment score and
engagement rate for each post or data point.
Positive sentiment may correlate with higher engagement, or vice versa,
depending on the data spread.
This plot can identify if higher engagement rates lead to more positive or negative
sentiments, providing insights into how engagement influences sentiment.
DESCRIPTION:
The scatter plot displays the relationship between sentiment scores (ranging from
-1 to 1) and engagement rates (0 to 100). Each blue point represents an
observation, showing how engagement varies with sentiment. This chart helps
identify trends or correlations, such as whether positive or negative sentiments
drive higher engagement.
The bubble chart reveals how post reach correlates with sentiment impact, with
larger bubbles representing posts with higher reach.
It helps identify if posts with higher reach tend to have a more positive or
negative sentiment, and how reach affects sentiment impact.
Larger bubbles further emphasize high-impact posts, which can be used for
strategic decisions in content amplification.
DESCRIPTION:
The bubble chart visualizes the relationship between post reach and sentiment
impact, with each bubble representing a post. The bubble size indicates an
additional variable (e.g., interactions), and colors range from red (negative
sentiment) to green (positive sentiment). This chart helps identify how sentiment
impact varies with post reach and highlights influential posts.
Additional graphs:
Hexbin chart (Sentimental score vs. Engagement Rate)
INFERENCE:
DESCRIPTION:
The hexbin chart visualizes the relationship between sentiment score and
engagement rate by grouping the data into hexagonal bins. Each bin's color
intensity indicates the count of data points within it, with darker blue areas
representing higher concentrations. The chart helps identify patterns and
correlations between sentiment and engagement, especially in areas where the
data points cluster. This visualization is particularly useful for large datasets with a
dense distribution of points.
POWER BI DESCRIPTION
Power BI is a business analytics tool developed by Microsoft that enables users to
visualize and share insights from their data. It provides interactive visualizations,
business intelligence capabilities, and data analysis tools to help organizations
make informed decisions. Power BI integrates with various data sources and
allows users to create reports, dashboards, and interactive data models without
the need for programming skills.
POWER BI FEATURES
1. Data Connectivity:
o Power BI allows seamless connection to a wide variety of data
sources such as Excel, SQL Server, cloud services, and APIs, enabling
users to easily pull data from different platforms for analysis.
2. Interactive Dashboards:
o Users can create dynamic, interactive dashboards that offer drill-
down functionality, allowing them to explore data from different
perspectives in real-time.
3. Powerful Analytics with DAX:
o Power BI includes advanced analytics capabilities with DAX (Data
Analysis Expressions), allowing users to create custom calculations,
trends, and metrics to gain deeper insights into their data.
CONCLUSION: