0% found this document useful (0 votes)
46 views

Data Visualisation Report[1]

The project report analyzes global energy consumption and production patterns using various data visualization techniques in R, including line charts, bar charts, and heatmaps. It emphasizes the importance of data cleaning and the use of libraries like ggplot2 and plotly for effective visualization. Additionally, a Power BI dashboard is created to provide interactive insights into energy trends, enhancing the understanding of sustainability efforts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Data Visualisation Report[1]

The project report analyzes global energy consumption and production patterns using various data visualization techniques in R, including line charts, bar charts, and heatmaps. It emphasizes the importance of data cleaning and the use of libraries like ggplot2 and plotly for effective visualization. Additionally, a Power BI dashboard is created to provide interactive insights into energy trends, enhancing the understanding of sustainability efforts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Department of Computer Science and Engineering

KPR Institute of Engineering and Technology

U21CS303 –DATA VISUALIZATION

PROJECT REPORT

on

ENERGY CONSUMPTION AND PRODUCTION PATTERNS

Submitted By

23CS172 – SRIMAHA K
23CS203 – VINISHA B
23CS205 – VISALINI K
24CSL13 – SANJAI V
PROBLEM STATEMENT:

Analysis of energy consumption and production patterns across the world based on
categories as continents.
Graphs: Stacked area chart, Bar chart, Heatmap, Scatter plot, Line chart, Donut
chart,Histogram
Scenario: Analyze energy consumption and production trends across different energy
sources like solar, wind, and fossil fuels. Visualize energy usage by sector, region, and
time to identify sustainability trends.

INTRODUCTION ABOUT DATA VISUALISATION:

Data visualization enables efficient understanding and communication of data insights.


It involves presenting data in graphical formats such as charts, graphs, and maps,
facilitating decision-making. In fields like energy production and consumption, Data
Visualisation allows analysts to:
 Identify key trends (e.g., growth in renewables).
 Pinpoint anomalies (e.g., sudden drops in fossil fuel production).
 Compare variables across multiple dimensions.

INSTALLING PACKAGES IN R:
ggplot2:
 A versatile library for creating static, publication-quality charts using the grammar
of graphics.
 Supports a wide range of customizable plots, including bar charts, scatter plots,
and line graphs.
plotly:
 Adds interactivity to visualizations with zooming, tooltips, and dynamic features.
 Integrates well with ggplot2 and supports advanced visualizations like 3D plots
and interactive maps.
dplyr:
 Simplifies data manipulation with functions for filtering, transforming, and
summarizing data.
 Works seamlessly with visualization libraries like ggplot2 for smooth data-to-plot
workflows.
tidyr:
 Reshapes messy datasets into tidy formats, essential for effective plotting.
 Provides tools like pivot_longer() and pivot_wider() for reorganizing data
columns.
shiny:
 Enables the creation of interactive web applications and dashboards in R.
 Allows integration of visualizations and user controls for real-time data
exploration.
corrplot:
 Specialized in visualizing correlation matrices with intuitive styles like heatmaps
or circle diagrams.
 Helps identify and interpret relationships between multiple variables at a glance.

Fig 1. Installing the packages in Rstudio

BASIC GRAPHS AND ITS DESCRIPTIONS:

Data visualization in R provides numerous graphing options to help analyze and present
data effectively. Here are some of the most commonly used basic graphs along with
their descriptions:

1. Line Chart
o Description: A line chart connects data points with a line, making it ideal
for showing trends over time or sequences.
o Use Case: Monitoring the growth of renewable energy production over
decades.
o Advantages: Clearly shows upward or downward trends.
2. Bar Chart
o Description: Displays categorical data as rectangular bars, where the
length of each bar is proportional to its value.
o Use Case: Comparing energy production across different countries.
o Advantages: Excellent for visualizing comparative data across categories.
3. Scatter Plot
o Description: Uses dots to represent the relationship between two
continuous variables.
o Use Case: Analyzing the relationship between energy production and
engagement rates.
o Advantages: Highlights correlations, clusters, and outliers.
4. Pie Chart
o Description: A circular chart divided into slices to show proportions of a
whole.
o Use Case: Representing energy production proportions among various
energy types.
o Advantages: Intuitive for visualizing percentages or proportions.
5. Histogram
o Description: Represents the frequency distribution of a numeric variable.
o Use Case: Showing the distribution of energy consumption quantities.
o Advantages: Highlights patterns such as skewness or clustering.
6. Box Plot
o Description: Summarizes a data set using minimum, first quartile, median,
third quartile, and maximum values.
o Use Case: Analyzing variability in energy usage across countries.
o Advantages: Great for identifying outliers and variability.
7. Heatmap
o Description: Represents data in a matrix format with different intensities
or colors to highlight patterns.
o Use Case: Analyzing energy usage by country and year.
o Advantages: Provides a quick overview of patterns in complex data.

8. Donut Chart
o Description: A variation of the pie chart with a blank center.
o Use Case: Highlighting proportions in energy production categories.
o Advantages: Provides similar insights to pie charts but with a modern
appearance.

BASICS OF R FOR VISUALISATION: FEATURES AND MERITS

R is a powerful language for data analysis and visualization, offering unparalleled


flexibility and depth for creating visualizations. Below are its key features and merits:

FEATURES:
 Supports diverse graphs (line, bar, heatmaps, etc.).
 Highly customizable through libraries like ggplot2.
 Enables interactive visuals using plotly or shiny.
MERITS:
 Free and open-source.
 Reproducible and scalable for large datasets.
 Integrates statistical analysis with visuals

IMPORTING AND CLEANING DATASET IN R:

IMPORTING THE DATASET:

Fig 2 . Importing the dataset in R


DATA CLEANING:

1.Replace empty strings and handle missing va

Fig 3. Replacing the missing data

2.Removing rows with missing data

Fig 4 . Removing the rows with empty data


3.Checking for duplicates

Fig 5. Checking for duplicates in the dataset


DATA SUMMARY:

The dataset provides insights into global energy production and consumption,
with the following characteristics:
1. Structure:
o Columns: Key variables include country_or_area, year, quantity,
category, and quantity_footnotes.
o Rows: Over 1 million records.
2. Variable Types:
o Numeric: quantity (energy data) shows wide variability with missing
and extreme values.
o Categorical: category (energy types) and country_or_area (regions)
are ideal for grouping and filtering.
3. Missing Values:
o Significant missing data in quantity and quantity_footnotes, requiring
cleaning.
o Use colSums(is.na(energy_data)) to calculate missing values.
4. Key Insights:
o Covers multiple decades, enabling trend analysis.
o Includes diverse countries and energy types, with some regions or
categories likely to dominate.

Fig 6. Code for displaying the Structure of cleaned dataset


Fig 7 . Structure of the dataset

CREATING GRAPHS IN R:

a. Line Chart: Trends in Energy Production Over Time


Fig 8 & 9. Code and the resultant line graph

INFERENCE:

 The line chart shows the variation of sentiment over time, helping to
understand whether sentiments are becoming more positive or negative across
the periods observed.
 If the trend fluctuates significantly, it could indicate periods of strong sentiment
shifts (positive or negative).
 The chart can be used to monitor and analyze sentiment dynamics and identify
possible events or changes influencing sentiment.

DESCRIPTION:

The line chart shows the trend of sentiment scores over 2023, with time on the x-
axis and sentiment scores on the y-axis ranging from -1 to 1. The blue line
connects monthly sentiment data points (red dots), illustrating changes in
sentiment over the year. This visualization helps identify trends, fluctuations, and
patterns in sentiment, highlighting periods of positivity or negativity.
b. Bar chart (sentiment distribution across platforms)

Fig 10. Code for the bar chart

Fig 11. The resultant bar graph


INFERENCE:

 The bar chart compares sentiment distribution across different platforms (e.g.,
Twitter, Facebook, Instagram).
 A higher bar indicates a more positive or negative sentiment on a given
platform, helping to identify which platforms have more favorable or unfavorable
sentiment.
 It can guide platform-specific strategies for engagement or content adjustment
based on sentiment distribution.

DESCRIPTION:

The bar chart shows sentiment distribution across four platforms: Twitter,
Facebook, Instagram, and LinkedIn. Each bar's height represents the sentiment
score, with Facebook having the highest score (50) and Instagram the lowest (20).
This visualization helps compare sentiment levels across platforms, identifying
where sentiment is strongest or weakest.

c. Pie chart (positive vs. negative sentiment)

Fig 12. Code for the pie chart


Fig 13. The resultant pie chart

INFERENCE:

 The pie chart shows the proportion of positive vs. negative sentiment in the
dataset.
 A larger slice of "Positive" indicates that the majority of posts or interactions
are favorable, while a larger "Negative" slice indicates discontent or
dissatisfaction.

DESCRIPTION:

The pie chart visualizes the distribution of positive and negative sentiments, with
slices representing their proportions. The "Positive" segment (60%) is larger,
indicating a majority of positive sentiment, while the "Negative" segment (40%) is
smaller. This chart provides a quick, clear overview of the sentiment balance
within the dataset
d. Heatmap (engagement level across regions)

Fig 14 & 15. Code and the resultant heatmap

INFERENCE:
The heatmap visualizes the engagement level across different regions and energy
categories.
Regions and categories with darker colors represent higher engagement levels,
while lighter regions indicate lower engagement.
This can help identify regions that are more engaged with specific energy sources,
potentially guiding regional strategies for energy campaigns.

DESCRIPTION:

The heatmap visualizes engagement levels across different regions (North


America, Europe, Asia, Africa) and energy categories (Renewables, Fossil Fuels,
Hydropower). Each tile's color intensity, ranging from white (low) to blue (high),
represents the engagement level. This chart helps identify regions and categories
with the highest or lowest engagement, revealing patterns and areas of interest.

e. Scatter plot (sentiment score vs. engagement rate)

Fig 16. Code for the scatter plot


Fig 17. The resultant scatter plot

INFERENCE:

The scatter plot shows the relationship between sentiment score and
engagement rate for each post or data point.
Positive sentiment may correlate with higher engagement, or vice versa,
depending on the data spread.
This plot can identify if higher engagement rates lead to more positive or negative
sentiments, providing insights into how engagement influences sentiment.

DESCRIPTION:

The scatter plot displays the relationship between sentiment scores (ranging from
-1 to 1) and engagement rates (0 to 100). Each blue point represents an
observation, showing how engagement varies with sentiment. This chart helps
identify trends or correlations, such as whether positive or negative sentiments
drive higher engagement.

f. Bubble chart (post reach vs. sentimental impact)

Fig 18 & 19. Code and the resultant bubble chart


INFERENCE:

The bubble chart reveals how post reach correlates with sentiment impact, with
larger bubbles representing posts with higher reach.
It helps identify if posts with higher reach tend to have a more positive or
negative sentiment, and how reach affects sentiment impact.
Larger bubbles further emphasize high-impact posts, which can be used for
strategic decisions in content amplification.

DESCRIPTION:

The bubble chart visualizes the relationship between post reach and sentiment
impact, with each bubble representing a post. The bubble size indicates an
additional variable (e.g., interactions), and colors range from red (negative
sentiment) to green (positive sentiment). This chart helps identify how sentiment
impact varies with post reach and highlights influential posts.

Additional graphs:
Hexbin chart (Sentimental score vs. Engagement Rate)

Fig 20. Code for Hexbin chart


Fig 21. The resultant Hexbin chart

INFERENCE:

 X-axis (Sentiment Score): Represents sentiment, ranging from negative (-1)


to positive (+1).
 Y-axis (Engagement Rate): Shows engagement level, ranging from 0 to 100.
 Hexagonal Bins: Group data points with similar sentiment and engagement,
colored by the number of points in each bin.
 Fill Color: Darker blue indicates higher data concentration; lighter blue
indicates fewer data points.
 Purpose: Visualizes the relationship between sentiment and engagement,
highlighting patterns and clustering.

DESCRIPTION:

The hexbin chart visualizes the relationship between sentiment score and
engagement rate by grouping the data into hexagonal bins. Each bin's color
intensity indicates the count of data points within it, with darker blue areas
representing higher concentrations. The chart helps identify patterns and
correlations between sentiment and engagement, especially in areas where the
data points cluster. This visualization is particularly useful for large datasets with a
dense distribution of points.
POWER BI DESCRIPTION
Power BI is a business analytics tool developed by Microsoft that enables users to
visualize and share insights from their data. It provides interactive visualizations,
business intelligence capabilities, and data analysis tools to help organizations
make informed decisions. Power BI integrates with various data sources and
allows users to create reports, dashboards, and interactive data models without
the need for programming skills.

POWER BI FEATURES
1. Data Connectivity:
o Power BI allows seamless connection to a wide variety of data
sources such as Excel, SQL Server, cloud services, and APIs, enabling
users to easily pull data from different platforms for analysis.
2. Interactive Dashboards:
o Users can create dynamic, interactive dashboards that offer drill-
down functionality, allowing them to explore data from different
perspectives in real-time.
3. Powerful Analytics with DAX:
o Power BI includes advanced analytics capabilities with DAX (Data
Analysis Expressions), allowing users to create custom calculations,
trends, and metrics to gain deeper insights into their data.

FEATURES OF THE DASHBOARD:


The Power BI dashboard offers a comprehensive, interactive view of energy
consumption and production trends across solar, wind, and fossil fuels. It
integrates key visualizations like stacked area charts, bar charts, heatmaps, and
scatter plots to help track energy usage, identify patterns, and analyze
sustainability efforts.
Key Features:
 Real-Time Data Updates: Provides current insights into energy
consumption and production.
 Interactive Visualizations: Users can explore trends across regions, sectors,
and time periods.
 Customizable Layout: Dashboards can be tailored to meet specific user.
 Collaboration & Sharing: Easily shareable for teamwork and decision-
making.
 Secure Access: Role-based access to maintain data privacy.
DASHBOARD USING POWER BI:

Fig 22. Dashboard Created using Power BI

CONCLUSION:

This project demonstrated the effectiveness of R for data visualization and


analysis, highlighting key insights through various charts like line charts, bar plots,
word clouds, and hexbin plots. Data cleaning ensured accurate results, and R’s
flexibility enabled detailed visualizations for trend analysis and pattern
identification.
The Power BI dashboard complemented this by offering interactive exploration of
the dataset, consolidating key metrics and visualizations into a user-friendly
format. This integration of static and dynamic tools provided a comprehensive
approach to understanding the data and communicating findings effectively.

You might also like