0% found this document useful (0 votes)
14 views21 pages

Unit 4

The document outlines the steps in data analysis, including defining the problem, collecting and cleaning data, analyzing it, and visualizing results using tools like Tableau and Looker. It emphasizes the importance of data visualization for effective communication and decision-making, detailing various chart types such as bar charts, line charts, pie charts, scatter plots, and histograms. Each chart type is described with its purpose and when to use it, highlighting their roles in understanding and interpreting data.

Uploaded by

permeshwar245
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Unit 4

The document outlines the steps in data analysis, including defining the problem, collecting and cleaning data, analyzing it, and visualizing results using tools like Tableau and Looker. It emphasizes the importance of data visualization for effective communication and decision-making, detailing various chart types such as bar charts, line charts, pie charts, scatter plots, and histograms. Each chart type is described with its purpose and when to use it, highlighting their roles in understanding and interpreting data.

Uploaded by

permeshwar245
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Tools and techniques for Data

Analytics
Contents
• 4.1 Steps in Data Analysis
• 4.2 Working with methods for analyzing variety of data
• 4.3 Working with large data
• 4.4 Data Visualization using advanced graphs
Steps in Data Analysis
1. Define the Problem or Research Question
 In the first step of process, the data analyst is given a problem/business
task.
• The analyst has to understand the task and the stakeholder’s expectations
for the solution.
• A stakeholder is a person that has invested their money and resources to a
project.
• The analyst must be able to ask different questions in order to find the
right solution to their problem.
• The analyst has to find the root cause of the problem in order to fully
understand the problem.
• Communicate effectively with the stakeholders and other colleagues to
completely understand what the underlying problem is.
• Questions to ask yourself for the Ask phase are:
• What are the problems that are being mentioned by my stakeholders?
• What are their expectations for the solutions?
2. Collect Data
• The second step is to Prepare or Collect the Data.
• This step includes collecting data and storing it for further analysis.
• The analyst has to collect the data based on the task given from multiple
sources.
• All data fit into one of three categories: first-party, second-party, and third-
party data.
• First-party data are data that you, or your company, have directly collected from
customers.
• Second-party data is the first-party data of other organizations i.e The data that is
collected and sold is called second-party data.
• This might be available directly from the company or through a private marketplace.
• Third-party data is data that has been collected and aggregated from numerous
sources by a third-party organization. Often (though not always) third-party data
contains a vast amount of unstructured data points (big data).
3. Cleaning the data
• Removing major errors, duplicates, and outliers—all of which are inevitable problems when
aggregating data from numerous sources.
• Removing unwanted data points—extracting irrelevant observations that have no bearing on
your intended analysis.
• Bringing structure to your data—general ‘housekeeping’, i.e. fixing typos or layout issues, which
will help you map and manipulate your data more easily.
• Filling in major gaps—as you’re tidying up, you might notice that important data are missing.
Once you’ve identified gaps, you can go about filling them.
4. Analyzing the data

• The cleaned data is used for analyzing and identifying trends.


• It also performs calculations and combines data for better results.
• The tools used for performing calculations are Excel or SQL.
• These tools provide in-built functions to perform calculations or
sample code is written in SQL to perform calculations.
• Using Excel, we can create pivot tables and perform calculations while
SQL creates temporary tables to perform calculations.
• Programming languages are another way of solving problems. The
most widely used programming languages for data analysis
are R and Python.
5. Data Visualization
• The fifth step is visualizing the data.
• Nothing is more compelling than a visualization.
• The data now has to be made into a visual (chart, graph).
• The reason for making data visualizations is that there might be people, mostly
stakeholders that are non-technical.
• Visualizations are made for a simple understanding of complex data.
• Tableau and Looker are the two popular tools used for compelling data visualizations.
• Tableau is a simple drag and drop tool that helps in creating compelling visualizations.
• Looker is a data visualization tool that directly connects to the database and creates
visualizations.
• Tableau and Looker are both equally used by data analysts for creating a visualization.
• R and Python have some packages that provide beautiful data visualizations.
• A presentation is given based on the data findings. Sharing the insights with the team
members and stakeholders will help in making better decisions.
• It helps in making more informed decisions and it leads to better outcomes.
Why is data visualization important?
• Data visualization provides a quick and effective way to communicate
information in a universal manner using visual information. Business
professionals have different areas and levels of expertise, but visualizations
are meant to be understandable by anyone. Visualizations make it easier
for employees in an organization to make decisions and act based on
insights derived from them.
• Visualizations help businesses in many ways. Some examples include the
following:
• They help isolate factors that affect customer behavior.
• They identify products or services that need to be improved.
• They make data more memorable for stakeholders.
• They help organizations understand when and where to place specific
products.
• They can predict sales or revenue volumes.
Data Visualization using advanced graphs

• Basic Charts for Data Visualization


• Basic charts function foundational tools in information visualization, offering
trustworthy insights into datasets. Best data visualization charts are:
• Bar Chart
• Line Chart
• Pie Chart
• Scatter Plot
• Histogram
Bar Charts
• Bar charts are one of the common visualization tool, used to symbolize and
compare express facts by way of showing square bars.
• A bar chart has X and Y Axis where the X Axis represents the types and the Y axis
represents the price.
• The top of the bar represents the price for that class at the y-axis. Longer bars
suggest better values.
• There are various types of Bar charts like horizontal bar chart, Stacked bar chart,
Grouped bar chart and Diverging bar Chart.
• When to Use Bar Chart:
• Comparing Categories: Showcasing contrast among distinct categories to
evaluate, summarize or discover relationship in the information.
• Ranking: When we’ve got records with categories that need to be ranked with
highest to lowest.
• Relationship between categories: When you have a dataset with multiple specific
variables, bar chart can help to display courting between them, to discover
patterns and tendencies.
Bar Chart
Line Charts
• Line chart or Line graph is used to symbolize facts through the years series.
• It presentations records as a series of records points called as markers,
connected with the aid of line segments showing the between values over
the years.
• This chart is normally used to evaluate developments, view patterns or
examine charge moves.
• When to Use Line Chart:
• Line charts can be used to analyze developments over individual values.
• Line charts also are utilized in comparing trends among more than one
facts series.
• Line chart is high-quality used for time series information.
Line chart
Pie Charts
• A pie chart is a round records visualization tool, this is divided into slices to
symbolize numerical percentage or percentages of an entire.
• Each slice in pie chart corresponds to a category in the dataset and the
perspective of the slice is proportional to the share it represents.
• Pie charts are only valid with small variety of categories.
• Simple Pie chart and Exploded Pie charts are distinctive varieties of Pie
charts.
• When to Use Pie Chart:
• Pie charts are used to show specific facts to expose the proportion of
elements to the whole. It is used to depict how exclusive classes make up a
total pleasant.
• Useful in eventualities where statistics has small range of classes.
• Useful in emphasizing a particular category by way of highlighting a
dominant slice.
Pie Chart
Scatter Chart (Plots)
• A scatter chart or scatter plot chart is a effective information visualization device, makes
use of dots to symbolize information factors.
• Scatter chart is used to display and examine variables which enables find courting
between the ones variables.
• Scatter chart uses axes, X and Y.
• X-Axis represents one numerical variable and Y-axis represents another numerical
variable.
• The variable on X-axis is independent and plotted against the dependent variable in Y-
axis.
• Type of scatter chart consists of simple scatter chart, scatter chart with trendline and
scatter chart with coloration coding.
• When to Use Scatter Chart:
• Scatter charts are awesome for exploring dating between numerical variables and in
identifying traits, outliers and subgroup variations.
• It is used while we’ve got to plot two sets of numerical statistics as one collection of X
and Y coordinates.
• Scatter charts are satisfactory used for identifying outliers or unusual remark for your
facts.
Scatter Plot
Histogram
• A histogram represents the distribution of numerical facts by using dividing it into
periods (packing containers) and displaying the frequency of records as bars.
• It is commonly used to visualize the underlying distribution of a dataset and
discover styles
• Histograms are treasured gear for exploring facts distributions, detecting outliers,
and assessing records great.
• When to Use Histogram:
• Distribution Visualization: Histograms are best for visualizing the distribution of
numerical information, allowing customers to recognize the unfold and shape of
the records.
• Data Exploration: They facilitate records exploration by using revealing patterns,
trends, and outliers inside datasets, aiding in hypothesis generation and
information-pushed decision-making.
• Quality Control: Histograms assist assess statistics first-class by way of identifying
anomalies, errors, or inconsistencies inside the facts distribution, enabling facts
validation and cleaning strategies.
Histogram

You might also like