0% found this document useful (0 votes)
62 views14 pages

Midterm Module 1a

The document discusses using pivot tables in Excel to summarize and analyze data. It explains that pivot tables allow you to extract important insights from large datasets by dynamically summarizing and filtering the data in different ways. Examples are provided of how to create a basic pivot table and arrange fields to analyze sales data by region and product. The document also covers creating different types of charts in Excel and essential data cleaning techniques like removing duplicate values and splitting text data into multiple columns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views14 pages

Midterm Module 1a

The document discusses using pivot tables in Excel to summarize and analyze data. It explains that pivot tables allow you to extract important insights from large datasets by dynamically summarizing and filtering the data in different ways. Examples are provided of how to create a basic pivot table and arrange fields to analyze sales data by region and product. The document also covers creating different types of charts in Excel and essential data cleaning techniques like removing duplicate values and splitting text data into multiple columns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Generating inference from Data

1. Pivot Table:

PivotTables allow you to summarise data and create dynamic reports by


modifying the PivotTable’s contents. You can use pivot tables to extract
important data from a vast dataset. This is the most practical method of data
analysis. After inserting a Pivot Table, you can drag fields, sort, filter, or
change the summary calculation. Two-dimensional Pivot Tables are also
possible. Group Pivot Table Items, Multi-level Pivot Table, Frequency
Distribution, Pivot Chart, Slicers, Update Pivot Table, Calculated Field/Item,
and GetPivotData are all important functions.

Whenever you are working with company data, you seek answers for
questions like “How much revenue is contributed by branches of North
region?” or “What was the average number of customers for product A?” and
many others.

Excel’s PivotTable helps you to answer these questions effortlessly. A pivot


table is a summary table that lets you count, average, sum, and perform other
calculations according to the reference feature you have selected i.e. It
converts a data table to an inference table which helps us to make decisions.
Look at the below snapshot:

Module in Data Science Analytics – mid2023-fdl


1
Above, you can see that table on the left has sales detail against each
customer with the region and product mapping. In the table to the right, we
have summarized the information at region level which now helps us to
generate an inference that the South region has the highest sales.

Methods to create Pivot table:


Step-1: Click somewhere in the list of data. Choose the Insert tab, and
click PivotTable. Excel will automatically select the area containing data,
including the headings. If it does not select the area correctly, drag over the
area to select it manually. Placing the PivotTable on a new sheet is best, so
click New Worksheet for the location and then click OK

Step-2: Now, you can see the PivotTable Field List panel, which contains the

Module in Data Science Analytics – mid2023-fdl


2
fields from your list; all you need to do is to arrange them in the boxes at the
foot of the panel. Once you have done that, the diagram on the left becomes
your PivotTable.

Above, you can see that we have arranged “Region” in row, “Product id” in
column and sum of “Premium” is taken as value. Now you are ready with
pivot table which shows Region and Product wise sum of premium. You can
also use count, average, min, max and other summary metrics.

2. Creating Charts: Building a chart/ graph in excel requires nothing more


than selecting the range of data you wish to chart and press F11. This will
create an Excel chart in default chart style but you can change it by
selecting different chart style. If you prefer the chart to be on the same
worksheet as the data, instead of pressing F11, press ALT + F1.

Module in Data Science Analytics – mid2023-fdl


3
Of course, in either case, once you have created the chart, you can customize
to your particular needs to communicate your desired message.

Microsoft Excel offers several types of charts and graphs to help you
visualize your spreadsheet data. All you need to do is organize your data,
select it and insert a chart from the menu bar. You can also customize,
resize and reposition the chart.

Types of Data Visualizations in Excel

Excel offers a variety of charts and graphs to help represent your data.
Here are some common data visualization types available in Excel:

 Column Chart: Displays data using vertical bars, with each bar
representing a category. It is useful for comparing values across
categories or showing trends over time.
 Bar Chart: Similar to a column chart, but with horizontal bars
instead of vertical ones. Bar charts are great for comparing values
Module in Data Science Analytics – mid2023-fdl
4
across categories when the category names are long or there are
many categories.
 Line Chart: Plots data points connected by lines, showing trends or
changes over time. Line charts are ideal for illustrating continuous
data, such as stock prices or temperature measurements.
 Pie Chart: Represents data as slices of a circle, with each slice's
size proportional to the value it represents. Pie charts are suitable
for showing the proportion of each category within a whole.
 Area Chart: Similar to a line chart, but with the area between the
line and the horizontal axis filled in. Area charts emphasize the
magnitude of change over time and are useful for showing
cumulative data or comparing multiple data series.
 Scatter Plot: Displays data points on a Cartesian coordinate
system, with each axis representing a variable. Scatter plots are
used to show the relationship between two variables and identify
patterns or correlations.
 Bubble Chart: A variation of the scatter plot that adds a third
variable represented by the size of the bubbles. Bubble charts can
help visualize the relationship between three continuous variables.
 Radar Chart (Spider Chart): Displays multivariate data on multiple
axes that radiate from a central point. Radar charts are useful for
comparing multiple variables across different categories or entities.
 Stock Chart: Designed specifically to show stock market data, stock
charts can display open, high, low and close values over a period of
time.
 Surface Chart: Displays three-dimensional data on a grid, where
the color or shading indicates the value. Surface charts are useful
for visualizing complex data with two independent variables.
 Waterfall Chart: Displays the cumulative effect of sequential
positive and negative values, typically used for financial data to show
the progression of a starting value to an ending value.
 Treemap: Represents hierarchical data as nested rectangles, with
each rectangle's size and color representing specific values.
Treemaps are helpful for visualizing large data sets with multiple
categories and subcategories.

Module in Data Science Analytics – mid2023-fdl


5
 Sunburst Chart: Similar to a treemap but with a radial layout.
Sunburst charts display hierarchical data as concentric rings,
showing how categories and subcategories relate to the whole.

Other data visualization types in Excel include histograms, box and


whisker plots, pareto charts, funnel charts, gauge charts, heat maps,
combination charts, error bars, Gantt charts, waffle charts, violin plots,
control charts and KPI dashboards.

Data Cleaning

1. Remove duplicate values: Excel has inbuilt feature to remove duplicate


values from a table. It removes the duplicate values from given table based on
selected columns i.e. if you have selected two columns then it searches for
duplicate value having same combination of both columns data.

Above, you can see that A001 and A002 have duplicate value but if we select
both columns “ID” and “Name” then we have only one duplicate value (A002,
2).
Follow the these steps to remove duplicate values: Select data –> Go to Data

Module in Data Science Analytics – mid2023-fdl


6
ribbon –> Remove Duplicates

2. Text to Columns: Let’s say you have data stored in the column as shown
in below snapshot.

Above, you can see that values are separated by semicolon “;”. Now to split
these values in a different column, I will recommend to use the “Text to
Columns” feature in excel. Follow the below steps to convert it to different
columns:

1. Select the range A1:A6

Module in Data Science Analytics – mid2023-fdl


7
2. Go to “Data” ribbon –> “Text to Columns”

Above, we have two options “Delimited” and “Fixed width”. I have


selected delimited because the values are separated by a delimiter(;). If
we would be interested to split data based on the width such as the first
four character to the first column, 5 to 10th character to the second
column, then we would choose Fixed width.

Module in Data Science Analytics – mid2023-fdl


8
3. Click on Next –>Mark checkbox on for “Semicolon” then Next and finish.

Essential keyboard shortcuts

Keyboard shortcuts are the best way to navigate cells or enter formulas more
quickly. We’ve listed our favorites below.

1. Ctrl +[Down|Up Arrow]: Moves to the top or bottom cell of the current
column and combination of Ctrl with Left|Right Arrow key, moves to the
cell furthest left or right in the current row
2. Ctrl + Shift + Down/Up Arrow: Selects all the cells above or below the
current cell
3. Ctrl+ Home: Navigates to cell A1
4. Ctrl+End: Navigates to the last cell that contains data
5. Alt+F1: Creates a chart based on selected data set.
6. Ctrl+Shift+L: Activate auto filter to data table
7. Alt+Down Arrow: To open the drop-down menu of auto filter
Module in Data Science Analytics – mid2023-fdl
9
8. Alt+D+S: To sort the data set
9. Ctrl+O: Open a new workbook
10. Ctrl+N: Create a new workbook
11. F4: Select the range and press F4 key, it will change the reference
to absolute, mixed and relative.

Module in Data Science Analytics – mid2023-fdl


10
Descriptive Statistics

You can use the Analysis Toolpak add-in to generate descriptive statistics. For example, you
may have the scores of 14 participants for a test.

To generate descriptive statistics for these scores, execute the following steps.

1. On the Data tab, in the Analysis group, click Data Analysis.

Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak add-in.
2. Select Descriptive Statistics and click OK.

Module in Data Science Analytics – mid2023-fdl


11
3. Select the range A2:A15 as the Input Range.

4. Select cell C1 as the Output Range.

5. Make sure Summary statistics is checked.

6. Click OK.

Result:

Module in Data Science Analytics – mid2023-fdl


12
Module in Data Science Analytics – mid2023-fdl
13
Module in Data Science Analytics – mid2023-fdl
14

You might also like