0% found this document useful (0 votes)
25 views

Excel Basic For Data Analysis

This document provides an introduction to using spreadsheets for data analysis. It discusses spreadsheet applications and terminology like workbooks, worksheets, and cells. It explains how spreadsheets can help with tasks like data cleaning, analysis, and visualization.

Uploaded by

wiena wien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Excel Basic For Data Analysis

This document provides an introduction to using spreadsheets for data analysis. It discusses spreadsheet applications and terminology like workbooks, worksheets, and cells. It explains how spreadsheets can help with tasks like data cleaning, analysis, and visualization.

Uploaded by

wiena wien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Course Introduction

Do you want to learn how to use spreadsheets and start analyzing data using Excel?
This course from IBM is designed to help you work with Excel and gives you a good grounding
in the cleaning and analyzing of data which are important parts of the skill set required
to become a data analyst.
You will not only learn data analysis techniques using spreadsheets, but also practice using
multiple hands-on labs throughout the course.
In module one you will learn about the basics of spreadsheets, including spreadsheet terminology,
the interface and navigating around worksheets and workbooks.
In module two you will learn about selecting data, entering an editing data, copying and
auto filling data, formatting data, and using functions and formulas.
In module three you will learn about cleaning and wrangling data using a spreadsheet, including
the fundamentals of data quality and data privacy, removing duplicated and inaccurate
data, removing empty rows, removing data inconsistencies and white spaces, and using the flash fill
and text to columns features.
In module four you will learn about analyzing data using spreadsheets, including filtering
data, sorting data, using common data analysis functions, creating and using pivot tables,
and creating and using slicers and timelines.
At the end of this course in module five, you will complete a series of hands-on labs which
will guide you on how to create your first deliverable as a data analyst.
This will involve you understanding what the business scenario is, cleaning and preparing
your data, and analyzing your data.
You will follow two different business scenarios throughout the course, with each using their
own data set.
These different scenarios and data sets will be used in the lesson videos and in the hands-on
labs.
After completing this course, you will be able to understand how spreadsheets can be
used as a data analysis tool; understand when to use spreadsheets as a data analysis tool
and their limitations; create a spreadsheet and explain its basic functionality; perform
data wrangling and data cleaning tasks using Excel; analyze data using filter sort and
pivot table features within Excel spreadsheets.
You will also perform some intermediate level data wrangling and data analysis tasks to
address a business scenario.
The course team and other peers are available to help in the course discussion forums in
case you require any assistance.
Let's get started with your next video where you will get an introduction to spreadsheets.
Introduction to Spreadsheets

In this first video of the course, we will list some of the common spreadsheet applications available,
learn about the key capabilities of spreadsheets, and discuss why spreadsheets might be a useful
tool for a Data Analyst. There are several spreadsheet applications available in the marketplace;
some of them are more widely known and used than others, and some are free, while others need to
be paid for. By far the most commonly used spreadsheet application, and the most fully featured of
them all is Microsoft Excel. The desktop version comes in a paid form as part of the Office suite and
some Microsoft 365 subscriptions, but there is also a web-based cut-down version called Excel for
the web, also known as Excel Online. The online version is free to users with a Microsoft account,
but does not offer all the advanced features that the desktop version provides. The next most
popular is Google Sheets, which offers a lot, though not all of the features that Excel provides, and is
free with a Google account. This is a web-based application and it integrates nicely with other
Google apps, such as Google Forms, Google Analytics, and Google Data Studio. Then there is
LibreOffice Calc, a totally free and open source desktop spreadsheet application that offers more
basic functionality than Excel or Google Sheets, but still has a lot of the tools you need for data
analysis, such as charts, conditional formatting, and pivot tables. Other spreadsheet apps include
Zoho Sheet (a fully-featured web-based application that is comparable with Google Sheets),
OpenOffice Calc, Quip from Salesforce, Smartsheet (which is predominantly for project
management), and Apple Numbers, (which is included with Apple devices such as Mac computers
and is also available on the App Store for other Apple devices). So, there are many spreadsheet
application options open to you, from fully-featured to basic, from cloud-based to desktop apps, from
paid-for to free versions. It’s up to you to decide which one best fits your needs and your budget.
Spreadsheets provide several advantages over manual calculation methods. For example, once you
have your formulas correctly written, you can be assured that your calculations are accurate, and
that the calculations will be performed automatically for you. Spreadsheets also help keep your data
organized and easily accessible. Your data can be easily formatted, filtered, and sorted to suit your
needs. If you do make mistakes in your data entry or your calculations you can easily edit them,
undo them, or use error-checking tools to help remedy those mistakes. And lastly, you can analyze
data in spreadsheets, and create charts, graphs, and reports to help visualize your data analysis.
Since spreadsheet software for personal computers first appeared on the market in the 1970s, with
VisiCalc on the Apple II PC, spreadsheets have come a long way in terms of the capabilities and
features they now offer businesses, from uncomplicated tables and relatively simple computations to
powerful tools for the analysis, management, and visualization of enormous sets of data. The most
common business uses for spreadsheet applications include the following: Data Entry and Storage,
Comparing Large Datasets, Modelling and Planning, Charting, Identifying Trends, Flowcharts for
Business Processes, Tracking Business Sales, Financial Forecasting, Statistical Analysis, Profit and
Loss Accounting, Budgeting, Forensic Auditing, Payroll and Tax Reporting, Invoicing, and
Scheduling. And away from the business side of things, other typical uses include Personal
Expenses, Household Budgeting, Recipe library, Fitness Tracking, Calorie Counting & Weight
Monitoring, Sports Leagues such as Fantasy Football, Cataloging Music Libraries, and even Contact
Lists, Shopping Lists and Christmas Card Lists. As a Data Analyst, you can use spreadsheets as a
tool for your data analysis tasks, including: Collecting and harvesting data from one or
more distributed and different sources. Cleaning data to remove duplicates, inaccuracies, errors, and
resolve missing values to improve the quality of the data. Analyzing data by filtering, sorting, and
interpreting it to determine what useful information can be gleaned from it. And visualizing data, to
help you tell a story about your data analysis findings to key business stakeholders and any other
interested parties within your organization. In this video, we had an introduction to spreadsheets. We
learned about some common spreadsheet applications, what the main capabilities of spreadsheets
are, and why spreadsheets might be a useful tool for a Data Analyst. In the next video, we will look
at the basics of spreadsheets, including common spreadsheet terminology.
Spreadsheet Basic 1

Now that we have a basic understanding of


what spreadsheet software is available, and why spreadsheets might be a useful tool for
a Data Analyst, let’s get started on looking at some of the basics of using a spreadsheet
application. In these videos we will be using the full
‘desktop’ version of Excel, but the majority of the tasks that we will perform can also
be done using Excel ‘on the web’, also known as Excel Online, and other spreadsheet
applications such as Google Sheets. Let’s first cover some basic spreadsheet
terminology. When you open Excel, you have the option of
creating a new blank workbook or opening an existing workbook. We’re going to choose New, and
then Blank
workbook. Workbooks are the highest-level component
in Excel and are represented as a .XLSX file. So, when you open an existing workbook or
create a new workbook you are in fact working with a .XLSX file. The workbook contains all your
data, calculations,
and functions, and contains several other underlying elements that make up a workbook. A
workbook consists of one or more worksheets,
each of which is represented by a tab in Excel. Each worksheet is given a name which is displayed
on the corresponding tab for the worksheet. By default, each tab is named Sheet1, then
Sheet2, and so on. To make these worksheet tabs more meaningful
it is usual to rename them, so they make more sense in relation to the worksheet’s purpose. For
example, you might call a worksheet January
Sales, or perhaps the name of a region or store, or even an office or department. To do this, right-
click the tab and choose
Rename. Instead of right-clicking to rename, you can
also just double-click the name of a worksheet tab to rename it. Essentially, worksheet tabs can be
named anything
you want to fit your particular needs to make it easier to understand what that worksheet
represents. Note that a worksheet that is highlighted,
as the Tire Sales worksheet tab is here, is referred to as the active worksheet. If you want to order
your worksheets in a
different way, that is very simple to do. Either drag a worksheet tab to the left or
right and drop it in the place you want, which is represented by the little black arrow,
or if you are not comfortable with dragging and dropping, then the longer way of doing
that is to right-click the worksheet tab, select Move or Copy, and then in the list
titled Before sheet, select where you want your worksheet tab to be placed, and click
OK. Every worksheet is made up of a lot of rectangular
boxes called cells. These cells will contain your data, which
may be text, numbers, formulas, or calculation results. Cells are organized in columns, which run
vertically down the screen and use a letter system; this is column B for instance. And rows, which
run horizontally across the
screen and use a numeric system; this is row 7 for example. Each cell is represented by a cell
reference
which is essentially just its column letter and row number. For example, if we click somewhere near
the
center of this worksheet, we now have the cell M20 selected. This is usually referred to as the ‘active
cell’. This is not only indicated by the highlighted
edges of the cell but also if you look in the top left corner of the worksheet, you
will see its cell reference is noted in the little box. Here you can see it says M20. One important thing
to note here is that cells
are always referenced by their column letter first then their row number; so, column M,
and row 20. The last element of a workbook I want to mention
is a cell range. This identifies a collection of several cells
selected together; that could mean a few cells in the same row or the same column, or it
could mean several rows and columns together. This can either be done using the mouse by
selecting the first cell then ‘dragging’ down or across to include other cells; or
you can use SHIFT+ arrow keys. This range of cells is often referred to as
an array, and it’s most commonly used as a reference in calculations and formulas. For example, if
you wanted to add up all the
values in a column between cells D9 and D19 you would specify this cell range within a
formula. Note that cell ranges are notated using a
full colon (:) between the cell references; so, in this example it would be D9:D19, or
to specify a few cells in the same row it might be D9:H9, or to select several rows
and columns it might be D9:H19. We will see this notation in use later in
this course when we start looking at calculations and formulas. These cell ranges could even be a
reference
point to cells contained on another worksheet; this is usually referred to as a 3D reference. We can
now close this workbook and we don’t
need to save it. In this video, we learned about some of the
basic terminology of spreadsheet elements. In the next video, we will discuss how to
navigate around a spreadsheet, how to use the ribbon and menus, and how to select data.

Now that we have a basic understanding of

the main elements that make up a worksheet, let’s see how to move around a spreadsheet,

get familiar with the ribbon and menus, and learn how to select data in a worksheet. To open a sample
file, we click File. This opens Backstage View. Here you can create a new workbook, or open,

save or print a workbook. You can also access Excel Options. Now, we want to open our sample file. So,
we click Open, and either select it from

my Recent list, or click Browse to find the data file we want. The first thing we should do is get
acquainted

with the ribbon and menus. Notice that on the ribbon at the top we have

several tabs. Some of these tabs may be familiar to you

from other Office products, such as the Home, Insert, and View tabs, while others might

be new to you, such as Formulas, Data, and Power Pivot. To make a little more workspace for ourselves

we can hide this ribbon by double-clicking any tab, and to unhide it, we do the same. The other option is
to use the shortcut key

CTRL+F1. The ribbon is organized into groups of buttons

to make them easier to find. So, on the Home tab we have groups for Font,

Alignment, Number, Styles, and so on. Some of these groups contain all the available
buttons on the ribbon when viewing in full screen, such as Styles and Cells, but other

ribbon groups have more options, which we access by clicking the little arrow icon in

the bottom right corner of the group, as can be seen here on the Number group for example. The next
item I want to point out is the Quick

Access Toolbar at the top of the screen above the ribbon. As the name suggests this is where you can

quickly access the tools you use most often. You can see we already have some tools in

this toolbar such as Save, Undo, Redo, New, and Open. But we can add other tools to the toolbar

if we wish. So if we click the drop-down arrow in the

toolbar and then select a tool we will use a lot, such as Sort Ascending, that will be

added, and we will also add the Sort Descending button too. Now we need to be comfortable with
moving

around a worksheet. You can simply use the arrow keys to move

left, right, up, and down 1 cell at a time. But you can also use Page Down and Page Up

to move around a bit faster, which is especially useful if you have lots of rows of data. And to move even
quicker up or down a large

datasheet use the vertical scroll bar, and to move left or right use the horizontal scroll

bar. Again, these can be very useful when you have

a large data set. There are also some useful shortcuts you can

use. CTRL+Home key for example takes you back to

the start of the worksheet (i.e. cell A1). CTRL+End takes you to the cell at the end

of your data in the worksheet. CTRL+Down arrow takes you to the end of the

column you’re in, while CTRL+Up arrow takes you back to the top of that column. So a quick way to find
out how many rows of

data you have in your worksheet is to go to the first cell in your data and press CTRL+Down

arrow to see the last row of data. So here you can see we have 160 rows. Now how do we go back to the
top again? CTRL+Home will do it. So far, we have seen how to navigate around

our worksheet and its data, now we need to look at how we select data. This is very important because
you often need

to select data to move it, copy it, or select it in a formula. The simplest selection is a single cell, usually

done with a mouse or maybe a directional arrow key. The next step up is to select multiple cells

together, and this can be done either with a mouse by dragging from one cell to additional
adjoining cells, or you can use the SHIFT key with directional arrow keys. Next up is selecting a single
column or row

which is done simply by selecting the letter at the top of a column, or the number on the

left of a row. Then we can progress to selecting multiple

columns and rows, by clicking the mouse button, holding it down and dragging across more columns. Or
if you are not comfortable with dragging

you can also select the column first, then hold SHIFT+Arrow keys to select multiple columns. The same
applies to rows too. However, if you have data in non-contiguous

rows or columns (i.e. not next to each other) you can select the first column, then use

the CTRL key to select another unconnected column, such as columns C and F here. The largest thing you
might want to select

is the whole worksheet which you can do by clicking in the top left corner of the cells. However, this
selects the entire worksheet

including all the empty rows and columns; so if you only want the data in your worksheet,

you can use the shortcut CTRL+A. A word of warning when selecting data in cells, rows,

and columns; there are 3 types of cross symbols that you might see when working with selected

cells. The first one is the large white cross that

you see when you select a cell as can be seen here in cell A4, this is the Select cross

that we have been using already in this video to select cells. The second type you might see is when you

hover over the bottom edge of a cell and see a thin black cross-type symbol with arrows

on each point…. this is the Move symbol and would move the cell data to another location. The last type
is the small thin black cross

that is seen when you hover over the bottom right corner of a cell; this is the Fill Handle

or Copy symbol and it fills (or copies) the cell data to another location. In this video, we learned how to
move around

a spreadsheet, became familiar with the ribbon and menus, and learned how to select data in

a worksheet. In the next video, we will discuss how to

enter data, how to copy and paste data, and how to format data in a spreadsheet.
Viewpoints: Using Spreadsheets as a Data
Analysis Tool
In this video, we will listen to several data professionals discuss the advantages and limitations of using
spreadsheets as a tool for data analysis. Let us start with, “What are the benefits and advantages of
using spreadsheets as a tool for data analysis?” My experience using spreadsheets as a tool for data
analysis is somewhat mixed. I think they can be really, really useful in the right context, but using
spreadsheets definitely has its limitations, so the big pro of using spreadsheets is you can see all the data
cleanly laid out in front of you in a table. So, I think it's very clear to anyone looking at a spreadsheet
exactly what the data is, what format it comes in, all of that. You can just easily, visually inspect it. As a
CPA, I use Microsoft Excel on a daily basis and I have done so for the duration of my career. The
functionalities, the pivot, the pivot tables, the charts, etc. But also, being able to use formulas. My
personal favorite is Index Match for using a pretty simple way to take just thousands of lines of
information and sift through all of that to find specifically what you're looking for. Excel is really that one-
stop-shop where you can perform calculations, analyze financial ratios, and even export reports out of
the ERP that I spoke of earlier to customize it as you need. My experiences using spreadsheets is that

they're great for simple analysis. I will say spreadsheets, over the years, the process itself has just
improved as systems improve, as technology improves, spreadsheets are the way to go. Spreadsheets
overall, when you do have probably anywhere from zero to twenty-thousand lines of data, it's a good
way to go, you can really pull out the data. Whether I'm trying to see how much a client’s

making per month, but they may have, you know, a thousand transactions. All of that's helpful. I can use
this spreadsheet to whittle down what is actually going on per month or if I want to do a Sum If, or you
know if this happens, give me this number. It it's really helpful to be able to dig in and wrap your hands
around it and take something that seems, on the surface, twenty-thousand lines seems almost
unmanageable, but if I take it and I massage it, put it in a spreadsheet and then sort it filter it, make it
pretty, put in a pivot table, I can get what I need. It’s just all about not looking at it as being this
intimidating thing but making it more manageable and breaking it down into bite size chunks.
Spreadsheets are the easiest way to analyze data and present data. We don't need any fancy tools or
additional software for spreadsheets. It's like the commonly utilized language to

communicate. Thank you for that insight, but let's move

on to look at the other side of the coin. What are the drawbacks and limitations of

using spreadsheets as a tool for data analysis? I think one of the big cons in terms of analyzing

data within spreadsheets is it's really hard to reproduce state. So, in other words, if you load in some
data

and you filter out some bad values, or you impute some missing values, there's no way

to tell your colleagues or your future self exactly the different steps you took to create

that data set. Or to modify that data set. It's almost a dilemma because of the plethora
of options available within Excel and all of the functions that are there, supposedly

to make your life easier, but it's nearly impossible to know everything. And you can find yourself in what
we accountants

call analysis paralysis when you're looking at something for too long or you're not well

versed in a particular Excel function. So, you may spend a lot more time, energy,

and effort trying to figure that one thing out. And had you done it a different way? Or maybe a manual
way? You probably could have gotten to the solution

a lot easier. And the downside of using spreadsheets is

that if you have complex formulas, v-lookups, if-statements at times they just stopped working

and you have to rebuild them. So, I have found that it's better to use Excel

just for simple analysis and for a download of information. I love a good spreadsheet. I love using Excel
and pivot tables to get

to the data, but I find that I if I start to get over ten, twenty- thousand lines of

data, it gets a little tricky. And sometimes the spreadsheets will crash. So that's when we might move to
Access and

some of the other tools that we use. Is very difficult to handle the extremely

large data set in spreadsheets. Besides spreadsheets have less flexibility

for complicated analysis and presentation.

Week 2 ( Viewing, Entering, and Editing


Data)

Now that you have learned basic spreadsheet terminology and learned how to navigate your
way around worksheets and select data in Excel, it’s now time to start entering some data.
First, we will look at some of the handy viewing features provided in Excel, and then we’ll
enter some data, and then edit that data.
When you have a lot of data in your worksheet it can be useful to zoom in closer to a specific
area of the data.
The Zoom Slider at the bottom right corner of the worksheet allows you to do just that.
You can either click on the plus and minus buttons or drag the slider to select your
preferred zoom value.
You also have some zoom controls in the ribbon on the View tab.
Zoom lets you pick a predefined zoom level or a custom one, the 100% button zooms the
worksheet back to its original size, and Zoom to Selection enables you to select an area
of data and then zoom into that specific selection only.
If you want to see several areas of your data at the same time while zoomed in, you can
use the Split button.
This splits the screen into multiple sections; and you can scroll each section separately.
If you only want two sections, you can remove either the horizontal or the vertical split
by double-clicking on it.
If you have headings in your columns like a header row, then you might want those to
remain on screen while you move down the sheet.
To do that you need to use Freeze Panes.
You can freeze only the top row if you wish, or if that doesn’t suit, as is the case
here, then you can select the row (or even just a cell in the row) below the row or rows
you want to freeze, and then select Freeze Panes.
You can do a similar thing for columns you want to freeze too.
And you can even freeze both rows and columns at the same time.
The trick here is to first select the cell that is both one row below where you want
to freeze, and one column to the right of where you want to freeze.
In this case, that is cell C4.
Now we can scroll down the worksheet and across the worksheet and we can still see the header
row and the Manufacturer and Model columns.
Now, if you have multiple workbooks open (notice I said workbooks and not worksheets) then
you can switch between them by using View, Switch Windows, or the faster method is to
use the CTRL+F6 shortcut.
Now let’s enter some data into a blank worksheet.
The easiest way to open a new worksheet from within Excel is to click the New button in
the Quick Access Toolbar (or CTRL+N if you prefer keyboard shortcuts).
So let’s enter some headings across the top of the worksheet; this is typically referred
to as a ‘header row’.
Note, that if you press Enter after typing data into a cell the next active cell is the
one directly below, which is not what we want in this case.
But, if we press Tab after we enter data in a cell, it selects the next cell along in
the row as the active cell.
Now we’ll enter some headings and press Tab after each entry.
Notice that the text is slightly longer in some of the cells and it either gets partly
hidden by the next cell or overlaps it.
If you click and hold the divider line between two columns, you can drag it left and right
to resize it manually.
If you want to do that automatically, you can double-click the divider line between
two columns.
As these are going to be headings for our columns, let’s make them bold.
Now let’s add another column between the parts and accessories columns.
Simply select the right-hand of those two columns, then right-click and choose Insert
to put another column to the left of the selected column.
Let’s call it Servicing Sales.
To tidy up all our column widths simultaneously, we select all the columns from A to E, then
double-click any of the divider lines between columns; this automatically reduces or increases
each column’s width to fit the data in that column.
OK, now we have some headings, let’s enter some month data in column A.
So, if we type Jan in cell A2 and press Enter then it takes us to the cell below, which
is what we want in this case and we can type Feb in cell A3 and so on until
we get to Dec in A13.
Now, let’s suppose you need to change a couple of your headings.
You have several ways of editing existing data in a cell;
You can either select the cell and then just start over typing.
Or you can select the cell and press F2 on your keyboard to put the cursor at the end
of the cell and make your changes.
Or you can simply double-click somewhere on the cell to put the cursor at that position
in the cell and make your changes.
And you can even select the cell and then click in the formula bar to edit your cell
data.
Now let’s do the same for the parts and accessories column headings.
In this video, we learned about some of the viewing options in Excel, and we learned how
to enter and edit data in cells.
In the next video, we will learn how to copy and fill data, and how to format the cells
and data in a worksheet.

Copying, Filing and Formatiing Cels AND Data

Now that we have learned about some of the handy viewing features provided in Excel, and entered and
edited some data, let’s

discuss how to move, copy, and fill data, and how to format cells and data to suit our needs. The first
thing we are going to discuss is

how to move data, so if you select a range of cells, in this case the headings in A1

to E1, and then hover over the top or bottom edge of a selected cell, and you will see

the Move pointer, then you can drag the selection to another place on the worksheet. Alternatively, if
you want to copy the data

instead, you do the same thing but this time you also hold CTRL key as you select and drag the selection
to another location and you will see the Copy pointer. If you are not comfortable with dragging,

you can also use Copy and Paste menu commands or keyboard shortcuts. So if you select some data in
column A and copy it to the clipboard. Then you simply select the new location and paste the copied
data. You can also move or copy between worksheets, so let’s create a new worksheet. Then select some
data from Sheet1, and this time let’s use the CTRL+C keyboard shortcut to copy it to the clipboard. Then
choose the other worksheet and use the

CTRL+V shortcut to paste the data. However, notice that the column widths are

not the same as the original source data, so let’s undo that and try another paste

option. By default, when you paste the copied data, it uses the column width settings of the destination
cells. So, to paste it and retain the column widths

of the source data, you chose the special option under the Paste command, called Keep Source Column
Widths. As an alternative to having to enter data

manually in a worksheet, you can use an Excel feature that automatically fill cells with

data when it follows a sequential series or pattern. The feature is called AutoFill, and it can

be especially useful when you need to enter lots of repetitive data into Excel, such as
date information. For example, if you enter a month in a cell,

even using a shortened version of the name, you can use what’s called the Fill Handle

to select down to the end of the series, and AutoFill will work out what the series is,

based on the selected data. Let’s try the same thing with days of the

week. If you enter Mon in a cell, then drag the

fill handle to use AutoFill, it will determine that you want to enter the days of the week sequentially.
However, if you also enter Wed (for Wednesday) in the next cell down, and select both cells in the series,
i.e. A16 and A17, and then drag the fill handle

down, AutoFill determines that the sequence has changed to every other day, and fills

in the data series for you. It’s important to select all cells that

define the pattern when using AutoFill so that it can best determine what the pattern is, in this case cells
A16 and A17. A similar thing applies to numerical patterns;

if you enter 5 in a cell, and then use the fill handle to fill the data down the column. Because the data is
not the name of a day

or month for example, AutoFill can’t determine what the pattern is yet. So, In this case, it just copies the
value

5 into every selected cell. However, if you enter the value 10 in B3,

and then use the fill handle to fill the data down the column, AutoFlll determines that

the pattern is incrementing by 5 each time and it fills in the remainder of the data

pattern for you. We are now going to look at formatting our data, and there are essentially two distinct
parts to this. First, there’s formatting of the cells themselves (with a fill color and a bold border for
example and bold text within it). And then there’s formatting the data in

the cells (for example, making it text format, number format, or a specific currency or accounting
format). Let’s open the car sales worksheet we used previously. Then select the headings in cells A3 to
P3

either using the mouse, or you could use the shortcut keys CTRL+SHIFT+Right Arrow. On the Home tab,
click the Styles drop-down arrow, and select a style color for your cells. Then you can make the selected
cells bold. Then you select the data in the Manufacturer column either using the mouse, or the shortcut
keys CTRL+SHIFT+Down Arrow. In the Styles drop-down arrow, select another style color for the selected
cells. Again you can make the cells bold. Then you select the data in the Model column again either using
the mouse, or the shortcut keys CTRL+SHIFT+Down Arrow. In the Styles drop-down arrow, select another

style color for the selected cells. This time you could make the selected cells italic. And you can also
change the font size and style. Lastly, you can select all the other cells
in the data by using the mouse or the CTRL+SHIFT+Right Arrow then Down Arrow, and apply borders to
the data cells. Now it’s time to format the cell data. The sales figures in columns C and D can be
formatted to display only two decimal places; just select the data and click the Decrease Decimal button.
We also have an issue with a couple of the car models. If you look in cells B129 and B130, where

the model name is supposed to be displayed, you can see there are actually two dates listed instead. And
if you look in the Number Format box,

the format type is Custom. This has happened because the model numbers are supposed to be the Saab
9-5 and the Saab 9-3 but when the files were imported from

CSV files these two cells must have been incorrectly determined to be date values and not just

numbers. You can fix this by formatting these two cells as Text, and then enter the correct values of 9-5
and 9-3. The last thing we shall do is format some

data as currency. If you look at the heading in column F it

says it is Price in thousands of dollars, and cell F4 is using the General format. So, let’s change the format
of this column

to American currency format. We select the column, F in this case, then

select More Number Formats from the drop-down list, then we choose the Currency option,

and the correct currency symbol and format. And we’re done. In this video, we learned how to move,
copy,

and fill data, and how to format cells and cell data to suit our needs. In the next video, we will look at the
basics of formulas, learn how to perform simple calculations, and learn how to select ranges and copy
formulas.

The Basics of Formula


Now that we have learned how to move, copy, and fill data, and how to format cells and
data, next we will take a look at the basics of formulas, including some basic calculations,
selecting ranges in formulas, and how to copy formulas.
A typical formula is made of several key components.
The equal sign starts the formula off and lets Excel know you are creating a formula
in this cell.
The next part is the function, which performs the calculation.
For example, the SUM function adds up the values in referenced cells or cell ranges.
Then comes the reference, which is the cell or range of cells you want to include in your
calculation, and these need to be enclosed in parentheses.
You also have operators, which specify what type of calculation to perform.
Common arithmetic operators include: addition, subtraction, multiplication and division.
And these are represented by symbols.
The plus symbol for addition, the minus symbol for subtraction, the asterisk for multiplication,
and the forward slash for division.
There are other types of operators too.
Namely comparison, text concatenation, and reference.
You may also use constants in your formulas, which as the name suggests are numbers or
values which you can enter directly into a formula, and which don’t change.
This might be a whole number such as 5, it might be a percentage such as 10%, or it might
even be a date.
So, a typical formula might be =SUM(B5*20), which would take the value in cell B5 and
multiply it by 20.
Let’s start with a few basic calculations.
Suppose you want to add up January and February sales of accessories.
You would start by typing an equal sign, which lets Excel know you are entering a formula.
Then you type in the function you wish to use, in this case the SUM function.
Note the description.
Next you type an open parenthesis, then you select your cell range, which in this case
would be E2 to E3, so you could enter that as ‘E2,E3’ then a close paranthesis and
press Enter.
And if you wanted to add March sales as well, then you would have to extend the cell range
to include E4.
So you could type E2,E3,E4 as your range and it will work.
Remember, to edit a cell, you select the cell, and either edit it directly in the formula
bar, or press F2, or double-click the cell.
However, it’s very cumbersome and not very flexible to do it this way, because if you
wanted to add up the entire column then you’d have to type every cell reference, one after
the other.
So thankfully, there’s a better way.
Instead of typing each cell to include in the reference, you just put a colon between
the first and last values in our range, so E2:E4, in this case.
And if you wanted the whole column, then you would enter E2:E13 in your formula.
But there’s another way of doing it, and that’s by using your mouse to select the
range, so you still type =sum then open parenthesis, but select the range with your mouse (or
SHIFT
+ arrow keys) and just press Enter.
Excel will add the close parenthesis for you.
To total these columns up, and add some tax, you’d add some headings first for Subtotals,
and Tax at 20%.
Then your formula will need to multiply the value in Subtotals by 20%.
If you want to add up all the column subtotals and calculate the taxes, then you could repeat
the previous process for each column, but that’s very time consuming, and you don’t
need to, because Excel has some neat tricks to do this for you.
Just select the fill handle in the bottom right corner of the cell, and drag across
to the other cells to copy the formula; this is called AutoFill.
Notice how the formula is copied, but the row references change in relation to the cells’
position on the worksheet.
So what was E2:E13 has become B2:B13.
These are known as relative references, but more on that later in the course.
And you can do the same thing for the tax values in row 16.
Now, you need a row for showing the totals.
The calculation here is simply the subtotal value in cell B15, added to the tax in B16.
And again, you can use the fill handle to copy the formula across.
If you want to total the sales of all products by month, you’d add a column heading; notice
how the cell style is copied to the new heading automatically.
Remember, to widen a column, either drag the divider manually, or double-click the divider.
Then enter the formula in cell F2 as you’ve done before.
However, Excel has another trick up its sleeve.
It’s called AutoSum and is found on the Home tab, in the Editing group.
This is a great little shortcut for some simple common functions like Sum, Average, Count,
Max, and Min, but you can choose other functions too.
You want ‘Sum’ for this particular calculation.
Notice that it also has a keyboard shortcut of ‘Alt plus equals’, and then press Enter,
and it’s done.
Now you can use the fill handle to copy down the remaining values.
But hold on, there is one more Excel trick to show and it’s a good one!
Suppose your column of data was very long; you might have to drag the fill handle down
over several pages, which isn’t easy to do and can easily lead to errors when selecting
large lists of data values.
Rather than needing to drag down to the rest of the column, you can just double-click the
fill handle, and it will automatically copy the formula to all the remaining cells in
that column.
This one is a real time-saver.
Finally, let’s format all these values to use the US dollar currency format.
In this video, we learned about the basics of formulas, how to perform simple calculations,
how to select ranges in formulas, and how to copy formulas.
In the next video, we will look at how to use some of the common functions used by Data
Analysts and discover some more advanced functions.

Intro to Functions
Now that you have learned about the basics of formulas, learned how to perform some basic
calculations, and how to select ranges and copy formulas, next we will have an introduction
to functions, including using some common statistical functions.
And then we will learn about some more advanced functions that a Data Analyst might also use.
First, let’s look at some common functions used for statistical calculations.
So, we’ll add some row headings for average, minimum, maximum, count, and median.
Then in cell B20, let’s work out the average of the car sales for the year, from the table
above.
On the Home tab, in the Editing group, we click the AutoSum drop-down list and choose
Average.
Now, because AutoSum tries to add up the values directly above it in the column, we need to
modify the cell range here to B2 to B13.
Then we can use the Fill Handle as we’ve seen before to copy the formula across to
column E.
For the minimum calculation in B21, we select Min from the AutoSum list.
And again, we need to modify the cell range.
So this calculates the lowest value in our range.
And fill across to column E. And for the maximum calculation, we select
Max from the list.
And then modify the range.
And once again, copy the formula across.
This calculates the highest value in our range.
In B23 we will calculate the Count, which basically just means the number of values
that exist in the selected range.
So, we select Count Numbers from the list.
Then modify the range.
For the median calculation, we can select ‘More Functions’ from the AutoSum list,then
select ‘Statistical’ as the category, and scroll down to find the MEDIAN function.
The ‘median’ returns the exact middle of a range of selected values.
Note that if you’re selecting an odd number of values it will return the figure that is
the middle value in your selected range, but if you have selected an even number of values
in your range, it will return the middle figure between the two middle values in your range.
Once again, we need to change the cell range to B2 to B13.
And we can then copy this formula across to column E.
You’ve seen AutoSum and some of the common statistical functions in Excel, but there
are another 400-plus other functions available, so let’s explore just a few of those now.
On the Formulas tab, in the Function Library group, there are drop-down lists for several
function categories.
The first is a list of ‘Recently Used’ functions, which updates automatically as
you use them.
Then you have functions related to ‘Financial’ calculations.
If you hover over the name of a function, you see a short description for each one;
so here we have the accrued interest function, and here is the interest rate function.
The ‘Logical’ list has BOOLEAN operator functions such as AND, IF, and OR.
There are several functions related to Text, such as CONCAT, which is an updated version
of a previous function called CONCATENATE (which is still supported by the way for backwards
compatibility), FIND, and SEARCH.
There are also several functions related to dates and times, such as NETWORKDAYS, WEEKDAY,
and WEEKNUM.
In the ‘Lookup & Reference’ list there are functions such as AREAS, HLOOKUP, SORTBY,
and VLOOKUP.
In the ‘Math & Trig’ list you’ll find lots of useful mathematical functions, such
as POWER, SUMIF, and SUMPRODUCT, alongside many functions for trigonometric purposes,
such as cosine, sine and tangent.
There is also a ‘More Functions’ list which provides several more function categories,
such as Statistical, Engineering, and Information.
In the ‘Statistical’ list you’ll find functions such as Average, Count, Max, Median,
and Min; we saw some of these used earlier in this video.
If you’re struggling to find the function you want in these lists, you can also search
for a function; just click the ‘Insert Function’ button on the Formulas tab, and then either
browse the category lists available, or choose ‘All’ and look down the alphabetical list
for the function you want.
Alternatively, type the name of a function you want to find, and click ‘Go’ to search
for it, then select the one you want from the returned search.
In this video, we learned about the basics of functions, how to use some of the more
common functions that a Data Analyst might employ, and looked at some of the more advanced
functions available in Excel.
In the next video, we will look at referencing data in formulas; specifically differentiating
between relative and absolute references, and error handling in formulas.
Referencing Data in Formulas
Now that you've had an

introduction to functions, seeing the use of some common

statistical functions and learned about some of the more

advanced functions that a data analyst might use, in this video

will look at the difference between relative, absolute, and

mixed references in formulas as well as how to use them. And

we'll learn about formula errors in Excel. It's important to

understand the difference between relative and absolute

references when creating your formulas. By default, in Excel,

cell references are always relative references. The term

relative is the key here, because it means that when you

reference a cell, you are in fact referencing the cells

position in relation to the cell that the formula is in. That is why when we have been

copying formulas from one cell to another so far in this course,

using either copy and paste or the fill handle, we haven't

needed to modify the cell references because Excel assumes

you are using relative references. When the formulas

are copied, the cell references are changed to match the

relative positions of the cells that are being copied to. So now we know that relative

references are the default in Excel, but how do we make it so

that the cell references don't change when we copy them? For

that you need to use absolute references in contrast to

relative references. Absolute references to cells stayed the

same. When you copy a formula containing such references. Lastly, there may also be some

instances where you only want one of the cell reference

identifiers to be absolute and the other one to be relative.

For example, you might want the row identifier to be absolute,


but the column Identifier to be relative, or vice versa. These

are called mixed references and. An example of this would be

equal sign a dollar sign one plus A3 where a dollar one. Has a relative column and an

absolute row or dollar 8. Three has an absolute column. Ando relative RO. In contrast to

relative and absolute references, when you copy a

formula containing mixed cell references, any relative cell

references will change, whereas any absolute cell references

will stay the same in the copied formula. First, let's look at an

example of using relative references in a formula. For

example, if we enter the formula equals A1 plus a 3IN cell, four

note the blue an red highlighted cells in a one, and a three.

These denote the cells being relatively referenced in the

formula. If we copy the formula to the cell directly below using

the fill handle, we can see that the result changes, and if we

look at the copied formula. You can see that the blue and

red cell references have changed relative to their position on

the worksheet. The formula has been changed to equals A2 plus a

four in the copied formula. That is, each cell reference has

moved one cell down and if we copy and paste the formula to

see seven, you can see that the results also changes and again

we can see that the blue and red cell references in the copied

formula have changed now. Let's look at an example of how

to use absolute references in a formula. All you need to do to

make a cell reference absolute is put a dollar sign in front of

the column and or row identifiers in the formula. For

example, if we enter the formula equals dollar sign a one plus

sign a dollar 3IN cell E4. Note the blue and red highlighted

cells in a one and a three. These denote the cells being. Absolutely referenced in the

formula. When we copy the formula using the fill handle,


you can see that the result stays the same this time and if

we look at the copied formula you can see that the blue and

red cell references haven't changed. The formula is still

equal sign dollar a dollar one plus a dollar three in the

copied formula. That is, the cell references haven't changed. Similarly, if we then copy and

paste the formula to E7, you can again see that the result stays

the same this time and we can see that the blue and red cell

references haven't changed. The formula is still equal sign

dollar a dollar one plus dollar a dollar three in the copied

formula. That is, the cell references haven't changed.

Lastly, will look at an example of how to use mixed references

in a formula so. If we enter the formula equals a

dollar one plus dollar 8, three in cell G4. Note the blue and

red highlighted cells in A1A three. These denote the cells

being referenced in the formula. If we copy the formula to the

cell below using the fill handle, you can see that the

result changes, but it's a different result from the

previous examples. And if we look at the copied formula, you

can see that the first blue cell reference has stayed the same. But the second red cell

reference has changed. If we copy and paste the formula to

G7, you can see that the same thing happens. The result

changes and again we can see that the first blue cell

reference has stayed the same in the copied formula, while only

the red cell reference has changed. Now we'll have a quick

introduction to dealing with formula errors in Excel. Because of the complexity of

writing formulas, especially the more complicated ones, there are

bound to be occasions when you make a mistake in the syntax or

in the data selection which will lead to a formula error. Errors

are typically denoted by displaying in the cell that is


supposed to be displaying the result. One of the error codes

in this list when you see multiple hash symbols in a cell,

it's not really an error, it just means the column either

isn't wide enough to display the whole word or value. Or it contains a negative date

or time value? So if we type control plus semi

colon, then space then control plus shift plus semi colon, it

enters today's date and the current time. But the cell is

too narrow to display it. So what we see is multiple hash

symbols. If we adjust the column width we can now see the cell

contents. So as I said, this really shouldn't be considered

as an error. However if we enter the formula seen in Cell I7. When we press enter, we see a

hash name error. This error was caused by trying to use an X as

a multiplication operator when in fact it should be an

asterisk. Note the small green triangle in the top left corner

of the cell. Also note that when you select

the cell and exclamation mark appears, providing you with a

hint about what caused the error. In this case it says the

formula contains unrecognized text. When you click the

dropdown error next to the exclamation mark for an error,

you see several options. The first line also gives you a clue

on the nature of the error. This one says invalid name error, so

it was probably a mistyped cell reference value or function name. If you click help on this

error, uh, help pane opens with specific information related to

this error. If you click show calculation steps, a dialog box

opens displaying the current syntax with the error

underlined. And you can try to evaluate the error if you are

certain the error is incorrect, you can choose ignore error,

and if you want to edit the formula, click edit in Formula

Bar and the cursor will be focused in the formula bar so


that you can try and correct the formula error. If you click error checking

options, the Excel Options Dialog Box is opened at the

section related to error checking rules and you can

modify these options to suit your needs. Each of the errors

you make which generate one of the error codes listed at the

start of this video will have a different reason and a different

solution For more information on each of these errors and typical

solutions visit the link provided. In this video we

learned about referencing data in formulas, specifically

differentiating between relative, absolute, and mixed

references, and how to use them. And we learned about formula

errors in Excel.

Relative absolute and mixed references


Now that you've had an

introduction to functions, seeing the use of some common

statistical functions and learned about some of the more

advanced functions that a data analyst might use, in this video

will look at the difference between relative, absolute, and

mixed references in formulas as well as how to use them. And

we'll learn about formula errors in Excel. It's important to

understand the difference between relative and absolute

references when creating your formulas. By default, in Excel,

cell references are always relative references. The term

relative is the key here, because it means that when you

reference a cell, you are in fact referencing the cells

position in relation to the cell that the formula is in. That is why when we have been

copying formulas from one cell to another so far in this course,


using either copy and paste or the fill handle, we haven't

needed to modify the cell references because Excel assumes

you are using relative references. When the formulas

are copied, the cell references are changed to match the

relative positions of the cells that are being copied to. So now we know that relative

references are the default in Excel, but how do we make it so

that the cell references don't change when we copy them? For

that you need to use absolute references in contrast to

relative references. Absolute references to cells stayed the

same. When you copy a formula containing such references. Lastly, there may also be some

instances where you only want one of the cell reference

identifiers to be absolute and the other one to be relative.

For example, you might want the row identifier to be absolute,

but the column Identifier to be relative, or vice versa. These

are called mixed references and. An example of this would be

equal sign a dollar sign one plus A3 where a dollar one. Has a relative column and an

absolute row or dollar 8. Three has an absolute column. Ando relative RO. In contrast to

relative and absolute references, when you copy a

formula containing mixed cell references, any relative cell

references will change, whereas any absolute cell references

will stay the same in the copied formula. First, let's look at an

example of using relative references in a formula. For

example, if we enter the formula equals A1 plus a 3IN cell, four

note the blue an red highlighted cells in a one, and a three.

These denote the cells being relatively referenced in the

formula. If we copy the formula to the cell directly below using

the fill handle, we can see that the result changes, and if we

look at the copied formula. You can see that the blue and

red cell references have changed relative to their position on


the worksheet. The formula has been changed to equals A2 plus a

four in the copied formula. That is, each cell reference has

moved one cell down and if we copy and paste the formula to

see seven, you can see that the results also changes and again

we can see that the blue and red cell references in the copied

formula have changed now. Let's look at an example of how

to use absolute references in a formula. All you need to do to

make a cell reference absolute is put a dollar sign in front of

the column and or row identifiers in the formula. For

example, if we enter the formula equals dollar sign a one plus

sign a dollar 3IN cell E4. Note the blue and red highlighted

cells in a one and a three. These denote the cells being. Absolutely referenced in the

formula. When we copy the formula using the fill handle,

you can see that the result stays the same this time and if

we look at the copied formula you can see that the blue and

red cell references haven't changed. The formula is still

equal sign dollar a dollar one plus a dollar three in the

copied formula. That is, the cell references haven't changed. Similarly, if we then copy and

paste the formula to E7, you can again see that the result stays

the same this time and we can see that the blue and red cell

references haven't changed. The formula is still equal sign

dollar a dollar one plus dollar a dollar three in the copied

formula. That is, the cell references haven't changed.

Lastly, will look at an example of how to use mixed references

in a formula so. If we enter the formula equals a

dollar one plus dollar 8, three in cell G4. Note the blue and

red highlighted cells in A1A three. These denote the cells

being referenced in the formula. If we copy the formula to the

cell below using the fill handle, you can see that the
result changes, but it's a different result from the

previous examples. And if we look at the copied formula, you

can see that the first blue cell reference has stayed the same. But the second red cell

reference has changed. If we copy and paste the formula to

G7, you can see that the same thing happens. The result

changes and again we can see that the first blue cell

reference has stayed the same in the copied formula, while only

the red cell reference has changed. Now we'll have a quick

introduction to dealing with formula errors in Excel. Because of the complexity of

writing formulas, especially the more complicated ones, there are

bound to be occasions when you make a mistake in the syntax or

in the data selection which will lead to a formula error. Errors

are typically denoted by displaying in the cell that is

supposed to be displaying the result. One of the error codes

in this list when you see multiple hash symbols in a cell,

it's not really an error, it just means the column either

isn't wide enough to display the whole word or value. Or it contains a negative date

or time value? So if we type control plus semi

colon, then space then control plus shift plus semi colon, it

enters today's date and the current time. But the cell is

too narrow to display it. So what we see is multiple hash

symbols. If we adjust the column width we can now see the cell

contents. So as I said, this really shouldn't be considered

as an error. However if we enter the formula seen in Cell I7. When we press enter, we see a

hash name error. This error was caused by trying to use an X as

a multiplication operator when in fact it should be an

asterisk. Note the small green triangle in the top left corner

of the cell. Also note that when you select

the cell and exclamation mark appears, providing you with a


hint about what caused the error. In this case it says the

formula contains unrecognized text. When you click the

dropdown error next to the exclamation mark for an error,

you see several options. The first line also gives you a clue

on the nature of the error. This one says invalid name error, so

it was probably a mistyped cell reference value or function name. If you click help on this

error, uh, help pane opens with specific information related to

this error. If you click show calculation steps, a dialog box

opens displaying the current syntax with the error

underlined. And you can try to evaluate the error if you are

certain the error is incorrect, you can choose ignore error,

and if you want to edit the formula, click edit in Formula

Bar and the cursor will be focused in the formula bar so

that you can try and correct the formula error. If you click error checking

options, the Excel Options Dialog Box is opened at the

section related to error checking rules and you can

modify these options to suit your needs. Each of the errors

you make which generate one of the error codes listed at the

start of this video will have a different reason and a different

solution For more information on each of these errors and typical

solutions visit the link provided. In this video we

learned about referencing data in formulas, specifically

differentiating between relative, absolute, and mixed

references, and how to use them. And we learned about formula

errors in Excel.
WEEK 3 - Introduction to Data Quality
Data analysis can play a pivotal role in business decisions and processes. In order to use the data to make confident

decisions, we must have the right information for the project and the data must be free from errors. In this video
we will learn how to profile data to discover inconsistencies. Whether we are working with small sets of data or
analyzing a spreadsheet with thousands of rows, one of the most difficult parts of the data analysis is finding and
keeping clean data.

To help with this process and qualify the data, look for these five traits: Accuracy, Completeness, Reliability,
Relevance and Timeliness. Accuracy is the first and most significant aspect to data quality. A data analyst must
clean the data set by removing duplicates, correcting formatting errors, and removing blank rows. Another
important aspect of data quality is determining if the information required to complete the data set is readily
available. Why does this matter as a trait for quality data? Let’s say we are given the task to calculate the revenues
of all sales per region. After collecting the data, we discover that no regions were specified. This data would then
be considered incomplete and other sources would have to be considered to obtain the data required.

Reliability is another vital factor in determining the quality of the data. For instance, let’s say we are given the task
to determine the agent revenue by customer. When gathering the data, we find the agents keep their own records
and do not always update the information in the shared company database. With those factors in mind, we would
then determine that the data in the shared company database was unreliable and new processes would need to be
established to ensure reliable data.

Relevance is another trait of quality data. When collecting information, a data analyst must consider if the data
being assembled is really necessary for the project. For example, when reviewing the data related to the sales
revenue per customer, information such as customer birthdays and other personal information is also included. By
making the determination early to exclude the personal information from the data set, the analyst would save
themselves from having to review unnecessary information.

The last factor in determining the quality of the data is timeliness. This trait refers to the availability and
accessibility of the selected data. Let’s say our sales report is going to be used for weekly employee reviews, but
our report is only refreshed once a month. This error in refreshing the data would cause our report to become
outdated, and would have serious consequences for employee reviews. In this video we learned the important role

of a data analyst in qualifying data. By considering the five traits of good quality data, an analyst can save time,
avoid serious issues, and have data that is free from errors. In the next video we will take the collected data and
learn how to import it to our spreadsheet.
Importing File Data
Now that you have learned about the importance of data quality, in this video you will learn
how to import data from a text file using the Text Import Wizard, learn how to adjust
column widths, and learn how to add and remove columns and rows. As you know, by default Excel
works with .xlsx or .xls files and opens them as workbooks. But Excel can also use data that is in
other formats, such as plain text, or data that has been comma-separated and tab-separated.
Sometimes, these source files will be saved with a .txt extension and referred to as ‘text’
files, but others might be saved with a .CSV file extension, and are typically referred
to as CSV files. Here in Notepad, I have opened a text file that contains data about car sales, and it
uses comma separated values (or CSVs) to separate each bit of data in a record. Notice that the top
line holds headings, such as Manufacturer, Model, Engine_size, and so on, and each one is
separated by a comma. We want these to become our headers when we import the file into Excel.
The line below these headings is the first line of real data, and again you can see that each piece of
data is also separated by a comma. There are 16 headings and there are also 16 pieces of data on
each of the lines below the headings. If we scroll to the bottom, we can see that last data record is
for the Volvo S80. Now, to open the file in Excel, we choose File, Open, and then either select the
file from the recently used list, or click Browse to find the file we want to import. When we open the
file, the Text Import Wizard launches automatically, and it will start to try and determine what your
file is. Note that it has been detected as being a delimited file; that is, one that has its data fields
separated by a character such as a comma or a tab. As we want the headings to become headers in
Excel, we need to ensure that we select the option ‘My data has headers’. We can see a mini
preview of the data in the preview box below. Then we click Next to proceed in the wizard.
In step 2 of the wizard, we need to select our delimiter; that is, which character is separating our
pieces of data; so we select Comma, and deselect any others. Note the data preview now starts to
show us what the imported data will look like. You can scroll down and across this preview window
to ensure that the data is going to look as you want and expect. It all looks OK, so we’ll continue
with the wizard. In step 3 of the wizard, we can set the data format for each column. For example,
you might want to change a column to Text or Date format. In this case we can just accept the
default General format, and finish the import wizard. In Excel we can see that the headings in the
text file have been imported as a header row. But also notice that some of the columns are not
showing all the data; some of the headings are not showing in full and some of the data is not shown
either; all you can see are a number of hashes in the cells. This is because the column widths are
too narrow in some cases. If you remember, we can manually adjust a column’s width by dragging
the divider across. But to change them all in one go, we select all the columns first, then double-click
one of the selected column dividers. We can do a similar thing with rows by dragging to make them
bigger or smaller, or double-clicking a row divider to autosize it. There are some columns that we
have decided we don’t really need; namely Vehicle_type and Latest_Launch, so let’s remove those.
This can either be done using the Delete drop-down menu in the Cells group on the Home tab, and
select Delete Sheet Columns, or by selecting and right-clicking a column and deleting it that way.
To add another column, you simply select the column to right of where you want your new column to
be, then right-click the column and choose Insert. And let’s give the header a name, such as Year.
To delete a row you don’t need, select the row, right-click it, and choose Delete.
And to add a row, select the row below the place you want to add your new row, right-click
the row, and choose Insert. If you want to save the file as an Excel file, you can either choose File,
Save As, or you can click Save As in the yellow tooltip that appeared at the top of the worksheet
when we imported the file, and then you would choose ‘Excel Workbook (*.xlsx)’ in the ‘Save
as type’ box. In this video, we learned how to import data using the Text Import Wizard, we learned
how to adjust column widths, and we learned how to add and remove columns and rows.
In the next video, we will discuss the importance of data privacy, including sensitive information,
and personally identifiable data.

Basics of Data Privacy


In this video, we will learn about data privacy and the regulations that govern the collected
data. When collecting customer data, specific regulations apply to how that data can used. By
understanding data privacy regulations and getting familiar with the following three fundamentals,
you can eliminate the risk of financial penalties and keep the trust of your customers. Confidentiality,
Collection and Use, and Compliance. Confidentiality is an important element in data privacy and it
acknowledges that the customer’s personal information belongs to them. The types of information
that can be accessed by a data analyst can range from sales forecasts, to employee information,
or even patient records. When accessing these types of records the analyst must be able
to recognize the different types of personal data. Personal Information or PI is any type of
information that can be traced back to a specific individual. This type of information can include
anything from emails to images. Personally Identifiable Information or PII is specific information that
could be used to identify an individual. This type of information could include a social security
number or a driver’s license number. And lastly , Sensitive Personal Information or SPI, may not
necessarily identify a specific individual, but contains private information that needs to be protected
because if made public it could possibly be use to harm the individual . The type of information can
include data about race, sexual orientation, biometric or genetic information. By understanding
personal data and the associated regulations, we can efficiently anonymize our data by removing
unnecessary information.
This type of action can help build consumer confidence and continue to develop the free
flow of information. When searching through data, the analyst must know the location of the
company collecting the data and the location of the respondent. Knowing where the data was
collected is an essential element of data privacy and what regulations must be applied.The General
Data Protection Regulation or GDPR is a regulation specific to the European Union, and only applies
to the jurisdiction of the individual. A new law created in Brazil, the LGPD, will take effect in August
2020. These new data policy regulations apply to individuals within Brazil, and ignores the location of
the data processor. While the United States does not have one country-wide principle law for data
privacy.
Because of this individual states began to make their own regulations. For instance,
California created the California Consumer Privacy Act (CCPA) to better protect customer
data. There are also industry specific regulations that govern the collection and use of sensitive and
personal data. For example, in Healthcare, HIPAA privacy rules govern the collection and disclosure
of protected health information.

In retail, the PCI standards govern credit card data, and failure to safeguard cardholder information
can result in hefty fines. With a basic understanding of these policies, we are able to remain
compliant when handling any sensitive information. Unfortunately, breeches in customer data is an
all too common occurrence and understanding how to remain compliant is essential. Understanding
the data privacy regulations of the European Union, the United States, and other countries as well as
industries is key to keeping data safe. Companies must comply with these privacy regulations at all
times and also make sure policies are readily accessible to employees. For example, let’s say a data
analyst downloads a spreadsheet of sensitive information. In order to complete the report by Monday
morning, the analyst decided to take their work laptop home for the weekend. After driving home,
the analyst accidently left the laptop in their car. The next morning, they found their car had been
stolen along with the laptop. Because it is the responsibility of the company to keep customer data
safe, this was a breach of privacy when the data left company property.

This type of action could not only cost the company large amounts of money in fines and penalties,
but could also reduce consumer confidence causing a significant impact to revenue. While data
privacy applies to most data that is collected, there are some instances where these regulations do
not apply. In order for these laws and regulations not to apply, the particular collection of data must
be completely anonymous. To make data anonymous means to exclude all data which ties it back to
a particular individual. While this approach might not be practical in all circumstances, collecting
data with privacy in mind could remove privacy limitations and make data collections more
accessible. In this video we learned about the importance of data privacy and the challenges that a
data analyst can face when collecting and sorting through data. In the videos in the next lesson, we
will learn about different methods for cleaning data in a spreadsheet.

Viewpoints: Data Quality and Privacy


In this video, we will listen to several data professionals discuss the importance of data
quality and data privacy as they relate to data analysis. Let us start with, “What is the importance of
data quality as it relates to data analysis?” Data quality is of the utmost importance in terms of data
and analytics, but the reason behind this is because as soon as what you're presenting does not
align with what someone expects, that's the first thing that they tend to go after. Where did you get
the data? What's happened to the data? How's it been transformed? Because people like to think
that they know and understand their, their business. And when you start to challenge that if you don't
have the ground to stand on of the data that it's quality that it's clean and then it is from a trusted
source, that's when you start to get into a lot of discussions. A lot of debate. And ultimately, the plot
of what you're trying to present gets lost. The backbone of any successful data analysis project is
good quality data. There is a common term in computer science called garbage in garbage out,
which is essentially if you read in bad quality data, you can expect to get bad quality results.
So, there's really nothing more important when doing a data analysis than making sure that you're
working with good quality data, and it's really important to sense-check the data yourself and really
feel comfortable that the data you're using is of a really high quality. Data accuracy is above all:
garbage in garbage out. It's a waste of time to analyze data of poor quality, and it might mislead the
business direction. The integrity of the data that you're using or providing for someone else to use is
of the utmost importance.Data is used determine, when or where to launch a product, if a division is
profitable or not and it's easy to get things confused if you're not paying attention to the
details. Using inventory as an example, if you're looking at inventory at a SKU level and you
accidentally pick the wrong SKU to analyze and then you draw these conclusions that this
particular item isn't profitable when in fact it is. So, that's a major, major decision for a company to
make obviously, so the expectation is that there will be lots of due diligence, but in the beginning if
you start off with that data and then you build on that only to later realized that it wasn't a good idea,
you've lost time, energy, effort, and in some cases, trust. Thank you for those viewpoints. What
about the importance of data privacy as it relates to data analysis? Data privacy is incredibly
important, especially when you're working in industries like pharmaceuticals or healthcare, but that's
not where it stops. We have to have the ability to make sure that the users are getting the
appropriate level of data based on their roles and their permissions. Now we can do this through a
number of cuts of the data specific to each geography or each function, or in some tools such as
Cognos Analytics, we can start to build out that as part of our model.
Within there you can say who has access to what, whether it's at a granular level of
this person can see data in Canada or the US or whether it's simply this person can
see this report in its entirely or not. There's lots of different ways to handle this, but data privacy is of
the utmost important across all industries. In today's world, data privacy is a huge thing on the tax
side, especially of our business we have what we have what we call PII: personal, identifiable
information. We have to protect that and so we can't just send things through email.
We don't send tax returns or even actually in our business. We don't send things through email.
They have sensitive PII data in it. We encrypt it. We make sure the email is encrypted or we use
software. Some certain softwares that will allow us to not show the social security numbers or
the names or the date of birth and what will happen is it has a certain sequence, and we share
that with the client by calling them.
We don't put that in an email and we certainly don't put that in the same email with the
encrypted information because we want to make sure that you are always safe.
So, we have to make sure we're protecting it. At all costs.

Week 4

Now that we’ve learned how to use the VLOOKUP

and HLOOKUP functions, in this video we’ll look at how to create and use Pivot Tables

in Excel. We’ll first look at how to format our data

as a table, then how to create Pivot Tables and use fields in a Pivot Table to analyze

data, and lastly we’ll see how to perform calculations in a Pivot Table. Having a worksheet full of informational data

is all very well, but to really get some use out of it we need to analyze it from different

perspectives to find answers to questions related to the data. Now, we’ve already used features such as

filters and formulas to draw mathematical and logical conclusions about our data

but not all questions can be answered easily

using filters and formulas alone. In order to obtain usable and presentable

insights into your data you need something else… and that something else is Pivot Tables. Pivot Tables provide a
simple and quick way,

in spreadsheets, to summarize and analyze data, to observe trends and patterns in your

data and to make comparisons of your data. A Pivot Table is dynamic, so as you change

and add data to the original dataset on which the Pivot Table is based, so the analysis

and summary information changes too. A Data Analyst can use Pivot Tables to draw

useful and relevant conclusions about, and create insights into, an organization’s

data in order to present those insights to interested parties within the company. Before you start to create a Pivot
Table in

Excel, it can be very helpful to first format your data as a table. The reason for this is not only to make it

more organized and defined and to add table styles to your data, but primarily it makes

it a lot easier when adding records to the dataset. In the car sales worksheet, let’s first

select any cell within the data, and then on the Home tab, in the Styles group, choose

‘Format as Table’. Then choose a style from the gallery… note that Excel automatically knows the boundaries

of our data range, but we can change this if we need to. And ensure you select ‘My table has headers’,

if indeed it does. After you click OK and the data has been formatted

as a table, note the filter drop-downs at the top of each column – these are automatically
added when you format as a table. If we now scroll down to the bottom of the

table… and start adding another row of data for another

vehicle… when you click Tab or Enter, note that it

is automatically formatted and included as part of our table. OK, now let’s see how to create a basic

Pivot Table, and how to use fields to arrange data in a Pivot Table. Just before we do that, there are a few things

you should use as a checklist to ensure your data is in a fit state to make a Pivot Table

from, and these are: Format your data as a table for best results Ensure column headings are correct, and there

is only one header row, as these column headings become the field names in a Pivot Table Remove any blank rows
and columns, and try

to eliminate blank cells also Ensure value fields are formatted as numbers,

and not text Ensure date fields are formatted as dates,

and not text In the worksheet, we can just select any cell

in the table. Then, on the Insert tab, we click PivotTable. Note that in the ‘Select a table or range’

box, the table name – Table1 – is already entered for us. If we hadn’t just formatted this data as

a table, we would specify the cell range here instead. Under that, we need to decide whether we want

to create the Pivot Table on a separate new blank worksheet, or on this worksheet – a

new worksheet is the default – and is the most commonly used option. So, a new blank worksheet opens,
displaying

some basic Pivot Table instructions in the graphic on the left of the worksheet, and

a ‘PivotTable Fields’ pane on the right. You can rename the worksheet for the Pivot

Table if you wish. To build the Pivot Table report we need to

add some fields from the top of the PivotTable Fields pane, to one or more of the sections

in the bottom part of the pane. For example, if we want to find out the total

sales for each model of car, let’s drag the Manufacturer field to the Rows section

of the report, … and then we’ll drag the Model field there

too. But this isn’t really the way we want it

to look, so we’ll drag the Manufacturer field to appear at the top of the Rows section

above the Model, which makes more sense with our data. Next, we’ll add the Price field to the Columns

section, … … but again that really isn’t the way

we want to view the data, so we’ll drag Price to the Values section instead, which

makes a lot more sense and looks a lot better. Next, we’ll add the Unit Sales field to
Values too, so now we can see both the individual price for each model and the number of unit

sales of each model. Let’s add the Vehicle-type field to Columns,

but that doesn’t seem very useful, so let’s remove that field, … , which we can do in two ways. Either by using the
drop-down menu, … ( or, if we undo that, … we can also do it by simply dragging the field

out of the Columns section, either to the left over the worksheet, or to the top over

the fields list above. Let’s now look at how to perform a simple

calculation in a Pivot Table. If we look in the ‘Sum of Price’ column

in our Pivot Table, we can see that the figures are formatted as General. So first, let’s change the format for these

figures to US currency. This can be done by modifying the value field

settings for the field in the relevant section of the PivotTable Fields pane. We’ll format the field as US dollars and

show no decimal places. Next, we’ll add a calculated field from

the ‘PivotTable Analyze’ tab, using the ‘Fields, Items & Sets’ button. We want this field to calculate the total

sales for each model by multiplying the price by the number of unit sales. When we create and add this formula, it
gets

added to the PivotTable Fields pane, as a field called Total Model Sales. And we can change the format to make it
US

dollars again. A new column called ‘Sum of Total Model

Sales’ has now appeared in the Pivot Table in our worksheet. In row 5 we can see that there have been over

360 million dollars of sales of the Acura Integra model, … and in row 7 we can see that there has been

over a billion dollars in sales of the Acura TL model. In this video, we learned how to format data

as a table, how to create a Pivot Table and use fields to analyze data in a Pivot Table,

and how to perform calculations using Pivot Table data. In the next video, we’ll look at some other

features of Pivot Tables.


Viewpoints: Pivot Tables
In this video we will listen to several data

professionals discuss their experience using pivot tables to analyze data. What are your experiences using pivot
tables

to analyze data? My experience using pivot tables in Excel

is extensive. I can use them all the time. The thing to keep in mind is that you can

sum, average, and count easily. You can set it to group-by so people can choose

what the parameters are at the top. It's great if you've got a couple of thousand

records all the way up to whatever Excel can handle. So, a pivot table is just a real simple way

of manipulation without having to do any actual querying or development language. I once had a huge ecommerce
sales data. I need to analyze the KPI's including gross

merchandise volume and take rate. However, I can only generate limited insights

if I stay at high level With pivot tables I was able to group the

data in terms of countries, type of stores, type of products, which enabled me to view

the data and analyze the key KPI's at different levels of granularity. I use pivot tables and we use pivot tables

in our firm, especially during audits to assist us and help us to kind of drill down on the

data because what a pivot table does is, it helps you to take a large set of data and

whittle it down to something that's meaningful. So, in the case of audits, a client might

have, you know, $500,000 worth of maintenance and repair bills that are made up of three-hundred

invoices. But we don't want to see every invoice for

every dollar we want to see the high dollar invoices, so we're going to use that pivot

table to narrow it down to the invoices that actually are going to have the highest level

of impact on the financial statement. Much like Excel, pivot tables are a great

way to understand your data quickly and effectively. Being able to just open up an Excel sheet,

put it into a pivot table, drag and drop things in to get a sense of what the numbers look

like, what the values are, really can help you get a good sense of the data in order

to then start to build out something a little bit more robust. Being able to understand the fields, what

they mean, what they look like. These are all things that can help you at

the start of a project, as you're looking to do your analysis. Pivot tables are incredibly useful to get

a quick view of your data and to look at multiple levels of your data in a very quick and clean
way. It's just very, very easy to create a pivot

table on a set of raw data, aggregate it by some level of interest, be it country, be

it you know country the user is from, be it the year the user joined, or anything else,

be it something related to time. It's really good for quickly seeing and understanding

some of the more high-level summaries that are hidden within your data.

Pivot Table Features

You might also like