Excel Basic For Data Analysis
Excel Basic For Data Analysis
Do you want to learn how to use spreadsheets and start analyzing data using Excel?
This course from IBM is designed to help you work with Excel and gives you a good grounding
in the cleaning and analyzing of data which are important parts of the skill set required
to become a data analyst.
You will not only learn data analysis techniques using spreadsheets, but also practice using
multiple hands-on labs throughout the course.
In module one you will learn about the basics of spreadsheets, including spreadsheet terminology,
the interface and navigating around worksheets and workbooks.
In module two you will learn about selecting data, entering an editing data, copying and
auto filling data, formatting data, and using functions and formulas.
In module three you will learn about cleaning and wrangling data using a spreadsheet, including
the fundamentals of data quality and data privacy, removing duplicated and inaccurate
data, removing empty rows, removing data inconsistencies and white spaces, and using the flash fill
and text to columns features.
In module four you will learn about analyzing data using spreadsheets, including filtering
data, sorting data, using common data analysis functions, creating and using pivot tables,
and creating and using slicers and timelines.
At the end of this course in module five, you will complete a series of hands-on labs which
will guide you on how to create your first deliverable as a data analyst.
This will involve you understanding what the business scenario is, cleaning and preparing
your data, and analyzing your data.
You will follow two different business scenarios throughout the course, with each using their
own data set.
These different scenarios and data sets will be used in the lesson videos and in the hands-on
labs.
After completing this course, you will be able to understand how spreadsheets can be
used as a data analysis tool; understand when to use spreadsheets as a data analysis tool
and their limitations; create a spreadsheet and explain its basic functionality; perform
data wrangling and data cleaning tasks using Excel; analyze data using filter sort and
pivot table features within Excel spreadsheets.
You will also perform some intermediate level data wrangling and data analysis tasks to
address a business scenario.
The course team and other peers are available to help in the course discussion forums in
case you require any assistance.
Let's get started with your next video where you will get an introduction to spreadsheets.
Introduction to Spreadsheets
In this first video of the course, we will list some of the common spreadsheet applications available,
learn about the key capabilities of spreadsheets, and discuss why spreadsheets might be a useful
tool for a Data Analyst. There are several spreadsheet applications available in the marketplace;
some of them are more widely known and used than others, and some are free, while others need to
be paid for. By far the most commonly used spreadsheet application, and the most fully featured of
them all is Microsoft Excel. The desktop version comes in a paid form as part of the Office suite and
some Microsoft 365 subscriptions, but there is also a web-based cut-down version called Excel for
the web, also known as Excel Online. The online version is free to users with a Microsoft account,
but does not offer all the advanced features that the desktop version provides. The next most
popular is Google Sheets, which offers a lot, though not all of the features that Excel provides, and is
free with a Google account. This is a web-based application and it integrates nicely with other
Google apps, such as Google Forms, Google Analytics, and Google Data Studio. Then there is
LibreOffice Calc, a totally free and open source desktop spreadsheet application that offers more
basic functionality than Excel or Google Sheets, but still has a lot of the tools you need for data
analysis, such as charts, conditional formatting, and pivot tables. Other spreadsheet apps include
Zoho Sheet (a fully-featured web-based application that is comparable with Google Sheets),
OpenOffice Calc, Quip from Salesforce, Smartsheet (which is predominantly for project
management), and Apple Numbers, (which is included with Apple devices such as Mac computers
and is also available on the App Store for other Apple devices). So, there are many spreadsheet
application options open to you, from fully-featured to basic, from cloud-based to desktop apps, from
paid-for to free versions. It’s up to you to decide which one best fits your needs and your budget.
Spreadsheets provide several advantages over manual calculation methods. For example, once you
have your formulas correctly written, you can be assured that your calculations are accurate, and
that the calculations will be performed automatically for you. Spreadsheets also help keep your data
organized and easily accessible. Your data can be easily formatted, filtered, and sorted to suit your
needs. If you do make mistakes in your data entry or your calculations you can easily edit them,
undo them, or use error-checking tools to help remedy those mistakes. And lastly, you can analyze
data in spreadsheets, and create charts, graphs, and reports to help visualize your data analysis.
Since spreadsheet software for personal computers first appeared on the market in the 1970s, with
VisiCalc on the Apple II PC, spreadsheets have come a long way in terms of the capabilities and
features they now offer businesses, from uncomplicated tables and relatively simple computations to
powerful tools for the analysis, management, and visualization of enormous sets of data. The most
common business uses for spreadsheet applications include the following: Data Entry and Storage,
Comparing Large Datasets, Modelling and Planning, Charting, Identifying Trends, Flowcharts for
Business Processes, Tracking Business Sales, Financial Forecasting, Statistical Analysis, Profit and
Loss Accounting, Budgeting, Forensic Auditing, Payroll and Tax Reporting, Invoicing, and
Scheduling. And away from the business side of things, other typical uses include Personal
Expenses, Household Budgeting, Recipe library, Fitness Tracking, Calorie Counting & Weight
Monitoring, Sports Leagues such as Fantasy Football, Cataloging Music Libraries, and even Contact
Lists, Shopping Lists and Christmas Card Lists. As a Data Analyst, you can use spreadsheets as a
tool for your data analysis tasks, including: Collecting and harvesting data from one or
more distributed and different sources. Cleaning data to remove duplicates, inaccuracies, errors, and
resolve missing values to improve the quality of the data. Analyzing data by filtering, sorting, and
interpreting it to determine what useful information can be gleaned from it. And visualizing data, to
help you tell a story about your data analysis findings to key business stakeholders and any other
interested parties within your organization. In this video, we had an introduction to spreadsheets. We
learned about some common spreadsheet applications, what the main capabilities of spreadsheets
are, and why spreadsheets might be a useful tool for a Data Analyst. In the next video, we will look
at the basics of spreadsheets, including common spreadsheet terminology.
Spreadsheet Basic 1
the main elements that make up a worksheet, let’s see how to move around a spreadsheet,
get familiar with the ribbon and menus, and learn how to select data in a worksheet. To open a sample
file, we click File. This opens Backstage View. Here you can create a new workbook, or open,
save or print a workbook. You can also access Excel Options. Now, we want to open our sample file. So,
we click Open, and either select it from
my Recent list, or click Browse to find the data file we want. The first thing we should do is get
acquainted
with the ribbon and menus. Notice that on the ribbon at the top we have
from other Office products, such as the Home, Insert, and View tabs, while others might
be new to you, such as Formulas, Data, and Power Pivot. To make a little more workspace for ourselves
we can hide this ribbon by double-clicking any tab, and to unhide it, we do the same. The other option is
to use the shortcut key
to make them easier to find. So, on the Home tab we have groups for Font,
Alignment, Number, Styles, and so on. Some of these groups contain all the available
buttons on the ribbon when viewing in full screen, such as Styles and Cells, but other
ribbon groups have more options, which we access by clicking the little arrow icon in
the bottom right corner of the group, as can be seen here on the Number group for example. The next
item I want to point out is the Quick
Access Toolbar at the top of the screen above the ribbon. As the name suggests this is where you can
quickly access the tools you use most often. You can see we already have some tools in
this toolbar such as Save, Undo, Redo, New, and Open. But we can add other tools to the toolbar
toolbar and then select a tool we will use a lot, such as Sort Ascending, that will be
added, and we will also add the Sort Descending button too. Now we need to be comfortable with
moving
around a worksheet. You can simply use the arrow keys to move
left, right, up, and down 1 cell at a time. But you can also use Page Down and Page Up
to move around a bit faster, which is especially useful if you have lots of rows of data. And to move even
quicker up or down a large
datasheet use the vertical scroll bar, and to move left or right use the horizontal scroll
a large data set. There are also some useful shortcuts you can
the start of the worksheet (i.e. cell A1). CTRL+End takes you to the cell at the end
of your data in the worksheet. CTRL+Down arrow takes you to the end of the
column you’re in, while CTRL+Up arrow takes you back to the top of that column. So a quick way to find
out how many rows of
data you have in your worksheet is to go to the first cell in your data and press CTRL+Down
arrow to see the last row of data. So here you can see we have 160 rows. Now how do we go back to the
top again? CTRL+Home will do it. So far, we have seen how to navigate around
our worksheet and its data, now we need to look at how we select data. This is very important because
you often need
to select data to move it, copy it, or select it in a formula. The simplest selection is a single cell, usually
done with a mouse or maybe a directional arrow key. The next step up is to select multiple cells
together, and this can be done either with a mouse by dragging from one cell to additional
adjoining cells, or you can use the SHIFT key with directional arrow keys. Next up is selecting a single
column or row
which is done simply by selecting the letter at the top of a column, or the number on the
columns and rows, by clicking the mouse button, holding it down and dragging across more columns. Or
if you are not comfortable with dragging
you can also select the column first, then hold SHIFT+Arrow keys to select multiple columns. The same
applies to rows too. However, if you have data in non-contiguous
rows or columns (i.e. not next to each other) you can select the first column, then use
the CTRL key to select another unconnected column, such as columns C and F here. The largest thing you
might want to select
is the whole worksheet which you can do by clicking in the top left corner of the cells. However, this
selects the entire worksheet
including all the empty rows and columns; so if you only want the data in your worksheet,
you can use the shortcut CTRL+A. A word of warning when selecting data in cells, rows,
and columns; there are 3 types of cross symbols that you might see when working with selected
you see when you select a cell as can be seen here in cell A4, this is the Select cross
that we have been using already in this video to select cells. The second type you might see is when you
hover over the bottom edge of a cell and see a thin black cross-type symbol with arrows
on each point…. this is the Move symbol and would move the cell data to another location. The last type
is the small thin black cross
that is seen when you hover over the bottom right corner of a cell; this is the Fill Handle
or Copy symbol and it fills (or copies) the cell data to another location. In this video, we learned how to
move around
a spreadsheet, became familiar with the ribbon and menus, and learned how to select data in
enter data, how to copy and paste data, and how to format data in a spreadsheet.
Viewpoints: Using Spreadsheets as a Data
Analysis Tool
In this video, we will listen to several data professionals discuss the advantages and limitations of using
spreadsheets as a tool for data analysis. Let us start with, “What are the benefits and advantages of
using spreadsheets as a tool for data analysis?” My experience using spreadsheets as a tool for data
analysis is somewhat mixed. I think they can be really, really useful in the right context, but using
spreadsheets definitely has its limitations, so the big pro of using spreadsheets is you can see all the data
cleanly laid out in front of you in a table. So, I think it's very clear to anyone looking at a spreadsheet
exactly what the data is, what format it comes in, all of that. You can just easily, visually inspect it. As a
CPA, I use Microsoft Excel on a daily basis and I have done so for the duration of my career. The
functionalities, the pivot, the pivot tables, the charts, etc. But also, being able to use formulas. My
personal favorite is Index Match for using a pretty simple way to take just thousands of lines of
information and sift through all of that to find specifically what you're looking for. Excel is really that one-
stop-shop where you can perform calculations, analyze financial ratios, and even export reports out of
the ERP that I spoke of earlier to customize it as you need. My experiences using spreadsheets is that
they're great for simple analysis. I will say spreadsheets, over the years, the process itself has just
improved as systems improve, as technology improves, spreadsheets are the way to go. Spreadsheets
overall, when you do have probably anywhere from zero to twenty-thousand lines of data, it's a good
way to go, you can really pull out the data. Whether I'm trying to see how much a client’s
making per month, but they may have, you know, a thousand transactions. All of that's helpful. I can use
this spreadsheet to whittle down what is actually going on per month or if I want to do a Sum If, or you
know if this happens, give me this number. It it's really helpful to be able to dig in and wrap your hands
around it and take something that seems, on the surface, twenty-thousand lines seems almost
unmanageable, but if I take it and I massage it, put it in a spreadsheet and then sort it filter it, make it
pretty, put in a pivot table, I can get what I need. It’s just all about not looking at it as being this
intimidating thing but making it more manageable and breaking it down into bite size chunks.
Spreadsheets are the easiest way to analyze data and present data. We don't need any fancy tools or
additional software for spreadsheets. It's like the commonly utilized language to
on to look at the other side of the coin. What are the drawbacks and limitations of
using spreadsheets as a tool for data analysis? I think one of the big cons in terms of analyzing
data within spreadsheets is it's really hard to reproduce state. So, in other words, if you load in some
data
and you filter out some bad values, or you impute some missing values, there's no way
to tell your colleagues or your future self exactly the different steps you took to create
that data set. Or to modify that data set. It's almost a dilemma because of the plethora
of options available within Excel and all of the functions that are there, supposedly
to make your life easier, but it's nearly impossible to know everything. And you can find yourself in what
we accountants
call analysis paralysis when you're looking at something for too long or you're not well
versed in a particular Excel function. So, you may spend a lot more time, energy,
and effort trying to figure that one thing out. And had you done it a different way? Or maybe a manual
way? You probably could have gotten to the solution
that if you have complex formulas, v-lookups, if-statements at times they just stopped working
and you have to rebuild them. So, I have found that it's better to use Excel
just for simple analysis and for a download of information. I love a good spreadsheet. I love using Excel
and pivot tables to get
to the data, but I find that I if I start to get over ten, twenty- thousand lines of
data, it gets a little tricky. And sometimes the spreadsheets will crash. So that's when we might move to
Access and
some of the other tools that we use. Is very difficult to handle the extremely
Now that you have learned basic spreadsheet terminology and learned how to navigate your
way around worksheets and select data in Excel, it’s now time to start entering some data.
First, we will look at some of the handy viewing features provided in Excel, and then we’ll
enter some data, and then edit that data.
When you have a lot of data in your worksheet it can be useful to zoom in closer to a specific
area of the data.
The Zoom Slider at the bottom right corner of the worksheet allows you to do just that.
You can either click on the plus and minus buttons or drag the slider to select your
preferred zoom value.
You also have some zoom controls in the ribbon on the View tab.
Zoom lets you pick a predefined zoom level or a custom one, the 100% button zooms the
worksheet back to its original size, and Zoom to Selection enables you to select an area
of data and then zoom into that specific selection only.
If you want to see several areas of your data at the same time while zoomed in, you can
use the Split button.
This splits the screen into multiple sections; and you can scroll each section separately.
If you only want two sections, you can remove either the horizontal or the vertical split
by double-clicking on it.
If you have headings in your columns like a header row, then you might want those to
remain on screen while you move down the sheet.
To do that you need to use Freeze Panes.
You can freeze only the top row if you wish, or if that doesn’t suit, as is the case
here, then you can select the row (or even just a cell in the row) below the row or rows
you want to freeze, and then select Freeze Panes.
You can do a similar thing for columns you want to freeze too.
And you can even freeze both rows and columns at the same time.
The trick here is to first select the cell that is both one row below where you want
to freeze, and one column to the right of where you want to freeze.
In this case, that is cell C4.
Now we can scroll down the worksheet and across the worksheet and we can still see the header
row and the Manufacturer and Model columns.
Now, if you have multiple workbooks open (notice I said workbooks and not worksheets) then
you can switch between them by using View, Switch Windows, or the faster method is to
use the CTRL+F6 shortcut.
Now let’s enter some data into a blank worksheet.
The easiest way to open a new worksheet from within Excel is to click the New button in
the Quick Access Toolbar (or CTRL+N if you prefer keyboard shortcuts).
So let’s enter some headings across the top of the worksheet; this is typically referred
to as a ‘header row’.
Note, that if you press Enter after typing data into a cell the next active cell is the
one directly below, which is not what we want in this case.
But, if we press Tab after we enter data in a cell, it selects the next cell along in
the row as the active cell.
Now we’ll enter some headings and press Tab after each entry.
Notice that the text is slightly longer in some of the cells and it either gets partly
hidden by the next cell or overlaps it.
If you click and hold the divider line between two columns, you can drag it left and right
to resize it manually.
If you want to do that automatically, you can double-click the divider line between
two columns.
As these are going to be headings for our columns, let’s make them bold.
Now let’s add another column between the parts and accessories columns.
Simply select the right-hand of those two columns, then right-click and choose Insert
to put another column to the left of the selected column.
Let’s call it Servicing Sales.
To tidy up all our column widths simultaneously, we select all the columns from A to E, then
double-click any of the divider lines between columns; this automatically reduces or increases
each column’s width to fit the data in that column.
OK, now we have some headings, let’s enter some month data in column A.
So, if we type Jan in cell A2 and press Enter then it takes us to the cell below, which
is what we want in this case and we can type Feb in cell A3 and so on until
we get to Dec in A13.
Now, let’s suppose you need to change a couple of your headings.
You have several ways of editing existing data in a cell;
You can either select the cell and then just start over typing.
Or you can select the cell and press F2 on your keyboard to put the cursor at the end
of the cell and make your changes.
Or you can simply double-click somewhere on the cell to put the cursor at that position
in the cell and make your changes.
And you can even select the cell and then click in the formula bar to edit your cell
data.
Now let’s do the same for the parts and accessories column headings.
In this video, we learned about some of the viewing options in Excel, and we learned how
to enter and edit data in cells.
In the next video, we will learn how to copy and fill data, and how to format the cells
and data in a worksheet.
Now that we have learned about some of the handy viewing features provided in Excel, and entered and
edited some data, let’s
discuss how to move, copy, and fill data, and how to format cells and data to suit our needs. The first
thing we are going to discuss is
how to move data, so if you select a range of cells, in this case the headings in A1
to E1, and then hover over the top or bottom edge of a selected cell, and you will see
the Move pointer, then you can drag the selection to another place on the worksheet. Alternatively, if
you want to copy the data
instead, you do the same thing but this time you also hold CTRL key as you select and drag the selection
to another location and you will see the Copy pointer. If you are not comfortable with dragging,
you can also use Copy and Paste menu commands or keyboard shortcuts. So if you select some data in
column A and copy it to the clipboard. Then you simply select the new location and paste the copied
data. You can also move or copy between worksheets, so let’s create a new worksheet. Then select some
data from Sheet1, and this time let’s use the CTRL+C keyboard shortcut to copy it to the clipboard. Then
choose the other worksheet and use the
CTRL+V shortcut to paste the data. However, notice that the column widths are
not the same as the original source data, so let’s undo that and try another paste
option. By default, when you paste the copied data, it uses the column width settings of the destination
cells. So, to paste it and retain the column widths
of the source data, you chose the special option under the Paste command, called Keep Source Column
Widths. As an alternative to having to enter data
manually in a worksheet, you can use an Excel feature that automatically fill cells with
data when it follows a sequential series or pattern. The feature is called AutoFill, and it can
be especially useful when you need to enter lots of repetitive data into Excel, such as
date information. For example, if you enter a month in a cell,
even using a shortened version of the name, you can use what’s called the Fill Handle
to select down to the end of the series, and AutoFill will work out what the series is,
based on the selected data. Let’s try the same thing with days of the
fill handle to use AutoFill, it will determine that you want to enter the days of the week sequentially.
However, if you also enter Wed (for Wednesday) in the next cell down, and select both cells in the series,
i.e. A16 and A17, and then drag the fill handle
down, AutoFill determines that the sequence has changed to every other day, and fills
in the data series for you. It’s important to select all cells that
define the pattern when using AutoFill so that it can best determine what the pattern is, in this case cells
A16 and A17. A similar thing applies to numerical patterns;
if you enter 5 in a cell, and then use the fill handle to fill the data down the column. Because the data is
not the name of a day
or month for example, AutoFill can’t determine what the pattern is yet. So, In this case, it just copies the
value
5 into every selected cell. However, if you enter the value 10 in B3,
and then use the fill handle to fill the data down the column, AutoFlll determines that
the pattern is incrementing by 5 each time and it fills in the remainder of the data
pattern for you. We are now going to look at formatting our data, and there are essentially two distinct
parts to this. First, there’s formatting of the cells themselves (with a fill color and a bold border for
example and bold text within it). And then there’s formatting the data in
the cells (for example, making it text format, number format, or a specific currency or accounting
format). Let’s open the car sales worksheet we used previously. Then select the headings in cells A3 to
P3
either using the mouse, or you could use the shortcut keys CTRL+SHIFT+Right Arrow. On the Home tab,
click the Styles drop-down arrow, and select a style color for your cells. Then you can make the selected
cells bold. Then you select the data in the Manufacturer column either using the mouse, or the shortcut
keys CTRL+SHIFT+Down Arrow. In the Styles drop-down arrow, select another style color for the selected
cells. Again you can make the cells bold. Then you select the data in the Model column again either using
the mouse, or the shortcut keys CTRL+SHIFT+Down Arrow. In the Styles drop-down arrow, select another
style color for the selected cells. This time you could make the selected cells italic. And you can also
change the font size and style. Lastly, you can select all the other cells
in the data by using the mouse or the CTRL+SHIFT+Right Arrow then Down Arrow, and apply borders to
the data cells. Now it’s time to format the cell data. The sales figures in columns C and D can be
formatted to display only two decimal places; just select the data and click the Decrease Decimal button.
We also have an issue with a couple of the car models. If you look in cells B129 and B130, where
the model name is supposed to be displayed, you can see there are actually two dates listed instead. And
if you look in the Number Format box,
the format type is Custom. This has happened because the model numbers are supposed to be the Saab
9-5 and the Saab 9-3 but when the files were imported from
CSV files these two cells must have been incorrectly determined to be date values and not just
numbers. You can fix this by formatting these two cells as Text, and then enter the correct values of 9-5
and 9-3. The last thing we shall do is format some
says it is Price in thousands of dollars, and cell F4 is using the General format. So, let’s change the format
of this column
select More Number Formats from the drop-down list, then we choose the Currency option,
and the correct currency symbol and format. And we’re done. In this video, we learned how to move,
copy,
and fill data, and how to format cells and cell data to suit our needs. In the next video, we will look at the
basics of formulas, learn how to perform simple calculations, and learn how to select ranges and copy
formulas.
Intro to Functions
Now that you have learned about the basics of formulas, learned how to perform some basic
calculations, and how to select ranges and copy formulas, next we will have an introduction
to functions, including using some common statistical functions.
And then we will learn about some more advanced functions that a Data Analyst might also use.
First, let’s look at some common functions used for statistical calculations.
So, we’ll add some row headings for average, minimum, maximum, count, and median.
Then in cell B20, let’s work out the average of the car sales for the year, from the table
above.
On the Home tab, in the Editing group, we click the AutoSum drop-down list and choose
Average.
Now, because AutoSum tries to add up the values directly above it in the column, we need to
modify the cell range here to B2 to B13.
Then we can use the Fill Handle as we’ve seen before to copy the formula across to
column E.
For the minimum calculation in B21, we select Min from the AutoSum list.
And again, we need to modify the cell range.
So this calculates the lowest value in our range.
And fill across to column E. And for the maximum calculation, we select
Max from the list.
And then modify the range.
And once again, copy the formula across.
This calculates the highest value in our range.
In B23 we will calculate the Count, which basically just means the number of values
that exist in the selected range.
So, we select Count Numbers from the list.
Then modify the range.
For the median calculation, we can select ‘More Functions’ from the AutoSum list,then
select ‘Statistical’ as the category, and scroll down to find the MEDIAN function.
The ‘median’ returns the exact middle of a range of selected values.
Note that if you’re selecting an odd number of values it will return the figure that is
the middle value in your selected range, but if you have selected an even number of values
in your range, it will return the middle figure between the two middle values in your range.
Once again, we need to change the cell range to B2 to B13.
And we can then copy this formula across to column E.
You’ve seen AutoSum and some of the common statistical functions in Excel, but there
are another 400-plus other functions available, so let’s explore just a few of those now.
On the Formulas tab, in the Function Library group, there are drop-down lists for several
function categories.
The first is a list of ‘Recently Used’ functions, which updates automatically as
you use them.
Then you have functions related to ‘Financial’ calculations.
If you hover over the name of a function, you see a short description for each one;
so here we have the accrued interest function, and here is the interest rate function.
The ‘Logical’ list has BOOLEAN operator functions such as AND, IF, and OR.
There are several functions related to Text, such as CONCAT, which is an updated version
of a previous function called CONCATENATE (which is still supported by the way for backwards
compatibility), FIND, and SEARCH.
There are also several functions related to dates and times, such as NETWORKDAYS, WEEKDAY,
and WEEKNUM.
In the ‘Lookup & Reference’ list there are functions such as AREAS, HLOOKUP, SORTBY,
and VLOOKUP.
In the ‘Math & Trig’ list you’ll find lots of useful mathematical functions, such
as POWER, SUMIF, and SUMPRODUCT, alongside many functions for trigonometric purposes,
such as cosine, sine and tangent.
There is also a ‘More Functions’ list which provides several more function categories,
such as Statistical, Engineering, and Information.
In the ‘Statistical’ list you’ll find functions such as Average, Count, Max, Median,
and Min; we saw some of these used earlier in this video.
If you’re struggling to find the function you want in these lists, you can also search
for a function; just click the ‘Insert Function’ button on the Formulas tab, and then either
browse the category lists available, or choose ‘All’ and look down the alphabetical list
for the function you want.
Alternatively, type the name of a function you want to find, and click ‘Go’ to search
for it, then select the one you want from the returned search.
In this video, we learned about the basics of functions, how to use some of the more
common functions that a Data Analyst might employ, and looked at some of the more advanced
functions available in Excel.
In the next video, we will look at referencing data in formulas; specifically differentiating
between relative and absolute references, and error handling in formulas.
Referencing Data in Formulas
Now that you've had an
position in relation to the cell that the formula is in. That is why when we have been
relative positions of the cells that are being copied to. So now we know that relative
that the cell references don't change when we copy them? For
same. When you copy a formula containing such references. Lastly, there may also be some
equal sign a dollar sign one plus A3 where a dollar one. Has a relative column and an
absolute row or dollar 8. Three has an absolute column. Ando relative RO. In contrast to
will stay the same in the copied formula. First, let's look at an
the fill handle, we can see that the result changes, and if we
look at the copied formula. You can see that the blue and
four in the copied formula. That is, each cell reference has
moved one cell down and if we copy and paste the formula to
see seven, you can see that the results also changes and again
we can see that the blue and red cell references in the copied
sign a dollar 3IN cell E4. Note the blue and red highlighted
cells in a one and a three. These denote the cells being. Absolutely referenced in the
we look at the copied formula you can see that the blue and
copied formula. That is, the cell references haven't changed. Similarly, if we then copy and
paste the formula to E7, you can again see that the result stays
the same this time and we can see that the blue and red cell
dollar one plus dollar 8, three in cell G4. Note the blue and
cell below using the fill handle, you can see that the
can see that the first blue cell reference has stayed the same. But the second red cell
G7, you can see that the same thing happens. The result
changes and again we can see that the first blue cell
reference has stayed the same in the copied formula, while only
the red cell reference has changed. Now we'll have a quick
isn't wide enough to display the whole word or value. Or it contains a negative date
colon, then space then control plus shift plus semi colon, it
enters today's date and the current time. But the cell is
symbols. If we adjust the column width we can now see the cell
as an error. However if we enter the formula seen in Cell I7. When we press enter, we see a
asterisk. Note the small green triangle in the top left corner
hint about what caused the error. In this case it says the
you see several options. The first line also gives you a clue
on the nature of the error. This one says invalid name error, so
it was probably a mistyped cell reference value or function name. If you click help on this
underlined. And you can try to evaluate the error if you are
you make which generate one of the error codes listed at the
errors in Excel.
position in relation to the cell that the formula is in. That is why when we have been
relative positions of the cells that are being copied to. So now we know that relative
that the cell references don't change when we copy them? For
same. When you copy a formula containing such references. Lastly, there may also be some
equal sign a dollar sign one plus A3 where a dollar one. Has a relative column and an
absolute row or dollar 8. Three has an absolute column. Ando relative RO. In contrast to
will stay the same in the copied formula. First, let's look at an
the fill handle, we can see that the result changes, and if we
look at the copied formula. You can see that the blue and
four in the copied formula. That is, each cell reference has
moved one cell down and if we copy and paste the formula to
see seven, you can see that the results also changes and again
we can see that the blue and red cell references in the copied
sign a dollar 3IN cell E4. Note the blue and red highlighted
cells in a one and a three. These denote the cells being. Absolutely referenced in the
you can see that the result stays the same this time and if
we look at the copied formula you can see that the blue and
copied formula. That is, the cell references haven't changed. Similarly, if we then copy and
paste the formula to E7, you can again see that the result stays
the same this time and we can see that the blue and red cell
dollar one plus dollar 8, three in cell G4. Note the blue and
cell below using the fill handle, you can see that the
result changes, but it's a different result from the
can see that the first blue cell reference has stayed the same. But the second red cell
G7, you can see that the same thing happens. The result
changes and again we can see that the first blue cell
reference has stayed the same in the copied formula, while only
the red cell reference has changed. Now we'll have a quick
isn't wide enough to display the whole word or value. Or it contains a negative date
colon, then space then control plus shift plus semi colon, it
enters today's date and the current time. But the cell is
symbols. If we adjust the column width we can now see the cell
as an error. However if we enter the formula seen in Cell I7. When we press enter, we see a
asterisk. Note the small green triangle in the top left corner
you see several options. The first line also gives you a clue
on the nature of the error. This one says invalid name error, so
it was probably a mistyped cell reference value or function name. If you click help on this
underlined. And you can try to evaluate the error if you are
that you can try and correct the formula error. If you click error checking
you make which generate one of the error codes listed at the
errors in Excel.
WEEK 3 - Introduction to Data Quality
Data analysis can play a pivotal role in business decisions and processes. In order to use the data to make confident
decisions, we must have the right information for the project and the data must be free from errors. In this video
we will learn how to profile data to discover inconsistencies. Whether we are working with small sets of data or
analyzing a spreadsheet with thousands of rows, one of the most difficult parts of the data analysis is finding and
keeping clean data.
To help with this process and qualify the data, look for these five traits: Accuracy, Completeness, Reliability,
Relevance and Timeliness. Accuracy is the first and most significant aspect to data quality. A data analyst must
clean the data set by removing duplicates, correcting formatting errors, and removing blank rows. Another
important aspect of data quality is determining if the information required to complete the data set is readily
available. Why does this matter as a trait for quality data? Let’s say we are given the task to calculate the revenues
of all sales per region. After collecting the data, we discover that no regions were specified. This data would then
be considered incomplete and other sources would have to be considered to obtain the data required.
Reliability is another vital factor in determining the quality of the data. For instance, let’s say we are given the task
to determine the agent revenue by customer. When gathering the data, we find the agents keep their own records
and do not always update the information in the shared company database. With those factors in mind, we would
then determine that the data in the shared company database was unreliable and new processes would need to be
established to ensure reliable data.
Relevance is another trait of quality data. When collecting information, a data analyst must consider if the data
being assembled is really necessary for the project. For example, when reviewing the data related to the sales
revenue per customer, information such as customer birthdays and other personal information is also included. By
making the determination early to exclude the personal information from the data set, the analyst would save
themselves from having to review unnecessary information.
The last factor in determining the quality of the data is timeliness. This trait refers to the availability and
accessibility of the selected data. Let’s say our sales report is going to be used for weekly employee reviews, but
our report is only refreshed once a month. This error in refreshing the data would cause our report to become
outdated, and would have serious consequences for employee reviews. In this video we learned the important role
of a data analyst in qualifying data. By considering the five traits of good quality data, an analyst can save time,
avoid serious issues, and have data that is free from errors. In the next video we will take the collected data and
learn how to import it to our spreadsheet.
Importing File Data
Now that you have learned about the importance of data quality, in this video you will learn
how to import data from a text file using the Text Import Wizard, learn how to adjust
column widths, and learn how to add and remove columns and rows. As you know, by default Excel
works with .xlsx or .xls files and opens them as workbooks. But Excel can also use data that is in
other formats, such as plain text, or data that has been comma-separated and tab-separated.
Sometimes, these source files will be saved with a .txt extension and referred to as ‘text’
files, but others might be saved with a .CSV file extension, and are typically referred
to as CSV files. Here in Notepad, I have opened a text file that contains data about car sales, and it
uses comma separated values (or CSVs) to separate each bit of data in a record. Notice that the top
line holds headings, such as Manufacturer, Model, Engine_size, and so on, and each one is
separated by a comma. We want these to become our headers when we import the file into Excel.
The line below these headings is the first line of real data, and again you can see that each piece of
data is also separated by a comma. There are 16 headings and there are also 16 pieces of data on
each of the lines below the headings. If we scroll to the bottom, we can see that last data record is
for the Volvo S80. Now, to open the file in Excel, we choose File, Open, and then either select the
file from the recently used list, or click Browse to find the file we want to import. When we open the
file, the Text Import Wizard launches automatically, and it will start to try and determine what your
file is. Note that it has been detected as being a delimited file; that is, one that has its data fields
separated by a character such as a comma or a tab. As we want the headings to become headers in
Excel, we need to ensure that we select the option ‘My data has headers’. We can see a mini
preview of the data in the preview box below. Then we click Next to proceed in the wizard.
In step 2 of the wizard, we need to select our delimiter; that is, which character is separating our
pieces of data; so we select Comma, and deselect any others. Note the data preview now starts to
show us what the imported data will look like. You can scroll down and across this preview window
to ensure that the data is going to look as you want and expect. It all looks OK, so we’ll continue
with the wizard. In step 3 of the wizard, we can set the data format for each column. For example,
you might want to change a column to Text or Date format. In this case we can just accept the
default General format, and finish the import wizard. In Excel we can see that the headings in the
text file have been imported as a header row. But also notice that some of the columns are not
showing all the data; some of the headings are not showing in full and some of the data is not shown
either; all you can see are a number of hashes in the cells. This is because the column widths are
too narrow in some cases. If you remember, we can manually adjust a column’s width by dragging
the divider across. But to change them all in one go, we select all the columns first, then double-click
one of the selected column dividers. We can do a similar thing with rows by dragging to make them
bigger or smaller, or double-clicking a row divider to autosize it. There are some columns that we
have decided we don’t really need; namely Vehicle_type and Latest_Launch, so let’s remove those.
This can either be done using the Delete drop-down menu in the Cells group on the Home tab, and
select Delete Sheet Columns, or by selecting and right-clicking a column and deleting it that way.
To add another column, you simply select the column to right of where you want your new column to
be, then right-click the column and choose Insert. And let’s give the header a name, such as Year.
To delete a row you don’t need, select the row, right-click it, and choose Delete.
And to add a row, select the row below the place you want to add your new row, right-click
the row, and choose Insert. If you want to save the file as an Excel file, you can either choose File,
Save As, or you can click Save As in the yellow tooltip that appeared at the top of the worksheet
when we imported the file, and then you would choose ‘Excel Workbook (*.xlsx)’ in the ‘Save
as type’ box. In this video, we learned how to import data using the Text Import Wizard, we learned
how to adjust column widths, and we learned how to add and remove columns and rows.
In the next video, we will discuss the importance of data privacy, including sensitive information,
and personally identifiable data.
In retail, the PCI standards govern credit card data, and failure to safeguard cardholder information
can result in hefty fines. With a basic understanding of these policies, we are able to remain
compliant when handling any sensitive information. Unfortunately, breeches in customer data is an
all too common occurrence and understanding how to remain compliant is essential. Understanding
the data privacy regulations of the European Union, the United States, and other countries as well as
industries is key to keeping data safe. Companies must comply with these privacy regulations at all
times and also make sure policies are readily accessible to employees. For example, let’s say a data
analyst downloads a spreadsheet of sensitive information. In order to complete the report by Monday
morning, the analyst decided to take their work laptop home for the weekend. After driving home,
the analyst accidently left the laptop in their car. The next morning, they found their car had been
stolen along with the laptop. Because it is the responsibility of the company to keep customer data
safe, this was a breach of privacy when the data left company property.
This type of action could not only cost the company large amounts of money in fines and penalties,
but could also reduce consumer confidence causing a significant impact to revenue. While data
privacy applies to most data that is collected, there are some instances where these regulations do
not apply. In order for these laws and regulations not to apply, the particular collection of data must
be completely anonymous. To make data anonymous means to exclude all data which ties it back to
a particular individual. While this approach might not be practical in all circumstances, collecting
data with privacy in mind could remove privacy limitations and make data collections more
accessible. In this video we learned about the importance of data privacy and the challenges that a
data analyst can face when collecting and sorting through data. In the videos in the next lesson, we
will learn about different methods for cleaning data in a spreadsheet.
Week 4
and HLOOKUP functions, in this video we’ll look at how to create and use Pivot Tables
as a table, then how to create Pivot Tables and use fields in a Pivot Table to analyze
data, and lastly we’ll see how to perform calculations in a Pivot Table. Having a worksheet full of informational data
is all very well, but to really get some use out of it we need to analyze it from different
perspectives to find answers to questions related to the data. Now, we’ve already used features such as
filters and formulas to draw mathematical and logical conclusions about our data
using filters and formulas alone. In order to obtain usable and presentable
insights into your data you need something else… and that something else is Pivot Tables. Pivot Tables provide a
simple and quick way,
in spreadsheets, to summarize and analyze data, to observe trends and patterns in your
data and to make comparisons of your data. A Pivot Table is dynamic, so as you change
and add data to the original dataset on which the Pivot Table is based, so the analysis
and summary information changes too. A Data Analyst can use Pivot Tables to draw
useful and relevant conclusions about, and create insights into, an organization’s
data in order to present those insights to interested parties within the company. Before you start to create a Pivot
Table in
Excel, it can be very helpful to first format your data as a table. The reason for this is not only to make it
more organized and defined and to add table styles to your data, but primarily it makes
it a lot easier when adding records to the dataset. In the car sales worksheet, let’s first
select any cell within the data, and then on the Home tab, in the Styles group, choose
‘Format as Table’. Then choose a style from the gallery… note that Excel automatically knows the boundaries
of our data range, but we can change this if we need to. And ensure you select ‘My table has headers’,
if indeed it does. After you click OK and the data has been formatted
as a table, note the filter drop-downs at the top of each column – these are automatically
added when you format as a table. If we now scroll down to the bottom of the
is automatically formatted and included as part of our table. OK, now let’s see how to create a basic
Pivot Table, and how to use fields to arrange data in a Pivot Table. Just before we do that, there are a few things
you should use as a checklist to ensure your data is in a fit state to make a Pivot Table
from, and these are: Format your data as a table for best results Ensure column headings are correct, and there
is only one header row, as these column headings become the field names in a Pivot Table Remove any blank rows
and columns, and try
to eliminate blank cells also Ensure value fields are formatted as numbers,
and not text In the worksheet, we can just select any cell
in the table. Then, on the Insert tab, we click PivotTable. Note that in the ‘Select a table or range’
box, the table name – Table1 – is already entered for us. If we hadn’t just formatted this data as
a table, we would specify the cell range here instead. Under that, we need to decide whether we want
to create the Pivot Table on a separate new blank worksheet, or on this worksheet – a
new worksheet is the default – and is the most commonly used option. So, a new blank worksheet opens,
displaying
some basic Pivot Table instructions in the graphic on the left of the worksheet, and
a ‘PivotTable Fields’ pane on the right. You can rename the worksheet for the Pivot
add some fields from the top of the PivotTable Fields pane, to one or more of the sections
in the bottom part of the pane. For example, if we want to find out the total
sales for each model of car, let’s drag the Manufacturer field to the Rows section
of the report, … and then we’ll drag the Model field there
to look, so we’ll drag the Manufacturer field to appear at the top of the Rows section
above the Model, which makes more sense with our data. Next, we’ll add the Price field to the Columns
we want to view the data, so we’ll drag Price to the Values section instead, which
makes a lot more sense and looks a lot better. Next, we’ll add the Unit Sales field to
Values too, so now we can see both the individual price for each model and the number of unit
but that doesn’t seem very useful, so let’s remove that field, … , which we can do in two ways. Either by using the
drop-down menu, … ( or, if we undo that, … we can also do it by simply dragging the field
out of the Columns section, either to the left over the worksheet, or to the top over
the fields list above. Let’s now look at how to perform a simple
in our Pivot Table, we can see that the figures are formatted as General. So first, let’s change the format for these
settings for the field in the relevant section of the PivotTable Fields pane. We’ll format the field as US dollars and
the ‘PivotTable Analyze’ tab, using the ‘Fields, Items & Sets’ button. We want this field to calculate the total
sales for each model by multiplying the price by the number of unit sales. When we create and add this formula, it
gets
added to the PivotTable Fields pane, as a field called Total Model Sales. And we can change the format to make it
US
Sales’ has now appeared in the Pivot Table in our worksheet. In row 5 we can see that there have been over
360 million dollars of sales of the Acura Integra model, … and in row 7 we can see that there has been
over a billion dollars in sales of the Acura TL model. In this video, we learned how to format data
as a table, how to create a Pivot Table and use fields to analyze data in a Pivot Table,
and how to perform calculations using Pivot Table data. In the next video, we’ll look at some other
professionals discuss their experience using pivot tables to analyze data. What are your experiences using pivot
tables
is extensive. I can use them all the time. The thing to keep in mind is that you can
sum, average, and count easily. You can set it to group-by so people can choose
what the parameters are at the top. It's great if you've got a couple of thousand
records all the way up to whatever Excel can handle. So, a pivot table is just a real simple way
of manipulation without having to do any actual querying or development language. I once had a huge ecommerce
sales data. I need to analyze the KPI's including gross
merchandise volume and take rate. However, I can only generate limited insights
if I stay at high level With pivot tables I was able to group the
data in terms of countries, type of stores, type of products, which enabled me to view
the data and analyze the key KPI's at different levels of granularity. I use pivot tables and we use pivot tables
in our firm, especially during audits to assist us and help us to kind of drill down on the
data because what a pivot table does is, it helps you to take a large set of data and
whittle it down to something that's meaningful. So, in the case of audits, a client might
have, you know, $500,000 worth of maintenance and repair bills that are made up of three-hundred
every dollar we want to see the high dollar invoices, so we're going to use that pivot
table to narrow it down to the invoices that actually are going to have the highest level
of impact on the financial statement. Much like Excel, pivot tables are a great
way to understand your data quickly and effectively. Being able to just open up an Excel sheet,
put it into a pivot table, drag and drop things in to get a sense of what the numbers look
like, what the values are, really can help you get a good sense of the data in order
to then start to build out something a little bit more robust. Being able to understand the fields, what
they mean, what they look like. These are all things that can help you at
the start of a project, as you're looking to do your analysis. Pivot tables are incredibly useful to get
a quick view of your data and to look at multiple levels of your data in a very quick and clean
way. It's just very, very easy to create a pivot
it you know country the user is from, be it the year the user joined, or anything else,
be it something related to time. It's really good for quickly seeing and understanding
some of the more high-level summaries that are hidden within your data.