0% found this document useful (0 votes)

94 views

4 Data Wrangling With Excel

This document discusses data wrangling and introduces concepts using Excel and Python. It begins by explaining that Excel is useful for exploring modest datasets and learning concepts visually, but cannot handle large datasets and lacks advanced modeling tools. Python is presented as an alternative that can handle larger datasets and more advanced analysis. The document then provides an example student survey dataset in Excel format to illustrate data wrangling concepts like data dictionaries, protecting the original data, and reading the dataset into Python.

Uploaded by

akmam.haque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

4 Data Wrangling With Excel

Uploaded by

akmam.haque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

4 Data Wrangling with Excel and Python

Serious data analysis is usually done using specialized software. For several decades, the main
tools were SPSS, SAS and other commercial statistical software tools. In more recent years,
there has been wide acceptance of open source tools such as R or Python. For modest
problems and quick and dirty tasks, Excel is an excellent platform. It is highly visual making it
accessible to new and occasional users.

The focus of this text is introducing concepts and ways of thinking, and it is easier to illustrate
these in Excel. Most people have some familiarity with Excel and it is on everyone’s desktop.
Having strong Excel skills is an asset in every organization.

But Excel cannot handle large data sets without serious performance problems. It lacks
advanced tools for building models. For these reasons, many topics may be introduced with
Excel but then we show how to do the same tasks in Python.

Cleaning and transforming data to make it ready for analysis can be a tedious, but necessary
first step. Data preparation can take up as much as 80% of your time in doing a data mining
project! We need to ensure that the data is what we think it is and arrange it in a format that
will make analysis and modelling easier. Data scientists often call this data wrangling.
Wrangling usually means to argue or wrestle, but is also used to describe “taming” or
“controlling” cattle or other animals. Some also call it data munging.

4.1 What does a data file look like in Excel?

In these notes we will limit our discussion to what are known as flat files. A flat file is like a
spreadsheet. Each row represents a single response or observation and each column represents
an attribute or variable associated with the observation. Most data analysis is done on flat files.

To illustrate a flat file, we will use the responses of 811 Saint Mary’s University students to a
national student survey in March 2010. Saint Mary's University is a primarily undergraduate
university, located in Halifax, Nova Scotia, Canada. It offers program in Arts, Business and
Science, and had approximately 6,500 undergraduate students enrolled in winter 2010. The
data file is an Excel workbook with three sheets (the raw data, a list of questions, and a
summary of question responses).

Frequently, data files are saved as a csv file. These are simple text files in which each line is a
single record of observations. Each record includes the values for each variable, separated by a
comma. Hence the name Comma Separated Values or csv. csv files can by read by any data
analysis software, making them highly portable.

4 Data Wrangling with Excel and Python Page 1 of 27

Sometimes, a challenge with saving data in a csv file is that it only contains the data and no
explanatory information about the source and details about how to interpret the variables.
Excel workbooks can have multiple sheets so the data and supplementary information can be
bundled together in separate sheets.

Excel can easily convert these text files to a data file. Select the column with the text, click on
the Data tab and select Text to Columns. Select Delimited, then select Comma and Finish. For
convenience, the data files used in this text are already saved as Excel files.

Open Student Survey Data 2010.xlxs

Figure 4-1: Excel data file

4 Data Wrangling with Excel and Python Page 2 of 27

Figure 4-2: Variable list

Figure 4-3: Summary statistics for variables

4 Data Wrangling with Excel and Python Page 3 of 27

4.2 Data Dictionary
The headings in the data file are simply q1, q2, q3,…. This is confusing if you do not know what
these headings mean. For example, q5 reflects the student’s program of study. But if you look
at the values, you see numbers: 4 4 6 7 4 … What do these numbers mean? If we had only the
data file, then the data would be meaningless. With any data file, you should keep a data
dictionary. The dictionary should give you background on where the data came from (how it
was obtained), but the most important part of the dictionary is a list of each variable with an
explanation of what it is and what the values mean.

If our data file had been customer transactional information, it might be equally cryptic. The
firm’s database would store records in a number of fields (variables) and the field names are
often kept very short (often only 8 characters). Even if the field name is more detailed, there
may be many fields with similar names (e.g., customer address, billing address, shipping
address, alternate shipping address, ….) and the field name may not completely describe what
the field contains. The database documentation should have a dictionary. Unfortunately, the
meanings of some variables may change over time. The original data definition may be
grounded in the context in which it was created and over time, institutional memory is lost so
the definition may be incorrectly interpreted. It is very easy to extract data from the wrong
field.

If we go to the second sheet, we find that q5 means In what program are you currently
enrolled? Unfortunately, the sheet does not tell us what program corresponds to a response of
4, or 6 or any other value. Thankfully, the third sheet summarizes each variable. For q5 we have

Table 4-1: Interpretation of answers to Question 5

4 Data Wrangling with Excel and Python Page 4 of 27

I added the numeric values to the left. Ideally, the data dictionary should include both a
description of the variable and information about what the values mean. For some variables,
the values are self-explanatory (e.g., estimated tuition for 2009-2010), but for categorical
variables, we need to know what the codes mean.

4.3 Protecting Your Data

We will be using Excel to explore our data. A challenge with Excel is that if we modify and
transform the data during exploration, we may lose information. We should preserve the
original data set. You could do this by saving a copy of the data file as a separate spreadsheet.
You could also copy the sheet to a new sheet in the workbook. Rename this sheet Raw Data.
Then go to File at the top of the screen and select Protect Workbook and then select Protect
Current Sheet.

Figure 4-4: Protecting your worksheet

Excel has a variety of database tools for querying and extracting data from a database. Use of
these tools ensures that you do not damage the original dataset and can also keep track of
query and transformations done to the data. Exploration of these tools is beyond the scope of
this book.

4.4 Tables in Excel

A data file is somewhat different from other types of Excel files. You are not building a model
for financial analysis where each cell in the spreadsheet may have special meanings. In a flat
data file, each column is a variable and each row is a record or observation. Select any cell in
the data. In the ribbon at the top of the screen, select Insert. Then select Table.

4 Data Wrangling with Excel and Python Page 5 of 27

Figure 4-5: How to Insert a Table

You will be prompted to select what cells are part of the table. Excel chooses what it thinks is
the table (and it usually guesses correctly). Click OK. The sheet will change its appearance.

Figure 4-6: Excel Table

You can remove the shading if you like, but most people find it easier to read with different
shading in alternating rows.

4 Data Wrangling with Excel and Python Page 6 of 27

4.5 Opening a data file in Python
Python is a very rich programming language. If you have never worked with a programming
language before, the transition from Excel can be confusing. Rather than selecting actions
through drop-down menus and boxes, you must write and then “execute” instructions for the
computer to perform.

Python is “open source” software that is free to download and is maintained and upgraded by a
large community of users. Although there are many books you can read, it is often best to start
with following a YouTube tutorial. I found the video, “Python Course for Excel Users”, produced
by freeCodeCamp.org to be best one to get me started.

This video got me started and I found the book “Python for Data Analysis”, by Wes McKinney,
O’Reilly Media Inc., 2018, taught me many of the Python features in more depth.

The version of Python used in the text is Python 3 and illustrations are done using Jupyter
Notebook. Both were downloaded from www.anaconda.com.

After downloading and installing Python, open the Anaconda Navigator and then launch Jupyter
Notebook.

4 Data Wrangling with Excel and Python Page 7 of 27

In the top right corner, click on New, and select Python 3 Notebook

The notebook looks nothing like an Excel workbook. Before starting work, name your
notebook. To the right of Jupyter, at the top of the screen, my page shows “Untitled3”. Simply
type over this and give your notebook a name, say “Student Survey Data 2010”.

Our Python notebook will be a record of all the “instructions” that we ask to be executed. We
want to read the data file “Student Survey Data 2010.xlsx”, so will need to tell Python to do
this. This instruction must be typed out and it will be recorded in our notebook. Think of every
point and click action that you do in Excel as an instruction. Now, rather than pointing and
clicking, you will need to write this out and it will be recorded. This takes some getting used to.

Python makes extensive use of libraries. These are like “add-ins” in Excel. We will be using
several libraries in this text.

In Excel, you select a file by first selecting a folder where the file is located. Python assumes
everything is in whatever the current folder is. This is root folder on your computer unless told
otherwise. Let us start by creating a folder that will contain the data and the notebook for this
data analysis project. Creating folders and navigating among them requires us to install the os
library.

In the first box in your notebook, type import os

4 Data Wrangling with Excel and Python Page 8 of 27

To execute this instruction, click on Run above, or press ctrl+Enter. Nothing seems to happen!
But Python executed the instruction. Note that the display changed slightly, from green to blue
and the number 1.

The blue means we are now in Command mode, whereas before we were in Edit mode. If you
press h, you will see the many shortcuts for things we can do in Command mode. There are a
lot!

To start a new line below in Command mode, type b (and type a to insert a new line above).

4 Data Wrangling with Excel and Python Page 9 of 27

We need to find the current working directory where Python is saving our notebook and
looking for data. Type os.getcwd() and then ctrl+Enter.

I want to create a new folder (directory) for my Python files and within it, one for this project.

Your root directory is likely not C:\\Users\\s1687448, so you will need to change that in the
script below.

Type os.makedirs(‘C:\\Users\\s1687448\\Documents\\Python Projects\\Student Survey

Project’)
Press Enter and then type os.chdirs(‘C:\\Users\\s1687448\\Documents\\Python
Projects\\Student Survey Project’)
Press Enter and type os.getcwd()
Press crtl+Enter to execute these 3 instructions.

If you are like me, you will make typos and get error messages. Simply go back and edit the
script and run it again.

Notice that each of these instructions had some very fussy syntax.

The os library is treated as an object that has methods (functions) associated with it. We used
several “methods”:
Getcwd get the current working directory
Makedirs make a new directory (folder)
Chdir change the current working directory

Observe that we must tell Python both the object and the method, separated by a dot (e.g.
os.getcwd()). Also, methods are always followed by (), even if there is nothing in the brackets.

4 Data Wrangling with Excel and Python Page 10 of 27

When the information needed by a method is text, it must be enclosed in quotes, either ‘ ‘ or “
“. Names of folders or files are viewed as text (strings) in Python.

Copy the file Student Survey Data 2010.xlsx into your new Student Survey Project folder. You
can check that it is there by typing the line below.

Now we want to read the data set and start exploring it. There are 2 libraries that will be useful
to us, Numpy and Pandas. Numpy (Numerical Python) is a library of tools to do numerical
transformations. Pandas is a library of tools for data analysis. Its name is short for “panel
data”, a term for data sets in econometrics, as well as a play on words for “Python data
analysis”. To load these two libraries, type

There is no response, but don’t worry.

We want to read in the Excel file Student Survey Data 2010.xlsx, so type
pd.read_excel(“Student Survey Data 2010.xlsx”)
Remember to enclose the file name in quotes.

The output shows the first and last 5 rows of data and the 10 and last 110 columns. Python
automatically codes blanks as NaN (Not a Number). The first column is the “Index” for the data

4 Data Wrangling with Excel and Python Page 11 of 27

set and it always starts at 0. The data set has 811 rows, from row 0 to row 810. Starting the
index at 0 can take some getting used to.

In Python, a data set that has multiple columns is called a Data Frame. A single column of data
(excluding the index column) is called a Series.

A Data Frame is equivalent to a Table in Excel.

Python only read in the first page of the Excel notebook. It did not read in the additional pages
that contain valuable information that is useful for interpreting what the variables are and what
the values mean. We will need a separate dictionary, outside Python, to keep track of this.

The Jupyter notebook can function as a document editor in which we can save all kinds of text
information. We can even format it using a convention called Markdown. This is beyond the
scope of this text, but it is valuable for you to investigate if you plan to use Python regularly.

Columns in a data frame are often called features. The names of the columns(features) appear
at the top of each column. They should be in the top row of the Excel or CSV file.

The subsequent rows in the data file (Excel or CSV) are known as the observations. If a cell is
empty, it will appear with a value of NaN, for Not a Number. Python expects that each value in
a column will have the same data type. Common data types are Integer, Float(decimal
numbers), and String (text).

To explore this data frame, we need to give it a name. We can be exploring many different data
frames when we are in a Jupyter notebook, so naming them allows Python to understand what
we are referring to. In contrast to Excel where many actions are simply point and click, Python
wants us to write out the instructions associated with every action we take.

Let us call our data frame df_SSD. When naming objects in Python, do not use blanks. I have
used the underscore, _, to represent a blank. To remind me that this object is a data frame, I
started the name with df.

An “object” has associated attributes. For example, what is the shape of the dataframe? A
data frame always has an Index with unique values for each row/observation. I can get a list of
the names of the various columns and I can request what their data types (dtypes) are.

To obtain these attributes, I simply type the name of the data frame, followed by dot and the
attribute. For example:

4 Data Wrangling with Excel and Python Page 12 of 27

df_SSD.shape
df_SSD.index
df_SSD.columns
df_SSD.dtypes

Also associated wit objects, we have methods. Methods are functions that we can apply to an
object. We will explore many different methods throughout the text.

4 Data Wrangling with Excel and Python Page 13 of 27

4.6 Making Your Data Easier to Read
There is a very large number of columns in this spreadsheet. It is likely that we only need a few
of them to investigate a research question. Suppose that we wished to investigate student
expectations about employment, two years after graduation. There are several questions
related to this issue. We would also like to see how student expectations differ based upon
various demographic characteristics. Suppose that we settle on the following questions as being
relevant.

ID Survey record number

q3 In what year did you first enrol in your current institution?
q5 In what type of program are you currently enrolled?
q7 In which province or territory did you complete high school?
q14a Indicate how concerned you are about finding any job after graduation.
q14b Indicate how concerned you are about finding a job in your field of study.
q14c Indicate how concerned you are about finding a job that will pay a salary you desire.
q14d Indicate how concerned you are about finding a job in a place you want to live.
q15 What do you expect your annual salary to be two years after finishing your post-secondary
education?
q16 How long do you think it will take you to find a job after completing your degree?
q53 Are you: Male/Female
q54 How old are you?
q56 What was the highest level of education completed by either of your parents?
q57 What language do you generally speak when at home with your family?

In Excel:

Extract this data by copying and pasting these columns into a new worksheet and call it Grad
Expect. Every time you make significant changes to a worksheet, you should copy the data to
a new sheet so that you maintain a trail of your changes.

The variable names should reflect what they represent. We wish to improve our data
understanding, so I recommend that you relabel your variables with descriptors that are short
but intuitive.

4 Data Wrangling with Excel and Python Page 14 of 27

Table 4-2: Selected variables

Note that in a Data Table, each variable name must be unique. If you try typing the same name
for two variables, Excel will change the second one to name2. You should update your data
dictionary with the old and new variable names. Keep a record of what changes you are making
to your data and include this in your spreadsheet on a separate sheet. Excel does not keep
notes of your changes.

Video: Creating a Table in Excel

https://youtu.be/bgNmRmszsQ0

4 Data Wrangling with Excel and Python Page 15 of 27

In Python:

In Python, we can extract the columns we want by copying them into a new data frame. Note
that we do not have the risks inherent in modifying a data set in Excel, since we never actually
change the original data file. Furter, we can track all of our actions because they are always
saved in our notebook.

Let us name our new data frame df_Grad_Exp. To pull the columns we want out of df_SSD, we
must list the names of the columns we want. The list is enclosed in double square brackets, [[]].
The names of the columns are text strings, so we must enclose them in quotes ‘ ‘ or “ “. If some
columns have names that contain apostrophes, then you must use double quotes “ “, otherwise
Python will interpret the apostrophe as a single quote and give you an error. Below, is the
script to create df_Grad_Exp and then a display of what it looks like. Typing the name of an
object and then crtl+Enter (Run) will display the value(s) of the object.

If you wanted the columns in df_Grad_Exp to be in a different sequence than in df_SSD, simply
list the columns in the desired sequence.

To rename the columns, we must use the rename “method”. A “method” is Python’s term for a
function. We must say that it is the column headings we wish to rename and then list the old
and new names. If we wish to make this change to the data frame and not assign the result to a
new data frame, we tell Python that change will be inplace.

The script is quite long, so you may wish to break it up over several lines as done below. Python
sent a warning about using inplace rather than copying my data frame to a new object. You
can’t undo the inplace action once it is done.

4 Data Wrangling with Excel and Python Page 16 of 27

4 Data Wrangling with Excel and Python Page 17 of 27
4.7 Transforming Data Values
Many of the variables have numeric values that represent names, words, opinions, …. Text
information may be represented with numbers. It may be desirable to also show the text
equivalent. We will need to create complementary variables to show this.

In Excel (VLOOKUP):

Let us look at the Program variable in Column C. Insert a column to the left of C. If you label the
column Program, Excel will change the name of the original Program variable to Program2. We
would like to map the values of Program2 into the text equivalent in Program. We need a table
to map the number to text. For example,

Table 4-3: Interpretation of Program responses

Saint Mary’s does not offer Education, Fine Arts, Medicine, Health, Services or Law, but some
students selected these programs. We also do not offer any programs that are “Other” than
those in Humanities, Social Sciences, Business, Science, Mathematics, Environment and
Engineering. What should we do? It is not uncommon to have some data values that are invalid.
We could classify them all as Invalid and maybe will choose to investigate them later or to
exclude them from our analysis later.

Open a new worksheet and Rename it Lookup Tables. Create a look up table in which
• Humanities and Social Sciences are classified as Arts,
• Business as Business,
• Science, Math, Engineering and Environment are Science, and
• all others are classified as Invalid.

4 Data Wrangling with Excel and Python Page 18 of 27

Figure 4-7: Look up table for Program

Go back to the Grad Expect worksheet and the first cell in the Program column (C2). Type

• =VLOOKUP(D2,
• now select the Lookup Tables worksheet and highlight the table you created.
• In the formula bar at the top of the screen you should see that the VLOOKUP formula is
capturing this information.
• Continue typing in the formula bar to add ,2,False).

It should look like this

Figure 4-8: VLOOKUP formula for Program

• Press Enter.
The rest of the Program column is filled in, but incorrectly. This is because we need to lock the
location of the look up table.

• Go back to C2 and in the formula bar,

4 Data Wrangling with Excel and Python Page 19 of 27

• change A1:B15to $A$1:$B$15.
• Press Enter again.

In a Table, formulas apply to all cells in a column automatically.

Similarly we can create additional tables in our Lookup Tables worksheet and use them to
create new variables that have the text equivalents for Home, Gender, Parent Ed, and
Language.

VLOOKUP is a function that assigns a value based upon a look up table. It has four arguments.
VLOOPUP(value, table, column, match)

The value is the cell location whose value you would like to match within the look up table.

The table is the location of the lookup table, like the Program table we used.

• You must give the location of the top-left cell and the bottom-right cell, separated by :
• Don’t put a comma between the two cell locations, else Excel will give you an error.
• Since the lookup table will always stay in the same location, you should lock the location
by putting $ signs in front of the row and column (e.g., $A$1:$B$15 in the previous
example).

The column is the column with the new value to assign. Your look up table can have many
columns. Excel will match the value in the first column and assign the value in the column that
you have named in the function.

The match can be TRUE (approximate match) or FALSE (exact match). In the example of
program, we exactly matched each numeric value with a particular program.

In Python:

In Python, there are a variety of ways to perform the equivalent of an exact match. One uses a
very similar method to a lookup table, except that it is a list. To improve readability, we can
make it look like a table. We begin by creating a dict. A dict is a dictionary. It is a list of pairs
with the first entry being the old value and the second being the new value. We will call our
first dict Program_map.

We will use this dict to populate a new variable we will call Home2. In Excel, when we tried
naming a new variable with the same label as an existing variable, Excel automatically renamed
to old variable. This doesn’t happen in Python.

4 Data Wrangling with Excel and Python Page 20 of 27

The new variable, Program2, has been appended to the right side of the data frame. Python
has also sent a warning recommending that rather than using variable names, we reference
their location by column number.

BUT!!!

Python is a very rich programming language. Frequently, there are many ways to achieve the
same outcome. A VLOOKUP is equivalent to joining databases(table) which have a common
variable in common. A table is simply a 2 dimensional data frame (a flat file in Excel). “Joining”
is connecting file that have a variable in common. A lookup table is such a file. Suppose we
create a simply array, Program_Name that has just two columns, with the program numeric
value and its text value, just like Program_map.

Note that Program_Name is a set of “nested” lists. Something enclosed [] is a list and here we
have lists within lists. This becomes an “array”, a 2 dimensional matrix of elements.

4 Data Wrangling with Excel and Python Page 21 of 27

Now we can use the merge or join functions to combine these two data frames.

Merge and Join are important tools for merging databases. Consult Python documentation for
more information on these methods.

4 Data Wrangling with Excel and Python Page 22 of 27

4.8 Grouping (binning)
In Excel:

We can also do approximate matches as a way of grouping values. Grouping changes a numeric
variable into a categorical (ordinal) variable. Grouping is also referred to as binning. This can
often simplify interpretation. Look at the salary expectations. The average is around $50,000,
so we could group salaries into categories, such as

Table 4-4: Look up table to recode Salary

We can group the salaries by building a Look Up table in which the range of salaries is defined
by its lowest value.

Table 4-5: Look up table as it will look in Excel

VLOOKUP will classify salaries into the highest category it can select. For example, $54,000 is
greater than 40000 but less than 60000, so it will be classified into the category starting with
40000. In this case, we would assign the match value to be TRUE to get an approximate match.

=VLOOKUP(L2,'Lookup Tables'!$O$1:$P$7,2,TRUE)

4 Data Wrangling with Excel and Python Page 23 of 27

Figure 4-9: Recoded Salary groups

In Python:

In Pandas, there is a function cut that takes numeric data and cuts it into categories. Similar to
Excel, Python creates bins. Whereas Excel asks for the lower limit for each bin, Python asks for
the upper limit. Create a list of upper limits and then us the cut function to assign values to a
new variable, Program_Grp.

You must tell Python the start of the first bin, otherwise it will assume it starts at 19999.

This new variable has a data type “category”.

Python assigns default category labels showing the limits of each interval. A round bracket, (,
indicates that this value is NOT included in the interval and a square bracket, ], indicates that
the value is included. (19,999.0 39,999.0] indicates that values must be greater than 19,999
and less than or equal to 39,999.

4 Data Wrangling with Excel and Python Page 24 of 27

Unlike Excel, most functions in Python have optional arguments. If we wish to make our
intervals include the starting value and exclude the ending value, as in Excel, we simply include
the statement right=False. We should change the ending values of the groups to match. To
add labels to each group, we include the statement labels = ['Very
Low','Low','Average','High','Very High','Unrealistic'].

4 Data Wrangling with Excel and Python Page 25 of 27

4.9 Blank and Zero
You may have observed that when VLOOKUP encounters a blank (missing value) it sometimes
codes the outcome as #NA, but sometimes it does assign a value. Look at Gender. If the
response was blank, Gender was coded as Male. For Salary Grp, blanks were coded as Very
Low. Why?

Figure 4-10: Effects of recoding blanks

VLOOKUP treats blanks as zeroes. If the look up table maps 0 to a new value (0 = Male, or 0 =
Very Low), then VLOOKUP applies this relationship. Missing values can be a data quality
problem, but erroneously recoding data is an even more serious data quality issue.

One way to address this issue is to use the IF function. The IF function takes a logical argument
and assigns one value if it is true and another value if it is false. IF(argument, TRUE result, FALSE
result).

For example, with Salary, we could change our VLOOKUP formula to read
=IF(L2<>””,VLOOKUP(L2,'Lookup Tables'!$O$1:$P$7,2,TRUE),””)

This says that if the value in L2 is not equal (<>) to blank (“”), then use VLOOKUP, but otherwise,
keep it blank.

Video: Recoding Values with VLOOKUP and IF

https://youtu.be/fWK0shgaHvc

4 Data Wrangling with Excel and Python Page 26 of 27

Python assigns the code NaN to blanks and does not encounter the same issue as Excel does.
Python also has an IF function that can be used for recoding or transforming variables. Since
Python is a full-fledged programming language, it has a very rich set of commands for
transforming data.

Image Citations:
Figures 4-1 to 4-10: Images courtesy of author using Microsoft Excel

4 Data Wrangling with Excel and Python Page 27 of 27

Python in Excel (2024)
100% (10)
Python in Excel (2024)
607 pages
Python Excel-Eration - A Compreh - Strauss, Johann
No ratings yet
Python Excel-Eration - A Compreh - Strauss, Johann
276 pages
EXCEL: Microsoft: Boost Your Productivity Quickly! Learn Excel, Spreadsheets, Formulas, Shortcuts, & Macros
From Everand
EXCEL: Microsoft: Boost Your Productivity Quickly! Learn Excel, Spreadsheets, Formulas, Shortcuts, & Macros
Quick Start Guides
No ratings yet
Excel Tables: A Complete Guide for Creating, Using and Automating Lists and Tables
From Everand
Excel Tables: A Complete Guide for Creating, Using and Automating Lists and Tables
Zack Barresse
4.5/5 (2)
Power BI Training Module
100% (1)
Power BI Training Module
2 pages
Exploring Data with Excel 2019
From Everand
Exploring Data with Excel 2019
Larry Rockoff
No ratings yet
Data Wrangling With Python Lab Manual
No ratings yet
Data Wrangling With Python Lab Manual
29 pages
jenisha INTERNSHIP REPORT-2.docx (1)
No ratings yet
jenisha INTERNSHIP REPORT-2.docx (1)
19 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
MODULE 5 Merged
No ratings yet
MODULE 5 Merged
22 pages
Instant Creating Data Models with PowerPivot How-to
From Everand
Instant Creating Data Models with PowerPivot How-to
Leo Taehyung Lee
1/5 (1)
28714576
No ratings yet
28714576
91 pages
Data Analytics With PowerBI
No ratings yet
Data Analytics With PowerBI
27 pages
Data Analysis and Business Modeling With Excel 2013 - Sample Chapter
No ratings yet
Data Analysis and Business Modeling With Excel 2013 - Sample Chapter
27 pages
Exploring Data with Access 2016
From Everand
Exploring Data with Access 2016
Larry Rockoff
No ratings yet
Data Analysis Using Python Day_1 to Day_4
No ratings yet
Data Analysis Using Python Day_1 to Day_4
30 pages
Excel 1
No ratings yet
Excel 1
177 pages
Data Sceince - UNIT -4
No ratings yet
Data Sceince - UNIT -4
70 pages
Excel 2019 – Business Basics & Beyond
From Everand
Excel 2019 – Business Basics & Beyond
Chris Smitty Smith
No ratings yet
Excel 2024: Mastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step GuideMastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step Guide
From Everand
Excel 2024: Mastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step GuideMastering Charts, Functions, Formula and Pivot Table in Excel 2024 as a Beginner with Step by Step Guide
Thomas Reynolds
No ratings yet
Exploring Data with Access 2019
From Everand
Exploring Data with Access 2019
Larry Rockoff
No ratings yet
Excel 2016: A Comprehensive Beginner’s Guide to Microsoft Excel 2016
From Everand
Excel 2016: A Comprehensive Beginner’s Guide to Microsoft Excel 2016
Timothy C. Needham
3.5/5 (3)
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
The Python Advantage Python For Excel In 2024 Hayden Van Der Post instant download
No ratings yet
The Python Advantage Python For Excel In 2024 Hayden Van Der Post instant download
52 pages
Excel, Power Query and Power Pivot for Business Professionals: Harness the Power of Excel for Advanced Data Analysis and Business Intelligence
From Everand
Excel, Power Query and Power Pivot for Business Professionals: Harness the Power of Excel for Advanced Data Analysis and Business Intelligence
Anthony Ainsley
No ratings yet
Excel Statistics: Step by Step
From Everand
Excel Statistics: Step by Step
Stephanie Glen
4/5 (8)
Python in Excel
No ratings yet
Python in Excel
14 pages
data analysis
No ratings yet
data analysis
42 pages
Gayu Report
No ratings yet
Gayu Report
24 pages
Data Representation
No ratings yet
Data Representation
13 pages
Advance Excel 2016: Training guide
From Everand
Advance Excel 2016: Training guide
Ritu Arora
No ratings yet
Data Science Analytics Reviewer
No ratings yet
Data Science Analytics Reviewer
10 pages
Excel Vs SPSS Vs Python: Comparison
No ratings yet
Excel Vs SPSS Vs Python: Comparison
8 pages
The Python Advantage: Python for excel in 2024 Hayden Van Der Postdownload
100% (2)
The Python Advantage: Python for excel in 2024 Hayden Van Der Postdownload
50 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
From Everand
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Nigel Tillery
5/5 (1)
Access Essentials: Access Essentials
From Everand
Access Essentials: Access Essentials
M.L. Humphrey
No ratings yet
An Introduction To Using Microsoft Excel For Data Analysis
No ratings yet
An Introduction To Using Microsoft Excel For Data Analysis
16 pages
Dsbda Ass1
No ratings yet
Dsbda Ass1
61 pages
Bossing Spreadsheets: A Girl's Guide to Data Analysis: Bossing Up
From Everand
Bossing Spreadsheets: A Girl's Guide to Data Analysis: Bossing Up
Sophie Johnson
No ratings yet
Getting started with OpenOffice Base
From Everand
Getting started with OpenOffice Base
Remy Lentzner
No ratings yet
Ccpda Book
No ratings yet
Ccpda Book
46 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Data Analysis
No ratings yet
Data Analysis
15 pages
An Introduction to Statistics using Microsoft Excel
From Everand
An Introduction to Statistics using Microsoft Excel
Dan Remenyi
No ratings yet
The Weakness of Excel
No ratings yet
The Weakness of Excel
5 pages
Excel The Complete Guide: From Fundamentals to Business Intelligence and Automation
From Everand
Excel The Complete Guide: From Fundamentals to Business Intelligence and Automation
Aarav Joshi
No ratings yet
Microsoft Access: Database Creation and Management through Microsoft Access
From Everand
Microsoft Access: Database Creation and Management through Microsoft Access
Steven Bright
No ratings yet
SG19861 DWDM Practical-File
No ratings yet
SG19861 DWDM Practical-File
29 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Data Wrangling
No ratings yet
Data Wrangling
6 pages
90403
No ratings yet
90403
62 pages
Python in Excel
100% (1)
Python in Excel
9 pages
Upgrading your skills with Access
From Everand
Upgrading your skills with Access
Rémy Lentzner
No ratings yet
Excel Mastery: From Basics to Power User – Unlock Your Full Potential: Your Guide to the Digital World, #2
From Everand
Excel Mastery: From Basics to Power User – Unlock Your Full Potential: Your Guide to the Digital World, #2
Atokhon Ghaniev
No ratings yet
Data Analytics Questions
No ratings yet
Data Analytics Questions
6 pages
Data Wrangling
No ratings yet
Data Wrangling
30 pages
2 Manipulating Processing Data
No ratings yet
2 Manipulating Processing Data
81 pages
Excel Formulas and Functions 2020: Excel Academy, #1
From Everand
Excel Formulas and Functions 2020: Excel Academy, #1
Adam Ramirez
3.5/5 (9)
ds with py
No ratings yet
ds with py
39 pages
Lab Manual 5
No ratings yet
Lab Manual 5
5 pages
DSA Interview Questions
No ratings yet
DSA Interview Questions
8 pages
Mail Merge Presentation
No ratings yet
Mail Merge Presentation
8 pages
Testsdumps: Latest Test Dumps For It Exam Certification
No ratings yet
Testsdumps: Latest Test Dumps For It Exam Certification
7 pages
FactoryTalk VantagePoint EMI Server - 8.20.00 (Released 10 - 2020)
No ratings yet
FactoryTalk VantagePoint EMI Server - 8.20.00 (Released 10 - 2020)
10 pages
Advanced Programming and Technologies Assignment 1
No ratings yet
Advanced Programming and Technologies Assignment 1
14 pages
IDP_Proposal (1)
No ratings yet
IDP_Proposal (1)
12 pages
SAP_HANA_Availability_System_Replication_NZD_upgrade_1735697827
No ratings yet
SAP_HANA_Availability_System_Replication_NZD_upgrade_1735697827
3 pages
COM 607 Enterprise Resource Planning
No ratings yet
COM 607 Enterprise Resource Planning
128 pages
2021 - Sem3 BT IT
No ratings yet
2021 - Sem3 BT IT
22 pages
Chapter 1 SAMPLING AND SAMPLING DISTRIBUTIONS PDF
No ratings yet
Chapter 1 SAMPLING AND SAMPLING DISTRIBUTIONS PDF
86 pages
Experiment 9 dbms[1]
No ratings yet
Experiment 9 dbms[1]
8 pages
Upgrading Job Scheduler Templates
No ratings yet
Upgrading Job Scheduler Templates
5 pages
10. MB4106A-information-management-Question Bank-
No ratings yet
10. MB4106A-information-management-Question Bank-
6 pages
Er To Table Mapping
No ratings yet
Er To Table Mapping
25 pages
Unit 3
No ratings yet
Unit 3
31 pages
Unit 2
No ratings yet
Unit 2
7 pages
Who Needs SSAS When You've Got SQL - 403 PDF
No ratings yet
Who Needs SSAS When You've Got SQL - 403 PDF
32 pages
Synopsis of Library Management System
No ratings yet
Synopsis of Library Management System
26 pages
DMSMP
No ratings yet
DMSMP
20 pages
SP13257 - Janit Bansal - 141227 - CSE - 2018
No ratings yet
SP13257 - Janit Bansal - 141227 - CSE - 2018
48 pages
PCASE Installation and Configuration Guide - June 2022
No ratings yet
PCASE Installation and Configuration Guide - June 2022
4 pages
Binary Search, Hashing and File Structures
No ratings yet
Binary Search, Hashing and File Structures
23 pages
CB3401 Unit 5
No ratings yet
CB3401 Unit 5
21 pages
As 400 Faq Part - 2
No ratings yet
As 400 Faq Part - 2
2 pages
Refer - LB - Prize - Map Refer - Leaderboard
No ratings yet
Refer - LB - Prize - Map Refer - Leaderboard
2 pages
Leveraging Sap Financial Statement Insights and Sap Realspend (1ku)
No ratings yet
Leveraging Sap Financial Statement Insights and Sap Realspend (1ku)
9 pages
Codelist Design: Version 2019.06.12 - For Citygml 3.0
No ratings yet
Codelist Design: Version 2019.06.12 - For Citygml 3.0
14 pages
Service Level Agreements Summary - Microsoft Azure
No ratings yet
Service Level Agreements Summary - Microsoft Azure
34 pages
Performance Task 1 Prog 114 No. 2 B
100% (1)
Performance Task 1 Prog 114 No. 2 B
4 pages

4 Data Wrangling With Excel

Uploaded by

4 Data Wrangling With Excel

Uploaded by

4 Data Wrangling with Excel and Python

4.1 What does a data file look like in Excel?

4 Data Wrangling with Excel and Python Page 1 of 27

Open Student Survey Data 2010.xlxs

Figure 4-1: Excel data file

4 Data Wrangling with Excel and Python Page 2 of 27

Figure 4-3: Summary statistics for variables

4 Data Wrangling with Excel and Python Page 3 of 27

Table 4-1: Interpretation of answers to Question 5

4 Data Wrangling with Excel and Python Page 4 of 27

4.3 Protecting Your Data

Figure 4-4: Protecting your worksheet

4.4 Tables in Excel

4 Data Wrangling with Excel and Python Page 5 of 27

Figure 4-6: Excel Table

4 Data Wrangling with Excel and Python Page 6 of 27

4 Data Wrangling with Excel and Python Page 7 of 27

In the first box in your notebook, type import os

4 Data Wrangling with Excel and Python Page 8 of 27

4 Data Wrangling with Excel and Python Page 9 of 27

Type os.makedirs(‘C:\\Users\\s1687448\\Documents\\Python Projects\\Student Survey

4 Data Wrangling with Excel and Python Page 10 of 27

There is no response, but don’t worry.

4 Data Wrangling with Excel and Python Page 11 of 27

A Data Frame is equivalent to a Table in Excel.

4 Data Wrangling with Excel and Python Page 12 of 27

4 Data Wrangling with Excel and Python Page 13 of 27

ID Survey record number

4 Data Wrangling with Excel and Python Page 14 of 27

Video: Creating a Table in Excel

4 Data Wrangling with Excel and Python Page 15 of 27

4 Data Wrangling with Excel and Python Page 16 of 27

Table 4-3: Interpretation of Program responses

4 Data Wrangling with Excel and Python Page 18 of 27

It should look like this

Figure 4-8: VLOOKUP formula for Program

• Go back to C2 and in the formula bar,

4 Data Wrangling with Excel and Python Page 19 of 27

In a Table, formulas apply to all cells in a column automatically.

4 Data Wrangling with Excel and Python Page 20 of 27

4 Data Wrangling with Excel and Python Page 21 of 27

4 Data Wrangling with Excel and Python Page 22 of 27

Table 4-4: Look up table to recode Salary

Table 4-5: Look up table as it will look in Excel

4 Data Wrangling with Excel and Python Page 23 of 27

This new variable has a data type “category”.

4 Data Wrangling with Excel and Python Page 24 of 27

4 Data Wrangling with Excel and Python Page 25 of 27

Figure 4-10: Effects of recoding blanks

Video: Recoding Values with VLOOKUP and IF

4 Data Wrangling with Excel and Python Page 26 of 27

4 Data Wrangling with Excel and Python Page 27 of 27

You might also like