Excel Analytics and Programming
Excel Analytics and Programming
Create dynamic algorithms to approach cases When data is changed, but retains its original format, the algorithm should be able to automatically handle the transition appropriately
Created by George Zhao 2
Workshop Structure
Instead of providing function and programming
syntax to memorize, this workshop emphasizes case studies, through which the skills are utilized
Cases: applicable situational tasks
Tutorials: supplemental teaching material to understand
foundational materials
Self-assessments: More relevant case study samples Encouraged to attempt them with resources Solutions will be posted
Created by George Zhao 3
Workshop Resources
All here: http://www.columbia.edu/~gz2165/excel.html Lesson material: LearningSlides[.pdf] Exercises Blank [.xlsx] Exercises Filled [.xlsm] Self-assessments: Assignments Solutions Recorded sessions
Created by George Zhao 4
Contents Overview
Case 1: Multiplication Table Case 2: Percentile Calculations Tutorial 1: Variables and Arrays
Tutorial 5: Userform
Case 7: Subway Data All-Around Analysis
Created by George Zhao 5
Multiplication Table
Task: If given the following on an Excel worksheet,
1
1 2 3 4 5
initially:
beginning in cell B1, going rightward We also want both sets of numbers to be bolded
until an empty cell (or the end of capacity limit of the worksheet) is reached
Shift + Control + (Up / Down / Left / Right) arrow all
work similarly
10
Where to Start
Use fixed reference cells
1
1 2 3 4 5
11
-3 -2 -1 0 1 2 3
Created by George Zhao
9.9
12
Show Formulas
Formulas > Show Formulas Toggle on and off between showing / not showing
13
0.1
$BB$32
r= x -3 -2 -1 0 1 2 3
Created by George Zhao
fixed If fixing BOTH row and column, press F4 while cursor is over the reference in the formula editing
Created by George Zhao 16
1 1 2 3 4 5
Created by George Zhao
12
17
18
Think About It
Focus on any single row: We are traversing through various columns, but want to fix the first term (A4), so fix the column letter (A) Focus on any single column: We are traversing through various rows, but want to fix the second term (E1), so fix the row number (1)
19
Result of Fixing
Fix the column (A) from the first reference, and the
row (1) from the second reference F2 to illustrate the formula and the references (colored)
20
the cell This pastes the formula down the column, avoids the effort of dragging the formula down across rows
21
columns
22
Performance Evaluation
How would performance have been if we were dealing
occurs constant time Double clicking to paste to formulas down occurs constant time Manually dragging the formulas across columns depends linearly on number of columns
23
24
everything in-between, and hit Alt + Enter Difficult to capture the desired region efficiently with the Shift + Control + arrow method
25
26
Remarks
When fixing cell reference, think: Are we fixing the column? Are we fixing the row? Both? Neither? Alt + Enter way to paste formulas is more useful in 1-
dimensional situations
27
1 1 2 3 4 5
Created by George Zhao
12
28
29
Task 1
We are given 20 x 10 matrix of all random numbers Upon supplying various integer values between 0 and
31
=PERCENTILE()
=PERCENTILE([array], k) Let k be within [0, 1] =PERCENTILE(A1:A10, .75) gives the 75th percentile
value of the data from A1 to A10 =PERCENTILE(B1:B10, .05) gives the 5th percentile value of the data from B1 to B10
32
Right Formula?
33
34
B1:B20
Stores the column of data points to be analyzed Think of what happens when the formula is dragged
on to adjacent cells DO NOT want to shift down to B2:B21 and so forth fix the row references But DO want shift right to C1:C20 do not fix the column references B$1:B$20
35
A23/100
Stores the k value Think of what happens when the formula is dragged
on to adjacent cells DO want to shift down to A24 do not fix row references DO NOT want to shift right to B23 fix the column references $A23/100
36
Refined Formula
37
Dynamic Formulas
Results updates automatically for different k values
38
Task 2
Given several integers, called x, calculate the percentile
rank of those integers What percentile would these integers fit into? If x is out of the range, error would return
In that case, display a message that its out of the range
=PERCENTRANK([array], x)
39
40
=IF()
Returns different values given the certainty of a
Suppose cell A1 contains 24 Suppose A2 wants to show the value in A1, but only the value is divisible by 11, otherwise leave blank =IF(MOD(A1,11)=0,A1,"")
Created by George Zhao 41
Nested =IF()
Suppose cell A1 contains 24 In cell A2, type =IF(MOD(A1,2)=0,IF(MOD(A1,3)=0,"DIV BY 6","DIV BY 2"),"NOT DIV BY 2") If divisible by 2:
If further divisible by 3, show that its divisible by 6 If not further divisible by 3, show that its merely divisible by 2 Display that its not divisible by 2
If not divisible by 2:
42
=IFERROR()
=IFERROR([normal value], [value if error]) For B23 cell for example, we want
=IFERROR(PERCENTRANK(B$1:B$20,$A32),"Out of Range") Return PERCENTRANK(B$1:B$20,$A32) to B23 cell, but if that results in an error, return Out of Range instead
43
=IFERROR(PERCENTRANK(B$1:B$20,$A32),"Out of Range") is essentially this: =IF(PERCENTRANK(B$1:B$20,$A32)=#N/A,"Out of Range") However, we cant use the latter. PERCENTRANK(B$1:B$20,$A32) immediately throws an error, wont compare to #N/A So =IFERROR() is the only way to trap that error
Created by George Zhao 44
1 through 20
However, if the sum is less than 1000, display < 1000
In B23 cell:
45
=SUMIF()
In the previous example, output changed depending
What if we want conditions for each entry? In row 40, sum the entries of rows of 1 to 20, but only each individual entry is greater than 70
46
Criteria
Boolean condition within quotation marks Examples: Less than 100: <100 Equals to 100: =100 Greater than 100: >100 Not 100: <>100 Greater than or equal to 100: >= 100 Less than or equal to 100: <=100 =SUMIF(B1:B20, ">70") in this scenario
Created by George Zhao 47
exercise Most times, if a dynamic formula in a cell can give us all the information we need, use them instead of programs
Faster, easier debugging
50
52
Declaring Variables
Dim size as Integer Dim location as String Dim passFail as Boolean Dim avgGrade as Double Dim ltrGrade as Char
of procedure) Variable Names: begin with letter, only contain letter, number, or underscore, cannot be reserved word
Created by George Zhao 53
Initializing Variables
Can do all the declaration in one line as follows: Dim size as Integer, location as String, passFail as
ltrGrade = A
Created by George Zhao 54
Variants
Variables not restricted to specific type No need to declare by type size = 30 location = "Hamilton Hall"
passFail = False
avgGrade = 94.4 ltrGrade = A
55
Difficult to keep track of all of the variables Difficult to access each of the variables Gets particularly difficult when the number of entries
grow higher
56
Solution: Arrays
Array: block of pigeonholes 7 pigeonholes, each representing a day of week:
Sunday Index Price 0 $5.03 Monday 1 $0.13 Tuesday Wednesday Thursday Friday Saturday 2 3 4 5 6 $1.51 $7.75 $7.24 $1.99 $0.64
57
Arrays
Dim prices(6) as Double prices(1) to retrieve entry from index 1 (second entry) prices(7) will give out-of-bounds error Benefit: the index can be accessed by other variables: Dim i as integer prices(i) gives the (i+1)th entry
58
Arrays
prices(0) = 5.03 prices(1) = 0.13 prices(2) = 1.51 prices(3) = 7.75 prices(4) = 7.24
prices(5) = 1.99
prices(6) = 0.64
Sunday Index Price 0 $5.03 Monday 1 $0.13 Tuesday Wednesday Thursday Friday Saturday 2 3 4 5 6 $1.51 $7.75 $7.24 $1.99 $0.64
59
Multidimensional Array
Row x column Dim matrix(1,2) as Integer Creates 2x3 matrix with 2 rows, 3 columns
60
Dynamic Array
Dim sample(9) as String Creates string array of size 10 sample(0) = "Introduction" Now suppose we want to increase the array size to 100 ReDim sample(99) This would erase existing data, such as Introduction in cell index 0 ReDim Preserve sample(99) This preserves existing data and changes size
Created by George Zhao 61
Practice Exercise
Whats the result after executing this code?
Dim dat(2, 1) dat(0, 0) = "Criterion" dat(0, 1) = "Value" dat(1, 0) = "Budget" dat(1, 1) = 5123.21 dat(2, 0) = "Enough?" dat(2, 1) = True ReDim Preserve dat(2,1)
Created by George Zhao 62
Practice Exercise
Did you catch that: dat originally was 3x2 two-
dimensional variant array? The very last line did nothing since the new dimension is the same?
Excel Hierarchy
Workbook (Data.xls) Worksheet (Sheet1) Row / Column (1 / A) Cell (A1)
65
workbook codes
Code is run when the workbook is opened Code is run when the worksheet is modified Etc.
66
67
Worksheet Codes
Task: Display Hello World to the window when
68
ALT + F11
Created by George Zhao 69
ALT + F11
70
71
Hello World
Recall, task: Display Hello World to the window
72
Hello World
Select Worksheet
73
Hello World
Since we want code to run when this worksheet is
74
Hello World
Private Sub Worksheet_Change(ByVal Target As Range) MsgBox "Hello World" End Sub
Hello World is a string
75
Hello World
76
Deactivate
FollowHyperlink PivotTableUpdate
SelectionChange
Created by George Zhao 77
Workbook Actions
Open: codes to be run when the workbook opens
78
Practice Exercise
Suppose you want a pop-up that says Welcome to the
Database when you open the workbook, and then want a pop-up that says Are you super sure? before any calculations are performed on Sheet1.
79
Practice Exercise
ThisWorkBook > WorkBook > Open:
A Word on Functions
Excel has pre-defined function, including: =SUM() returns the sum of an array of numbers =AVERAGE() returns the average value of an array of number Etc. We can write, define, and use our own functions For example, a function that takes an array of numbers, and returns the product of the maximum value and the minimum value
82
Intro to Modules
Subroutines (Sub) Piece of code that perform a set of actions or calculations or a combination of the two Does not return a value Functions Exactly like a subroutine except returning a value Think of f(x, y, z) Can have as many inputs, but returns one value
83
Function Example
Suppose given three digits a, b, c, return the number
abc If digits 4, 5, 6 are passed, the function returns 456 Algorithm: 100*a+10*b+c Lets call this function Concat
84
Module
Can no longer write codes under the worksheet objects Right-click Sheet > Insert > Module
85
Concat
Function Concat(a, b, c) Concat = 100 * a + 10 * b + c End Function Function name equals the value to be returned
86
87
Concat(3, 4, 5)
Should return 345 And indeed it does
88
Practice Exercise 1
Without using pre-existing Excel functions, write your
89
Practice Exercise 1
Function Special(a, b) Special = (a ^ b) Mod b End Function
In A1 cell, can type =special(2,3)
90
Practice Exercise 2
Without using pre-existing Excel functions, write your
91
Practice Exercise 2
Function takes in two variables: A string An integer Need to compute the length of the string Need to compare the length of the string, to the
integer Return whether or not (true or false) the two values are equal
92
Practice Exercise 2
Function StrLength(a, b) Length = Len(a) StrLength = (Length = b) End Function
Note that (Length = b) is a boolean statement. It is
95
Sample Exercise
Print out 1, 2 1000 in column A of the first 1000 rows
96
97
Do Until
Similar to While loop
100
Word on Loops
Usually interchangeable Choice of which loop to use usually coming down to
personal preference For loop usually best when the number of iterations are known
101
If Statements
Given 1, 2, 3 1000 printed in column A Display in column B whether each integer is divisible
102
Algorithm Approach
Traverse through 1, 2 1000 If divisible by 6, note it Otherwise check if its divisible by 2 or 3 and note if so Important: If its divisible by 6 already, no need to
check if its divisible by 2 or 3 Recall: a Mod b gives the remainder of a / b In another words, a is divisible by b if a Mod b = 0
103
The Code
For Row = 1 To 1000 If Cells(Row, 1) Mod 6 = 0 Then Cells(Row, 2) = "Divisible by 6" ElseIf Cells(Row, 1) Mod 2 = 0 Then Cells(Row, 2) = "Divisible by 2" ElseIf Cells(Row, 1) Mod 3 = 0 Then Cells(Row, 2) = "Divisible by 3" End If Next Row
Created by George Zhao 104
If Ladder
An If ladder begins with If Then Can include multiple ElseIf Then Ladder ends with EndIf Within a ladder, as long as the first satisfying
condition is met, other conditions are ignored and skipped even if they are true
105
Think About It
What will happen after this switch or ordering?
For Row = 1 To 1000 If Cells(Row, 1) Mod 2 = 0 Then Cells(Row, 2) = "Divisible by 2" ElseIf Cells(Row, 1) Mod 3 = 0 Then Cells(Row, 2) = "Divisible by 3" ElseIf Cells(Row, 1) Mod 6 = 0 Then Cells(Row, 2) = "Divisible by 6" End If Next Row
Created by George Zhao 106
Think About it
Take the number 12: Since it 12 % 2 = 0, satisfying the first condition, will display Divisible by 2 and exit the If ladder Even if its divisible by 3 and 6 also Instead, breaking this ladder into multiple different If
107
Hypothetical Revision
Overriding each time just inefficient
If Cells(Row, 1) Mod 2 = 0 Then Cells(Row, 2) = "Divisible by 2" End If If Cells(Row, 1) Mod 3 = 0 Then Cells(Row, 2) = "Divisible by 3" End If If Cells(Row, 1) Mod 6 = 0 Then Cells(Row, 2) = "Divisible by 6" End If
Created by George Zhao 108
Remember This?
Nested for loop (i, j) = (2,2), (2,3) (2,6), (3,2), (3,3) (3,6), (4,2)
109
Task
Suppose this grading scheme: With the caveat that up to 2 assignments can be dropped Compute the overall grade
Assignment HW 1 Quiz 1 HW 2 Quiz 2 Test 1 HW 3 Test 2
Created by George Zhao
Weight
10% 15% 10% 15% 20% 10% 20%
Points
Total
111
Illustration
For example, if Quiz 1 and Test 1 are dropped: 35% of the assignments are dropped HW 1 is now worth (10%) / (1 0.35) and so on
Assignment Weight Points HW 1 10% Quiz 1 15% HW 2 10% Quiz 2 15% Test 1 20% HW 3 10% Created by George Zhao Test 2 20%
Total
112
Approach
Take into consideration: Varying total points for assignments Unexpected position of empty slots
Assignment HW 1 Quiz 1 HW 2 Weight 10% 15% 10% 43 Points 18 Total 20 10 50
Quiz 2
Test 1 HW 3 Test 2
Created by George Zhao
15%
20% 10% 20%
23
9 49
24
50 20 50
113
Approach
Calculate grade for each assignment If score cell is empty, keep track of the weight of the
cell is blank
114
115
Approach
If no points are listed in the POINTS column, a grade
Total 18 43 23 9 49
Approach
If the Points column is blank, return blank in the %
117
Approach
Multiply each assignment grade by the assignment
weight, sum the product Normalize the overall weight, to take into consideration dropped assignments
Assignment Weight Points HW 1 10% Quiz 1 15% HW 2 10% Quiz 2 15% Test 1 20% HW 3 10% Test 2 20%
Created by George Zhao
Total 18 43 23 9 49 20 10 50 24 50 20 50
56.08% 86.27%
Calculation
10% * 90% + 10% * 86% + 15% * 95.83% + 10% * 45% +
Total 18 43 23 9 49 20 10 50 24 50 20 50
56.08% 86.27%
20% * 98% = 56.08% Numbers in red represent the weights, multiplied by its corresponding grade % We multiply each weight by its corresponding %, and tally them: cross product of vectors in calculus
120
Normalized Sum
56.08% / (100% - 15% - 20%) = 86.27% If Points column is empty, take the Weight value of
Total
18 43 23 9 49 20 10 50 24 50 20 50
%
90.00% Sum:
Normalized Sum: 56.08% 86.27%
Normalized Sum
Added Column F to display the weight of the
122
Programming Approach
We already have the algorithms down Just need to translate to VBA codes Given the following structure to begin with:
123
Psuedocode
Start from row 2 Traverse down the rows, as long as there is an
assignment
For each row, compute % if there is a score Otherwise, keep track of the weight, to be backed out
later
normalized sum
Created by George Zhao 124
The Code
Sub Tally() Row = 2 BackedOut = 0 Sum = 0
While Cells(Row, 1) <> "" If Cells(Row, 3) <> "" Then Sum = Sum + Cells(Row, 2) * Cells(Row, 3) / Cells(Row, 4) Else BackedOut = BackedOut + Cells(Row, 2) End If Row = Row + 1 Wend Cells(2, 9) = Sum / (1 - BackedOut) End Sub
Created by George Zhao 125
Set-Up
Sub Tally() Row = 2 BackedOut = 0 Sum = 0
Start from row 2
Sum and weights to back out begin with 0
126
known as cellA2 <> means not equal to <> means not equal to blank cell in another words, the cell contains something
127
Normal Entries
If Cells(Row, 3) <> "" Then Sum = Sum + Cells(Row, 2) * Cells(Row, 3) / Cells(Row, 4)
Recall: column 3 (column C) is the Points column If content of column C for each row is empty, then:
128
Empty Entries
Else BackedOut = BackedOut + Cells(Row, 2) End If
Otherwise, the weight of that assignment entry
129
Traversing
Row = Row + 1 Wend Cells(2, 9) = Sum / (1 - BackedOut) Cells(2, 9).NumberFormat = "0.00%" End Sub
Move onto the next row Wend to denote the end of a While loop End Sub to denote the end of the Subroutine
Created by George Zhao 130
Remarks
This code can handle any number of assignment
131
133
Task
Give the percentage of days in which the stock index
(S&P 500) closed above the previous day The last date on the table wouldnt be factored
If there are 19 dates of raw data, therell only be 18
added below Perform the task without programming, and then with programming
Created by George Zhao 134
Algorithm Reasoning
In column C, return 1 if the value in same rows column
B is greater than previous rows column B Otherwise, return 0 Drag the formula down to the penultimate entry Sum up column C, divided by the number of entries in column C
135
No Programming
136
Programming Reasoning
For loop would not be appropriate, since the number
of iterations isnt known at first While loop or Do Until loop Similar reasoning as the non-programming approach
137
While Loop
Row = 2 Sum = 0 Total = 0 While Cells(Row,1) <> "" If Cells(Row,2) > Cells(Row+1,2) Then Sum = Sum + 1 End If Total = Total + 1 Row = Row + 1 Wend Cells(Row-1,3) = "" Cells(2,5) = (Sum-1) / (Total-1) to discount the last entry
Created by George Zhao 138
ActiveCell
ActiveCell, as the name suggests, refers to the
currently selected cell ActiveCell.Value returns the value of the current cell Say cell B4 is to be selected:
Cells(4,2).Select By the way, Range(B4) and Cells(2,4) refers to the same ActiveCell.Value now returns the value of cell B4
139
Offset
Very important function used both in programming
ActiveCell.Offset(1,0).Select Select the cell one row beneath, in the same column Potential for loops?
Created by George Zhao 140
Offset Looping
Given rows of entries, begin with cell A1 and loop
down the rows, until there are no more entries No more the need to keep track of a Row variable Range("A1").Select Do Until IsEmpty(ActiveCell.Value) [Can do something here for each row] ActiveCell.Offset(1,0).Select Loop
Created by George Zhao 141
142
Programming
Range("B2").Select Sum = 0 Total = 0 Do Until IsEmpty(ActiveCell.Value) Total = Total + 1 If ActiveCell.Value > ActiveCell.Offset(1,0).Value Then Sum = Sum + 1 EndIf ActiveCell.Offset(1,0).Select Loop Range("E2").Value = (Sum 1) / (Total 1)
Created by George Zhao 143
give the code (macro) that, once upon run, would perform the same actions
But what about tweaking a mini detail?
145
Sorting
Refer to Excel file Wish to sort by values in column B, smallest to largest,
expanding the selection Easy to perform manually Record macro to see the syntax to automate the action
Record macro Perform action manually Stop macro Read macro code
Created by George Zhao 146
Record Macro
Essentially subroutines
147
148
Expanded Exercise
Suppose in a table of data, spanning from columns A
to E, rows 1 to 100, sort by values in column A Replace references of column B with column A Range("A1:B20") becomes Range(A1:E100)
151
displayed as percentage Copying, cutting, pasting Moving selection to the top row of a table entry Defining a formula in a cell
RC format just like offset
152
order to know how and where to change the recorded codes to fit in the specific task This workshop will not be able to cover all types of scenarios in Excel, but knowing how to approach new problems is the most crucial component of problem solving
Created by George Zhao 156
158
Task
Create a customized data, whereby the user can enter
and change 5 selected dates For those 5 dates, the customized data would pull all of the data to display from the original table For those 5 dates, make a bar graph, with the Opening and Closing values displayed on top of another for each date When the user changes the dates, the customized table and graphs should update automatically
Created by George Zhao 159
Algorithm
Need a function that would take a value (user-inputted
date) and search for that value within a bigger table =VLOOKUP() very useful when data is presented rowby-row
For example, if the look-up value is 7/20/2012 Looks through the master table, looking for the row
entry with 7/20/2012 in the first column Function can return specific column entry =HLOOKUP() not used as often, for when each data entry presented column-by-column
Created by George Zhao 160
=VLOOKUP()
User inputs 7/20/2012 for Opening value: =VLOOKUP(I2,A1:F31,2,FALSE) I2 is the look-up value A1:F31 is the master table to search from 2 signifies return the 2nd column value from master table FALSE means return only exact results (TRUE would yield approximate result when no exact result is found)
161
=VLOOKUP()
162
We dont want this, instead want fixed for all searches, across columns and rows Fix both the row and column: $A$1:$F$31
gets shifted
We want to only fixed the column reference, not the row reference Fix just the column: $I2
163
Refined Formula
=VLOOKUP($I2,$A$1:$F$31,2,FALSE)
164
Problem
Upon dragging the formula to the right, nothing
165
Possible Solutions
Manually change If the customized table will not be moved, can use a
function that determines the current column value, and rearrange accordingly =COLUMN() returns the column number of the cell
Column A returns 1 Column B returns 2 Etc.
166
As It Stands
Formula in J2 wouldve been: =VLOOKUP($I2,$A$1:$F$31,2,FALSE) Formula in K2 wouldve been: =VLOOKUP($I2,$A$1:$F$31,3,FALSE) Formula in L2 wouldve been: =VLOOKUP($I2,$A$1:$F$31,4,FALSE) Formula in M2 wouldve been: =VLOOKUP($I2,$A$1:$F$31,5,FALSE) Formula in N2 wouldve been: =VLOOKUP($I2,$A$1:$F$31,6,FALSE)
Created by George Zhao 167
Use =COLUMN()
Column J is really column 10 Column K is really column 11 Column L is really column 12 Column M is really column 13 Column N is really column 14
168
Refined Formula
169
Improvement
=VLOOKUP() returns #N/A error is the look-up value
is not found in the original table We can error trap using =IFERROR() function =IFERROR([value if no error], [value if error]) Suppose we want to display ***NO DATA*** if there is no data In cell J2, to be dragged in both directions:
=IFERROR(VLOOKUP($I2,$A$1:$F$31,COLUMN()-
8,FALSE),"***NO DATA***")
Created by George Zhao 170
Refined Table
Manually efficient Dynamic
171
Bar Chart
Select the customized table, and choose bar graph Horrendous output, nothing like what were looking
for
172
Dont Worry
We can manipulate a lot about the chart Right click > Select Data
173
End Result
Series Categories
3000 2500
2000
1500
Close Open
1000
500
7/20/2012
7/18/2012
6/25/2012
6/24/2012
6/18/2012
174
Series
Series 1 name: cell containing Open Series 1 value: cell range containing opening prices Series 2 name: cell containing Closing Series 2 value: cell range containing closing prices
176
Category
Select the 5 dates
177
2000
3000
2500
1500
1000
500
0 6/18/2012 6/19/2012 6/20/2012 6/21/2012 6/22/2012 6/23/2012 6/24/2012 6/25/2012 6/26/2012 6/27/2012
7/9/2012
7/10/2012 7/11/2012 7/12/2012 7/13/2012 7/14/2012 7/15/2012 7/16/2012 7/17/2012 7/18/2012 7/19/2012
7/20/2012
Close
Open
178
The Problem
Axis is arranged numerically, treating the dates on a
continuous spectrum We want discrete spectrum, treating the dates not as numerical, but as text Luckily, theres feature for that
Right click axis > Format Axis > Axis Type: Text Axis
179
181
Message Box
So far, weve worked with only the simplest type of
message box: user can only click Okay MsgBox "Message Body", vbInformation, "Optional Title Goes Here"
182
Goes Here MsgBox "Message Body", vbQuestion, "Optional Title Goes Here MsgBox "Message Body", vbExclamation, "Optional Title Goes Here"
183
184
Yes or No
Capture the user response onto the variable response
response = MsgBox("Choose yes or no", vbYesNoCancel, "Choices") If response = vbYes Then MsgBox "You clicked yes" ElseIf response = vbNo Then MsgBox "You clicked no" Else MsgBox "Why can't you follow directions?" End If
Created by George Zhao 185
Customized Userform
Begin by inserting a blank userform
186
Blank Userform
187
188
Userform Hierarchy
Book1 (.xls file) Userform1
189
Properties Tab
Important to keep track of for what the information is
190
Label
Create event-driven programs (much like worksheet
191
192
Continued
ListBox Similar to ComboBox, but multiple columns allowed CheckBox Useful for boolean (true / false) conditions OptionBox Can only choose one option, given multiple Rather than demonstrating the use for each, why not
ub_annual.htm
195
196
Clean out all of the train icon from the station names Current format: 161 St-Yankee Stadium B train icon D train icon 4 train icon New format: 161 St-Yankee Stadium: [B, D, 4]
Created by George Zhao 197
Adding Identifier
Insert blank column to the left of column A Type the borough name, and drag it down until the
end of the list for that borough Since there are only 4 boroughs (Staten Island not part of the Subway system), its faster to do this one-time step completely manually
Need to recognize when to perform tasks manually vs.
198
Off-Align
For each station: Take the numerical data, and move them up one row Delete the row where the data used to reside in Do this for all stations in the borough Do this for the other boroughs
199
200
PsuedoCode
Think from a top-down overview approach: Have a variable that tracks the number of boroughs
processed Run the program in a loop until all 4 boroughs are traversed through:
For each station, copy the numerical data, and then
delete the row with that data End of a section of boroughs is reached when value in column A is empty Delete that row and continue on
Created by George Zhao 201
The Codes
Sub Align() Range("A5").Select boroughsTraversedThrough = 0 Do Until boroughsTraversedThrough = 4 For i = 2 To 9 Range(ActiveCell.Offset(0, i), ActiveCell.Offset(0, i)) = Range(ActiveCell.Offset(1, i), ActiveCell.Offset(1, i)) Next i Rows(ActiveCell.Row + 1).Delete Shift:=xlUp ActiveCell.Offset(1, 0).Select If IsEmpty(ActiveCell.Value) Then Rows(ActiveCell.Row).Delete Shift:=xlUp boroughsTraversedThrough = boroughsTraversedThrough + 1 End If Loop End Sub
Created by George Zhao 202
Processed Data
All entries on one row, with the borough identified Clean transition from borough to borough
203
204
entries
205
InStr Function
We need a function that finds a string (train icon)
train icon", "train icon") returns 25 (first instance) InStr(26, "161 St-Yankee Stadium B train icon D train icon 4 train icon", "train icon") returns 38 (second instance) InStr(52, "161 St-Yankee Stadium B train icon D train icon 4 train icon", "train icon") returns 0 (none found anymore)
Created by George Zhao 206
Mid Function
Similar to substring() function in Java Mid("abcdefg", 2, 3) returns bcd 2 is the start position (starts counting from 1, not 0) 3 is the length to extract Mid("abcdefg", 2) returns bcdefg If the second number is not specified, assumes rest of the string
207
Psuedocode
For all entries (station names) in column B: Start with position 1, and run Instr() function search in for train icon Loop through the text until Instr() function returns 0 Record the letter / number (B, 4, etc), which is always two spaces to the left of the Instr() value given Search again, except starting from the Instr() value + 1, so we can search for subsequent instances of train icon Retain the station name, and add the appropriate brackets syntax
Created by George Zhao 208
The Code
Sub NameProcess() Range("B3").Select Do Until IsEmpty(ActiveCell.Value) fullText = ActiveCell.Value startPosition = 1 bracketedText = "["
209
The Code
While InStr(startPosition, fullText, "train icon") <> 0 result = InStr(startPosition, fullText, "train icon") bracketedText = bracketedText & Mid(fullText, result - 2, 1) & ", " startPosition = result + 1 Wend
210
The Code
stationName = Left(fullText, InStr(fullText, "train icon") - 3) & bracketedText stationName = Left(stationName, Len(stationName) - 2) & "]" ActiveCell.Value = stationName ActiveCell.Offset(1, 0).Select Loop End Sub
211
212
data:
Decreasing ridership in each successive year
Summarize in a condensed table, the percentage of stations in each borough that satisfied this quality
213
Strategy
For stations with increasing ridership in each
successive year:
Value in column G > value in column F
214
AND Operator
In VBA programming: If [condition 1] AND [condition 2] Then But in Excel cells: AND([condition 1], [condition 2] )
215
216
217
Results Table
219
change in ridership from 2011 to 2007 Determine the max and min from the column, and determine the station name for those rows
220
Error Trap
Even with a calculation as simple as percentage
change: (G3-C3)/C3 Needs to take into consideration of bad data Having #DIV/0! Error will not allow for further calculations, such as the max and min =IFERROR((G3-C3)/C3,"")
221
column) Problem is that the lookup value must be in the first column of the search table Alas, the lookup value (% change values) are on the very last column
222
Offset?
Can we locate the row containing the max value, and
then use offset to move over to the correct column? =OFFSET() needs a reference cell Unfortunately, =VLOOKUP() returns only the value, not the reference Need a function that returns a reference in an array, given a lookup value
223
=MATCH()
=MATCH([lookup value], [lookup array], [match
224
min values
Think about it: =OFFSET() needs a baseline reference, and two directions to branch out to =MATCH() needs a data set, and returns a subsequent reference Any way to use both simultaneously?
Created by George Zhao 225
226
What We Want
We want the station name (column B) Recall from previous slide: MATCH(U2,$R$3:$R$423,0)
is the number of row to offset down from R2 What about the number of columns?
-16 (with reference at column R, column B is 16 columns
to the left)
227
was between the user-chosen range Given the stations that fall into this criteria, create a chart sorting the stations by boroughs
Even though the second task did not specify that the
borough be pulled also, it may seem helpful to do that as well, given this final task
Created by George Zhao 228
Userform Design
Which option in the Toolbox would be the most
Label merely with text TextBox Label with an option to run more programs when clicked
229
Userform Design
230
Unbounded Values
If TextBox1.Value is blank, we could set it to 0 Similarly, if TextBox2.Value is blank, set to some
enormous number like 999,999,999 In general, this is not a good practice, especially when we dont know the extent of the data were dealing with But the benefit is simplicity: otherwise we need more If-Else for when they are blank Since we roughly know the range of data here, acceptable to hardcode the values here
Created by George Zhao 232
Unbounded Values
Private Sub Label4_Click() If TextBox1.Value = "" Then Min = 0 Else: Min = 1# * TextBox1.Value End If If TextBox2.Value = "" Then Max = 999999999 Else: Max = 1# * TextBox2.Value End If
Created by George Zhao 233
use IsEmpty in replace of TextBox1.Value = "" By inputting nothing, we entered a blank entry, but not an empty entry Best way to capture blank entry is to test equality to
In 1# * TextBox1.Value, it was necessary to include 1#
* because otherwise, it would treat TextBox1.Value as a text, not number, and make inequity impossible to check
Created by George Zhao 234
Traversing Data
We could use ActiveCell and Offset to traverse down
Range("G3").Select currentRow = 1
Created by George Zhao 235
Traversing Data
Columns("X:Y").Clear Do Until IsEmpty(ActiveCell) If ActiveCell.Value >= Min And ActiveCell.Value <= Max Then Cells(currentRow, 24) = ActiveCell.Offset(0, -6) Cells(currentRow, 25) = ActiveCell.Offset(0, -5) currentRow = currentRow + 1 End If ActiveCell.Offset(1, 0).Select Loop
236
syntax for the auto-fitting After recording the macro, recognize the need to change the parameters to columns X and Y Able to add other customization (font size, cell style, etc) this way (recall the tutorial on Recording Macros)
Created by George Zhao 237
To Run This
Recall these codes are within the userform We need the userform to be displayed We can write a short code for that Then simply assign an object (text box, etc) to run that short code upon being clicked
239
240
Pull Data
Use =COUNTIF() method for each borough Look up the borough names in column X Efficient coding: Were repeating the same operation 4 times, with the only difference of the borough name To avoid writing similar codes, use arrays to store the borough names
241
Create Chart
Record a macro and manually create the desired chart
Ridership between 5000000 and 20000000
2 16 10
The Bronx
Brooklyn
Manhattan Queens
50
242
Recording Macro
Range("AA1:AB4").Select ActiveSheet.Shapes.AddChart.Select ActiveChart.SetSourceData Source:=Range("'Case 7'!$AA$1:$AB$4") ActiveChart.ChartType = xlPie ActiveChart.SeriesCollection(1).Select ActiveChart.SeriesCollection(1).ApplyDataLabels ActiveChart.ChartArea.Select ActiveChart.SetElement (msoElementChartTitleCenteredOverlay) ActiveChart.ChartTitle.Text = "Ridership between " & Min & " and " & Max
Created by George Zhao 243
Assignments
Short mini case studies, demonstrating the use of
Excel and VBA to handle data in different ways Includes function syntaxes as references
More information includes than needed Need to discern which functions will be useful in each
case
sides
Created by George Zhao 245
Assignment 1
At a fair, the participants were asked to input their
name in the first column and their Columbia University UNI on the second column of an Excel document. Write the codes in VBA that can complete the task of automating the process by which the collected UNIs can be bundled together into one text, which can be directly pasted into an email to send out to all participants
246
Assignment 2
the responses were typed up onto column A in an
Excel worksheet. Participants may have used different cases of letters or added spaces between the words, but they refer to same entry. For example, the following entries should all be treated the same: Coldplay, ColdPlay, Cold play, c o Ld p lAy Write the codes in VBA that will create a sorted tally table that displays the distinct entries with their count in the dataset
247
Assignment 3
Suppose that at an institution, the unique ID code for
each student is comprised of two letters (use lower-case in this exercise) of the alphabet, followed by one, two, or three digits of numbers. If two or more digits are present in the ID, the numerical portion of the ID will not begin with 0. Write a VBA that will generate all of the combinations of the IDs in column A, and then randomize the order
248
Final Marks
This workshop was designed to give an overview of
some of the noteworthy functions and the foundations of VBA programming. There are many more functionalities in Excel. One can always learn more by browsing through the functions list or searching for them online. For VBA, one can always record some macros. Having a solid foundation enables one to more easily learn new essentials.
249