VBA, Macros, and User-Defined Functions: Fixing Messy Formatting With VBA
VBA, Macros, and User-Defined Functions: Fixing Messy Formatting With VBA
VBA, Macros, and User-Defined Functions: Fixing Messy Formatting With VBA
Welcome to our fourth lesson in this module on how to use VBA, macros, and user-defined
functions to customize Excel. This time around, we're going to be extending some of the
concepts that we learned previously, specifically with how to enter visual basic code and how to
set up and edit macros and also user-defined functions within VBA.
You can see all the shortcuts for these over here on the left-hand side, in case you need to a
refresher or reminder, and we're going to be applying these to a more specialized case, which is
how you actually use them to fix formatting, and specifically here, the task at hand is to fix
these phone numbers. You can see how somehow we've imported them or gotten them from
another source and these are all wrong, because some of these phone numbers have dots,
periods in the middle between numbers, others have spaces, others have parentheses, others
are just numbers altogether.
[01:00]
So we have a wide variety of different formats here and it is not at all what we want, because
we want them to be ideally formatted using one standard, so we can go in and easily apply, for
example, the special number format and go to phone number here, and then apply that. So
what I'd ideally like to have all these formatted the same way, but they're not, and we can use
VBA and user-defined functions to fix this.
Now you might look at this and say, Okay, well, this is kind of a ridiculous scenario, because
how could you ever bother with something so trivial when you're working at a bank or a large
financial investment firm? And the short answer is that you spend a lot more time on tasks like
this than you think, especially when you're working with outside clients, or even if you're on the
buy side at a hedge fund or private equity firm, and you're evaluating a company as an
investment.
You are going to deal with a lot of horribly formatted data that is completely out of place, and
you have to know how to format it properly, if you want to make any type of progress in
analyzing the company. This example of course, these phone numbers aren't really critical to
our financial analysis of the company, but this is really just to illustrate the point, and to show
you how you might apply it to this one specific scenario.
[02:00]
http://breakingintowallstreet.com
Now you might also ask yourself, Wait a minute, why do we need to do this at all, because we
already learned the text to columns function? If you remember that, let me just highlight this
phone numbers column and if we go to ALT + AE for text to columns. Remember, we went
through this whole exercise in some of the formatting lessons in that module, and we learned
how you could separate for example numbers and fields like this, if we use a space for the
delimiter and you can see how it breaks it up properly. So why not just go through this and use
that instead?
And the short answer is that we could do that. It would just take a lot more time and it would
take several extra steps, because what we'd have to do is for each different set of data that's
formatted differently here, we'd have to go in and apply one style, one set of instructions for
text to column for this set, and then another one for the one that has periods in between, and
then other formats for the ones that have spaces in between, and then still others for things
that have no spaces, no periods, no hyphens, nothing like that at all.
[03:00]
So we could do it that way, but by using VBA and user-defined functions it is a real, real time
saver. Now before you get started and before you get started with this exercise a couple of
requirements. For the formatting of these cells, you should have these formatted in the general
format, so if you highlight this whole column and go to CTRL + 1. You want the general
category over here you do not want these to be formatted as numbers, or text, or special or
anything else. You want them to be general this is really the best way to insure the function
works properly.
Also as always, to create these user-defined functions, we're going to have to go to ALT + F11
and then insert new module, and we've already been over this. We saw examples of how this
worked last time around. A few other things to mention, in terms of useful VBA functions here,
so LEN, MID, LEFT, RIGHT, REPLACE, those all work exactly the same way as the equivalent
functions worked for text manipulation.
[04:00]
So remember in the lesson on text manipulation, we learned the LEN function for determining
the length of text. We learned the MID function, which basically lets us retrieve characters from
the middle of text, and then we learned LEFT, which lets us return specified number of
http://breakingintowallstreet.com
characters at the start of a text stream, and RIGHT does something similar, except starting from
the right side, the end of a string instead.
Replace also works the same way. Format is pretty much like the text function that we learned
there, so these are a few useful functions. I'm listing them here, because we are going to be
using them in VBA with this function, and they work the same way, it's just that the syntax, how
you write them, is a little bit different. Really, the only new function here is the ASC function,
which stands for ASCI, and this is actually
Okay, this is not even a function that works in Excel normally, but essentially what this does is it
converts text to a number, and specifically what happens is if you just enter normal text, so if
you just have something like normal text, it'll actually convert this to a specific number, if you
just have a single character. The ASC function will convert that to a specific number that
corresponds to the number code for this text in Excel.
[05:00]
But what happens if you enter something like 87, well as I say here it's going to change that text
of 87, into the number 87, so now you can add, subtract, multiply, and divide. So let's go in and
see firsthand how to do this, how we can use VBA to accomplish this task. Now before we get
started, I'm going to create an extra column here, so CTRL + spacebar, ALT + IC or CTRL +
SHFT + plus, and I can say new phone.
Okay, so let's think about how this function works. I'm going to press ALT + F11 to go into our
VBA editor and shrink this window, so we can see it in the recording window, and then module.
So we already have four modules from some of the macros that we created before. What I
normally like to do is if we're creating a new set of functions, I like to create a new module
here. And these are essentially just places to store some of your own user-defined functions,
some of your own VBA and macros that you have from previous sessions and previous files in
Excel.
[06:00]
So I'm going to right click this and go to insert, and then module. Then let's call this function fix
US phone number, and we're going to be accepting a phone number as text, so phone number
as string. That just forces Excel to interpret whatever we pass into this function as a string, as
text as opposed to interpreting it accidentally, as a number or a date or something like that.
http://breakingintowallstreet.com
And then I'm going to say, as string at the end. Why am I saying as string at the end? Well,
remember, at the very end of the function, we are going to be sending something back to Excel.
We want to ensure that we are always sending back text as opposed to something formatted as
a date or a number, or an integer or something else like that. So we're forcing it to always
return it in the form of normal text.
So here's our function. Now as you can see, since we're not calling this sub, this really is a userdefined function. As a result we're not going to be able to apply formatting or to be able to shift
around columns or anything like that, but we will be able to perform a lot of operations on the
data that we pass into this.
[07:00]
So I'm going to start by declaring a variable up here, and I'm going to call this number, so far as
string. So this is essentially a storage variable for the phone number as we begin assembling it.
Now to begin thinking about how to actually construct this function, you have to look at what's
wrong with this data, so a few observations first. Every single phone number on here contains
exactly 10 numbers, so these are all US phone numbers.
We have the area code in the beginning, and then over here at the end, we have the more
specific numbers, the seven numbers that actually identify who you are. So for each of these
they're written in different ways, but you can go through and count yourself, all of these are
going to have exactly seven numbers, and so our task is to go through and throw out anything
that's not a number and then reassemble this into something that is in the format that we
want. What is the format that we actually want here?
[08:00]
Well, over here on the next tab over international numbers, I'm just going to copy over the
proper formatting here to show you. So we want it to have the country code in front, the plus
one right here for the US, and then we want the area code in parentheses and then the rest we
want... I'm actually going to change my mind about that one and include a dash in between, so
we want to have a dash in between those two lines apart, so this is our ideal format. So we
have to get to the task of extracting each of the numbers here and then making sure it shows
up properly in this format. How can we do that?
http://breakingintowallstreet.com
Well, think about for a second, I'm going to move this down a little bit, so you can see both the
data and then the function, as we write it here. Think about what we have to do. What we
really have to do here is cycle through each character of this phone number. Check whether or
not it a number or whether it is a special character such as parentheses, or a period, or a space
or something like that, and then we have to take all those numbers and then apply the correct
format within Excel.
[09:00]
So let's start by writing a loop. So For i Equals 1 to LEN, and I'll just move this up a little bit, so
you can see this better. Essentially, this is telling Excel, I'm going to put a next at the bottom so
we don't forget about that. This is telling Excel, okay for this one at the top, 467-447-7430, let's
cycle through each of the different characters here and for each of them, we are going to
perform a different operation or sometimes maybe nothing at all.
So here's what we're going to say for this. What we want to say is, Ill go up one space here, so
IF ASC, remember this is going to convert it into a number, MID phone number, i1. So what is
this part doing? Well, the MID function, remember what this does. It takes the text that we
want to retrieve our character form, we're starting at position i, and then we want to take
exactly one character starting at position i from this.
[10:00]
So initially how is this going to work? Well, it's going to start at the four right here, and for this
first run through this loop, the ASC mid phone number one-one function, that's really just going
to get us the four, except here the four is written as part of text. What this is going to do, this
function by putting ASC in front, its actually going to convert it to a number instead, and that's
what it's going to retrieve and return to us. So we have this. And I'm going to say, IF that is great
than or equal to ASC zero so essentially what we're doing is converting this.
We have converted this part into a number, and we're also converting the zero here into a
number, because we want to make sure that we have the correct numerical code within Excel,
and VBA that represents literally the number zero, so we have that. And then we're going to do
that same thing and make sure it's also less than or equal to nine, so we want to ensure that we
are only looking at characters here that have ASCI values that correspond to the numbers
between 0 and 9.
http://breakingintowallstreet.com
[11:00]
So what I can do to be lazy, is I'm just going to copy over this function over here, since we
already have it. Let me move this over a little bit, so you can see this better. Okay, so we have
this and now we're going to say, IF this is less than or equal to ASC 9. So essentially, this whole
thing looks complicated, but it's really not if you break down the individual parts. Really were
just checking to see if the value of the character that we're on, as we cycle through and loop
through this string, we want to make sure that's between 0 and 9.
If it is, then we are going to take our numbers so far, and we're going to add in whatever this
character is. So number so far, so we're taking whatever is existing there right now and then we
are going to add in the character that we're on. I'm going to say mid phone number I1 and this
corresponds, of course, to the current character.
[12:00]
Now one improvement to this function that you could make is mid phone number I1, looks a
little bit intimidating. Someone who's new to VBA or Excel might not know exactly what that
means, so one thing that you could do here to make it a little bit better is to do the following.
You could say DIM current character or current CHAR as string, and then we could say current
CHAR equals mid phone number I1.
And that makes it a little bit more clear for what exactly this is, and then of course, we could go
in and actually rewrite this, and we could say current character, so Im just going through and
doing that. So again, it doesn't really make it better or worse, it's just a little bit more
understandable to someone who's new to VBA and programming in Excel, if you write it like
this.
[13:00]
But that's essentially all we're doing, and with that in place we can actually say ENDIF and we're
almost done with this function now. What I want to do now is get outside of this loop, and the
last thing I want to do here is say fix US phone number. Remember we're calling it this, because
we want it to correspond to the function title itself up here, so we're making our last variable
the last variable we're assigning, the same as the function name. We're going to say that equals
and I'm going to pull this up a little bit, so you can see this better onscreen. That's equal to
format number so far, and what type of format do we want for this number?
http://breakingintowallstreet.com
Well, remember what I inserted into Excel, we want plus one space parentheses three digits
parentheses space three digits dash, and then four digits over there. So we're going to say plus
one and then parentheses zero, zero, zero. Excel is smart enough to know that by entering
zeros there, we don't literally want zeros.
[14:00]
What we really want is, just as with custom number formats, which we learned in the
formatting module, what we really want is to use the digits that are part of this string, and we
want it to insert the first three digits right here, in between these parentheses. And then we
can have a space and then zero, zero, zero, dash, zero, zero, zero, zero, four and we have that,
and so we are pretty much done. That's how the function works. We'll test it out in a bit and
see if this actually works properly.
You may have a few questions on this, so I encourage you if anything is unclear leave a
comment below this video, leave a question, and Ill be happy to answer anything that you're
confused about. One point of confusion, looking back at this now may be Wait a minute, for
this number so far. The part right here plus current character, why is it that we have to convert
this into a number up here, by using the ASC function, when we're just leaving it alone here?
And the answer is, remember, we want our current text here, so this is called a string in
programming language and in VBA. So we want to return this as an actual block of text.
[15:00]
If we were to convert this, and if we made, numbers so far an actual number or if we made
current character an actual number this would not work correctly, because now we'd be
building an actual number and each time we'd loop through this, we'd be literally adding these
numbers. So if we got zero two and three for the first three digits, we would not return this as
023, we would literally say zero plus two, plus three equals five, and the numbers so far would
turn into five, and it just would not work correctly.
So we want to literally be returning the text as opposed to the number. That's the distinction
and that is why we're not using the ASC function for this part, but why we are using it for the
check up here, because in this case we do want to check this against normal numbers. So we
have this in place. I'm going to press ALT + Q now to close out of this, and moment of truth;
http://breakingintowallstreet.com
let's see if this actually works. So I'll enter, fix US phone number and for the input, let's take this
phone number and let's see what happens.
[16:00]
Okay, it looks promising. Let's do the usual thing and copy this down with ALT + ESF or CTRL +
ALT + V, or CMD + CTRL + V on the Mac, and then F for formulas, or CMD + F on the Mac.
And boom, here we go, so we have done this correctly. We have formatted our phone numbers
properly and that's pretty much what I wanted to show you in this lesson. Just a very
specialized use case for how to combine some of the features in VBA, user-defined functions,
and also text manipulation, to fix the phone numbers in a messy, formatted file like this.
What I can do now is just copy and paste values to these, so SHFT + CTRL + down arrow key,
CTRL + C, and then ALT + ESV, or CMD + CTRL + V, CMD + V on the Mac to paste these as
values, and then I can copy and paste these over here. And then we don't even need this
columns anymore, so CTRL + spacebar, CTRL + minus sign to delete it. And now we have fixed
our phone numbers. I could press ALT + OCA, but it's already the appropriate width there, and
we can get rid of this improper formatting.
[17:00]
So that is how we'd go through and fix the phone numbers in Excel. Now what I want you do for
your exercise is based on what we just did here. I want you go to over this other tab for
international numbers. It's called Intel international-numbers. And you can see the twist that
we're going to be doing something similar, except now we have to fix both US and UK numbers
And to make things more fun, take a look at this data. Looking at this the basic idea is that
anything here you can sort of tell in advance what the difference is.
Normal standard digits, phone numbers like this, the 376-443-5911, that's clearly a US number.
On the other hand, anything with a 020 in front, the 020 signifies a London city code, within the
UK, so that corresponds to a UK number for certain. But as you go down you can see that this
formatting sort of breaks down, because in some cases we have parentheses and in some cases
we have spaces, and in some cases we have dashes, and so on and so forth.
[18:00]
http://breakingintowallstreet.com
So, your task here is we can actually re-use a lot of the functions that we entered before, but
you'll have to modify this slightly, because of the following. First off, we have to tell the
difference between UK and US phone numbers, and then we also have to format them
differently. The UK numbers where we want to have the +44 country code in front, and then
you can see the formatting for yourself right here.
That essentially we want to have a space and then the parentheses with the three digits for the
local code, another space and then four digits, a dash, and then another four digits. Whereas
US numbers are just slightly different, they're set up differently. You can take a look at the
formatting for yourself right here. And then we also want to add another user-defined function
to just simply return the country code, based on the number over here. Now that's the easy
part, because really all you're doing is checking to see, Okay, do we have a +1 or a +44 in front.
So that's the easy part. But what I want you to do here is really thing about how to tell the
difference between these and how to modify the function. As I say here, you should not assume
that the 020 always indicates the UK.
[19:00]
The issue with that is if you think about it, what if we have a US number that has a 020, in front
here? Now I'm not sure if that actually corresponds to a certain region in the US, it may not. But
you could have that case. So the point is that that's not necessarily the best way to tell them
apart. Instead you want to try to find something else, and this applies by the way, not only for
this set of data and this exercise, but just in general when you're writing these formatting
functions in VBA and creating your own functions. You want to look at the data very carefully to
tell what kind of conditions you want to set up.
There is no universal function that will always fix all phone numbers or all other data for you. It
really depends on what the data looks like and what the problems with it are. So you always
have to look at the underlying data, before you jump in and complete an exercise like this. In
any case though, I want you to try this yourself. Pause the video right now and I want you to go
into VBA and create a new function, call it just Fix Phone Number or anything else you want
besides Fix US Phone Number, and go in and try yourself.
[20:00]
http://breakingintowallstreet.com
Modify the function as necessary, and then write those two other functions or really the one
other function to determine what country you're in. So pause the video right now, give it a shot
yourself. When you're done, or you want to check your work un-pause it and then we'll walk
through it together. Okay, good. So let's go through this. We're going to start off in VBA
actually, and let's start by actually just copying our function up here, but changing around a few
things. So Ill just press enter a few times to add in some spaces, and I'm going to call this
function Fix Phone Number instead, just so it's called something different.
We don't want duplicate function names within VBA, and within these user-defined functions.
So let's think about how this works and what we need to actually do differently here. So this
whole first part, where we pass in the phone number as the input into the function, this can
pretty much stay the same. We're not going to change anything here, because what's relevant
about this function is how it extracts only the actual numbers from whatever text we pass into
it.
[21:00]
It doesn't matter how many times we loop through it or how long the text is or anything like
that, all that matters is getting the text in the first place. So bottom line is what's going to
change is this last part. We have it right now set to always format it as a US phone number, but
we need to build-in the option to format it as a UK-style phone number, as well. So let's go back
and take a look at the phone data here. How can we tell these apart? Well, I already told you
that the 020 in the beginning is not the best way to do it.
In this case, really the best way to tell them apart is the length. If you look at this, the US
numbers always have 10 digits, so 123-456-78910; whereas the UK numbers always have 11
digits. So we have 1234-5678-91011, right there. So that's really the best way to tell them apart
here. Now depending on your dataset, you may have scenarios that are much more difficult
than this one.
[22:00]
You may have scenarios where some of the countries will have country codes; some do not
have country codes. And as you can predict in advance, some of that is going to get very
complicated, you're going to have to check a lot of different conditions. But for our purposes
http://breakingintowallstreet.com
for this one, we're going to tell these apart by length, which is a pretty common way to tell
apart phone numbers from different countries, different regions in some cases.
If you had a dataset where you had, for example, some numbers that had country codes, some
that didn't, well what you'd have to there is actually look at the country code, and then if it has
a country code, figure out what it is. If it's a +44, or a +1, or a +75, or +86 for China or whatever
it actually is. Figure out what the country code is and then change around the formatting for
the text based on that.
If on the other hand, you have a mix of both, well that's where it gets tricky and you have to
split it apart in different cases, and do one case for country codes, one case for without country
codes, and take it from there. But you really have to look at the data and break down all the
different types of phone numbers first, and how to distinguish them before doing this.
[23:00]
So here's what we're going to do. I'm going to say IF length of the number so far, if that's
greater than ten, then this will be the UK phone number, because we saw that those always
have 11 characters, otherwise we are just going to apply our US formatting down here, so we
have that, and this is pretty much what I had laid out previously, so this is fine. Now for the UK
formatting, let's just copy and paste this function, and then think about how to do it.
So for this one we want to have that +44 country code in front, so we have that right there, and
then for the rest of it, really most of it is very similar. We have three digits right here, in this
case for this set of data, it's really for the London code of 020, and then this next part we should
have four zeros, instead of three zeros. But other than that it's fairly similar, so this is what our
function looks like.
[24:00]
Now let's go in and actually test this out, so ALT + Q to close this, and let's just try this. So fix
phone number, we have that. And again, when I press equals, I can go down using the arrow
keys, and then press tab to auto fill that in. Moment of truth, lets see what happens. Okay, so it
looks like it works for US numbers. Let's copy and paste this down with CTRL + C, CTRL + V
here, and it looks like it's also working for the UK numbers properly, so it's properly
distinguishing between those.
http://breakingintowallstreet.com
Let's copy and paste this all the way down, just to verify that it actually works properly, and it
looks like it does. I'm just kind of scrolling through very quickly and it looks like for the most
part it's telling, really for all of these I shouldn't say for the most part, for all of these it's telling
the difference between US versus UK numbers, so that is in place, and what we can do now
actually is copy and paste these as values.
[25:00]
Actually, I'm not going to do that. In this case, I'm going to leave in both sets of data, phone
number and then new number, just for our own reference, and just so you have a reminder of
how a user-defined function like this works. Now for the country, this part is very easy, so let's
think about how we might do this. Really all we have to do is check the first three characters
and see if it's a +44. So let's go back in to VBA by pressing ALT + F11, and let's have a function
here, function which country phone number as string and then say as string for this.
Okay, so we're passing in our phone number and I'm going to declare a variable, DIM country
code as string, and then we're going to set country code equal to the three leftmost characters
on our phone numbers. So I'm going to use the LEFT function, same exact function you use
normally in Excel when you're manipulating text, except here we're using it in the context of a
programming language instead. So I'm going to say LEFT phone number and then three.
[26:00]
So for this function I'm going to say IF country code equals double quotes +44, then I'm going to
say which country equals UK otherwise, we're going to say which country equals US, and then
ENDIF. So it's pretty straightforward. There really isn't too much to this function. As I said, this
would be the easy part you don't have to do too much thinking here. But in any case, let's just
go in and try this function.
Really the only part here that you didn't already know before starting this lesson was this bit
about how you use the LEFT, RIGHT, MID, and other functions to manipulate text within VBA, as
well. So ALT + Q to close this, and let's just enter this now, which country and let's make sure
we always refer to the new number, so let's copy and paste this down. And here, it looks like it
is always correctly returning US or UK. It's a little hard to distinguish, because they look similar
when printed and written out like this, but it looks like it is working correctly.
[27:00]
http://breakingintowallstreet.com
So that's it for our lesson on how to use VBA and user-defined functions to manipulate data and
to format phone numbers properly and go from a messy dataset, to a better one in Excel. As I
said in real life, this can get more complicated, because you can have data that's much messier
than anything you saw here. For example, you could have data that mixes country codes and
some that doesn't have the country codes.
Some that has regional codes, or city codes, or extra things, or leading or trailing zeros. The
bottom line is that it can get more complicated, but if you really break it down and look at all
the different types of phone numbers, and how they should be formatted you can always come
up with a set of IF statements, a set of cases and conditions that will allow you to fix this, even
if it takes some extra work.
Even if you have to do a lot of extra conditional checking and extra loops to check phone
numbers and to format them properly. So that's it for this lesson. I hope you got a lot out of it,
and see a more specialized way that you could actually use this. Again, this is going to come up
all the time when you're working with data that clients send you, when you're evaluating
companies, and you need to format and fix data properly.
[28:00]
So it's valuable and it's also fairly simple. It's a really good demonstration of the power of userdefined functions, as we have them right here. So that's it for this lesson. Coming up next we're
going to be jumping into dynamic and interactive charts with form controls. You're going to
learn how to place buttons, and check boxes, and radio buttons, and other features like that
into you Excel spreadsheets, to allow more flexibility with displaying charts and with some of
the other data and the visualizations of that data within Excel.
http://breakingintowallstreet.com