Automate the Web - create web automation bots with Excel & VBA
Automate the Web - create web automation bots with Excel & VBA
TL;DR »
Let’s start by building a basic web bot to open a search page, submit a search request,
evaluate and scrape the results page, and return the results to Excel.
1. Excel Basics – but first, if you’re new to Excel you should start here.
automatetheweb.net 1/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
2. VBA Basics – it would also help to know some VBA fundamentals (eg. what’s a
variable, what’s a subroutine, for-next loops, the VBA Editor, etc) — these courses are
good.
3. HTML and the DOM – and for creating web-bots, you should understand basic HTML
structure – how a webpage is laid out – the Document Object Model.
Ok, you’re grounded in the basics. Now let’s jump into building a simple bot to search for a
keyword and return the results…
Enter a keyword – let’s say auto parts – in cell A2 , and a location – let’s say Houston TX
– in cell C1 .
Click the Developer tab (not showing?), and click Visual Basic .
automatetheweb.net 2/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
Then click Tools › References , and add the reference libraries Microsoft HTML Object
Library and Microsoft Internet Controls …
Next, in the Visual Basic Editor (VBE), click View › Project Explorer .
automatetheweb.net 3/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("search_form_input_homepage").Value = _
Sheets("Sheet1").Range("A2").Value & " in " & Sheets("Sheet1").Range("C
automatetheweb.net 4/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
'for each <a> element in the collection of objects with class of 'result_
For Each aEle In objIE.document.getElementsByClassName("result__a")
'...get the href link and print it to the sheet in col C, row y
result = aEle
Sheets("Sheet1").Range("C" & y).Value = result
'...get the text within the element and print it to the sheet in col
Sheets("Sheet1").Range("D" & y).Value = aEle.innerText
Debug.Print aEle.innerText
Your module should look like the image below. To run this subroutine you’d press the green
play button or just press F5 . But first, let’s have a closer look. Click the first line, Sub
SearchBot() . Then press F8 to begin stepping through the code line-by-line (also called
debugging).
automatetheweb.net 5/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
Each time you press F8 the line of code highlighted in yellow will execute, and then the
next line down of code will turn yellow and wait for your key press. Say you change the
code in a line and want to re-run it… you’d move the yellow line back up by dragging the
yellow arrow with your mouse, or by clicking the line you want and then pressing
CTRL + F9
Notice that after executing the line objIE.Visible = True , Internet Explorer opens? Now
position all 3 applications (Excel, VBE, and IE) on the screen so that each is visible, and
continue stepping through the code using F8 .
automatetheweb.net 6/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
you want. CamelCase naming convention, with first letters capitalized, is typical for naming
subroutines and functions.
Think of variables (like objIE) as floating references or containers that refer to objects or
numbers we want to manipulate. But before we get to use them, we have to declare, or
Dim , them, which sets aside some memory for whatever kind of use we have in mind for
them. To manipulate Internet Explorer as an object in VBA, we need to Dim the browser —
either generically as an Object variable (an example of late binding objects), or more
specifically as an InternetExplorer special object variable like we did here (an example
of early binding objects).
Here we reserve aEle as a different type of object variable. Specifically, we early bind it for
use as a webpage link (the <a> tags on a webpage). The other way would be to declare it
more generically as Dim aEle as Object then later put it to use in one or more specific
roles as an object — either way works. Once again, I could choose to call it anything, even
mySuperHrefThingy, but I like the simplicity of aEle because it’s short and makes me think
<a> element.
Dim y as Integer
Let’s use the variable y to refer to the row number on the Excel sheet where we’ll have the
bot put our search results. Integer type variables in VBA are round numbers between
automatetheweb.net 8/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
Let’s look at the code line by line. Watch the video too if you want.
The first thing you’ll notice are lots of comments I added to the code. VBA ignores lines
beginning with a single quote ( ' ), highlighting them green, as comments. Adding your own
personal comments to your code is an especially good idea for beginners, or for more
complex code that requires a memory-jogger when you go back to it.
Sub SearchBot()
This tells VBA we are starting a new subroutine, or Procedure . A module can contain
subroutines ( Sub … End Sub ) and functions ( Function … End Function ). In the video,
we start the subroutine by putting the cursor within the code and pressing F5 (run) or F8
(execute line-by-line). But usually a Sub or Function is called by another Sub or
Function , or by a launch button. The name of a Sub or Function can be almost anything
automatetheweb.net 7/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
-32,768 and 32,768. If we needed a variable to hold a value greater than 32,768 we’d Dim
y as Long instead. Or if we needed it to have decimals we’d use Dim y as Double .
For each aEle we find, we’ll copy its href value over to a string (alphanumeric text) variable
called result, then print the string to the Excel sheet. Ok, I admit we don’t really need this –
we could just print aEle’s value to the sheet directly– but I added it to introduce you to
string variables.
Setting an object instantiates or activates a new instance of its object type. So we’re saying
let’s start a new instance of the IE browser.
objIE.Visible = True
Awesome – your bot just opened your IE browser! Properties (characteristics, such as
making IE Visible) and Methods (actions) are invoked after typing a dot ( . ) after an object,
like you see here. In your VBE, delete this line and start retyping it. You’ll notice after typing
objIE. that the Intellisense helper pops up, listing all the properties and methods
available for the InternetExplorer object. Go ahead and scroll through the list.
objIE.navigate "https://duckduckgo.com"
The .navigate event tells IE to navigate to whatever webpage is within the quotes. Yay
automation time!
But the code wants to run all at once, in a millisecond. By the time IE actually loads the
page of search results, the entire subroutine is long over. We have to slow things down and
wait for the browser while its navigating. We could use something like
Application.Wait(Now + TimeValue("00:00:05")) to wait 5 seconds, but if the page
loads faster we’re wasting time waiting for nothing, and if the page takes longer we’re in
the same pickle as before. Do While objIE.Busy = True Or objIE.readyState <> 4:
DoEvents: Loop is an elegant Do...Loop solution consisting of 3 VBA statements we’ve
tidied up on one line, using colons ( : ) to connect them. It’s saying: while IE is busy
navigating, just sit here and run in circles. objIE.Busy = True and objIE.readyState <>
4 ( <> means not equal) are essentially the same thing –one could probably be ommitted–
automatetheweb.net 9/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
but just to make sure the page is really done loading we make both of these conditions
requirements for getting out of the loop, by using the logical operator Or . The DoEvents
part is technically not needed but we add it in the loop to free up processing resources and
allow other things to happen within Excel during this time, asynchronous to the running
loop. For the over-achievers out there, here are some other ways we could have written
this line of code to do the same thing.
objIE.document.getElementById("search_form_input_homepage").Value = _
Sheets("Sheet1").Range("A2").Value & " in " &
Sheets("Sheet1").Range("C1").Value
This VBA statement says: Find the search box element on the webpage by its ID , which
is search_form_input_homepage , and make its text value what we have in cell A2 , plus
the string " in ", plus what we have in cell C1 . The underscore ( _ ) just lets us put longer
pieces of code on multiple lines for better readability. How did I know what the search
input box’s ID was, or that it even had an ID ? Firebug of course! If you don’t have
Firefox get it, and then install the Firebug add-on. Then right-click the search box, click
Inspect Element with Firebug , and behold…
automatetheweb.net 10/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
automatetheweb.net 11/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
…the line of code is highlighted in the Firebug console (I added the red box around the
ID ). The image above reminds us a webpage is organized in a nested, tree-like structure,
starting with .document as the root object and branching out from there into smaller
objects. This is called the Document Object Model. .getElementById("__") is the go-to
tool for grabbing one of those objects from the webpage. With it you can scrape data from
a page, as well as writing to a webpage as we’re doing here. Here’s a sneak peak at some
other ways we could have found the search box and entered data into it (shown in the
order I would have tried them, if an ID was not available).
objIE.document.getElementById("search_button_homepage").Click
I used Firebug to find the ID of the search button (magnifying glass) on the page, then
used .getElementById again to grab it, and the .Click method to click it. Alternatively,
we could have submitted the search box form like this instead
objIE.document.getElementById("search_form_homepage").Submit . Actually, there’s a
simpler method still that would allow us to skip the previous 3 steps altogether. Can you
see it? Hint: take a look at the url we ended up with… https://duckduckgo.com/?
q=auto+parts+in+Houston+TX. Got it? If you still don’t see it, and you don’t think you’ll be
able to sleep tonight, email me 🙂
We need to wait for the browser for a second (an eternity in CPU land) while the search is
running.
y = 2
Now we have a page full of search results. We’ll put them in column C , but we need to
start in row 2 since C1 contains our heading Houston TX.
This is the opening statement in our action-packed For...Next loop, where the procedure
seeks out each aEle (link) object that’s found in the collection of all objects on the page
that have a class of result__a (again I used Firebug to locate this particular class name to
automatetheweb.net 12/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
target). For each match found, we temporarily assign it to the aEle variable and do the
following with it…
result = aEle
Sheets("Sheet1").Range("C" & y).Value = result
We take the found <a> element ( aEle ) and assign its value to a string variable, result .
Ok, we really didn’t need to do this… we could just work with aEle directly without passing
it around… I just wanted to demonstrate what a string variable is! Then we print the string
to the sheet in column C , row y .
Let’s say we also grab the anchor text for each link and plop that into the cell to the right.
The .innerText property (or .textContent ) gets us that text. Side-note: I’ve built many
bots with the code If InStr(objIE.document.body.innerText, "some text") > 0 then
... where I needed to know if the web page had some text in it somewhere.
Debug.Print aEle.innerText
We’ll also print this text in the debugging console, the Immediate Window , as well (note:
you may have to press CTRL + G to show the Immediate Window , or open it from the
View menu).
Let’s say we want to single out Yellow Pages links. InStr looks for one string inside
another, using the format InStr(haystack, needle) and returns the numerical starting
point of the first occurence of needle, or 0 if not found. If ‘yp’ or ‘yellowpages’ is found in a
link we make the interior of the cell red with the .ColorIndex VBA property, and place a ‘1’
in the cell to the left. FYI, short and simple If...Then statements, like this one: If x = y
Then Debug.print "Yo, x is the same as y, bro!" , can be written on one line with the
closing End If ommitted. If...Then statements are the backbone of any great A.I. …
Watson, Wall-E, and now SearchBot()!
y = y + 1
automatetheweb.net 13/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
Now we increment y by 1, so each new result found will be printed on the next row down.
Next
Sheets("Sheet1").Range("B1").Value = _
Application.WorksheetFunction.Sum(Sheets("Sheet1").Range("B2:B100"))
Since we put a 1 in row B each time we found a Yellowpages link, we can now just add
them up, making cell B1 display the total. Alternatively, we could have used another
For...Next loop to look at each cell in column B one at a time, adding up all the 1’s. But
using Excel’s powerful built-in WorksheetFunction ‘s whenever possible is better.
objIE.Quit
End Sub
Finally, let’s save the file. We need to save the file with the extension .xlsm , since this is
most definitely a macro-enabled workbook.
automatetheweb.net 14/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
Congratulations! You built your first bot, and in the process learned a good deal about the
most common VBA commands used to interact with and automate the web.
To download this Excel VBA project: xl-searchbot.xlsm (keyword search & scrape
bot)
follow:
Automate the Web / Proudly powered by WordPress
Bots should never be created that harm or impede the functionality of a website. By using this site you agree to these
terms.
automatetheweb.net 15/16
8/19/2020 Automate the Web - create web automation bots with Excel & VBA
automatetheweb.net 16/16