BGP Best Path Selection Algorithm With Examples
BGP Best Path Selection Algorithm With Examples
BGP is the protocol used to announce prefixes throughout the internet. It’s a very robust
protocol, and very useful to carry lot of prefixes, such as the Internet prefixes or internal
client prefixes of an ISP.
When a prefix is received in BGP, the path passes through two steps before being chosen as
candidate to populate the RIB.
The first step consists on checking if the path is valid. If it is, the prefix will get into the
BGP table, and later the second step of selection will start.
In order to pass this first check, the path must meet the following requirements:
In the second step, the best path to reach the prefix is selected. If there is only one path,
no comparison needed. If there are many paths to reach the prefix, there is a special algorithm
that BGP uses to select the best path, and this is what I want to talk about.
Let’s study points 1 through 8 and how we can influence them within the following lab. The
prefix we are going to be working with is 100.100.100.0/24, announced by R4 and R6:
Weight is a Cisco-specific attribute, that means it’s not standard. This attribute is local to the
router on witch it’s configured, so it’s not advertised with the prefix to other peers. This
attribute is used to tell the router which path to use to reach the prefix. The highest value
wins.
It’s the first attribute checked by BGP, so if there are two different paths for the same prefix
but with different Weight values, the path with the highest value wins.
In the lab scenario, R4 and R6 both announce the prefix 100.100.100.0/24, one through an
eBGP session and other through an iBGP session. Let’s check how R2 and R1 see this prefix
without changing anything:
R2#show ip bgp
BGP table version is 3, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 100.100.100.0/24 4.4.4.4 0 0 65002 i
*>i 6.6.6.6 0 100 0 i
R2 gets two paths for the prefix 100.100.100.0/24: one of them from an eBGP peer and the
other one from an iBGP peer. So R2 doesn’t choose the path through the eBGP peer, as we
could think initially as the Administrative Distance for eBGP is less than for iBGP, but that’s
not what really happens.
R2 picks the one from the iBGP peer as the best one, because as we will see later, it’s the
one with the shortest AS_PATH length. Both paths (through R4 and through R6) have the
same weight, local-preference and route origin. So the tie-breaker is the shorter AS_PATH,
that is the path through R6.
Let’s see what happens when the weight parameter is configured on R2:
R2#conf term
R2(config)#router bgp 65001
R2(config-router)#neig 4.4.4.4 weight 200
R2(config-router)#end
R2#clear ip bgp 4.4.4.4
R2#sh ip bgp
BGP table version is 4, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 100.100.100.0/24 4.4.4.4 0 200 65002 i
* i 6.6.6.6 0 100 0 i
Now R2 takes the path through R4. And it announces this path to R1 as its own choice, but
we said the weight attribute is not attached to the prefix, so if R1 had a BGP session with R6,
it would prefer the path through R6 as R2 did at the beginning.
Let’s build this BGP session between R1 and R6, and let’s see which path R1 chooses:
Although R2 prefers the path through R4, R1 prefers the path through R6 because it has a
shorter AS_PATH.
So as I said before, the weight attribute only has local significance, and it’s not attached to
the prefix when announced via BGP.
When all the paths to the destination have the same weight value, the next attribute to be
checked is Local-Preference.
Local-preference is a standard attribute, and it’s transmitted only between iBGP peers.
This parameter is set to outgoing or incoming prefixes by using a route-map with the peer. If
there isn’t any statement matching a specific prefix inside the route-map, the local-preference
is set for all the prefixes outgoing or incoming for that peer. The highest value wins.
Let’s get back to the original scenario. R4, R3, and R6 are announcing the same
100.100.100.0/24 prefix. But, R3 is announcing this prefix with a local-preference of 150:
R2#sh ip bgp
BGP table version is 7, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24 3.3.3.3 0 150 0 i
* 4.4.4.4 0 0 65002 i
* i 6.6.6.6 0 100 0 i
It makes R2 select the path through R3 as the best choice, and announce this choice to other
iBGP neighbors, as we can see in R1:
R1#sh ip bgp 100.100.100.0/24
BGP routing table entry for 100.100.100.0/24, version 17
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
3.3.3.3 (metric 11) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 150, valid, internal, best
Originator: 3.3.3.3, Cluster list: 2.2.2.2
In order to change this decision, we can configure a route-map in R2 with a higher local-
preference value and apply it to the session with R6. After resetting the session with R6 on
R2, the prefix announced by R6 will have the highest local-preference value, so R2 will
choose this new path. At the same time it would be announced this way to their clients:
R2#configure t
R2(config)#route-map LP-200
R2(config-route-map)#set local-preference 200
R2(config-route-map)#exit
R2(config)#router bgp 65001
R2(config-router)#neig 6.6.6.6 route-map LP-200 in
R2(config-router)#end
R2#clear ip bgp 6.6.6.6
R2#sh ip bgp
BGP table version is 8, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24 6.6.6.6 0 200 0 i
* i 3.3.3.3 0 150 0 i
* 4.4.4.4 0 0 65002 i
R1#show ip bgp 100.100.100.0/24
BGP routing table entry for 100.100.100.0/24, version 18
Paths: (1 available, best #1, table default)
Not advertised to any peer
Local
6.6.6.6 (metric 21) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 6.6.6.6, Cluster list: 2.2.2.2
A path without LOCAL_PREF is considered to have the value that is set with the bgp
default local-preference command, or if this is not configured, a 100 by default.
This point is reached if all of the above attributes have the same value for all the feasible
paths.
Local paths that are sourced by the network or redistribute commands are preferred over
local aggregates that are sourced by the aggregate-address command.
R3#show ip bgp
BGP table version is 4, local router ID is 3.3.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
s>i100.100.100.0/30 5.5.5.5 0 100 0 i
* 100.100.100.0/24 0.0.0.0 32768 i
*> 0.0.0.0 0 32768 ?
R3 prefers the path originated via the redistribute command, instead of the one from the
aggregate command. And that path is the one announced to R2.
If none of the above attributes break the tie and the router doesn’t have the prefix locally
generated, the next parameter to check is the AS_PATH attribute.
The AS_PATH is a well-known mandatory attribute. It means every prefix has this attribute
attached, and every router must understand this attribute. The shorter this attribute is, the
more preferable is the path.
Let’s get back again to the original scenario, with all already seen attributes set by default.
In this scenario, the prefix received from R4 has the longest AS_PATH because it’s an eBGP
session.
R2#sh ip bgp
BGP table version is 61, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24 6.6.6.6 0 100 0 i
* 4.4.4.4 0 0 65002 i>/pre>
That’s why R2 prefers the iBGP prefix than the eBGP prefix.
The manipulation of the AS_PATH attribute must be done in a eBGP session. Among
iBGP peers is not possible to manipulate the AS_PATH (you could hide it with the
aggregate-address command, or to manipulate it with confederations)
Origin is also a well-known mandatory attribute, like next-hop and as_path. So every BGP
prefix has this attribute.
IGP is more preferable than Exterior Gateway Protocol (EGP), and EGP is more preferable
than INCOMPLETE.
Typically, when a prefix is generated by the command network, it gets the type IGP, and
when it’s redistributed from another protocol, it gets the type INCOMPLETE.
R6#show route-map
route-map CONN, permit, sequence 10
Match clauses:
interface Loopback100
Set clauses:
Policy routing matches: 0 packets, 0 bytes
R6#conf term
R6(config)#router bgp 65001
R6(config-router)#redistribute connected route-map CONN
R6(config-router)#end
R6#clear ip bgp
R2#sh ip bgp 100.100.100.0/24
BGP routing table entry for 100.100.100.0/24, version 76
Paths: (3 available, best #1, table default)
Advertised to update-groups:
13 18
Local, (Received from a RR-client)
3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100, valid, internal, best
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin incomplete, metric 0, localpref 100, valid, internal
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100, valid, external
R6#conf term
Enter configuration commands, one per line. End with CNTL/Z.
R6(config)#route-map CONN
R6(config-route-map)#set origin igp
R6(config-route-map)#end
R6# clear ip bgp 2.2.2.2
R2#sh ip bgp 100.100.100.0/24
BGP routing table entry for 100.100.100.0/24, version 76
Paths: (3 available, best #1, table default)
Advertised to update-groups:
13 18
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100, valid, internal, best
Local, (Received from a RR-client)
3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100, valid, internal
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100, valid, external
MED comparison only occurs if the first (the neighboring) AS is the same in the two paths to
compare. There are other implications (check this Cisco reference to know more about this
parameter)
It’s an Optional Non-transitive Attribute, so it may not been passed to other AS’s and its
usage as a tie-breaker between several paths depends on each AS policy. The lowest MED is
the most preferable.
R3#conf term
R3(config)#route-map MED
R3(config-route-map)#set metric 20000
R3(config-route-map)#router bgp 65001
R3(config-router)#neig 2.2.2.2 route-map MED out
R3(config-router)#end
R3#clear ip bgp 2.2.2.2
R6#conf term
R6(config)#route-map MED
R6(config-route-map)#set metric 1000
R6(config-route-map)#exit
R6(config)#router bgp 65001
R6(config-router)#neig 2.2.2.2 route-map MED out
R6(config-router)#end
R6#clear ip bgp 2.2.2.2
R2#sh ip bgp
BGP table version is 81, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* i100.100.100.0/24 3.3.3.3 2000 100 0 i
*>i 6.6.6.6 1000 100 0 i
* 4.4.4.4 0 0 65002 i
7.- PREFER EBGP OVER IBGP
We reached the most interesting point.. From the first part of the post, we saw that the path
through R6, who it’s an iBGP peer, was preferred over the path through R4, who is an eBGP
peer.
This is because the fact that the route is learned via iBGP or eBGP is not considered until all
the above attributes are equal. In that case, the prefix learned through an eBGP session is
preferred over an iBGP session.
In order to try this, I have changed a little bit the scenario. Now R5 keeps an eBGP session
with R3, and it announces the prefix 100.100.100.0/24.
R4 has an eBGP session with R2, and it announces also the prefix 100.100.100.0/24.
Between R2 and R3 there is an iBGP session, but R2 filters everything towards R3.
In this situation, we see that R2 gets two path for the prefix 100.100.100.0/24. Both paths
have the same attributes, but one of them is through an iBGP peer, and the other one through
an eBGP peer:
R2#sh ip bgp
BGP table version is 84, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* i100.100.100.0/24 5.5.5.5 0 100 0 65003 i
*> 4.4.4.4 0 0 65002 i
R2 prefers the path through the eBGP peer, although it has another path through an iBGP
peer.
If all the above attributes are equal and no path has been chosen yet, the next parameter to
check is the IGP cost to reach the different next-hops of the prefix.
Getting back to the original scenario, I changed the OSPF cost of R3′s loopback. Now only
R6 and R3 are announcing the prefix 100.100.100.0/24:
R2#sh ip bgp
BGP table version is 88, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* i100.100.100.0/24 3.3.3.3 0 100 0 i
*>i 6.6.6.6 0 100 0 i