Designing large-scale BGP networks

Border Gateway Protocol (BGP) has a solid place in large-scale IP networks, but knowing when and how to deploy it is the key.

Considering the relative complexity of Border Gateway Protocol (BGP), it's not surprising that you would consider various design aspects before rushing head-on into implementing it in your network. If nothing else, a good design and careful planning you could save you a few tense troubleshooting sessions.

In this article, I'll try to give you a few generic guidelines that you should follow when designing your BGP network. Don't forget that experience comes only with practice, however. When designing your first few BGP networks, you should get expert help, either in-house, from your vendor or from a qualified professional services organisation.

Use a public autonomous system number

BGP uses autonomous system (AS) numbers to track networks through which the traffic would have to pass to reach the final destination. AS numbers visible in the public Internet have to be globally unique and are allocated by various Internet registries. If you want to offer public Internet services, having a public AS number is mandatory. If you are in hurry and just need BGP to offer other IP-based services (for example, Layer 3 VPN services based on MPLS VPN), you could use a private AS numbers specified in RFC 1930 (AS 64512 through AS 65535), but then you might be faced with challenging migration scenarios if you'd ever want to offer public Internet services.

Use BGP only in combination with another routing protocol

BGP was designed to be a robust, conservative routing protocol able to carry hundreds of thousands of IP prefixes. It was never meant to be a fast-converging protocol needed to implement modern IP-based services (for example, Voice-over-IP or Triple Play services). You should always use BGP on top of a modern, fast-converging Interior Routing Protocol (IGP), for example OSPF or IS-IS. In such a design, the IGP provides optimum paths through the network core and BGP provides edge-to-edge routing across these paths.

Run Internal BGP between Loopback Interfaces

BGP uses TCP as a reliable transport to exchange routing information between manually configured BGP peers (there is no neighbour discovery in BGP). TCP is always tied to a pair of local and remote IP addresses. Should any one of these become unreachable, the TCP session and consequently BGP routing would become disrupted even though the routers are still operational.

Internal BGP sessions (BGP session between routers in your network) should thus always be run between loopback interfaces, ensuring that the TCP session stays operational as long as there is at least one path between the BGP neighbours (even though the physical interfaces through which the neighbours are reached might change).

External BGP neighbours are usually directly connected (your BGP router is directly attached to your customer's or peering partner's BGP router). The external BGP sessions are thus commonly run between adjacent IP addresses assigned to physical interfaces.

Run BGP in the Whole Network

Historically, some service providers tried to avoid running BGP in the whole network to reduce the memory requirements and CPU utilisation of their routers, relying on ingenious designs that inevitably became too complex once their networks started to grow. It's best to accept the fact that BGP is inevitable in a serious service provider network and design the whole network for it from the very start.

Obviously, you don't need to run BGP on every router in your network. For example, dial-up servers or DSL concentrators can rely on default routing supplied by the network core, but the edge routers connecting enterprise customers could already need BGP to cater to the needs of the multihomed customers.

Statically Configure Advertised Prefixes

If you're offering public Internet services, you have to advertise public IP address space assigned to you via various Internet registries into BGP. Sometimes the engineers try to reach this goal through a complex process of route redistribution from IGP into BGP and subsequent route aggregation within BGP. It's much simpler to advertise the exact prefixes you've been allocated on a few key BGP routers.

When you decide to split the routing of your Internet customers from your core routing (highly recommended) and carry customer IP prefixes in BGP, they could be redistributed from IGP (or from static routes on the edge routers), but tagged with the well-known NO_EXPORT community to prevent their propagation into adjacent autonomous systems.

NOTE: Different rules apply when you run BGP in MPLS VPN environments, where two-way redistribution between BGP and customer's IGP is very common.

Do not change BGP attributes within your network

Any routing protocol (BGP included) works best if all routers in the network have a consistent view of the network. To ensure the consistent routing in your network, do not change any BGP attributes on updates sent to IBGP neighbours (most router vendors would allow you to do that). On the other hand, it's OK to change BGP attributes on:

  • Routes received from external BGP neighbours. Most commonly, the local preference attribute is set to indicate preferred/backup exit points.
  • Routes redistributed into BGP from other sources. Some BGP attributes (for example, Multi-Exit Discriminator) are set automatically, others can be set on the redistributing router.

Redistribute External Subnets into your IGP

Each IP prefix carried by BGP has a next hop attribute, specifying the IP address of the next-hop BGP router. It's the job of the IGP to figure out the optimum path toward the next hop.

By default, BGP advertises IP prefixes received from an external neighbour (from your peering partner, for example) with the next hop attribute pointing to the IP address of the external peer. This property allows you to implement perfect load sharing toward those Internet Exchange Points (IXPs) where you have deployed multiple routers for redundancy purposes. However, the external IP addresses advertised as the next hop by BGP have to be reachable; you should redistribute them into your IGP. Failure to do so might result in interesting troubleshooting exercises.

Note: If you haven't deployed multiple routers connected to the same IXP, you could also use an alternate design, where your edge BGP router resets the next hop attribute to point to its own loopback address.

Use BGP route reflectors

Due to BGP loop avoidance rules, an IP prefix received from an internal BGP peer should not be advertised to another internal peer. Consequently, every BGP-speaking router in your autonomous system should have a BGP session with every other BGP-speaking router in your network. Obviously, the overhead of such scheme in large Service Provider networks is enormous and tools have been developed years ago to make internal BGP scalable.

There are two approaches to scalable internal BGP: BGP route reflectors and BGP confederations. Confederations are rarely used; most designs use BGP route reflectors.

BGP route reflector (RR) is a BGP router that is allowed to propagate IP prefixes between internal BGP neighbours (additional BGP attributes are used to detect loops). The route reflectors could be connected in a hierarchy; for example, a regional route reflector might be a client of a core route reflector. The hierarchy should not have too many levels, as each level introduces additional delay in the BGP convergence process.

You could use regular routers as BGP route reflectors with a low number of clients. In large networks, the core route reflectors should be dedicated devices that are not forwarding significant amount of traffic.

For example, the distribution-layer routers connecting your Points-of-Presence to the network core could act as BGP RR for the BGP routers in the POP. The core route reflectors would then be dedicated boxes distributing BGP routes to all core- and distribution-layer routers.

Use peer templates

Most router vendors allow you to configure a large number of options controlling BGP behaviour toward individual BGP neighbours or per-neighbour inbound/outbound filtering policies. Keeping these settings consistent in an environment with a large number of BGP neighbours is a management nightmare. You can easily avoid it if you use configuration scalability tools (commonly called peer groups and peer templates).


While BGP is undoubtedly a complex routing protocol, you can design reliable large-scale BGP networks based on well-known best practices and design guidelines including these:

  • If at all possible, get a public AS number and use it.
  • Run BGP throughout your network, at least on all of your core routers (unless you've deployed MPLS, in which case this is no longer a requirement).
  • Scale your network with BGP route reflectors and peer templates.
  • Always run BGP in combination with a fast IGP. Establish IBGP sessions between router's loopback interfaces.
  • Do not redistribute/aggregate routes into public Internet. Use static IP prefix origination.

About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry.

Read more on Data centre networking