The Wayback Machine - https://web.archive.org/web/20121107103626/http://www.scribd.com/doc/54296513/8/Distributed-Backbone

Distributed Backbone

Designing Large-Scale LANs_ Chapter 3_ Design Types > Distributed Backbone

The alternative to the Collapsed Backbone architecture is a Distributed Backbone. Later in this
chapter, I describe the concept of hierarchical network design. At that point, the implementation of
distributed backbone structures will become clearer. For now, I need to discuss some general
principles.

A Distributed Backbone just indicates more than one collapse point. It literally distributes the
backbone functions across a number of devices. In a network of any size, it would be extremely
unusual to have a true single collapsed backbone. A large network with a single collapsed
backbone would have a terrible single point of failure. It would also probably suffer from serious
congestion problems if all inter-segment traffic had to cross through one point. Even if that collapse
point had extremely high capacity, it would probably be difficult to get a high enough port density
for it to be useful in a large network.

All practical large-scale networks use some sort of distributed backbone. Moving the backbone
functions outside of a single chassis introduces two main problems: trunk capacity and fault
tolerance.

Trunk capacity

Suppose you want to distribute your backbone-switching functions among two or more large
switches. The central question is how much capacity should you provide to the trunk? By a trunk I
mean any high-speed connection that carries traffic for many end-device segments. In this book, I
often use the term trunk to refer specifically to a connection that carries several VLANs. I want to
consider the more general case here.

A naïve approach would be simply to add up the total burst capacity of all segments feeding this
trunk. If you had, for example, 5 Fast Ethernet (100Mbps half-duplex) LAN segments flowing into
one trunk, then you would need 500Mbps of trunk capacity. But this scenerio presents a serious
problem. How do you practically and inexpensively get this much bandwidth? Do you really have to
go to a Gigabit Ethernet or an ATM just because you're trying to run a few trunks? Even load
sharing isn't much of an option because you would need as many Fast Ethernet trunks as you have
segments, so why use trunks at all in that case?

Needless to say, this approach is not very useful. You have two options for more efficient ways to
think about trunk sizing. You could either develop some generally useful rules of thumb, or you
could give up completely and just keep throwing bandwidth at it until the congestion goes away.
You could actually take a rigorous approach to this second idea by using simulation tools. In the
end, you will always have to monitor your trunks for congestion and increase their capacity when
you start to get into trouble. A few good rules would give a useful starting point. Trunks should
have more capacity than the average utilization. The only question is how much of a peak can the

5/1/2011

Designing Large-Scale LANs: Chapter 3: …

oreilly.com/catalog/…/ch03.html

23/69

have more capacity than the average utilization. The only question is how much of a peak can the

network deal with. Congestion on these trunk links is not a disaster in itself. Later in this book I talk
about prioritization schemes to ensure that the important data gets through no matter how heavy the
flow is. But there needs to be enough capacity for the normal peak periods, and this capacity needs
to be balanced against cost because the higher speed technologies are significantly more expensive
to implement.

The key to this discussion is the fact that all end segments are not statistically expected to peak at
once. Most of the time, there will be an average load associated with all of them. Every once in a
while, one or (at most) two experience a burst to full capacity. The basic rule for sizing trunks is to
make sure that they have enough capacity for two end (shared) segments to peak at the same time
plus 25% of capacity for all the remaining end segments. If the trunk has full-duplex transmission,
consider the directions separately.

For an example, look at Figure 3-8. A central trunk connects four user segments with a server
segment. First assume that this is a half-duplex trunk and that all end segments are 10Mbps
Ethernet segments. Then the rule says to allow for two times 10Mbps plus 25% of three times
10Mbps, which works out to be 27.5Mbps. It would be completely safe to use a Fast Ethernet
trunk in this case.

If the trunk technology is capable of full-duplex transmission, then you need to consider the two
directions separately. Suppose that all traffic is between the users and the servers, with little or no
user segment to user segment communication. This situation will help to establish the directions. For
the user-to-server direction, there are four 10Mbps Ethernet segments. If two of these segments
burst to capacity at the same time, the other two reach 25% of their capacity, and the trunk load
will be 25Mbps in this direction. In the other direction, there is only one segment, so if it bursts to
capacity, then it will have only 10Mbps in the return direction. As a side benefit, this activity shows
that upgrading the server segment to full-duplex Fast Ethernet doesn't force an upgrade on the full-
duplex Fast Ethernet backbone.

But this rule doesn't work very well for LANs that have every PC connected to a full-duplex Fast
Ethernet port of its own. The rule allows two PCs to burst simultaneously and add 25Mbps to the
trunk for every other PC on the network. 50 PCs connected in this way would need a full-duplex
trunk with 1.4Gbps in either direction. This doesn't make much sense.

Individual workstations do not behave like nice statistical collections of workstations. The problem
is not in assuming that two will burst simultaneously, but rather in the 25% of capacity for the rest.
When workstations are connected to a switch like this, the typical utilization per port looks like
silence interspersed with short hard bursts. A completely different sort of rule is necessary to
express this sort of behavior.

A simple way to say it is that some smal percentage of the workstations will operate at capacity,
while the rest do nothing. The actual percentage value unfortunately changes radically depending on
the organization and even on the department. A graphic design group that spends its time sending
large graphic image files might have a relatively high number. A group that only uses the network for
printing the occasional one-page document will have a much smaller number. A general rule
requires a reasonable mid-point number that is useful for Distribution trunks in a large network. A

5/1/2011

Designing Large-Scale LANs: Chapter 3: …

oreilly.com/catalog/…/ch03.html

24/69

requires a reasonable mid-point number that is useful for Distribution trunks in a large network. A
fairly safe number for this purpose is 5%. This percentage may be a little on the high side for many
networks, so you can consider reducing it to 2.5%. Bear in mind that the smaller this number is, the
less capacity for expansion allowed in your network.

Consider another example to demonstrate this rule. Suppose the end-segments in the network
shown in Figure 3-8 have switched full-duplex Fast Ethernet to every desk. Suppose that 25
workstations are in each of the four groups. Then, for the user to server traffic, the trunk should
allow for 5% of these 4 × 25 = 100 workstations to burst to their full 100Mbps capacity
simultaneously. Thus, the trunk will operate at 500Mbps in at least this direction. Gigabit Ethernet
or ATM can achieve these bandwidths, as can various vendor-proprietary Ethernet multiplexing
technologies.

But wait, there's a twist in this example. So far, the discussion has assumed that all traffic is between
the users and the servers. So what good does it do if the network can burst to 500Mbps on the
trunk for traffic destined for the server segment, if the server segment can't deal with this much
traffic? If 5 or more servers are all connected similarly to full-duplex Fast Ethernet switch ports,
then this is possible. But the burst would have to be conveniently balanced among these servers. In
this case, because traffic patterns are known very precisely, it is possible to reduce the trunk
capacity to save money. The point is that this rule is just a starting point. You should always re-
evaluate according to your own network conditions. Also note that the rule doesn't apply at all on
the server side because you should always expect the servers to work the network very hard.

Trunk fault tolerance

A trunk, like any other part of the network, can fail. If it happens to carry all traffic from some part
of the network at the time, though, it could be disastrous. Since trunk failures are potentially serious,
it is always wise to include some sort of redundancy in every trunk. In fact, in most organizations I
have seen personally, trunk failure is more common than hardware failure on key network
equipment. This information is anecdotal, and I have no statistics on it, but it makes sense that
delicate strands of optical fiber stretching long distances might be more vulnerable than a tank-like
Ethernet switch chassis. If that switch is located in a locked room while the fiber has to run through
a conduit shared with other building tenants, there's an even stronger reason to worry about the
fiber. In some cases, it is physically damaged while technicians are doing other work. But even if
fiber is never touched and the conduit remains sealed forever, eventually it degrades due to a host
of environmental hazards, such as background radiation.

All of this information is intended to scare the reader into worrying about trunk failures. In most
network designs, the trunks are the first things I would want to provide redundancy for. There are
many ways to do so. The actual redundancy mechanism depends on trunk type. If the trunk is itself
a multiplexed collection of links (like Cisco's EtherChannel or Nortel's MultiLink Trunking), then
redundancy is inherent in the design. In this case, it would be wise to employ an N+1 redundancy
system. This means that the trunk capacity should be sized as discussed in the previous section, and
then increased by one extra link. This way, there is still sufficient capacity if any one link fails.

However, if a single fiber pair carries the trunk, then the only useful way to add redundancy is by
running a second full-capacity trunk link. Since one of the main concerns is environmental or
physical damage to the fiber, putting this second link through a different conduit makes sense.

5/1/2011

Designing Large-Scale LANs: Chapter 3: …

oreilly.com/catalog/…/ch03.html

25/69

physical damage to the fiber, putting this second link through a different conduit makes sense.

The only remaining question is whether to make the backup trunk link a hot standby or to have it
actively share the load with the primary link. And the answer, unfortunately, depends on what you
can get with the technology you're using. In general, if you can do it, load sharing is better for two
reasons:

In case you inadvertently underestimate your trunk capacity requirements, or in case those
requirements grow over time, load sharing gives you extra bandwidth all the time.

If the primary can fail, so can the backup. The difference is that you notice when the primary
fails, and you don't necessarily know when the backup fails. If traffic goes through it all the
time, then you'll usually know pretty quickly that you've had a failure of your backup link.

Leave a Comment

close