Spanning Tree Protocol

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 36

Spanning Tree Protocol

The Spanning Tree Protocol (STP) is a network protocol that builds a logical loop-free
topology for Ethernet networks. The basic function of STP is to prevent bridge loops and the
broadcast radiation that results from them. Spanning tree also allows a network design to include
spare (redundant) links to provide automatic backup paths if an active link fails. This is done
without the danger of bridge loops, or the need for manual enabling or disabling of these backup
links.

As the name suggests, STP creates a spanning tree within a network of connected layer-2
bridges, and disables those links that are not part of the spanning tree, leaving a single active
path between any two network nodes.

STP was originally standardized as IEEE 802.1D, but the functionality, spanning tree, rapid spanning tree
and multiple spanning tree previously specified in 802.1D, 802.1w and 802.1s respectively has been
incorporated into IEEE 802.1Q-2014[3]

Broadcast Storms

Assume that PC0 perform an ARP request to find the MAC address of Server. ARP (Address
Resolution Protocol) uses broadcast method to locate the MAC address of device.

In this circumstance PC0 will generate single broadcast frame. Switch S1 will receive it from
PC0. Switch S1 will flood this broadcast frame from all remaining ports except the incoming
port.

Without any loop removing mechanism, switches will flood broadcasts endlessly throughout the
network. This is known as broadcast storm. Next figure illustrates how a broadcast frame is
continually being flooded throughout the network.
Endless Cycle One

PC0 => S1 => S2 => S3 => S6 => (Server and) S5 => S4 => S1 => (PC0 and) S2 => S3 .......

Endless Cycle Two

PC0 => S1 => S4 => S5 => S6 => (Server and) S3 => S2 => S1 => (PC0 and) S4 => S5 .......

Duplicate frame copies

In looped network, a device could receive duplicate copies of same frame from different
switches. Assume that PC0 sends a unicast frame to Server. Switch floods unknown unicast from
all of its ports, except the incoming port. Above figure demonstrates how the Server will receive
duplicate copies of unicast frame simultaneously. Duplicate frame copies create additional
overhead on the network.

Unstable MAC Table

When switch receives a frame, it checks source MAC address in frame and associate that
interface with finding MAC address. Next time when switch receives a frame for this MAC
address, it will forward that frame from this interface. These entries are stored in MAC Address
Table. Switch uses MAC Address Table to forward the frame. Looped network can make MAC
Address Table unstable. For example, assume that PC0 sends a unicast frame to Server. Switch
S6 receives this frame from two interfaces (interface connected to S3 and interface connected to
S5). When it receives frame from S3, it associate PC0 MAC address with the interface that is
connected with S3. Again when it receives same frame from S5, it think that location of PC0 has
been changed and it would update the entry in MAC address table.

Same thing will happen again when it receives frame from switch S3 again. MAC address table
would be totally confused about the PC0 location because the switch S6 is receiving the PC0s
frame from more than one link. Situation becomes bad to worse when switch stuck in constantly
updating the MAC Address Table with source locations and fail to forward the frame. This is
known as thrashing the MAC Table.

STP is the answer of all issues explained above. STP was developed to solve each problem that
is trigged due to layer two loops. Before we explain how the STP works, you need to be familiar
with some basic terms of STP and their function within STP.

Protocol operation:
A local area network (LAN) can be depicted as a graph whose nodes are bridges and LAN segments (or
cables), and whose edges are the interfaces connecting the bridges to the segments. To break loops in
the LAN while maintaining access to all LAN segments, the bridges collectively compute a spanning
tree.[a] The spanning tree that the bridges compute using the Spanning Tree Protocol can be determined
using the following rules. The example network at the right, below, will be used to illustrate the rules.

Select a root bridge. The root bridge of the spanning tree is the bridge with the smallest
(lowest) bridge ID. Each bridge has a configurable priority number and a MAC address; the
bridge ID is the concatenation of the bridge priority and the MAC address (E.g., the ID of a
bridge with priority 32768 and mac 0200.0000.1111 is 32768.0200.0000.1111). The bridge
priority default is 32768 and can only be configured in multiples of 4096.[b] When comparing
two bridge IDs, the priority portions are compared first and the MAC addresses are compared
only if the priorities are equal. The switch with the lowest priority of all the switches will be the
root; if there is a tie, then the switch with the lowest priority and lowest MAC address will be the
root. For example, if switches A (MAC=0200.0000.1111) and B (MAC=0200.0000.2222) both
have a priority of 32768 then switch A will be selected as the root bridge.[c] If the network
administrators would like switch B to become the root bridge, they must set its priority to be less
than 32768.[d]

Determine the least cost paths to the root bridge. The computed spanning tree has the
property that messages from any connected device to the root bridge traverse a least cost path,
i.e., a path from the device to the root that has minimum cost among all paths from the device to
the root. The cost of traversing a path is the sum of the costs of the segments on the path.
Different technologies have different default costs. An administrator can configure the cost of
traversing a particular network segment. The property that messages always traverse least-cost
paths to the root is guaranteed by the following two rules.

Least cost path from each bridge. After the root bridge has been chosen, each bridge determines
the cost of each possible path from itself to the root. The calculation is done by comparing the
'root path cost' of the BPDUs that each bridge gets on each of its ports. The root bridge sends
BPDUs with path cost equal to zero, and once a non-root bridge gets a BPDU it increments the
path cost by adding the cost of the incoming link and propagates it on the network. The port that
gets the BPDU with the smallest path cost (e.g., connecting the switch to the least-cost path) then
becomes the root port (RP) of the bridge.[4]

Least cost path from each network segment. The bridges on a network segment collectively
determine which bridge has the least-cost path from the network segment to the root. The port
connecting this bridge to the network segment is then the designated port (DP) for the segment.
Disable all other root paths. Any active port that is not a root port or a designated port is a
blocked port (BP).

Modifications in case of ties. The above rules over-simplify the situation slightly, because it is
possible that there are ties, for example, the root bridge may have two or more ports on the same
LAN segment, two or more ports on a single bridge are attached to least-cost paths to the root or
two or more bridges on the same network segment have equal least-cost paths to the root. To
break such ties:

Breaking ties for root ports. When multiple paths from a bridge are least-cost paths, the chosen
path uses the neighbor bridge with the lower bridge ID. The root port is thus the one connecting
to the bridge with the lowest bridge ID. For example, in figure 3, if switch 4 was connected to
network segment d instead of segment f, there would be two paths of length 2 to the root, one
path going through bridge 24 and the other through bridge 92. Because there are two least cost
paths, the lower bridge ID (24) would be used as the tie-breaker in choosing which path to use.

Breaking ties for designated ports. When the root bridge has more than one port on a single LAN
segment, the bridge ID is effectively tied, as are all root path costs (all equal zero). The
designated port then becomes the port on that LAN segment with the lowest port ID. It's put into
Forwarding mode while all other ports on the root bridge on that same LAN segment become
non-designated ports and are put into blocking mode.[5] Not all bridge/switch manufacturers
follow this rule, instead making all root bridge ports designated ports, and putting them all in
forwarding mode. A final tie-breaker is required as noted in the section "The final tie-breaker."

The final tie-breaker. In some cases, there may still be a tie, as when the root bridge has multiple
active ports on the same LAN segment (see above, "Breaking ties for designated ports") with
equally low root path costs and bridge IDs, or, in other cases, multiple bridges are connected by
multiple cables and multiple ports. In each case, a single bridge may have multiple candidates for
its root port. In these cases, candidates for the root port have already received BPDUs offering
equally-low (i.e. the "best") root path costs and equally-low (i.e. the "best") bridge IDs, and the
final tie breaker goes to the port that received the lowest (i.e. the "best") port priority ID, or port
ID.[6]

In summary, the sequence of events to determine the best received BPDU (which is the best path
to the root) is

Lowest root bridge ID - Determines the root bridge


Lowest cost to the root bridge - Favors the upstream switch with the least cost to root
Lowest sender bridge ID - Serves as a tie breaker if multiple upstream switches have
equal cost to root
Lowest sender port ID - Serves as a tie breaker if a switch has multiple (non-
Etherchannel) links to a single upstream switch, where:
o Bridge ID = priority (4 bits) + locally assigned system ID extension (12 bits) + ID
[MAC address] (48 bits); the default bridge priority is 32768, and
o Port ID = priority (4 bits) + ID (Interface number) (12 bits); the default port
priority is 128

Data rate and STP path cost

The access speeds of the links determine the path cost that STP/RSTP assumes. The STP path
cost default was originally calculated by the formula 1 Gigabit / second/bandwidth . When faster
speeds became available the default values were adjusted as otherwise speeds above 1 Gbit/s
would have been indistinguishable by STP. Its successor RSTP uses a similar formula with a
larger numerator: 20 Terabit / second/bandwidth . These formulas lead to the sample values in the
table below:[7]:154

Data rate STP cost (802.1D-1998) RSTP cost (802.1W-2004, default value)[7]:154

4 Mbit/s 250 5,000,000

10 Mbit/s 100 2,000,000

16 Mbit/s 62 1,250,000

100 Mbit/s 19 200,000

1 Gbit/s 4 20,000

2 Gbit/s 3 10,000

10 Gbit/s 2 2,000

Path Costs

Path cost is an accumulated value of port costs from Root Bridge to other switches in network. It
is always calculated from Root Bridge. Default path cost at Root Bridge is 0. BPDU contains
path cost information. When Root Bridge advertises BPDU out from its interface, it set path
costs to 0. When connected switch receives this BPDU, it increments path cost by adding the
port cost value of its incoming port. For example if switch receive this BPDU on Gigabit
interface then path cost would be 0 (Value it receive from Root Bridge) + 4 (port cost value, see
above table) equal to 4. Now this switch will set path cost value to 4 in BPDU frame and forward
it. Assume that next switch is connected with this switch and receives updated BPDU on fast
Ethernet port. Path cost for new switch would be 23. Path costs value received in BPDU + Port
cost ( 4+ 19 = 23).

Bridge Protocol Data Units


Main article: Bridge Protocol Data Unit

The above rules describe one way of determining what spanning tree will be computed by the
algorithm, but the rules as written require knowledge of the entire network. The bridges have to
determine the root bridge and compute the port roles (root, designated, or blocked) with only the
information that they have. To ensure that each bridge has enough information, the bridges use
special data frames called Bridge Protocol Data Units (BPDUs) to exchange information about
bridge IDs and root path costs.

A bridge sends a BPDU frame using the unique MAC address of the port itself as a source
address, and a destination address of the STP multicast address 01:80:C2:00:00:00.

There are two types of BPDUs in the original STP specification[7]:63 (the Rapid Spanning Tree
(RSTP) extension uses a specific RSTP BPDU):

Configuration BPDU (CBPDU), used for Spanning Tree computation


Topology Change Notification (TCN) BPDU, used to announce changes in the network topology

BPDUs are exchanged regularly (every 2 seconds by default) and enable switches to keep track
of network changes and to start and stop forwarding at ports as required.

When a device is first attached to a switch port, it will not immediately start to forward data. It
will instead go through a number of states while it processes BPDUs and determines the
topology of the network. When a host is attached such as a computer, printer or server the port
will always go into the forwarding state, albeit after a delay of about 30 seconds while it goes
through the listening and learning states (see below). The time spent in the listening and learning
states is determined by a value known as the forward delay (default 15 seconds and set by the
root bridge). However, if instead another switch is connected, the port may remain in blocking
mode if it is determined that it would cause a loop in the network. Topology Change Notification
(TCN) BPDUs are used to inform other switches of port changes. TCNs are injected into the
network by a non-root switch and propagated to the root. Upon receipt of the TCN, the root
switch will set a Topology Change flag in its normal BPDUs. This flag is propagated to all other
switches to instruct them to rapidly age out their forwarding table entries.

STP port states

Ports on switch running STP go through the five different states. During STP convergence,
switches will move their root and designated ports through the various states: blocking, listening,
learning, and forwarding, whereas any other ports will remain in a blocked state.
Blocking

In blocking state, switch only listen and process BPDUs on its ports. Any other frames except
BPDUs are dropped. In this state, switch try to find out which port would be root port, which
ports would be designated ports and which ports would remains in blocking state to remove
loops. A port will remain in this state for twenty seconds. By default all ports are in blocking
state, when we powered on the switch. Only root port and designated ports will move into next
state. All remaining ports will remain in this state.

Listening

After twenty seconds, root port and designated ports will move into listening state. In this state
ports still listen and process only BPDUs. All other frames except BPDUs are dropped. In this
state switch will double check the layer 2 topology to make sure that no loops occur on the
network before processing data frames. Ports remain in this state for fifteen seconds.

Learning

Root port and designated ports enter in learning state from listening state. In this state ports still
listen and process BPDUs. However, in this state ports start processing user frames. Switch
examines source address in the frames and updates its MAC Address Table. Switch will not
forward user frames to destination ports in this state. Ports stay in this state for fifteen seconds.

Forwarding

In forwarding state, ports will listen and process BPDUs. In this state ports will also process user
frames, update MAC Address Table and forward user traffic through the ports.

Disable

Disable ports are manually shut down or removed from STP by an administrator. All unplugged
ports also remain in disable state. Disable ports do not participate in STP.

Convergence

Convergence is a state where all ports on switch have transitioned to either forwarding or
blocking modes. During the STP converging, all user data frames would be dropped. No user
data frame will be forwarded until convergence is complete. Usually convergence takes place in
fifty seconds (20 seconds of blocking state + 15 seconds of listing state + 15 seconds of learning
state).

STP Convergence in General

As we know, STP protocol follows certain simple procedure to calculate the loop-free subset of
the network topology. STP protocol could be compared to RIP in some sense. Both execute a
version of Bellman-Ford iterative algorithm, which could be described as gradient (meaning it
iteratively looks for the optimal solution, selecting the closest candidate every time). Every
switch accepts and retains only the best current root bridge information. The switch then blocks
alternate paths to the root bridge, leaving only the single optimal (in terms of path cost) uplink
and continues relaying the optimal information. If a switch learns about a better (superior) root
bridge than it knows now (e.g. better bridge id, or shorter path to the root), the old information is
erased and the new one immediately accepted and relayed. Note that the switch stores the most
recent STP BPDUs with every port that receives them. Therefore, for a given switch, there is a
BPDUs stored with every root or alternate (blocked port).

There are certain features in STP designed to improve the algorithm stability and ensure the
aging out of the old information. Every BPDU contains two fields: Max_Age and Message_Age.
The Message_Age field is incremented every time a BPDU traverses a switch (so it might be
compared to the hop count). When a switch stores the BPDU with the respective port, it will
count the time in seconds, starting from Message_Age and up to the Max_Age. If during this
interval, no further BPDUs are received, the current BPDU is wiped out and the port is declared
designated. This procedure ensures that the old information is eventually aged out of the
topology.

There is one more thing, similar to the hold-down feature found in RIP. It is the way in which
STP deals with inferior BPDUs. The BPDU is considered inferior, if it carries information
about the root bridge that is worse than the one currently stored for the port, or the BPDU has
longer distance to reach the current root bridge (compare this to RIPs increase in metric).
Inferior BPDUs may appear when a neighboring switch suddenly loses its uplink and claims
itself the new root of the topology. By default, every switch should ignore inferior BPDUs, until
the currently stored BPDU expires (time=Max_Age Message_Age). This feature intends to
stabilize STP topology in situations where an uplink on some switch flaps, causing the switch to
start sending inferior information.

STP convergence in case of Direct Link failure:


Figure 9-6 shows a network that has converged into a stable STP topology. The VLAN is forwarding on all
trunk links except port 1/2 on Catalyst C, where it is in the Blocking state.

This network has just suffered a link failure between Catalyst A and Catalyst C. The sequence of
events unfolds as follows:

1. Catalyst C detects a link down on its port 1/1; Catalyst A detects a link down on its port
1/2.
2. Catalyst C removes the previous "best" BPDU it had received from the Root over port
1/1. Port 1/1 is now down so that BPDU is no longer valid.

Normally, Catalyst C would try to send a TCN message out its Root Port, to reach the
Root Bridge. Here, the Root Port is broken, so that isn't possible. Without an advanced
feature such as STP UplinkFast, Catalyst C isn't yet aware that another path exists to the
Root.

Also, Catalyst A is aware of the link down condition on its own port 1/2. It normally
would try to send a TCN message out its Root Port, to reach the Root Bridge. Here,
Catalyst A is the Root, so that isn't really necessary.

3. The Root Bridge, Catalyst A, sends a Configuration BPDU with the TCN bit set out its
port 1/1. This is received and relayed by each switch along the way, informing each one
of the topology change.
4. Catalysts B and C receive the TCN message. The only reaction these switches take is to
shorten their bridging table aging times to the Forward Delay time. At this point, they
don't know how the topology has changed; they only know to force fairly recent bridging
table entries to age out.
5. Catalyst C basically just sits and waits to hear from the Root Bridge again. The Config
BPDU TCN message is received on port 1/2, which was previously in the Blocking state.
This BPDU becomes the "best" one received from the Root, so port 1/2 becomes the new
Root Port.

Catalyst C now can progress port 1/2 from Blocking through the Listening, Learning, and
Forwarding states.

As a result of a direct link failure, the topology has changed and STP has converged again.
Notice that only Catalyst C has undergone any real effects from the failure. Switches A and B
heard the news of the topology change but did not have to move any links through the STP
states. In other words, the whole network did not go through a massive STP reconvergence.

The total time that users on Catalyst C lost connectivity was roughly the time that port 1/2 spent
in the Listening and Learning states. With the default STP timers, this amounts to about two
times the Forward Delay period (15 seconds), or 30 seconds total.
STP convergence in case of Indirect Link failure:

Consider the topology on Fig 2.

In this case, SW2 has better Bridge ID than SW3, and thus Port D is designated on the segment
between SW2 and SW3. SW3 blocks the redundant uplink to via SW3 (Port B) and elects Port A
as the root port. Now imagine that SW2 detects loss of carrier on the link connected to SW1
(Port C). The switch will immediately invalidate the best BPDU stored for Port C, and will
assume itself the root of the spanning-tree, as there are no other ports receiving BPDUs. SW2
will start advertising BPDUs to SW3, setting the designated and the root bridge to itself in the
configuration BPDUs. Those are, by definition, inferior BPDUs, and SW3 will ignore them, as it
still hears better information from SW1. SW3 will also keep the previous BPDU associated with
Port B for the duration of Max_Age-Message_Age. When this timer expires, SW3 will start
considering the inferior BPDUs. Port B will move to Listening state, and SW3 will start relaying
SW1s BPDUs to SW2, as those are superior to SW2s BPDUs. Now, SW2 would detect the
better information on its formerly designated port (Port D) and will cycle the port through
Listening and Learning states. Both switches (SW2 and SW3) will eventually move their ports
into forwarding states, recovering the connectivity. Therefore, it will take Max_Age-
Message_Age + 2xForward_Time to recover from indirect link failure.
Take a look at the picture above. SW1 is the root bridge and the fa0/16 interface on SW3 has
been blocked. Suddenly the link between SW1 and SW2 fails. From SW3s perspective this is an
indirect link failure.

This is what will happen:

1. SW2 will detect this link failure immediately since its a directly connected link. Since it
doesnt receive any BPDUs from the root anymore it assumes it is now the new root
bridge and will send BPDUs towards SW3 claiming to be the new root.
2. SW3 will receive these BPDUs from SW2 but it will realize that this new BPDU is
inferior compared to the old one it has currently stored on its fa0/16 interface and will
ignore this new BPDU. When a switch receives an inferior BPDU it means that the
neighbor switch has lost its connection to the root bridge.
3. After 20 seconds (default timer) the max age timer will expire for the old BPDU on the
fa0/16 interface of SW3. The interface will go from blocking to the listening state and
will send BPDUs towards SW2.
4. SW2 will receive this BPDU from SW3 and discovers that he isnt the root bridge. It
wont send BPDUs anymore towards SW3.
5. The fa0/16 interface on SW3 will continue from the listening state (15 seconds) to the
learning state (15 seconds) and ends up in the forwarding state.

Connectivity is now restored but it took 20 seconds for the max age timer to expire, 15 seconds
for the listening state and another 15 seconds for the learning state before we go to the
forwarding state. Thats a total of 50 seconds downtime.
The effect of topology changes

Switches forward Ethernet frames based on their MAC address tables (filtering tables) that bind
MAC addresses to egress ports. When a change in topology occurs (e.g. a link failure) the MAC
address tables may appear to be invalid, as the paths between switches have changed. The
switches may eventually re-learn the new information, but it may take considerable time,
especially if the traffic is scarce and MAC address aging time is large (5 minutes by default).
Based on that, if switch detects a change in the topology (e.g. link going up or down), it should
notify all other switches that something has changed. In response to this notification, all switches
will reduce their MAC address aging time to Forward_Time (15 secs by default) effectively
fastening the aging process.

As we know, topology changes are signaled via special TCN BPDU, which is being sent
upstream from the originating switch (the one that detected the change) to the root switch via the
root ports. As the root switch hears the TCN BPDU, it will set TCN ACK flag in all its outgoing
configuration BPDUs for the duration of Max_Age+Forward_Time. All switches that see this
flag, will set their MAC address tables aging time to Forward_Time. Once the switch that
originated the TCN BPDU will hear the TCN ACK, it will stop signaling about the topology
change.

Now what is the effect of a topology change event? Two major things are impacted:

1) Connectivity. In some cases, it may time additional Forward_Delay seconds to expire the old
MAC address information and recover connectivity. This may only happen if the old information
persists in some switches, and the frames are black-holed.

2) Network performance. Shortening the MAC address table aging time results in less stable
topology. When a switch loses a MAC address, it starts flooding frames for this destination,
effectively acting like a hub. If the flow of packets in your network is not intense enough, the
switches may start losing MAC address table information, resulting in excessive traffic flooding.

The second issue might become pretty dangerous with high number of topology changes.
Excessive flooding might severely impact your network performance. Note, that this issue also
pertains to L2 topologies that runs RSTP, as the topology changes are handled in the similar way.
In order to reduce the number of topology changes, configure all edge ports in the topology
(connected to hosts, IP Phones, servers) as spanning-tree portfast. Portfast ports do not generate
TC events when they go up or down.

Uplink Failure With Uplink Fast Enabled (This is for direct link failure)

Heres the big difference. When uplinkfast is enabled a non-designated port will go to
forwarding state immediately if the root port fails. Instead of 30 seconds downtime
connectivity is restored immediately.
This section details the steps for UplinkFast recovery. Use the network diagram that was
introduced at the beginning of the document.

Immediate Switch Over to the Alternate Uplink

Complete these steps for an immediate switch over to the alternate uplink:

1. The uplink group of A consists of P1 and its non-self-looped blocked port, P2.
2. When the link between D1 and A fails, A detects a link down on port P1.

It knows immediately that its unique path to the root bridge is lost, and other paths are
through the uplink group, for example, port P2 , which is blocked.

3. A places port P2 in forwarding mode immediately, thus it violates the standard STP
procedures.

There is no loop in the network, as the only path to the root bridge is currently down.
Therefore, recovery is almost immediate.
CAM Table Update

Once UplinkFast has achieved a fast-switchover between two uplinks, the Content-Addressable
Memory (CAM) table in the different switches of the network can be momentarily invalid and
slow down the actual convergence time.

In order to illustrate this, two hosts are added, named S and C, to this example:

The CAM tables of the different switches are represented in the diagram. You can see that, in
order to reach C, packets originated from S have to go through D2, D1, and then A.

As shown in this diagram, the backup link is brought up:


The backup link is brought up so quickly, however, that the CAM tables are no longer accurate.
If S sends a packet to C, it is forwarded to D1, where it is dropped. Communication between S
and C is interrupted as long as the CAM table is incorrect. Even with the topology change
mechanism, it can take up to 15 seconds before the problem is solved.

In order to solve this problem, switch A begins to flood dummy packets with the different MAC
addresses that it has in its CAM table as a source. In this case, a packet with C as a source
address is generated by A. Its destination is a Cisco proprietary multicast MAC address that
ensures that the packet is flooded on the whole network and updates the necessary CAM tables
on the other switches.

The rate at which the dummy multicasts are sent can be configured.

Spanning-Tree Backbone Fast:

Backbone Fast is used to recover from an indirect link failure.


Take a look at the picture above. SW1 is the root bridge and the fa0/16 interface on SW3 has been
blocked. Suddenly the link between SW1 and SW2 fails.

Without backbone fast enabled, after the max age timer expires (20 seconds) for the old BPDU from
SW2 the fa0/16 interface on SW3 will go to the listening and learning state and ends up in forwarding
state.

Without backbone fast, spanning-tree will discard the inferior BPDUs that SW3 receives on its
fa0/16 interface and it will have to wait till the max age timer expires (20 seconds).

If we enable backbone fast it will skip the max age timer so we can save 20 seconds of time.
Consider the following topology.

If Backbone Fast is enabled in the network, Spanning Tree Protocol (STP) behaves as below.

When SW3 receives an inferior BPDU from SW2, it will send a Root Link Query (RLQ) PDU
on all non-designated ports (except the port where it received the inferior BPDU) to hear that the
Root Switch (Root Bridge) is still available.

The port on which SW3 received the inferior BPDU from SW2 is also excluded because that
path is already failed.

When a Root Link Query (RLQ) response is received on a port, if the answer is negative, the port
lost connection to the root and you can age out its BPDU. If all other non-designated ports
received a negative answer, Switch SW3 has lost connection to Root Switch (Root Bridge) and
can start the Spanning Tree Protocol (STP) calculation from beginning.

If SW3 receives any positive response, it will assume the current Root Switch (Root Bridge) is
still reachable. In our case, SW3 will receive a positive response from SW1 (Root Switch (Root
Bridge)), and start relaying SW1 (Root Switch (Root Bridge)), BPDUs to SW2.

Backbone Fast is pro-active (using Root Link Query (RLQ)) and when Backbone fast is
implemented, it can minimize the max age timer interval. By enabling Backbone fast the max
age timer can be skipped and the delay is minimized from 50 seconds to 30 seconds.
Differences between STP and RSTP
The following table outlines the main differences between Rapid STP (802.1w) and the legacy
STP(802.1d):

STP (802.1d) Rapid STP (802.1w)


In stable topology all
In stable topology only the root sends
bridges generate BPDU every Hello (2 sec) :
BPDU and relayed by others.
used as keepalives mechanism.
Port states
Disabled
Discarding (replaces disabled, blocking and
Blocking
listening)
Listening
Learning
Learning
Forwarding
Forwarding
To avoid flapping, it takes 3 seconds for a port to migrate from one protocol to another (STP /
RSTP) in a mixed segment.
Port roles
Root (Forwarding) Root (Forwarding)
Designated (Forwarding) Designated (Forwarding)
Non-Designated (Blocking) Alternate(Discarding)Backup (Discarding)
Additional configuration to make an end An edge port (end node port) is an integrated
node port a port fast (in case a BPDU is Link type which depends on the duplex : Point-to-
received). point for full duplex & shared for half duplex).
Topology changes and convergence
Introduce proposal and agreement process for
Use timers for convergence (advertised by
synchronization (< 1 sec).- Hello, Max Age and
the root):
Forward delay timer used only for backward
Hello(2 sec)
compatibility with standard STP
Max Age(20 sec = 10 missed hellos)
Only RSTP port receiving STP (802.1d) messages
Forward delay timer (15 sec)
will behaves as standard STP.
Faster transition on point-to-point and edge ports
Slow transition (50sec): only:Less states No learning state, doesnt wait
Blocking (20s) =>Listening (15s) to be informed by others, instead, actively looks
=>Learning (15s) =>Forwarding for possible failure by RLQ (Request Link Query)
a feedback mechanism.
Use other 6 bits of the flag octet (BPDU type
Use only 2 bits from the flag octet:Bit 7 : 2/version 2):
Topology Change Acknowledgment.Bit 0 Bit 1 : ProposalBit 2, 3 : Port roleBit 4 :
: Topology Change LearningBit 5 : ForwardingBit 6 : AgreementBit
0, 7 : TCA & TCN for backward compatibility
The bridge that discover a change in the TC is flooded through the network, every bridge
network inform the root, that in turns generate TC (Topology change) and inform its
informs all others by sending BPDU with neighbors when it is aware of a topology change
TCA bit set and instruct them to clear and immediately delete old DB entries.
their DB entries after short timer
(~Forward delay) expire.
If a non-root bridge doesnt receive Hello
for 10*Hello (advertised from the root), Wait for 3*Hello on a root port (advertised from
start claiming the root role by generating the root) before deciding to act.
its own Hello.
Wait until TC reach the root + short timer
Delete immediately local DB except MAC of the
(~Forward delay) expires, then flash all
port receiving the topology changes (proposal)
root DB entries

Root Port Roles

The port that receives the best BPDU on a bridge is the root port. This is the port that is
the closest to the root bridge in terms of path cost. The STA elects a single root bridge in
the whole bridged network (per-VLAN). The root bridge sends BPDUs that are more
useful than the ones any other bridge sends. The root bridge is the only bridge in the
network that does not have a root port. All other bridges receive BPDUs on at least one
port.

Designated Port Role

A port is designated if it can send the best BPDU on the segment to which it is connected.
802.1D bridges link together different segments, such as Ethernet segments, to create a
bridged domain. On a given segment, there can only be one path toward the root bridge.
If there are two, there is a bridging loop in the network. All bridges connected to a given
segment listen to the BPDUs of each and agree on the bridge that sends the best BPDU as
the designated bridge for the segment. The port on that bridge that corresponds is the
designated port for that segment.
Alternate and Backup Port Roles

These two port roles correspond to the blocking state of 802.1D. A blocked port is
defined as not being the designated or root port. A blocked port receives a more useful
BPDU than the one it sends out on its segment. Remember that a port absolutely needs to
receive BPDUs in order to stay blocked. RSTP introduces these two roles for this
purpose.
An alternate port receives more useful BPDUs from another bridge and is a port blocked.
This is shown in this diagram:

A backup port receives more useful BPDUs from the same bridge it is on and is a port
blocked. This is shown in this diagram:

This distinction is already made internally within 802.1D. This is essentially how Cisco
UplinkFast functions. The rationale is that an alternate port provides an alternate path to the root
bridge and therefore can replace the root port if it fails.
Lets walk through the other stuff that has been changed:

BPDUs are now sent every hello time. Only the root bridge generated BPDUs in the classic
spanning-tree and those were relayed by the non-root switches if they received it on their root
port. Rapid spanning-tree works differentlyall switches generate BPDUs every two seconds
(hello time). This is the default hello time but you can change it.

The classic spanning-tree uses a max age timer (20 seconds) for BPDUs before they are
discarded. Rapid spanning-tree works differently! BPDUs are now used as a keepalive
mechanism similar to what routing protocols like OSPF or EIGRP use. If a switch misses three
BPDUs from a neighbor switch it will assume connectivity to this switch has been lost and it will
remove all MAC addresses immediately.

Rapid spanning tree will accept inferior BPDUs. The classic spanning tree ignores them. Does
this ring a bell? This is pretty much the backbone fast feature of classic spanning-tree.

Transition speed (convergence time) is the most important feature of rapid spanning tree. The
classic spanning tree had to walk through the listening and learning state before it would move
an interface to the forwarding state, this took 30 seconds with the default timers. The classic
spanning tree was based on timers.

Rapid spanning doesnt use timers to decide whether an interface can move to the forwarding
state or not. It will use a negotiation mechanism for this. Ill show you how this works in a bit.

Do you remember portfast? If we enable portfast while running the classic spanning tree it will
skip the listening and learning state and put the interface in forwarding state right away. Besides
moving the interface to the forwarding state it will also not generate topology changes when
the interface goes up or down. We still use portfast for rapid spanning tree but its now referred
to as an edge port.

Rapid spanning tree can only put interfaces in the forwarding state really fast on edge ports
(portfast) or point-to-point interfaces. It will take a look at the link type and there are only two
link types:

Point-to-point (full duplex)


Shared (half duplex)

Normally we are using switches and all our interfaces are configured as full duplex, rapid
spanning tree sees these interfaces as point-to-point. If we introduce a hub to our network well
have half duplex which is seen as a shared interface to rapid spanning-tree.

Lets take a close look at the negotiation mechanism that I described earlier:
Let me describe the rapid spanning tree synchronization mechanism by using the picture above.
SW1 on top is the root bridge. SW2, SW3 and SW4 are non-root bridges.

As soon as the link between SW1 and SW2 comes up their interfaces will be in blocking mode.
SW2 will receive a BPDU from SW1 and now a negotiation will take place called sync:
After SW2 received the BPDU from the root bridge it immediately blocks all its non-edge
designated ports. Non-edge ports are the interfaces that connect to other switches while edge
ports are the interfaces that have portfast configured. As soon as SW2 blocks its non-edge ports
the link between SW1 and SW2 will go into forwarding state. SW2 will now do the following:

SW2 will also perform a sync operation with both SW3 and SW4 so they can quickly move to
the forwarding state.

Are you following me so far? The lesson to learn here is that rapid spanning tree uses this sync
mechanism instead of the timer-based mechanism that the classic spanning tree uses
(listening > learning > forwarding). Im going to show you what this looks like on real switches
in a bit. Lets take a closer look at the sync mechanism, lets look at what happens exactly
between SW1 and SW2:
At first the interfaces will be blocked until they receive a BPDU from each other. At this moment
SW2 will figure out that SW1 is the root bridge because it has the best BPDU information. The
sync mechanism will start because SW1 will set the proposal bit in the flag field of the BPDU.
When SW2 receives the proposal it has to do something with it:

SW2 will block all its non-edge interfaces and will start the synchronization towards SW3 and
SW4, once this is done SW2 will let SW1 know about this:

Once SW2 has its interfaces in sync mode it will let SW1 know about this by sending an
agreement. This agreement is a copy of the proposal BPDU where the proposal bit has been
switched off and the agreement bit is switched on. The fa0/14 interface on SW2 will now go into
forwarding mode. When SW1 receives the agreement heres what happens:
Once SW1 receives the agreement from SW2 it will put its fa0/14 interface in forwarding mode
immediately.

Topology Change Propagation

When a bridge receives a BPDU with the TC bit set from a neighbor, these occur:

It clears the MAC addresses learned on all its ports, except the one that receives the
topology change.
It starts the TC While timer and sends BPDUs with TC set on all its designated ports and
root port (RSTP no longer uses the specific TCN BPDU, unless a legacy bridge needs to
be notified).

This way, the TCN floods very quickly across the whole network. The TC propagation is now a
one step process. In fact, the initiator of the topology change floods this information throughout
the network, as opposed to 802.1D where only the root did. This mechanism is much faster than
the 802.1D equivalent. There is no need to wait for the root bridge to be notified and then
maintain the topology change state for the whole network for <max age plus forward delay>
seconds.

In just a few seconds, or a small multiple of hello-times, most of the entries in the CAM tables of
the entire network (VLAN) flush. This approach results in potentially more temporary flooding,
but on the other hand it clears potential stale information that prevents rapid connectivity
restitution.
Convergence with 802.1w(RSTP)

Now, you can see how RSTP deals with a similar situation. Remember that the final topology is
exactly the same as the one calculated by 802.1D (that is, one blocked port at the same place as
before). Only the steps used to reach this topology have changed.

Both ports on the link between A and the root are put in designated blocking as soon as they
come up. Thus far, everything behaves as in a pure 802.1D environment. However, at this stage,
a negotiation takes place between Switch A and the root. As soon as A receives the BPDU of the
root, it blocks the non-edge designated ports. This operation is called sync. Once this is done,
Bridge A explicitly authorizes the root bridge to put its port in the forwarding state. This diagram
illustrates the result of this process on the network. The link between Switch A and the root
bridge is blocked, and both bridges exchange BPDUs.

Once Switch A blocks its non-edge designated ports, the link between Switch A and the root is
put in the forwarding state and you reach the situation:

There still cannot be a loop. Instead of blocking above Switch A, the network now blocks below
Switch A. However, the potential bridging loop is cut at a different location. This cut travels
down the tree along with the new BPDUs originated by the root through Switch A. At this stage,
the newly blocked ports on Switch A also negotiate a quick transition to the forwarding state
with their neighbor ports on Switch B and Switch C that both initiate a sync operation. Other
than the root port towards A, Switch B only has edge designated ports. Therefore, it has no port
to block in order to authorize Switch A to go to the forwarding state. Similarly, Switch C only
has to block its designated port to D. The state shown in this diagram is now reached:

Remember that the final topology is exactly the same as the 802.1D example, which means that
port P1 on D ends up blocking. This means that the final network topology is reached, just in the
time necessary for the new BPDUs to travel down the tree. No timer is involved in this quick
convergence. The only new mechanism introduced by RSTP is the acknowledgment that a
switch can send on its new root port in order to authorize immediate transition to the forwarding
state, and bypasses the twice-the-forward-delay long listening and learning stages. The
administrator only needs to remember these to benefit from fast convergence:

This negotiation between bridges is only possible when bridges are connected by point-
to-point links (that is, full-duplex links unless explicit port configuration).
Edge ports play an even more important role now that PortFast is enabled on ports in
802.1D. For instance, if the network administrator fails to properly configure the edge
ports on B, their connectivity is impacted by the link between A and the root that comes
up.

RSTP in case of direct link failure:

Another form of immediate transition to the forwarding state included in RSTP is similar to the
Cisco UplinkFast proprietary spanning tree extension. Basically, when a bridge loses its root
port, it is able to put its best alternate port directly into the forwarding mode (the appearance of a
new root port is also handled by RSTP). The selection of an alternate port as the new root port
generates a topology change. The 802.1w topology change mechanism clears the appropriate
entries in the Content Addressable Memory (CAM) tables of the upstream bridge. This removes
the need for the dummy multicast generation process of UplinkFast.

UplinkFast does not need to be configured further because the mechanism is included natively
and enabled in RSTP automatically.
Faster Aging of Information

On a given port, if hellos are not received three consecutive times, protocol information can be
immediately aged out (or if max_age expires). Because of the previously mentioned protocol
modification, BPDUs are now used as a keep-alive mechanism between bridges. A bridge
considers that it loses connectivity to its direct neighbor root or designated bridge if it misses
three BPDUs in a row. This fast aging of the information allows quick failure detection. If a
bridge fails to receive BPDUs from a neighbor, it is certain that the connection to that neighbor is
lost. This is opposed to 802.1D where the problem might have been anywhere on the path to the
root.

Accepts Inferior BPDUs

This concept is what makes up the core of the BackboneFast engine. The IEEE 802.1w
committee decided to incorporate a similar mechanism into RSTP. When a bridge receives
inferior information from its designated or root bridge, it immediately accepts it and replaces the
one previously stored.

Because Bridge C still knows the root is alive and well, it immediately sends a BPDU to Bridge
B that contains information about the root bridge. As a result, Bridge B does not send its own
BPDUs and accepts the port that leads to Bridge C as the new root port.

RSTPs BackboneFast Equivalent:

Normally, an RSTP bridge ignores proposal messages received on blocked ports. However, in
one special situation this rule is not observed. When a blocked port receives inferior BPDU (a
BPDU with different root bridge information), the local bridge does either of the following:

1. If the information received overrides the currently known root, new


synchronization process begins
2. If the local bridge knows better root bridge information, it immediately sends
back a proposal with this information encoded. This allows the inferior bridge
to quickly adapt a new path to the root bridge

Notice how different this process is from the original BackboneFast. Legacy process used
explicit RLQ messages to validate the currently known root bridge. RSTP process relies on the
previously cached information to respond back immediately. This should be possible by the
virtue of RSTP sync process, which is assumed to always maintain valid root bridge information
in the topology.

Multiple Spanning Tree:


Where to Use MST

This diagram shows a common design that features access Switch A with 1000 VLANs
redundantly connected to two distribution Switches, D1 and D2. In this setup, users connect to
Switch A, and the network administrator typically seeks to achieve load balancing on the access
switch Uplinks based on even or odd VLANs, or any other scheme deemed appropriate.

These sections are example cases where different types of STP are used on this setup:

PVST+ Case

In a Cisco Per-VLAN Spanning Tree (PVST+) environment, the spanning tree parameters are
tuned so that half of the VLANs forward on each Uplink trunk. In order to easily achieved this,
elect Bridge D1 to be the root for VLANs 501 through 1000, and Bridge D2 to be the root for
VLANs 1 through 500. These statements are true for this configuration:

In this case, optimum load balancing results.


One spanning tree instance for each VLAN is maintained, which means 1000 instances
for only two different final logical topologies. This considerably wastes CPU cycles for
all of the switches in the network (in addition to the bandwidth used for each instance to
send its own Bridge Protocol Data Units (BPDUs)).

Standard 802.1q Case

The original IEEE 802.1q standard defines much more than simply trunking. This standard
defines a Common Spanning Tree (CST) that only assumes one spanning tree instance for the
entire bridged network, regardless of the number of VLANs. If the CST is applied to the
topology of this diagram, the result resembles the diagram shown here:
In a network running the CST, these statements are true:

No load balancing is possible; one Uplink needs to block for all VLANs.
The CPU is spared; only one instance needs to be computed.

Note: The Cisco implementation enhances the 802.1q in order to support one PVST. This feature
behaves exactly as the PVST in this example. The Cisco per-VLAN BPDUs are tunneled by pure
802.1q bridges.

MST Case

MSTs (IEEE 802.1s) combine the best aspects from both the PVST+ and the 802.1q. The idea is
that several VLANs can be mapped to a reduced number of spanning tree instances because most
networks do not need more than a few logical topologies. In the topology described in the first
diagram, there are only two different final logical topologies, so only two spanning tree instances
are really necessary. There is no need to run 1000 instances. If you map half of the 1000 VLANs
to a different spanning tree instance, as shown in this diagram, these statements are true:

The desired load balancing scheme can still be achieved, because half of the VLANs
follow one separate instance.
The CPU is spared because only two instances are computed.

From a technical standpoint, MST is the best solution. From an end-user's perspective, the main
drawbacks associated with a migration to MST are:

The protocol is more complex than the usual spanning tree and requires additional
training of the staff.
Interaction with legacy bridges can be a challenge. For more information refer, to the
Interaction Between MST Regions and the Outside World section of this document.
MST Region

As previously mentioned, the main enhancement introduced by MST is that several VLANs can
be mapped to a single spanning tree instance. This raises the problem of how to determine which
VLAN is to be associated with which instance. More precisely, how to tag BPDUs so that the
receiving devices can identify the instances and the VLANs to which each device applies.

The issue is irrelevant in the case of the 802.1q standard, where all instances are mapped to a
unique instance. In the PVST+ implementation, the association is as follows:

Different VLANs carry the BPDUs for their respective instance (one BPDU per VLAN).

The Cisco MISTP sent a BPDU for each instance, including a list of VLANs that the BPDU was
responsible for, in order to solve this problem. If by error, two switches were misconfigured and
had a different range of VLANs associated to the same instance, it was difficult for the protocol
to recover properly from this situation.

The IEEE 802.1s committee adopted a much easier and simpler approach that introduced MST
regions. Think of a region as the equivalent of Border Gateway Protocol (BGP) Autonomous
Systems, which is a group of switches placed under a common administration.

MST Configuration and MST Region

Each switch running MST in the network has a single MST configuration that consists of these
three attributes:

1. An alphanumeric configuration name (32 bytes)


2. A configuration revision number (two bytes)
3. A 4096-element table that associates each of the potential 4096 VLANs supported on the
chassis to a given instance. (MST instance to VLAN mapping table.)

In order to be part of a common MST region, a group of switches must share the same
configuration attributes. It is up to the network administrator to properly propagate the
configuration throughout the region. Currently, this step is only possible by the means of the
command line interface (CLI) or through Simple Network Management Protocol (SNMP). Other
methods can be envisioned, as the IEEE specification does not explicitly mention how to
accomplish that step.

Note: If for any reason two switches differ on one or more configuration attribute, the switches
are part of different regions. For more information refer to the Region Boundary section of this
document.

When switches have the same attributes configured they will be in the same region. If the
attributes are not the same the switch is seen as being at the boundary of the region. It can be
connected to another MST region but also talk to a switch running another version of spanning
tree.
The MST configuration name is just something you can make up, its used to identify the MST
region. The MST configuration revision number is also something you can make up and the
idea behind this number is that you can change the number whenever you change your
configuration. It doesnt matter what you pick as long as its the same on all switches within the
MST region. VLANs will be mapped to an instance by using the MST instance to VLAN
mapping table. This is something we have to do ourselves.

Within the MST region we will have one instance of spanning tree that will create a loop free
topology within the region. When you configure MST there is always one default instance used
to calculate the topology within the region. We call this the IST (Internal Spanning Tree). By
default Cisco will use instance 0 to run the IST. In case you were wonderingits rapid spanning
tree that we run within the MST.

I could create instance 1 for VLAN 100 200 and instance 2 for VLAN 201 300. Depending
on which switch will become root bridge for each instance a different port will be blocked. It
could look like this:
The switch outside the MST region doesnt see what the MST region looks like. For this switch
its like its talking to one big switch or a black box:

You might also like