IP Multicast Tutorial PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 175

Multicast Standards

Marshall Eubanks
[email protected]
Bali
February, 2007

APRICOT Meeting 2007

Information
http://www.multicasttech.com/faq/
http://www.sprint.net/multicast/faq.html
ftp://ftpeng.cisco.com/ipmulticast.html
Multicast hardening : http://www.juniper.net/solutions/literature/
app_note/350051.pdf
tutorial-style paper at:
http://multicast.internet2.edu/almeroth.pdf

Status Of Inter-domain deployment : http://www.multicasttech.com/status

Books: See http://www.multicasttech.com/faq/index.html#Books


Interdomain Multicast Routing,
Edwards, Giuliano, Wright

(Addison-Wesley)

Interdomain Multicast Solutions Guide,


Beau Williamson (Cisco Press)

Apricot Meeting 2007

Acknowledgements
Greg Shepherd
Beau Williamson
Bill Nickless
Caren Litvanyi
Patrick Dorn
Leonard Giuliano
Alan Crosswell
University of Oregon
Cisco Systems
Juniper Networks
Columbia University
Internet 2 Multicast Workshop Team
Apricot Meeting 2007

Contents
Overview
Multicast on the LAN
Source-Specific Multicast (SSM)
Any-Source Multicast (ASM)
Intra-domain SSM
Inter-domain ASM
Troubleshooting Methodology
Making the Case for Multicast
Apricot Meeting 2007

Overview

Apricot Meeting 2007

The Basic Idea


Rather than sending a separate copy
of the data for each recipient, the source
sends the data only once, and routers
along the way to the destinations
make copies as needed.
Unicast does mass mailings;
multicast does chain letters.

Apricot Meeting 2007

Unicast vs. Multicast


Unicast

Multicast

Apricot Meeting 2007

Some Uses for Multicast


Any application with multiple receivers
one-to-many or many-to-many
Live video distribution
Collaborative groupware
Periodic data delivery - push technology
stock quotes, sports scores, magazines,
newspapers
advertisements

Apricot Meeting 2007

Some More Uses for Multicast


Server/web site replication
Reducing network/resource overhead
more efficient to establish multicast tree
rather than multiple point-to-point links
Resource discovery
Distributed interactive simulation
war games
virtual reality

Apricot Meeting 2007

What Happened to Multicast?


By 1995, multicast seemed well on its way to
adoption.
The MBone (Multicast backBone) had been set up and
was growing.
Audiocasts and Videocasts of meetings, seminars, etc.,
were fairly routine.
Serious interest was coming from industry.

So why isnt it ubiquitous now ?


The hype got ahead of the technology!
The original technology was not suitable for adoption
throughout the Internet. Basic parts had to be reengineered.
This took from 1997 to early 2001.

Apricot Meeting 2007

10

11

The MBone
The original multicast network was called the MBone. It used
a simple routing protocol called DVMRP (Distance Vector
Multicast Routing Protocol).
As there were only isolated subnetworks that wanted to deal
with DVMRP, the old MBone used tunnels to get multicast
traffic between DVMRP subnetworks.
i.e., the multicast traffic was hidden and sent between the
subnetworks via unicast.
This mechanism was simple, but required manual
administration and absolutely could not scale to the entire
Internet.
Worse, DVMRP requires substantial routing traffic behind
the scenes and this grew with the size of the MBone.
Thus, the legend grew that multicast was a
bandwidth hog.

Apricot Meeting 2007

Multicast Grows Up

12

Starting about 1997, the building blocks for a multicast-enabled


Internet were put into place.
An efficient modern multicast routing protocol, Protocol
Independent Multicast Sparse Mode (PIM-SM), was
deployed.
The mechanisms for multicast peering were established,
using an extension to BGP called Multiprotocol BGP (MBGP),
and peering became routine.
The service model was split into:
a many-to-many part (e.g., for videoconferencing):
Any-Source Multicast (ASM), and
a one-to-many (or broadcast) part:
Source-Specific Multicast (SSM).
By 2001, these had completely replaced the old MBone.
This path is not unusual for new technology...

Apricot Meeting 2007

13

The Life Cycle of New Technologies in General

(From Lawrence Orans of Gartner)

Apricot Meeting 2007

14

The Life Cycle of Multicast in Particular


($ 6 billion USD cant be wrong)

Triple Play !

Apricot Meeting 2007

Deployment of Multicast

15

Multicast Technologies has been monitoring the state of multicast since


April 2001... And the number of Multicast enabled Autonomous
Systems is higher than ever (although the growth is slow)

From http://www.multicasttech.com/status/

Apricot Meeting 2007

16

Multicast Terminology: the basics


receivers

source

Multicast stream
e.g., video server

source = origin of multicast stream


multicast address = an IP address in the Class D range (224.0.0.0 239.255.255.255),
used to refer to multiple recipients. A multicast address is also called a multicast group
or channel.

multicast stream = stream of IP packets with multicast address for IP destination


address.
(S,G) = (source, group) reference
All multicast uses UDP packets
receiver(s) = recipient(s) of multicast stream

Apricot Meeting 2007

Multicast Protocol Summary


(Some Alphabet Soup!)
IGMP - Internet Group Management Protocol, used by
hosts and routers to tell each other about group
membership. Two versions in wide use, v2 and v3.
PIM-SM - Protocol Independent Multicast-Sparse
Mode, used to propagate forwarding state between
routers.
MBGP - Multiprotocol Border Gateway Protocol, used
to exchange routing information for inter-domain RPF
checking. AKA BGP 4+.
MSDP - Multicast Source Discovery Protocol, used to
exchange active source information between RPs.
ASM (Any Source Multicast) - The original Service
Model
SSM, BiDir - New PIM Service models
one to2007
many
Apricotfor
Meeting
and many to many multicasts

17

18

(S,G) notation
For every multicast source there must be two
pieces of information: the source IP address, S, and
the group address, G.
These correspond to the sender and receiver
addresses in unicast.
This is generally expressed as (S,G).
(*,G) is every source for a particular group.
You join to a Group - that is, you join to either {*,G}
(for ASM) or {S,G} (for SSM), using IGMP.
Internal to the protocol, ASM may generate {*,G} or {S,G}
joins or leaves

Apricot Meeting 2007

19

IP Multicast building blocks


The SENDERS send without worrying about receivers
Packets are sent to a multicast address (RFC 1700)
This is in the class D range (224.0.0.0 239.255.255.255)

The RECEIVERS inform their first hop router what


they want to receive
done via Internet Group Management Protocol (IGMP),
version 2 (RFC 2236) or later

The network makes sure the STREAMS make it to the


correct receiving networks.
Current Multicast routing protocol: PIM-SM
Older ones included DVMRP and MOSPF

Apricot Meeting 2007

20

Essential IP Multicast Protocols


Receivers
Delivery tree
Membership reports
Senders

Multicast Routing
Protocol (e.g. PIM-SM)

Group Management
Protocol (e.g. IGMP)

Group Management Protocol - enables hosts to dynamically join/leave multicast


groups. Receivers send group membership reports to the nearest router.

Multicast Routing Protocol - enables routers to build a delivery tree between the
sender(s) and receivers of a multicast group.

This split between routing protocol and edge protocol means that end users do
not need to know about the routing protocol, nor do they directly interact with it.

Apricot Meeting 2007

21

Multicast Routing

Multicast routing can be thought of as the


reverse of unicast forwarding.
Unicast forwarding is concerned with where
the packet is going.
Multicast routing is concerned with where
the packet will be coming from.
Multicast paths to receivers form a tree. The
tree is built (or torn down) from the receiver
back toward the source.
Apricot Meeting 2007

22

PIM-SM
Protocol Independent Multicast - Sparse Mode
The core multicast protocol: builds and tears down
multicast trees
RFC 4601 has obsoleted RFC 2362; bootstrap router
removed from PIM spec and the description is much
improved.
Sparse Mode means explicit joins : the protocol assumes
that not everyone wants the data
Uses externally-provided (i.,e., Unicast) reachability
table to build forwarding topology.
Each router maintains an outgoing interface list (OIL)
for each (S,G) and (*,G) for which it has downstream
listeners. Multicast packets received from a given
source for a given group are sent out only on the
interfaces specified in the appropriate OIL.

Apricot Meeting 2007

23

Reverse Path Forwarding


A multicast join starts at the receiver. How does it know
the direction packets should take from the source ?
Reverse Path Forwarding is used

The idea is simple. At each hop from the receiver, the


router looks up where a multicast packet should have
come from, and that is the direction that the join is sent
to.
At each hop, the routers create state, and when the joins
reach the source (or an already existing part of the tree),
the multicast data flows down the created tree to the
receiver.
Used for filtering too - only accept packets from any
source on an interface when that interface is the RPF
direction.
Apricot Meeting 2007

Reachability and RPF

24

When a unicast packet shows up on an interface, the destination


address is looked up in the unicast forwarding table to determine
where the router should send the packet next.
When a multicast (S,G) Join shows up on an interface, the source
address, S, is looked up in the reverse-path forwarding (RPF) table
to determine how the router should join the source-based
forwarding tree. This information is used to build the multicast
forwarding tree.
For a (*,G) Join the RPF is towards the RP
The unicast forwarding table and the RPF table contain the same
kind of information unicast routes, or reachability
information and may in fact be the same table.
The point of having separate tables is to enable separate policies
and paths for unicast forwarding and RPF. You need MBGP, IS-IS,
or static mroutes to do this.
Once the multicast forwarding tree is built, multicast forwarding
works similarly to unicast forwarding.

Apricot Meeting 2007

Multicast Distribution Trees


The path taken by multicast data is called a tree.
Routing loops are not allowed, so there is always a
unique series of branches between the root of the
tree and the receivers for any group.
The trunk of the tree is either the RP or the Source
Edge nodes in the tree are called leaves.

The tree is built based on RPF reachability


information.
Joins travel the same directions as unicast packets
would towards the RP or Source.
(Assuming the reachability is the same)

State is remembered in the tree.


Data floods down the tree in the opposite direction.
So data flows from where unicast packets would go to.

Apricot Meeting 2007

25

26

Multicast Distribution Trees


A shortest path tree (SPT) is a tree rooted in a
multicast source. An SPT is sometimes called a
source tree.
A rendezvous point tree (RPT) is a tree rooted in a
multicast rendezvous point (RP). An RPT is
sometimes called a shared tree.
In the original multicast service model, a
connection between a source and a receiver is first
set up by building an RPT from the receiver back
to the RP, and an SPT from the RP back to the
source. Once data starts flowing to the receiver, an
SPT is built directly from the receiver back to the
source.
Apricot Meeting 2007

27

Shortest Path Tree


Source

State Information:
(S, G)
S = Source
G = Group
Group Member 1

Group Member 2

Apricot Meeting 2007

28

Rendezvous Point Tree


Source 1
Rendezvous Point

Source 2
Shortest Path Trees

State Information:
(*, G)
* = Any Source
G = Group

RP Tree

Group Member 1

Group Member 2

Apricot Meeting 2007

29

Multicast Distribution Trees Compared

Shortest Path Tree


More resource-intensive; requires more state n
(S x G)
You get optimal paths from source to all
receivers, which minimizes delay
Best for one-to-many distribution
Rendezvous Point Tree
Uses less resources; requires less memory n(G)
You may get suboptimal paths from source to all
receivers, depending on topology
The RP itself and its location may affect
performance
Best for many-to-many distribution
Necessary for in-band source discovery
Apricot Meeting 2007

30

Multicast Addressing
IPv4 Multicast Group Addresses
224.0.0.0239.255.255.255
Class D Address Space
High order bits of 1st Octet = 1110

Source sends to group address


Receivers receive traffic sent to group address

Apricot Meeting 2007

31

CIDR Address Notation


The multicast address block is 224.0.0.0 to
239.255.255.255
It is cumbersome to refer to address blocks in the above
fashion. Address blocks are usually described using
CIDR notation
This specifies the start of a block, and the number of bits
THAT ARE FIXED.
In this shorthand, the multicast address space can be described
as 224.0.0.0/4 or, even more simply, as 224/4. The fixed part of
the address is referred to as the prefix, and this block would be
pronounced "two twenty four slash four."

Note that the LARGER the number after the slash, the
LONGER the prefix and the SMALLER the address block.

Apricot Meeting 2007

32

Multicast Addressing
RFC 3171
http://www.iana.org/assignments/multicast-addresses
Examples of Reserved & Link-local Addresses
224.0.0.0 - 224.0.0.255 reserved & not forwarded
224.0.0.1 - All local hosts
224.0.0.2 - All local routers
224.0.0.4 - DVMRP
224.0.0.5 - OSPF
224.0.0.6 - Designated Router OSPF
224.0.0.9 - RIP2
224.0.0.13 - PIM
224.0.0.15 - CBT
224.0.0.18 VRRP
239.0.0.0 - 239.255.255.255 Administrative Scoping
Ordinary multicasts dont have to request a multicast address
from IANA.

Apricot Meeting 2007

33

Scoping

The old way : use the TTL value to define scope


IP multicast packet must have TTL > some assigned limit TTL
at a boundary or it is discarded.
No longer recommended.
Administratively Scoped Addresses RFC 2365
239.0.0.0239.255.255.255 or 239/8
Private address space

Similar to RFC 1918 unicast addresses

Not used for global Internet traffic


Same addresses may be in used in different sub-networks
for different multicast sessions
Examples

Site-local scope: 239.253.0.0/16

Organization-local scope: 239.192.0.0/14

These are not universally followed.

IPv6 allows for a lot more Scoping !


Apricot Meeting 2007

34

Multicast Address Allocation


For a long time, this was a sore spot. There was no
way to claim or register a Multicast Class D address
like unicast address blocks can be registered.
For temporary teleconferences, this is not such a
problem, but it does not fit well into a broadcast
model.

Now, there are solutions:


For SSM, addresses dont matter, as the broadcast
address is really unique as long as the (S,G) pair is
unique.
For ASM, there is GLOP.
We are working to instantiate Extended GLOP
(eGLOP) - come to the APNIC meeting !

Apricot Meeting 2007

35

Multicast Addressing
GLOP addresses
Provides globally available private Class D
space
233.x.x/24 per AS number
RFC 2770
How?
Insert the 16-bit AS number into the
middle two octets of the 233/8
Online GLOP calculator:
www.shepfarm.com/multicast/glop.html
If you have an AS, you have multicast
addresses.
Apricot Meeting 2007

36

Expanding Multicast
Address Assignment
GLOP based address assignment has
worked well.
Every organization gets the same amount of
space, a /24.

What if you need more?


There is mechanism for requesting more
GLOP space: RFC 3138.
There is demand for this, and we are
working on instantiating it.

Apricot Meeting 2007

37

Apricot Meeting 2007

38

Multicast on the LAN

Apricot Meeting 2007

39

Multicast Addressing at Layer 2


An IPv4 multicast address is 32 bits, of which the first 4 bits are
always the same, leaving 28 bits.
A MAC multicast address is 48 bits, of which the first 24 bits are
always the same. One of the remaining bits is reserved, leaving 23
bits.
So, one multicast MAC address maps to 32 multicast IP addresses.
This means that multicast addresses can collide on the LAN.

Apricot Meeting 2007

40

Ethernet Multicast Addressing


IANA owns 01-00-5E vendor address block; half of it is assigned for IP multicast.

32-bit IP address

3
Class D address
1
1110 can be ignored, leaving 28 bits

48-bit Ethernet
address IEEE Ethernet multicast bit
0

2
4
000000010000000001011110 0
01-00-5E-

23 bits

available

4
7

00-00-00 thru 7F-FF-FF


0 = Internet multicast
1 = Reserved for other use

Since 23 bits is < 28 bits, there is a 25 or 32 to 1 degeneracy : For each


multicast address, there are 31 others with the same MAC address !

Apricot Meeting 2007

41

IGMP
Internet Group Management Protocol - how hosts tell routers about group
membership
Routers also solicit group membership from directly connected hosts
RFC 1112 specifies version 1 of IGMP
Supported on Windows 95
RFC 2236 specifies version 2 of IGMP
Supported on latest service pack for Windows, newer Windows releases,
and most UNIX systems
RFC 3376 specifies version 3 of IGMP
Provides source include-list capabilities (SSM!)
Included in Linux kernel 2.6 and later

Supported in XP and Windows Server 2003


Mac OS X support soon. (I hear.)

Apricot Meeting 2007

42

IGMPv2
Router:
sends Membership Query messages to All Hosts (224.0.0.1)
query-interval = 125 secs default
router with lowest IP address is Querier (rest non-queriers)
If lower-IP address query heard, back off to non-querier state
Other Querier Present Interval default: (robust-count x queryinterval) + (0.5 x query-response-interval) = 255 secs
listens for reports (whether querier or not) and adds group to
membership list for that interface
query-response-interval = 10 secs default
timeout (Group member interval) default:
(robust-count x query-interval) + (1 x query-response-interval) =
260 sec
robust-count - provides fine-tuning to allow for expected packet loss
on a subnet. Default = 2 (tunable from 2-10)

Apricot Meeting 2007

43

IGMPv2
Host:
sends Membership Report messages to groups
it is a member of
waits 0-10 sec (default)
Hosts listen to other host reports
Only 1 host responds (all listen)
sends unsolicited Membership Reports (i.e., Join
Messages) to group address (e.g. 224.10.8.5)
sends Leave messages to All Routers (224.0.0.2)
reports group membership ONLY no sources. Only
the existence of local group members is reported,
not the actual members themselves
Apricot Meeting 2007

44

IGMP v2 Protocol Flow - Join a Group


I want
to JOIN!

Router adds group


230.0.0.
1

Forwards stream

230.0.0.
1
230.0.0.
1

I want 230.0.0.1

Router triggers group membership request to PIM.


Hosts can send unsolicited join membership messages called reports in
the RFC (usually more than 1)
Or hosts can join by responding to periodic query from router

Apricot Meeting 2007

45

IGMP v2 Protocol Flow - Querier


Still
interested?
(general query)
224.0.0.
1

Yes, me!
0-10
sec

230.0.0.
1

230.0.0.1 group

125 sec

I want 230.0.0.1

230.0.0.
1

224.0.0.
1

Hosts respond to query to indicate (new or continued) interest in group


(s)
only one host should respond per group
This implies that all hosts have to listen for this traffic !

Hosts fall into idle-member state when same-group report


heard.
After 260 sec with no response, router times out group.

Apricot Meeting 2007

IGMPv2 Protocol Flow Leave a Group


Anyone still
want this group?
224.0.0.1

46

I want
to leave!
224.0.0.2
<230.0.0.1>

<230.0.0.1>

1 sec (re-transmit timer)

I dont want
230.0.0.1 anymore

224.0.0.1
<230.0.0.1>

230.0.0.1 group

Hosts that support IGMPv2 send leave messages to all-routers


group indicating group theyre leaving.
Router follows up with 2 group-specific queries messages
IGMPv1 hosts leave by not responding to queries (260 sec timeout)

Apricot Meeting 2007

47

Soft State
Say I set up an active Multicast group, say by issuing a
membership report. What happens if my computer goes down and
never directly leaves the group ?
This is fixed with Soft State
Everything has a timer, and if not periodically reinitiated the
timer will expire and the state will be removed.
So there is no danger of some rogue group lasting forever.

Apricot Meeting 2007

48

IGMPv3
Specified in RFC 3376
Enables hosts to listen only to a specified subset of the sources
sending to the group
Source = 1.1.1.1
Group = 224.1.1.1

R2

R1

Source = 2.2.2.2
Group = 224.1.1.1

Video Server

Video Server

R3
H1 wants to receive from
S = 1.1.1.1 but not from S
= 2.2.2.2
With IGMPv3, specific
sources can be pruned
back - S = 2.2.2.2 in this
case

IGMPv3: MODE_IS_INCLUDE
Join 1.1.1.1, 224.1.1.1

H1 - Member of 224.1.1.1

Apricot Meeting 2007

49

IGMPv3 Enhancements
Group-Source Report message is defined. Enables
hosts to specify which senders it can receive or not
receive data from.
Group-Source Leave message is defined. Enables
host to specify the specific IP addresses of a
(source,group) that it wishes to leave.

Apricot Meeting 2007

50

Switches and Snooping


IGMP host reports (Joins) tell the router to start sending
multicast traffic to the LAN, since one or more hosts on the
LAN are members of the group.
In a conventional shared broadcast LAN using switches
that have no multicast smarts, the traffic is sent to all
hosts.
With multiple high bandwidth multicast sources (e.g. video
at 5 Mbps), this does not scale beyond approximately one
source.
There are a few techniques used to deal with this...

Apricot Meeting 2007

51

IGMP Snooping
Implemented by several vendors. Support for IGMPv2 is common;
support for IGMPv3 is rare, but becoming more common.
Alas, v2 Snooping switches tend not to do the right thing when v3
traffic is present.

What happens at the MAC layer:


IGMP snoopers add a bridge table entry for each multicast
group destination address (GDA) to each switch port that has
the interested member's unicast source address (USA) already
on it. (Remember that there are likely to be dumb hubs
downstream of switches, so more than one USA can be on a
single port.)
When an IGMP Leave is received, the GDA entries are pruned.

Apricot Meeting 2007

52

Why IGMP snooping is


harder than it looks
The IGMP membership reports have to be captured from each
host and suppressed to other hosts to prevent the others from
going into idle-member state; every interested host has to be
spoofed into thinking it is the only member of the group, so that it
actively sends membership reports. The IGMP snooper then
forwards one of these membership reports up to the router (or
makes up a fake membership report for itself).
Effectively, the snooping switch has to act like an IGMP
router.

Apricot Meeting 2007

Why IGMP snooping is


harder than it looks, continued

53

Since multiple USAs can be on a port (via dumb hub), the switch
has to actually do the IGMP membership query/timeout before
pruning a port.
Since membership reports are sent to the same GDA as the
(possibly high-bandwidth) multicast traffic, there is a potential for
heavy loading of the switch CPU, unless you use more expensive
ASICs that can separate the IGMP protocol messages from
general traffic and route only the IGMP messages to the CPU.
The switch has to know which is the multicast router port. It does
this by snooping for IGMP queries.

Apricot Meeting 2007

54

Problems with Multicast on the LAN


In general, multicast on the LAN is not as well
understood as multicast on the WAN.
Switch behaviors are not standardized.
Problems with switches:
when snooping is enabled, they may drop
packets that shouldnt be dropped.
even without snooping, sometimes they step
outside their bailiwick, trying to do non-Layer-2
tasks.

Apricot Meeting 2007

55

Case Study
This author traveled to Los Alamos, New Mexico to help
debug a multicast problem that had everyone stumped.
Everyone was assuming the only known router on the
subnet was also acting as the multicast gateway.
Unfortunately, this wasnt the case. A nominally Layer
2 switch on the subnet was accidentally configured with
PIM active, and won the PIM Designated Router
election. Of course, this Layer 2 switch had no upstream
to anywhere.
Bill Nickless

Apricot Meeting 2007

One Approach to
Multicast on the LAN
Avoid snooping, as it causes more problems than it solves.
Keep subnets small. A smaller subnet is less likely to have
people joining several different multicast groups, traffic for
each of which is sent to the entire subnet.
If at all possible, use routers, not switches or bridges.
If you have to use switches, try to at least buy them all from
the same vendor, so you wont have inconsistent behavior as
well as unexpected behavior.

Apricot Meeting 2007

56

57

Another Approach to
Multicast on the LAN
The previous approach reflects gigaPoP/WAN bias.
On a campus, it just isn't possible to use routers
everywhere.
Switches and snooping may be evils, but they are necessary
evils. Learn to cope with them. http://www.cisco.com/warp/
public/473/22.html
is a good place to start.

Apricot Meeting 2007

58

SSM

Apricot Meeting 2007

59

PIM-SM
SM stands for Sparse Mode.
RFC 2362 and RFC 4601
There is also a PIM-DM (Dense Mode), but I dont
recommend using it.
Cisco has a proprietary Sparse-Dense mode which is used
for RP discovery with auto-RP.
This can be dangerous if the RP is unreachable.

PIM-SM allows for both RPTs and SPTs.

There are two ways to use PIM-SM

Apricot Meeting 2007

60

ASM and SSM


ASM: Any-Source Multicast. Traditional multicast data
and joins are forwarded to an RP.
All routers in a PIM domain must have RP mapping.
When load exceeds threshold, forwarding switches to an
SPT. The default threshold is one packet; in this case,
the sole purpose of the RPT is to learn which sources
are active. (With IGMPv2, the receiver can only specify
the group, not specific sources.)
State increases (not everywhere) as number of sources
and number of groups increase.
SPT state is refreshed when data is forwarded and with
Join/Prune control messages.
SSM: Source-Specific Multicast. PIM-SM without RPs
instead, the source is learned out-of-band, and the SPT is
built directly to it.

Apricot Meeting 2007

61

SSM
Source-Specific Multicast (SSM) is a subset of ASM, so
SSM concepts apply directly to ASM, but
SSM is a lot simpler than ASM.
For these reasons, we cover SSM first in this workshop.
232 / 8 is assigned to SSM as an address space. Other
address ranges can also be set up for SSM this is
primarily a function of the receiving network.
Source activity and IP addresses are assumed known.
IGMPv3 allows for Include lists of (S,G) pairs.

Apricot Meeting 2007

62

SSM
SSM was given 232/8 IANA assigned
No RPTs or Register packets or encapsulation or .
Guarantees ONE source on any delivery tree
Content security no unwanted sources
Reduced protocol dependence more later...
Solves address allocation issues for inter-domain one-to-many
tree address is 64 bits S,G
Host must learn source address out-of-band (e.g, from a web page)
Host-to-router join request specifies source as well as group
requires IGMPv3 for include-source list
SSM behavior in 232/8 by default
Configurable to expand range

Apricot Meeting 2007

63

SSM in Action
Each (S,G) pair listed in the IGMPv3 include list generates a
(S,G) Join directly towards the source.
Thats it. Its very simple. All you need to implement is :
Edge routers need IGMPv3
Interior routers need filters to prevent RP (*,G) Joins &
other RP state for the SSM address block

Well, not quite


Snooping Switches need upgrading
OS stack and applications need upgrading

Apricot Meeting 2007

64

SSM Group Addresses


232 / 8 is assigned to SSM as an address space.
BUT, you can use 239/8 also for SSM.

You dont have to ask, you can just pick one


and use it.
How can this be ?
Note that all joins are unique as long as the
combination of S and G are unique. Not only
can one source support multiple groups, but if
there are two sources using the same group
address, everything works just fine.
If you have an IP address, you can source
traffic.
Apricot Meeting 2007

65

SSM
RP

Sourc
e

Receiver announces desire


to join group G AND source S
with an IGMPv3 include-list.

IGMPv3 host report

Last-hop router joins the SPT.

(S, G) Join
Shortest Path Tree
Traffic Flow
Receiver

(S,G) state is built between


the source and the receiver.

Apricot Meeting 2007

66

SSM
RP

Sourc
e

Data flows down the shortest


path tree to the receiver.
Shortest Path Tree
Traffic Flow
Receiver

Apricot Meeting 2007

67

Apricot Meeting 2007

68

ASM

Apricot Meeting 2007

69

ASM
Allows both SPTs and RPTs
RP:
Matches senders with receivers
Provides network source discovery

Typically uses RPT to bootstrap to get SPT


Switch either after 1 packet, or never

RPs can be learned via:

Static configuration recommended


Anycast-RP recommended
Auto-RP (PIM-SM v1 & v2) not recommended
Bootstrap Router (PIM-SM v2) not recommended

Apricot Meeting 2007

70

ASM: the original multicast service model


From RFC 1112 :
Packet transmission is based on UDP, so packet delivery is
best-effort, with no loss detection or retransmission
A source can send multicast packets at any time, with no need to
register or schedule transmissions.
Sources do not know the group membership. A group may have many
sources and many members.
Group members may come and go at will, with no need to coordinate
with a central authority.
And, critically, group members know only the group. They dont need to
know anything about sources not even whether or not any sources
exist.

This is the ASM paradigm. It requires sender registration and treeswitching, which make it much more complex than SSM.

Apricot Meeting 2007

71

Designated Router (DR)


DR sends
Join/Prune messages toward the RP from receiver network
Register messages toward the RP from source network
Include encapsulation of the data packets received

Selecting the DR:


Neighboring PIM-SM routers multicast periodic Hello messages to
each other (default is every 30 seconds; the hello-interval is tunable
for faster convergence).
On receipt of a Hello message, a router stores the IP address and
priority for that neighbor.
The router with highest IP address is selected as the DR, if the
priorities match.
When DR goes down, a new one is selected by scanning all neighbors on
the interface and choosing the one with the highest IP address.

Apricot Meeting 2007

72

Intra-domain ASM

Apricot Meeting 2007

73

ASM RP Tree Join


RP

Receiver announces desire


to join group G with IGMPv2
host report (*,G).
DR sends PIM (*,G)
Join toward the RP;
subsequent routers
do likewise.

IGMPv2 host report


(*, G) Join
RP Tree
Receiver

(*, G) state created from


the RP to the receiver.

Apricot Meeting 2007

74

ASM Sender Registration


RP

Sourc
e

Active source triggers DR


to send (S,G) Register
message to RP.
RP Tree
Traffic Flow
(S, G) Register
(S, G) Join
Shortest Path Tree

RP sends (S,G) Join to source.


(unicast
)

Receiver

(S, G) state created only


along the SPT.

Apricot Meeting 2007

75

ASM Sender Registration


RP

Sourc
e

(S, G) traffic begins


arriving at the RP via
the SPT.
RP Tree
Traffic Flow
(S, G) Register
Shortest Path Tree
(S, G) Register-Stop

(unicast
)

Receiver

RP sends a Register-Stop
back to the first-hop
router to stop the Register
process.

(unicast
)

Apricot Meeting 2007

76

ASM Sender Registration


RP

Sourc
e

Source traffic flows natively


along SPT to RP.

From RP, traffic flows down


the RPT to the receiver.

Traffic Flow
RP Tree
Shortest Path Tree
Receiver

Apricot Meeting 2007

77

How can PIM do both


SPTs and RPTs ?
PIM-SM includes a mechanism for switching
from an RPT to an SPT.
Generally, this is configured to happen
immediately, i.e., at most one packet flows
down the RPT.
This means that the placement of the RP and
the speed of the links to it are not so
important.

This gets pretty complicated!


Apricot Meeting 2007

78

How Tree-Switching Works


1) Source sends data to its First Hop Router.
2) First Hop Router encapsulates it and unicasts it
to the RP in a Register message.
3) If the RP has receivers that belong to the group,
it sends the data down the RPT.
Receivers may join before or after the source
starts.
Receivers join just the group (i.e., a (*,G) Join).
4) The RP issues a (S,G) Join towards the source, S,
in order to receive the data through multicast.
Apricot Meeting 2007

79

Tree-Switching (continued)
1) The routers in the path learn about the
particular S from the traffic down the RPT.
2) The receivers first hop router issues a (S,G)
Join toward the source.
3) This travels hop by hop towards the source
until it finds a router with (S,G) state maybe
at the source, maybe closer.
4) The (S,G) data starts flowing directly from the
source to the receiver.

Apricot Meeting 2007

80

Tree-Switching (continued)
1)
2)
3)

Some router is receiving two copies of the data,


from the SPT and the RPT.
This router drops the data from the RPT, and
issues an (S,G) Prune message toward the RP.
This travels hop by hop toward the RP,
pruning the state, until it reaches a router
that needs the (S,G) data for some other
receiver.

Apricot Meeting 2007

81

ASM SPT Cutover


RP

Sourc
e

Last-hop router joins the SPT.

Traffic Flow
RP Tree
Shortest Path Tree
(S, G) Join

Receiver

Additional (S, G) state is created


along a new part of the SPT.

Apricot Meeting 2007

82

ASM SPT Cutover


RP

Sourc
e

Traffic begins flowing down the


new branch of the SPT.

Traffic Flow
RP Tree
Shortest Path Tree
(S, G) RP-bit Prune

Receiver

Additional (S, G) state is created


along the RPT to prune
off (S, G) traffic.

Apricot Meeting 2007

83

ASM SPT Cutover


RP

Sourc
e

(S,G) traffic flow is now


pruned off of this branch of
the RPT and is flowing to
the receiver via the SPT.

Traffic Flow
RP Tree
Shortest Path Tree
Receiver

Traffic for other sources


may still be flowing down
the RPT.

Apricot Meeting 2007

84

ASM SPT Cutover


RP

Sourc
e

(S, G) traffic flow is no


longer needed by the RP, so
it prunes the flow of (S, G)
traffic.

Traffic Flow
RP Tree
Shortest Path Tree
(S, G) Prune

Receiver

Apricot Meeting 2007

85

ASM SPT Cutover


RP

Sourc
e

(S, G) Traffic flow is now only


flowing to the receiver via a
single branch of the SPT.

Traffic Flow
RP Tree
Shortest Path Tree
Receiver

As long as the source


remains active, its first-hop
router sends Null-Register
messages to the RP, enabling
the RP to maintain a list of
all active sources.

Apricot Meeting 2007

86

RP Discovery Options
Static RP
Recommended
Easy transition to Anycast-RP
Allows for a hierarchy of RPs
Auto-RP (Cisco proprietary)
Fixed convergence timers (slow)
Must flood RP mapping traffic
bootstrap router
Fixed convergence timers (slow)
Allows for a hierarchy of RPs

Apricot Meeting 2007

87

RP Options
In many cases, static RP is the best option:
simple: just tell every router the RP address (once!)
flexible: use a /32 on a loopback interface so it can
be moved
scalable: add more instances of same RP address for
redundancy, load splitting, topological localization,
etc.
survivable: fail-over from one RP to another is as
fast as IGP convergence
blessed: RFC 3446 (just 8 pages!)
Only use more complicated options if you really need
to:
different RP(s) for different groups
see later Anycast-RP slides for details
Apricot Meeting 2007

Commonly-Used
Cisco Multicast Forwarding
State Flags

APRICOT Meeting 2007

89

S Sparse Mode Flag


The S flag indicates that the multicast group is
a sparse mode group.
Appears only on (*,G) entries

Apricot Meeting 2007

90

D Dense Mode Flag


The D flag indicates that the multicast group is
a dense mode group.
Appears only on (*,G) entries

Apricot Meeting 2007

91

F Register flag
F flag means that Register messages are being
sent to the Rendezvous Point (RP)
Set on Designated Router (DR) state for each
(S,G)
Also set on the parent (*,G) state if any child
(S,G) state entry has the F flag set.

Apricot Meeting 2007

92

J Join SPT flag on (*,G)


Appears on (*,G) entries when the C flag is set
(theres a locally connected receiver).
Set when traffic rate on the RP Tree exceeds
the SPT-Threshold.
When set, the next packet down the (*,G) RP
Tree will create separate (S,G) state. (SPTcutover)

Apricot Meeting 2007

93

J Join SPT flag on (S,G)


(S,G) created by SPT-cutover.
PIM checks the rate every minute to decide
whether theres enough traffic to justify
maintenance of the (S,G) state.

Apricot Meeting 2007

94

T SPT Flag
The T flag (or SPT) flag indicates that traffic is
being forwarded via the (S,G) entries. That is,
on shortest path tree.
Appears only on (S,G) entries

Apricot Meeting 2007

95

R RP-bit flag
Shows up on (S,G) entries only
The (S,G) entrys RPF is towards the RP
(not towards S)
The outgoing interface list is often Null, or a
subset of the outgoing interface list on the (*,G)
parent entry
Used to prune (S,G) traffic from the RP tree
when downstream routers have joined the SPT
instead
Apricot Meeting 2007

96

C Connected Flag
The C flag indicates that theres a directly
connected receiver.
Shows up on (*,G) state from IGMPv2 Host
Membership reports.
Shows up on (S,G) state from IGMPv3 Host
Membership reports (without INCLUDE lists).

Apricot Meeting 2007

97

P Pruned Flag
The P flag indicates that either:
(1) the outgoing interface list is Null
-or(2) all interfaces in the outgoing interface list
are in the Prune state
The P flag results in a Prune being sent to the
upstream neighbor for this (S,G) entry.

Apricot Meeting 2007

98

A Advertised via MSDP


Shows up on (S,G) entries at the Rendezvous
Point (RP)
Indicates that Register messages are being
received, and converted by the RP into MSDP
Source Active messages for transmission to
peers.

Apricot Meeting 2007

99

M MSDP Created Entry


Shows up on (S,G) entries at the Rendezvous
Point (RP)
Indicates that the (S,G) entry was created
when an MSDP Source Active message was
received on a group for which there is group
interest.
Group Interest: a (*,G) entry exists, created
either by local IGMP or PIM (*,G) joins.

Apricot Meeting 2007

100

L Local Flag
The L flag indicates that the router itself is a
member of the group.
The router will process the traffic at the route
processor.
Example: PIM RP-Discovery (224.0.1.40)
Enabling "ip sdr listen" will cause this flag to
be seen

Apricot Meeting 2007

101

s SSM Group Flag


Shows up on (*,G) entries for SSM groups
Incoming and Outgoing Interfaces are Null
SSM group (*,G) parent entries exist only for
internal data structure purposes

Apricot Meeting 2007

Less-Commonly-Used
Cisco Multicast Forwarding
State Flags

APRICOT Meeting 2007

103

U URD
Shows up on (S,G) entries created by the CiscoProprietary URL Redirect protocol (URD)
URD provides SSM semantics on networks/
hosts that dont support IGMPv3 natively.
Host sends special HTTP request towards the
last-hop router on TCP port 659, which
specifies the source and group address.
Router intercepts that HTTP request and
creates this (S,G) state.

Apricot Meeting 2007

104

X Proxy-Join Timer
Shows up on (S,G) entries when a router
becomes the Turnaround Router
Deals with situations like RP On A Stick
State times out if not refreshed

Apricot Meeting 2007

105

I Source Specific Host Report


The I flag indicates that the directly connected
receiver indicated that it wants traffic from a
specific source on the specific group.
Shows up on (S,G) state created by IGMPv3
Host Membership reports (with INCLUDE
lists).

Apricot Meeting 2007

106

B Bi-Dir Mode Flag


The B flag indicates that the multicast group is a bidirectional group.
New PIM mode; hardware support is still limited.
http://www.ietf.org/internet-drafts/
draft-ietf-pim-bidir-06.txt
Bi-directional RPT
Intended for apps with many sources, but operating
within a single PIM domain
Appears only on (*,G) entries

Apricot Meeting 2007

107

Apricot Meeting 2007

108

Inter-domain ASM

Apricot Meeting 2007

109

MBGP
Multiprotocol extensions to BGP
MBGP overview
MBGP capability negotiation
Multicast Reachability

Apricot Meeting 2007

110

MBGP Overview
MBGP: Multiprotocol BGP
(aka BGP4+ but NOT Multicast BGP)
Makes it possible for multicast routing policies to differ from
unicast routing policies
Defined in RFC 2283 (extensions to BGP)
Can carry different route types for different purposes
Unicast
Multicast
IPv6

Both route types carried in same BGP session


Does not propagate multicast state information
Same path selection and validation rules
AS-Path, LocalPref, MED,

Apricot Meeting 2007

111

MBGP
Tag unicast prefixes as multicast source prefixes for intra-domain
mcast routing protocols to do RPF checks.
WHY? Allows for inter-domain RPF checking where unicast and
multicast paths are non-congruent.
DO I REALLY NEED IT?
YES, if:
ISP to ISP peering
Multiple-homed networks

NO, if:
You are single-homed

Apricot Meeting 2007

112

New multiprotocol attributes


MP_REACH_NLRI and MP_UNREACH_NLRI
Address Family Information (AFI) = 1 (IPv4)
Sub-AFI = 1 (NLRI is used for unicast forwarding)
Sub-AFI = 2 (NLRI is used for multicast RPF
check and MSDP peer-RPF check)
SAFI = 3 is defined by (AFAIK) not implemented

Apricot Meeting 2007

113

MBGP Capability Negotiation


BGP routers establish BGP sessions through the OPEN message
OPEN message contains optional parameters
BGP session is terminated if OPEN parameters are not recognised
New parameter: CAPABILITIES
Multiprotocol extension
Multiple routes for same destination
Configures router to negotiate either or both NLRI
If neighbor configures both or subset, common NLRI is used in both
directions
If there is no match, notification is sent and peering doesnt come up
If neighbor doesnt include the capability parameters in open, session
backs off and reopens with no capability parameters
Peering comes up in unicast-only mode

Apricot Meeting 2007

114

RIB Groups and JUNOS


In JUNOS, a routing table is called a RIB (Routing
Information Base)
Different RIBs are used for different functions
inet.0 - IPv4 unicast routes
inet.1 - IPv4 multicast forwarding cache (incoming/
outgoing interface lists)
inet.2 - IPv4 multicast RPF table
inet.3 - MPLS next-hops for BGP resolution
inet.4 - MSDP SAs
inet6.0 - IPv6 unicast routes, etc.

RIB group configuration is very powerful, but not


very intuitive
You can do anything if you can figure out how!

Apricot Meeting 2007

115

Multicast RPF Table


By default, PIM and MSDP use inet.0 for
RPF lookups
To use a dedicated RPF table
populate inet.2
configure PIM and MSDP to use inet.2

RPF table contains only unicast routes


Used for RPF checks for source or RP
address
Never contains multicast groups!

Apricot Meeting 2007

Create RIB Groups

116

Create RIB groups, and put in static, connected and OSPF routes
routing-options {
interface-routes {
rib-group inet if-rib;
}
static {
rib-group static-rib;
}
rib-groups {
mcast-rpf-rib {
import-rib inet.2;
}
if-rib {
import-rib [ inet.0 inet.2 ];
}
static-rib {
import-rib [ inet.0 inet.2 ];
}
ospf-rib {
import-rib [ inet.0 inet.2 ];
}
}
}
protocols {
ospf {
rib-group ospf-rib;
}
}

Apricot Meeting 2007

117

MBGP
By default, any routes with SAFI=2 will be
put into inet.2
Just need to configure the BGP sessions to
support multicast NLRI
Can be applied to all peers, or individual
peers
protocols {
bgp {
family inet {
unicast;
multicast;
}
}
}

Apricot Meeting 2007

118

PIM and MSDP


Finally, tell PIM and MSDP to use inet.2
when they do RPF checks
protocols {
msdp {
rib-group inet mcast-rpf-rib;
}
pim {
rib-group inet mcast-rpf-rib;
}
}

Apricot Meeting 2007

119

Cisco IOS Multicast Reachability


Unicast forwarding never uses the AF=Multicast
reachability topology
PIM uses both the AF=Unicast and the
AF=Multicast topologies
Use show ip rpf to see whats really going on
show ip mbgp lets you see some AF=Multicast
routes

Apricot Meeting 2007

120

Cisco IOS Multicast Reachability


Distance-preferred lookups (default)
Every route has an administrative distance
Best match is the route with the lowest administrative
distance, regardless of Address Family
Longest-prefix match (hidden command option)
Top-level configuration: ip multicast longest-match
Best match is the most specific route in either the
AF=Multicast or AF=Unicast tables.
Ties between AF=Multicast and AF=Unicast routes
Broken in favor of the AF=Multicast route

Apricot Meeting 2007

121

Cisco IOS Multicast Reachability


Example 1:
140.221.201.0/24 AF=Multicast
Ethernet 0/0
140.221.201.128/27 AF=Unicast
Ethernet 0/1

AD=20
AD=200

Distance-Preferred Lookup of 140.221.201.129


Administrative Distance of 20 dominates, so show ip rpf
140.221.201.129 will select Ethernet 0/0
Longest-prefix Match Lookup of 140.221.201.129
Longest prefix of /27 dominates, so show ip rpf 140.221.201.129
will select Ethernet 0/1

Apricot Meeting 2007

122

Cisco IOS Multicast Reachability


Example 2:
140.221.201.0/24 AF=Multicast
Ethernet 0/0
140.221.201.0/24 AF=Unicast
Serial 0/0

AD=200
AD=200

Ties are broken in favor of the AF=Multicast route


show ip rpf 140.221.201.161 will select Ethernet 0/0

Note that the default in IOS is to make the unicast AD and


multicast AD equal, as in this example.

Apricot Meeting 2007

123

MBGP Summary
Solves part of inter-domain problem
Can exchange unicast prefixes for multicast RPF checks
Uses standard BGP configuration knobs
Permits separate unicast and multicast topologies
if desired
Still must use PIM to:
Build distribution trees
Actually forward multicast traffic

Apricot Meeting 2007

Inter-domain ASM and MSDP


A PIM domain is a network in which all routers use
the same RP for any given multicast group.
Inter-domain SSM is easy. Because you know the IP
address of the source, you can issue PIM joins that
leave your PIM domain and travel hop by hop across
as many PIM domains as necessary.
Inter-domain ASM requires another protocol:
Multicast Source Discovery Protocol (MSDP).
Why? Because the receiver is restricted to sending only
(*,G) joins to its RP. And its RP doesnt know where
the source is, because the source is registered to a
different RP. MSDP is needed for the receiver's RP to
find the (S,G).
Officially, MSDP is a temporary solution. We shall see.

Apricot Meeting 2007

124

125

MSDP
MSDP sets up peering between RPs in different domains.
RFC 3618
These MSDP Peer RPs pass Source Active (SA) messages for
every (S,G) they know about.
The receivers RP thus knows about the source, and can
implement a direct (S,G) Join from the RP to the source.
This sets up an SPT to the local RP.
Then the routing can switch over to a direct SPT to the
receiver in the usual fashion.

Apricot Meeting 2007

126

MSDP Operation Flood & Join


Flood
SA (source active) packets periodically sent to MSDP peers
indicating:
source address of active streams
group address of active streams
IP address of RP originating the SA
RPs only originate SAs for your sources within your domain
Join
Interested parties can send joins towards source
(this creates inter-domain shortest path trees)

Apricot Meeting 2007

127

MSDP Source Active Messages


Initial SA message sent when source DR first registers
May optionally encapsulate first data packet
Supports SAP / SDR

Should be treated as if it was a PIM register packet


Originating RP sends subsequent SA messages every 60 seconds, for
as long as source remains active
Other MSDP peers dont originate this SA but only forward it if
received
SA messages must be cached on router
Recent change to Draft
Reduces join latency for new group members that might join
Prevents SA storm propagation

Apricot Meeting 2007

128

MSDP Overview
Domain E
MSDP Peers
RP

SA

Source Active
Messages

Domain C

Join (*, 224.2.2.2)

RP
S
A

Domain B SA

S
A

SA

RP
S
A
S
A
SA Message
192.1.1.1, 224.2.2.2

RP

RP
SA Message
192.1.1.1, 224.2.2.2

Domain D

Domain A

Register
192.1.1.1,
224.2.2.2

Apricot Meeting 2007

129

MSDP Overview
Domain E
MSDP Peers

Domain C
RP

Jo
(S, 224 in
.2.2.2)

RP

Multicast Traffic

Jo
(S, 224 in
.2.2.2)

Jo
(S, 224 in
.2.2.2)

Domain B
RP

RP

Domain D

RP

Domain A

Apricot Meeting 2007

130

MSDP Peers (inter-domain case)


MSDP establishes a neighbor relationship between MSDP peers
Peers connect using TCP port 639
Peers send keepalives every 60 secs (fixed)
Peer connection reset after 75 seconds if no MSDP packets or keepalives
are received
MSDP peers must have knowledge of multicast topology.
May be an MBGP peer, a BGP peer or both
Required for peer-RPF checking of the RP address in the SA to prevent SA
looping. Note that this is not the same thing as the multicast routing RPF
check.
Done with the RPF AS, not the RPF interface !

Exception: BGP is unnecessary when peering with only a single MSDP


peer (default-peer)

Apricot Meeting 2007

131

MSDP so far
Allows RPs to share information about which sources
in their domains are active sending.
Interconnects RPs (MSDP Peers) between domains,
using TCP connections to pass source active messages
(SAs).
SAs are Peer-RPF checked before accepting or
forwarding.
RPs may trigger (S,G) Joins on behalf of local
receivers.
MSDP connections typically parallel MBGP
connections.
Next: Peer-RPF checking in detail. This is complex.
Apricot Meeting 2007

132

MSDP RPF Rules


If any of the following tests pass, the SA is
accepted. For any given (S,G), there can be one
or more accepted SAs in the SA cache.
1. The MSDP peer sending the SA is the originating RP
2. The MSDP peer sending the SA is the eBGP next hop for
the originating RP
3. The MSDP peer sending the SA is the iBGP advertiser
for the originating RP
4. The MSDP peer sending the SA is in the same AS as the
next hop for the originating RP
5. The MSDP peer sending the SA is statically configured to
be the RPF peer

Apricot Meeting 2007

133

Receiving SA Messages
RPF Check rule example cases
Case 1: Sending MSDP Peer = iMBGP peer
Case 2: Sending MSDP Peer = eMBGP peer
Is best path to RP via this MBGP peer?
Case 3: Sending MSDP Peer != BGP peer
Is the next AS in best path to RP = AS of the
sending MSDP peer?

Apricot Meeting 2007

134

RPF Check Example #1


SA

Group address Source address


info Peer address Originator
233.0.87.1
129.79.19.170 172.16.0.2
172.16.0.2

AS2

1.) Is the MSDP peer == originating RP? Yes.


3.2

3.1

RP
A

Source

4.2

4.1
2.1

2.2

AS1

AS3

rp {
local {
address 172.16.0.2;

RPF Success!

MSDP
SA
MSDP/iMBGP peering

Apricot Meeting 2007

135

RPF Check Example #2


RPF Success!

SA

Group address Source address


info Peer address Originator
233.0.87.1
129.79.19.170 172.16.3.1
172.16.0.2

AS2

1.) Is the MSDP peer == originating RP? No.


3.2

2.) Is the MSDP peer == eBGP next hop


for originating RP route?

4.2

MBGP Table router B


A Destination Next hop AS path
* 172.16.0.2/32 >172.16.3.1 1 i

4.1
3.1

RP
A

Source

2.2

2.1

AS1

AS3

rp {
local {
address 172.16.0.2;

MSDP
SA
MSDP/iMBGP peering

Apricot Meeting 2007

Yes.

136

RPF Check Example #3


RPF Success!

rp {
local {
address 172.16.10.2;

AS2
Source
3.2

3.1

SA
Group address Source address
info

4.2

233.41.184.1

2.2

2.1

4.1

RP
A

AS1

134.68.1.1

Peer address Originator


172.16.2.2
172.16.10.2

1.) Does the MSDP peer == originating RP?

No.

2.) Does the MSDP peer == eBGP next


hop for originating RP route?

No.

MBGP Table router B


A Destination Next hop AS path
* 172.16.10.2/32 >172.16.2.2 1 i
3.) Does the MSDP peer == iBGP/IGP
advertiser for originating RP route?

MSDP
SA
MSDP/MBGP peering

MBGP Table router B


A Destination Next hop AS path
* 172.16.10.2/32 >172.16.2.2 1 i

Apricot Meeting 2007

Yes.

137

RPF Check Example #4


RPF Success!

SA

Group address Source address


info Peer address Originator
233.0.87.1
129.79.19.170 172.16.3.1
172.16.0.2

AS2
3.2

3.1

4.2

4.1

Source

AS1

rp {
local {
address 172.16.0.2;

MSDP
SA
MSDP/MBGP peering

2.) Does the MSDP peer == BGP next hop


for originating RP route?

No.

3.) Does the MSDP peer == iBGP


advertiser for originating RP route?

RP
A

No.

MBGP Table router B


A Destination Next hop AS path
* 172.16.0.2/32 >172.16.4.1 1 i
2.2

2.1

1.) Does the MSDP peer == originating RP?

4.) Is the MSDP peer in the same AS


as the first AS in the best path for the
originating RP route?
MBGP Table router B
A Destination Next hop AS path
* 172.16.0.2/32 >172.16.4.1 1 i
* 172.16.3.1/32 >172.16.4.1 1 i

Apricot Meeting 2007

No.

Yes.

138

RPF Check Example #5


SA
info

RPF Failure!

Group address Source address Peer address Originator


233.0.87.1
129.79.19.170 172.16.3.1
172.16.0.2

AS2
3.2

3.1

RP
A

Source

4.2

AS3

rp {
local {
address 172.16.0.2;

MSDP
SA
MSDP/iMBGP peering

2.) Does the MSDP peer == eBGP next


hop for originating RP route?

No.

3.) Does the MSDP peer == iBGP


advertiser for originating RP route?

2.2

AS1

No.

MBGP Table router B


A Destination Next hop AS path
* 172.16.0.2/32 >172.16.4.1 3 1 i

4.1
2.1

1.) Does the MSDP peer == originating RP?

4.) Is the MSDP peer in the same AS


as the first AS in the best path for the
originating RP route?
MBGP Table router B
A Destination Next hop AS path
* 172.16.0.2/32 >172.16.4.1 3 1 i
* 172.16.3.1/32 >172.16.4.1 1 i

Apricot Meeting 2007

No.
No.

139

MSDP debug
The only way to check why an MSDP SA is/isnt
accepted is to turn on MSDP debug printout. If
you do this, watch out! Lots of messages will
follow! It may kill your router.
See current MSDP Internet Draft for details.

Apricot Meeting 2007

140

MSDP Application: Anycast-RP

MSDP used intra-domain to provide RP redundancy


Becoming best common practice for large networks
Specified in RFC 3446
Allows deployment of multiple RPs within a domain (for the same
group range)
Adding more RPs does not require changes to non-RP routers
Sources and receivers use closest RP, as determined by the IGP
RPs share information about sources via MSDP mesh group
Note: MSDP peering uses normal address, not
Anycast-RP address

Apricot Meeting 2007

141

MSDP Application: Anycast-RP


Rules are fairly simple
Have e-MSDP peers and i-MSDP peers, similar to BGP
If a mesh group member originates a SA message
Send to all i-MSDP peers and any e-MSDP peers
If a mesh group member receives a SA message from an i-MSDP
peer
Send to any e-MSDP peers
Do NOT send to other i-MSDP peers
If a mesh group member received a SA message from an e-MSDP
peer
Check RPF if passes, then
Flood to all i-MSDP peers and any other e-MSDP peers.

Apricot Meeting 2007

142

Anycast-RP
RP1
lo0: 10.0.0.100
lo1: 10.0.0.1

Src

Rec

MSDP
Rec
RP2

Rec
Rec
Src

lo0: 10.0.0.200
lo1: 10.0.0.1

Anycast RP address is 10.0.0.1


MSDP peering is between 10.0.0.100 and 10.0.0.200
Apricot Meeting 2007

143

Anycast-RP
RP1
lo0: 10.0.0.100
lo1: 10.0.0.1

Src

Rec

Rec
RP2

Rec
Rec
Src

lo0: 10.0.0.200
lo1: 10.0.0.1

Anycast RP address is 10.0.0.1


MSDP peering is between 10.0.0.100 and 10.0.0.200
Apricot Meeting 2007

144

Apricot Meeting 2007

145

SSM Revisited

Apricot Meeting 2007

146

Rationale for SSM


Why go to all the trouble involved in using RPs
(tree-switching, MSDP) when the RPT is
dropped for the SPT as soon as the first packet
flows down the RPT?
The RP is not really forwarding data, just
doing source discovery.
Isnt there an easier way ?
This is the rationale for Source-Specific
Multicast (SSM).

Apricot Meeting 2007

147

SSM Makes MSDP Unnecessary


Domain E
ASM MSDP Peers
(irrelevant to
SSM)

RP

Domain C

Receiver learns
S AND G out of
band; e.g., from a
Web page

RP

Domain B
RP
RP

Domain D
Source in 232/8

RP

Domain A

Apricot Meeting 2007

148

SSM Makes MSDP Unnecessary


Domain E
ASM MSDP Peers
(irrelevant to
SSM)

RP

Domain C

Receiver learns
S AND G out of
band; e.g., from a
Web page

RP

Domain B
RP
RP

Domain D
Source in 232/8

RP

Domain A

Apricot Meeting 2007

149

Summary: Advantages of SSM

No RPTs
No register packets
No RP mapping required (no RP required!)
No RP-to-RP source discovery (no MSDP required!)
No RP means no concentration of traffic towards the
RP, and no single point to attack
Rogue sources cannot easily spoof traffic
SSM can use entire multicast address space, but
232/8 is reserved for SSM exclusively
The biggest disadvantage is simply that you have to
know the source
Can be a problem in many to many communications.

Apricot Meeting 2007

150

Bi-Directional Multicast

Apricot Meeting 2007

151

Bi-Directional Multicast
SSM is a subset of PIM to deal with one-tomany broadcasts.
Whats left are few-to-many and many-tomany group communications.
Can PIM be optimized for the many-to-many
case?
In many-to-many, every source is also a
receiver.
So in standard PIM, trees must be built to the
source as well as from the source.
Apricot Meeting 2007

152

Bi-Directional PIM
Still an I-D
draft-ietf-pim-bidir-08.txt

However, it is in commercial use


Example: Cisco Multicast Hoot 'n' Holler

http://www.cisco.com/en/US/netsol/ns340/ns394/
ns165/ns70/
networking_solutions_white_paper09186a00800a
8479.shtml
Hoot 'n' Holler connects a large number of push-to-talk
terminals, all of which can hear a transmission from any
other one source

Apricot Meeting 2007

153

What is Bi-Directional PIM?


In Bi-Dir, communication is always two-way. However, PIM trees
are always uni-directional.
Bi-Dir could be achieved several ways:
Modify PIM to allow bi-directional trees
Set up trees to and from every source
However, the actual path taken was to drop SPT.
All traffic must go through an RP
No Register Messages
No (S,G) traffic
(*,G) traffic only

Apricot Meeting 2007

154

The Designated Forwarder (DF)


Bi-Dir PIM does not use either encapsulation of data to the RP or
SPT.
There is a specific RP Address (RPA) for any given
Bi-Dir group.
RPF always points toward the RPA.

For each link (hop) for a Bi-Dir group, there is a Designated


Forwarder (DF).
DFs are elected if need be based on the routing metric toward the
RPA.

The DF forwards any multicast data toward the RPA.


The DF also acts as a PIM DR as needed.

Apricot Meeting 2007

155

PIM Bi-Dir Operation


PIM Bi-Dir uses normal PIM-SM mechanisms to pass
traffic down the tree, like a PIM-SM group before SPT
switch-over.
PIM Bi-Dir uses the DF to pass traffic up the tree
toward the RPA.
This is a new routing feature.
DF election rules prevent forwarding loops.

Apricot Meeting 2007

156

PIM Bi-Dir Pros and Cons


Pros
PIM Bi-Dir solves the bursty source problem.
No traffic-based signaling.
No (S,G) state.
Reduces the router load and eases debugging.

Cons
SPT and RPT cannot be mixed.
If PIM Bi-Dir is deployed in a PIM domain, it must be
deployed in every router.
Otherwise forwarding loops will result.

The RP is once again a single point of failure.


Its placement and its ability to handle load become important.

As yet, no interdomain solution.

Apricot Meeting 2007

157

Apricot Meeting 2007

IPv6 Multicast :
The Multicast Address Space
T.M. Eubanks

APRICOT Meeting 2007

159

The Same, only Different


IPv4 and IPv6 multicast are based on the same
concepts:
packets are sent to a group address, which has
one or more senders and one or more receivers
the sender only transmits one packet, which is
replicated in the network as needed
requires ways to identify sources and receivers,
and to separate routing and forwarding rules

Apricot Meeting 2007

160

IPv6 and Multicast


In some ways IPv6 changes little for multicast,
except for the increased address length.
PIM is hardly changed.
IGMP is replaced by Multicast Listener
Discovery (MLD) Protocol.
With
IGMPv2 ---> MLDv1 (RFC 2710) and
IGMPv3 ---> MLDv2 (draft-vida-mld-v2-08.txt)

Apricot Meeting 2007

161

MTU and Multicast in IPv6


One thing that has changed is the MTU and
the treatment of fragmentation.
MTU = Maximum Transmissible Unit

In IPv6, fragmentation must be performed at


the edges.
Packets that are too large for an intermediate
hop are dropped.

In unicast, hosts are supposed to perform


Path MTU (PMTU) discovery.

Apricot Meeting 2007

162

IPv6 Multicast MTU


What to do in multicast?
Multicast MTU discovery has been implemented, but it
does not scale well.
The source could receive many ICMPv6 Packet Too Big
messages.
A source with many listeners could Auto-DoS itself by
sending out too big a packet!

What to do?
Set the Multicast MTU to 1280 (the IPv6 minimum
MTU).

Apricot Meeting 2007

163

Group Addresses
IPv4 - 224.0.0.0/8 (224.0.0.0-239.255.255.255)
IPv6 - ff00::/8
but the next octet is reserved for flags:
F

Lifetime: 0 = permanent
Scope:
1 = node
5 = site
E = global

lifetime

scope

1 = temporary
2 = link
8 = organization

Apricot Meeting 2007

164

IPv6 Addressing
The Foundations of all IPv6 addressing are set
up in RFC 3513
IPv6 Addressing Architecture
This includes Multicast

Rather more structure than for IPv4


I.e., the IPv6 multicast address space is subdivided

Apricot Meeting 2007

Basic IPv6 Multicast Addressing : RFC


3513
| 8

| 4| 4|

112 bits

+------ -+----+----+---------------------------------------------+
|11111111|flgs|scop|

group ID

+--------+----+----+---------------------------------------------+
binary 11111111 at the start of the address identifies the
address as being a multicast address.
+-+-+-+-+
flgs is a set of 4 flags:

|0|0|0|T|

+-+-+-+-+
T = 0 indicates a permanently-assigned ("well-known") multicast address, assigned by the Internet Assigned Number Authority
(IANA).

T = 1 indicates a non-permanently-assigned ("transient") multicast address.

Apricot Meeting 2007

165

166

IPv6 Multicast Scoping


scop is a 4-bit multicast scope value used to limit the scope
of the multicast group. The values are:
0 reserved
1 interface-local scope
2 link-local scope
3 reserved
4 admin-local scope
5 site-local scope
6 (unassigned)
7 (unassigned)
8 organization-local scope
9 (unassigned)
A (unassigned)
B (unassigned)
C (unassigned)
D (unassigned)
E global scope
F reserved

Apricot Meeting 2007

167

IPv6 Multicast Scope Values

interface-local scope spans only a single interface on a node, and is useful only for
loopback transmission of multicast.

link-local and site-local multicast scopes span the same topological regions as the
corresponding unicast scopes.

admin-local scope is the smallest scope that must be administratively configured,


i.e., not automatically derived from physical connectivity or other, non- multicastrelated configuration.

organization-local scope is intended to span multiple sites belonging to a single


organization.

scopes labeled "(unassigned)" are available for administrators to define additional


multicast regions.

Apricot Meeting 2007

168

RFC 3306 : Unicast Prefix Based Multicast


Addressing
| 8

| 4| 4| 8

| 8 |

64

| 32

+--------+----+----+--------+--------+----------------+----------+
|11111111|flgs|scop|reserved| plen | network prefix | group ID |
+--------+----+----+--------+--------+----------------+----------+
+-+-+-+-+
flgs is a set of 4 flags:

|0|0|P|T|

+-+-+-+-+
o P = 0 indicates a multicast address that is not assigned
based on the network prefix. This indicates a multicast
address as defined in RFC3513.
o P = 1 indicates a multicast address that is assigned based
on the network prefix.
o If P = 1, T MUST be set to 1, otherwise the setting of the T
bit is defined in Section 2.7 of RFC3513.

Apricot Meeting 2007

169

Unicast Prefix Based Multicast Addressing


(P = 1)
plen indicates the actual number of bits in the
network prefix field for the subnet.
The network prefix field contains the unicast
network prefix assigned to the domain owning,
or allocating, the multicast address.
the network prefix portion of the multicast
address will be at most 64 bits.

Apricot Meeting 2007

170

Unicast Prefix Based Multicast Addressing


Why would you do this ?
This allows a network operator to assign their
own multicast addresses without having to
either coordinate or interfere with other users.

Really intended for isolated networks, so the


intended use is similar to the 239/8 space
But it need not be - these could be used for any
scope

Apricot Meeting 2007

171

Embedded-RP
Now Approved as RFC 3956 - it builds on RFC3306
Takes advantage of the huge address size to stick the
RPs address inside the group address.
Other sites need only extract the RP address and send
their join messages to it.
Despite this being only a draft, Cisco has added
support in the latest IOS versions.

Apricot Meeting 2007

172

Embedded RP Address Space


| 8

| 4| 4| 4| 4|8 |

64

| 32

+--------+----+----+----+----+----+----------------+----------+
|11111111|flgs|scop|rsvd|RIID|plen| network prefix | group ID |
+--------+----+----+----+----+----+----------------+----------+
+-+-+-+-+
flgs is a set of four flags:

|0|R|P|T|

+-+-+-+-+
When the highest-order bit is 0, R = 1 indicates a multicast address
that embeds the address on the RP. Then P MUST be set to 1, and
consequently T MUST be set to 1, as specified in RFC3306. In
effect, this implies the prefix FF70::/12.
The plen and the network prefix are used to calculate the UNICAST address prefix of the RP!

Apricot Meeting 2007

173

Embedded RP Address Space


1. Copy the first "plen" bits of the "network prefix" to a zeroed
128-bit address structure, and
2. replace the last 4 bits with the contents of "RIID".
These two steps could be illustrated as follows:
| 20 bits | 4 | 8 |

64

| 32

+---------+----+----+----------------+----------+
|xtra bits|RIID|plen| network prefix | group ID |
+---------+----+----+----------------+----------+
||

\\ vvvvvvvvvvv

||

``====> copy plen bits of "network prefix"

||

+------------+--------------------------+

||

| network pre| 0000000000000000000000 |

||

+------------+--------------------------+

\\
``=================> copy RIID to the last 4 bits
+------------+---------------------+----+
| network pre| 0000000000000000000 |RIID|
+------------+---------------------+----+

Apricot Meeting 2007

174

Embedded RP in Use
Embedded RP means that anyone in another
PIM domain who wants to join your group,
knows where your RP is.
RP is a single point of failure.
Interdomain joins are now straightforward,
and MSDP is not needed.
This was implemented in GANT from scratch
in a few days!

Apricot Meeting 2007

175

Information Online
Goals of the M6Bone : www.m6bone.net/article.php3?

id_article=1
Paper by Stig Venaas :

http://www.6net.org/events/workshop2003/
venaas.pdf
Cisco Press Release :

http://newsroom.cisco.com/dlls/2005/
prod_091905b.html

Apricot Meeting 2007

You might also like