Multicast Troubleshooting Guidelines
Multicast Troubleshooting Guidelines
Multicast Troubleshooting Guidelines
While filing any multicast related PR, please collect the basic information that is listed below and the
logs in respective area (pim, igmp, etc.) in which the problem is observed. Please refer basic overview
on multicast and config as required.
* Collect instance specific logs in case the problem is in particular routing-instance.
* In scaled config, it is preferred to collect show command outputs for problematic multicast
groups/sources and prefixes (and output for few working groups/sources and prefixes), not the
complete output.
* When NSR is enabled, collect the same logs in the backup RE as well.
Basic information
1. Topology diagram.
- Indicate source, FHR (router that connects source), PIM RP (Rendezvous Point), LHR (router that
connects receiver), receivers and expected forwarding path of multicast traffic.
2. show configuration output.
3. Steps to repro the problem.
4. Collect trace log in respective area (pim, igmp, etc.) in which the problem is observed.
For any multicast related problem, the following set of commands have to be collected in all routers in
the multicast path of problematic group.
1. show pim join extensive <group>
2. show pim rps extensive
3. show pim neighbors
4. show multicast route extensive group <group>
5. show route <source | rp>
6. show multicast rpf <source | rp>
7. show igmp/mld group (only from LHR)
8. show igmp/mld interface (only from LHR)
9. show pim bidirectional df-election (only when bidir is configured)
Additionaly, collect the logs in respective area (pim, igmp, etc. as given below) in which the problem is
observed.
Troubleshooting Notes
The following set of basic multicast aspects has to be checked while filing the PR.
1: PIM DR
In FHR, it is the DR who should send the traffic; in LHR, it is the DR who should initiate the pim join.
Static igmp group configured on non-DR interface is a common seen problem -- Nobody represents
the igmp group and translates it to a PIM join;
Example: static igmp join is reported not to lead traffic. Turns out that the static group is configured
on non-DR interface.
2: FHR/register process
Normally pe (or pime in some other platforms) interface is in the OIF of forwarding route for only a
Juniper
Business
Use Only
short while. If you see a pe interface in the OIF all the time, it is an indication of register/switch-to-
SPT-in-RP process is not going well. Either the register packet doesn't reach the RP, or the RP doesn't
response the register with a register-stop and/or switch to SPT by sending the sg join to FHR, or the
SPT join doesn't go through all the way to the FHR (say no route back to the FHR).
Summary: in a steady state in FHR, pe/pime interface should not be in the OIF list of mcast route. In
RP, we should see NULL register received every 60 seconds.
Examples: there is no mcast route in routers between FHR and RP; SPT switch never finished; etc.
3: rib group and proper routes installed
Quite some systest/regression test scripts have rib group configured. I am not sure why this is the
case, but when this is the case, we should checking the RPF using the routes defined in the rib group.
Example: "show pim join" shows no route to upstream but "show route" seems to have the RPF route
-- Check the right rib group.
4: show pim statistics
This is a very useful command. Any non-0 count in the bottom of its output ("Global Statistics")
usually shows a problem and need to be checked/trace down before knowing what is really going
wrong.
5: BSR
To verify BSR is working correctly, check these two things:
1) "show pim bsr" should show the same router as bsr in all routers.
2) "show pim rp" should show the same rp set for bootstrap reported RPs for all routers.
Example: different pim routers have different view who the BSR is.
6: auto-rp and dense mode
To use auto-rp in sparse mode, dense mode for group 224.0.1.39 and 224.0.1.40 needs to be
configured so that RP information can be exchanged via flooding to all routers in the same pim
domain. That also means almost all pim interfaces need to be configured as dense-sparse mode.
Example: RP view is not the same for all pim routers using auto-rp, because the dense mode for group
224.0.1.39 and 224.0.1.40 is not properly set.
7: Resolve request and IIF mismatch explanation for non-bidir PIM
Normally checking if there is a resolve request message; this is the first step to see why there is no
multicast forwarding entry be created. Without this message, it means the first packet (or any
multicast packet for this flow) never reach rpd pim. Lower level (infra or kernel or pfe) needs to be
checked; SPT switch is normally associated with IIF match message;
Example: no multicast route is created after traffic is sent. Checking pim trace and could not find the
resolve request -- need to track where the multicast packet gets lost: PFE or kernel?
8: pe/pd should be there (pd-a/b/c.xxxxx)
In SM/SSM case, all routers should have either pe (routers that learns RP which is not self) or pd
(routers that are RP) interface. For each different RP, no RP router should have a corresponding pe
interface. A good pe/pd interface should have a name like pe-a/b/c.xxxxx or pd-a/b/c.xxxxx. If you
only see pe-a/b/c, this means the pe interface is not correctly created. If you don't even see pe-a/b/c
or pd-a/b/c, it means you don't have a service card (hard or soft).
Example: traffic is not going anywhere from FHR because there is no hardware service card or soft
service interface not configured.
Juniper
Business
Use Only
1. show pim interface
2. show pim neighbors [detail]
3. show pim rps extensive
4. show pim join extensive [<group>]
5. show pim join summary
6. show pim source detail
7. show pim statistics [interface <interface name>]
8. show pim bootstrap (only when bsr is configured on the testbed)
9. show pim bidirectional df-election (only when bidir is configured)
1. show multicast route extensive [group <group addr>] [source-prefix <source addr>]
2. show multicast rpf
3. show multicast statistics
4. show route extensive [<source | RP>]
5. show route forwarding-table
6. show route forwarding-table extensive destination <source | RP | multicast group>
7. show pfe route <ip | inet6>
8. show interfaces extensive [<interface>]
9. show interfaces <interface> | match rate
Juniper
Business
Use Only
1. show pim mdt instance <instance-name>
2. show pim mvpn
3. show pim join extensive instance <instance-name>
4. show pim join extensive - master instance
5. show mvpn instance <instance-name> display-tunnel-name
6. show mvpn neighbor extensive
7. show multicast route extensive instance <instance-name> display-tunnel-name
8. show interface mt*
9. show route table bgp.mdt.0
10. show route table <instance-name>.mdt.0
11. show pim neighbors instance <instance-name>
12. show pim neighbors
Multicast on TRIO
This post describes how multicast Next-Hop resolution and caching is done on Junos MX routers
with Trio based cards. To have more information regarding the multicast replication see my
previous post here: Multicast Replication.
1/ Introduction:
PIM join / prune sent by downstream routers allow to create/remove nodes/leaves to the
multicast tree. When an MX router receives a PIM join for a given (S;G), the kernel allocates
first to this new mcast entry (inet.1) a multicast next-hop that refers to a list of outgoing
interfaces (called OIL). Each combination of OIL has its own multicast next-hop. A given OIL, or
mcast NH, can be used by several streams; see example below:
Juniper
Business
Use Only
Kernel Multicast NH Allocation - there are 2 cases:
[1] If the PIM Join (S;G), receved, refers to a known combination of output interfaces (known
by the kernel), the kernel allocates this multicast NH (link to the known OIL) to the multicast
route. The kernel then sends to PFEs the new multicast route.
Mcast Route (S;G) > OILx (NHx)
OILx and NHx already known by the kernel (because used by other mcast streams)
[2] If the PIM Join (S;G), received, triggers a new combination of output interfaces (unknown
by the kernel), the kernel generates a new Multicast NH to the multicast route. The kernel
then sends to PFEs the new multicast route, and the new multicast NH + OIL
Mcast Route (S;G) > OILy (NHy)
Juniper
Business
Use Only
OILy and NHy are created by the kernel (because unused by other mcast streams)
2/ In practical:
Let’s start to play with cli and pfe commands to better understand those concepts. The setup
is depicted below. The mcast stream is there: (10.128.1.10;232.0.7.1).
Patrick sends a PIM Join to Bob:
The PIM Join (10.128.1.10;232.0.7.1) is handled by the Bob RE’s (RPD) which first creates a
new PIM join entry:
Then, kernel checks if the OIL for (10.128.1.10;232.0.7.1) already exists. You can check
“known OIL / NH” mapping via the following command:
In our case, the stream OIL’s made of ae91.0 is unknown. So Kernel allocates a new multicast
NH for our OIL and then creates a new mcast route within inet.1 table:
Juniper
Business
Use Only
And now multicast NH is added there:
After that the kernel creates a multicast forwarding cache entry. In our case the multicast
sender does not send multicast stream yet. By default the cache lifetime is equal to 360
seconds (RFC 4601 recommends 210 sec for KAT). This timeout of this entry is reseted each
time the router receives a packet (data) referring to this entry. In our case the cache lifetime
decreases (sender does not send traffic).
In // kernel creates: mcast route, OIL, NH, on PFEs. Via a RE shell command (need root uid)
you can monitor Kernel updates. The command is rtsockmon
Juniper
Business
Use Only
# rtsockmon -t
[15:23:31] rpd P NH-comp add nh=comp idx=1173 refcnt=0, af=inet, tid=0
hw_token=0 fn=multicast comp=1048583, derv=1339,1341,
[15:23:31] rpd P nexthop add inet nh=indr flags=0x4 idx=1048590 ifidx=0 filteridx=0
[15:23:31] rpd P route add inet 232.0.7.1,10.128.1.10 tid=0 plen=64 type=user
flags=0x800e nh=indr nhflags=0x4 nhidx=1048590 rt_nhiflist = 0 altfwdnhidx=0 filtidx=0
What we see?
Kernel first creates a composite NH 1173 (CNH) that contains (will see later) the binary tree.
Then it links this composite NH to an Indirect NH (INH) 1048590 (the entry point of the NH
chain). And finally, it creates the multicast route: a /64 route (G.S/64) which is linked to the
INH.
The NH chain can be summarized as follow:
232.0.7.1.10.128.1.10/64 à INH 1048590 à CNH 1173 à LIST OF NHs (each NH could be Unicast
or Aggregate NH)
To view the complete NH chain I recommend to use the following PFE command:
If multicast stream start before the keepalive timer expired there will be no kernel resolution
and all packets will be handled by the PFE only. In other words, the ingress PFE will check the
RFP (check if source is reachable by the ingress interface), then the lookup will be done and
the ingress PFE will find the right INH and finally the Composite NH.
The composite NH lists the different combinations of Dual Binary tree (or unary tree if you use
enhanced-IP mode at chassis level). Indeed, when you have LAG as outgoing interfaces there
are several possibilities of binary tree. Let’s take a simple example as follow:
Juniper
Business
Use Only
AE2 is a LAG with a single member, so for every combinations of S and G there is only one
forwarding NH (xe-1/3/0.0). On the other hand, AE3 is made of 2 members. So load balancing
will be done based at least on S and G. As you can see in this case there are 2 possible binary
trees. One where member xe-3/3/0.0 is selected for a given (S;G) and another one when xe-
3/2/0.0 is selected.
If we take again our example, you can see the different combinations of binary tree by
resolving the Composite NH: (remember AE91 is made of 2 10Ge interfaces)
Juniper
Business
Use Only
Sublist 1:
1387(Unicast, IPv4, ifl:400:xe-3/3/1.0, pfe-id:15)
--------------
Sublist 0:
mcast-tree:
nfes:1, hash:0
13,
Root
13
reverse-mcast-tree:
nfes:1, hash:0
13,
Root
13
[..]
Sublist 1:
mcast-tree:
nfes:1, hash:0
15,
Root
15
reverse-mcast-tree:
nfes:1, hash:0
15,
Root
15
[...]
NB. The reverse tree is explained in a previous post. Multicast Replication
Binary tree combinations are called “Sublist” . In our case OIL is made of AE91.0 that it made
of 2 members: xe-3/1/0.0 (hosted at PFE 13) and xe-3/3/1.0 (hosted at PFE 15).
At the end of the multicast lookup a composite NH is found, then the hashing algorithm selects
the right sublist for the given (S;G). Indeed, as the hash keys are configurable (and can add
more fields, like layer-4), the composite next hop resolution (see above) can just provide the
combinations of the binary tree and not the one selected at the end.
3/ Kernel Resolution:
When PIM prune is received, the multicast entry is removed at PFE and RE level. But what’s
happen when PIM joins are still sent periodically (every minutes) but multicast stream stops
during at least the keepalive timeout (default 360 secs.) ?
When no packet of a given (S;G) is received after Keepalive timeout expired, the multicast
cache entry is removed at RE level but also at PFE level. In our example, if my sender stops
sending (10.128.1.10,232.0.71), you can see that cache entry will be removed at RE level
(after 360 seconds):
Juniper
Business
Use Only
sponge@bob> show multicast route group 232.0.7.1
empty
sponge@bob> show route table inet.1 | match 232.0.7.1
empty
But also at PFE level after those kernel updates (rtsockmon traces):
# rtsockmon -t
[15:54:48] rpd P nexthop delete inet nh=indr flags=0x6 idx=1048584 ifidx=0 filteridx=0
Note: PIM entry is still there:
Now, when the multicast stream starts again, a lookup of the first packet is performed, but
the mcast route has been removed of the PFE, so lookup result will be a KERNEL Resolve NH:
Juniper
Business
Use Only
RT flags: 0x000a, Ignore: 0x00000000, COS index: 0
DCU id: 0, SCU id: 0, RPF ifl list id: 0
Then a notification with the S and G fields of the first multicast packet is punted to the RE
(Kernel). Resolution is sent over em0 internal interface.
FPC 3 that, hosts ae90 (ingress PFE that receives the stream), sends resolve request to RE
Master
17:09:14.564414 In IP (tos 0x0, ttl 255, id 14405, offset 0, flags [none], proto: TCP (6), length:
88) 128.0.0.19.14340 > 128.0.0.1.6234: P 258841:258877(36) ack 82496 win 65535
<nop,nop,timestamp 428760235 430306943>
-----IPC message payload packet-----
Packet 1:
type: next-hop(6), subtype: resolve addr request(14), length: 20, opcode: 1,
error: 0,
[|ipc]
0201 0000 0005 0200 0000 0013 0800 4500
0058 3845 0000 ff06 8346 8000 0013 8000
0001 3804 185a 0b14 e1e9 213a 0fe9 8018
ffff 7d78 0000 0101 080a 198e 5cab 19a5
f67f 0100 0000 1c00 0000 0600 0e00 1400
0100 0000 03d7 0000 014b 0040 02ef e800
0701 0a80 010a
N.B.:
e8000701 = 232.0.7.1
0a80010a = 10.128.1.10
sponge@bob> show tnp addresses | match "fpc3|master" | match em0
master 0x1 02:01:00:00:00:05 em0 1500 0 0 3
fpc3 0x13 02:00:00:00:00:13 em0 1500 4 0 3
0x13= 19 = internal IP address of FPC3 = 128.0.0.19
0x1= 1 = internal IP address of Master RE = 128.0.0.1
Kernel first checks if a PIM entry is available for this (S;G) and then allocates a new INH /
Composite NH for this multicast route. The multicast cache entry is created at RE level and
kernel creates again the different NHs and the multicast route on PFEs.
# rtsockmon -t
[16:44:16] rpd P NH-comp add nh=comp idx=1174 refcnt=0, af=inet, tid=0
hw_token=0 fn=multicast comp=1048583, derv=1179,1182,1241,1341,
[16:44:16] rpd P nexthop add inet nh=indr flags=0x4 idx=1048584 ifidx=0 filteridx=0
[16:44:16] rpd P route add inet 232.0.7.1,10.128.1.10 tid=0 plen=64 type=user
flags=0x800e nh=indr nhflags=0x4 nhidx=1048584 rt_nhiflist = 0 altfwdnhidx=0 filtidx=0
You can check kernel resolution “hits” via this following cli command:
Juniper
Business
Use Only
sponge@bob> show multicast statistics
Instance: master Family: INET
Interface: ae90.0
Routing protocol: PIM Mismatch error: 0
Mismatch: 0 Mismatch no route: 0
Kernel resolve: 1 Routing notify: 0
Resolve no route: 0 Resolve error: 0
Resolve filtered: 0 Notify filtered: 0
In kbytes: 11964471 In packets: 9156666
Out kbytes: 0 Out packets: 0
Next packets will be resolved at PFE level by a more specific multicast
route previously added by the kernel .
Note: You can experience some packet drops during the kernel resolution process.
In scaled multicast networks, kernel resolution might request high kernel consumption "ticks".
To avoid that, Juniper has throttled resolution requests at PFE level. Each TRIO based card is
limited to 66 resolutions per second. This PFE command gives you this information:
To override the default cache timeout (360s) you can use this following command:
edit exclusive
set routing-options multicast forwarding-cache timeout ?
Possible completions:
<timeout> Forwarding cache entry timeout in minutes (1..720)
And just for fun and on a lab, you can override the resolve-rate (default 66/sec) with this
following HIDDEN command:
edit exclusive
set forwarding-options multicast resolve-rate ?
Possible completions:
<resolve-rate> Multicast resolve rate (100..1000 per second)
4/ Can I do a DOS to the Kernel if I try to play with multicast kernel resolution?
NOOOOOO !!!!
Juniper
Business
Use Only
Indeed, you can imagine to send a lot of multicast packets with random S and G. Each new
stream will trigger a kernel resolution. Resolve request will be first rate-limited by the
resolve-rate on MPC (default 66/sec) and then by a mulicast discard mechanism explained
below.
As I explained, the first packet of a given (S;G) triggers a kernel resolution if a multicast route
is not found at PFE level. The kernel that receives the resolve request will first check if the
(S;G) matches a known PIM entry. If not, the RE first sends a PIM Prune to the upstream node
to force the upstream router to stop the forwarding toward itself. If the router is directly
connected to the source it can’t send a PIM prune. So in // the kernel adds a specific route for
the given (S;G) that will discard this stream at PFE level (no more kernel resolution toward the
Kernel(RE) will be requested for this (S;G))
Example: Sender sends an unknown multicast stream (10.128.1.10,232.0.8.1).
On Bob no pim entry is available for this stream
Resolved request for (10.128.1.10;232.0.8.1) is sent by ae90.0 to the Kernel.
As I said previously, in this architecture Bob can’t send a PIM prune for this (S;G) because it’s
directly connected to the source, but it sends a kernel route add request for this (S;G) but
with a specific NH (NH=35=multicast discard)
# rtsockmon -t
[18:20:58] rpd P route add inet 232.0.8.1,10.128.1.10 tid=0 plen=64 type=user
flags=0xe nh=mdsc nhflags=0x0 nhidx=35 rt_nhiflist = 0 altfwdnhidx=0 filtidx=0
At PFE level the multicast route is now:
A PIM join for this (S;G) can change the NH or if the sender stops sending this stream the entry
will be removed automatically after the Keepalive timer expires (default 360s)
David.
Juniper
Business
Use Only
Juniper
Business
Use Only