Tutorial On TI C6678
Tutorial On TI C6678
Tutorial On TI C6678
TMS320C6678 (Shannon)
DSP Training
Brighton Feng
November, 2010
L2 Memory
• Navigator C66x core
Navigator
Peripherals and I/O
– Multicore eco system
sRIO TSIP
• Packet Infrastructure L1 D L1 P
Flash PCIe
• Network Coprocessor
– IP Network solution for IP v4/6 UART SPI, I2C
– 1.5M packets per sec (1Gb Ethernet
wire-rate) TeraNet 2 Hyperlink50
– IPsec, SRTP, Air Interface Encryption
Multicore
fully offloaded Memory System Crypto/IPSec Enet
• 3-port GigE Switch (Layer 2) Multicore Memory
Controller CoProcessor Switch
DDR-3
64b
SGMII
• Low Power Consumption
SGMII
Packet
– Adaptive Voltage Scaling (Smart Shared Memory CoProcessor
ReflexTM)
• Hyperlink 50
– 50G Expansion port
– Transparent to Software
4
• Multicore Debugging
Copyright © 2010 Texas Instruments. All rights reserved.
Enhanced DSP core
C66x
Performance improvement
Increased
Fixed and floating
Point capability
C64x+
C67x+ SPLOOP and 16-
bit instructions
for smaller code
size
C64x
C67x Flexible level one Advanced fixed-
memory point
2x registers architecture instructions
Native
instructions for iDMA for rapid Four 16-bit or
IEEE 754, SP&DP data transfers eight 8-bit MACs
Enhanced between local
Advanced VLIW floating-point memories Two-level cache
architecture add capabilities
256 Bits
C66x Core
Instruction Fetch
Control Registers
SPLOOP Buffer Interrupt
Control
Instruction Dispatch
In-Circuit Emulation
Instruction Decode
Data Path 1 Data Path 2
L1 S1 M1 D1 D2 M2 S2 L2
+ + + + x x x x + + + + x x x x + + + +
+ + + + x x x x + + + + x x x x + + + +
+ + + + x x x x + + + + x x x x + + + +
+ + + + x x x x + + + + x x x x + + + +
2x64 Bits
S
Core 0
M
.
Shared Shared .
L2
L2 RAM
Control
.
DMA
(Core Internal
Shared SCR
speed)/2 L2 RAM
L2 ROM External
256 bit
(Core Memory
speed)/3
128 bit
S
Core N
M
EDMA M
S
Core 0
Shared .
L2 RAM
.
.
Shared DMA
Memory Internal SCR
Control L2 RAM
External
Memory (Core
(Core speed)/3
speed)/2 128 bit
256 bit
S
Core N
EDMA M EDMA M
• L2 memory controller extends the MAR registers by adding the “PFX” field,
L2 memory controller uses this bit to convey XMC whether a given address
range is prefetchable.
CGEM CGEM
Slave Port x N CGEM cores Slave Port
MSMC Core
256 256
System MSMC Datapath
Memory
Slave Port Protection
for shared and
Extension
VBUSM 256 SRAM 256
Unit
(SMS) (MPAX) Arbitration for Banks
RAM banks,
256-bits per
256 bank
SCR
Memory
Protection EDC for SRAM
System
and
Slave Port Extension
for Unit
(MPAX)
VBUSM 256
external 256
memory
(SES)
One slave interface per C66x
Megamodule (256 bits @ CPUCLK/2)
MSMC System MSMC EMIF
Master Port
Uses a 36 bit address extended inside
Master Port events a C66x Megamodule core
Two slave interfaces (256 bits @
VBUSM 256
VBUSM 256 CPUCLK/2) for access from system
masters
SMS interface for accesses to MSMC
EMIF – 64 bit SRAM space
SCR
DDR3
SES interface for accesses to DDR3
space
Both interfaces support memory
protection and address extension
One master interface (256-bits @
CPUCLK/2) for access to the DDR3
EMIF
One master interface (256 bits @
CPUCLK/2) for access to system
slaves
Copyright © 2010 Texas Instruments. All rights reserved.
MSMC Shared Memory
4 banks x 2 sub-banks, sub-bank are 256-bit
wide.
Reduces conflicts between C66x Megamodule cores
and system masters
Features a dynamic fair-share bank arbitration for
each transfer
Supports bandwidth management. Avoid
indefinite starvation for lower priority requests
due to higher priority requests
Features Not Supported
Cache coherency between L1/L2 caches in C66x
Megamodule cores and MSMC memory
Cache coherency between XMC prefetch buffers and
MSMC memory
C66x Megamodule to C66x Megamodule cache
coherency via MSMC
8:8000_0000
FFFF_FFFF Segment 15 Disabled 8:7FFF_FFFF
Segment 14 Disabled
Segment 13 Disabled
Segment 12 Disabled
Segment 11 Disabled 8:0000_0000
7:FFFF_FFFF
Segment 10 Disabled
Segment 9 Disabled Upper 60GB
8000_0000
7FFF_FFFF Segment 8 Disabled
Segment 7 Disabled
Segment 6 Disabled
Segment 5 Disabled
Segment 4 Disabled 1:0000_0000
0C00_0000 Segment 3 Disabled 0:FFFF_FFFF
0BFF_FFFF Segment 2 Disabled
(not remappable)
0000_0000 Segment 1 BADDR = 80000h; RADDR = 800000h; Size = 2GB
CGEM Logical Segment 0 BADDR = 00000h; RADDR = 000000h; Size = 2GB Lower 4GB
32-bit Memory Map
0:8000_0000
0:7FFF_FFFF
0:0C00_0000
FFFF_FFFF
0:0C1F_FFFF
0:0C00_0000
21xx_xxxx MSMC RAM Alias 2
20xx_xxxx MSMC RAM Alias 1
CGEM 32-bit Memory Map Segment 0 BADDR = 00000h; RADDR = 000000h; Size = 2GB
0:5004_2xxx
code
App.out App.out App.out
and
read/write
Data 1 Data 2
data
Data 0
CGEM address space (1) SoC address space CGEM address space (n)
data2
MPAX
data3 data3 data3
data3
virtual address space (1) SoC address space virtual address space (n)
C6000
C6000 C6000
C6000 C6000
C6000
Core
Core0 0 Core
Core1 1 Core
Core2 2
L1L1Data
Data L1L1Data
Data L1L1Data
Data
L1L1Prog
Prog L1L1Prog
Prog L1L1Prog
Prog
43
Debug EDMA
C66x
L2 Memory
Navigator
core Peripherals and I/O
ETB
sRIO TSIP
L1 D L1 P
Flash PCIe
UART SPI, I2C
Data
TRACE TeraNet 2 Hyperlink50
Visualization
Multicore
Memory System Enet
Crypto/IPSec
Multicore Memory CoProcessor Switch
DDR-3 Controller
TRACE
64b
SGMII
SGMII
Shared Packet
Memory CoProcessor
CPU/2
256b S Shared M
VUSR M VBUSM L2 RAM
SCR S
TC0 M
16ch DMA
TC1 M
EDMA_0
S MSMC_SS M DDR3
XMC
TC2 M
64ch SS GEM MM
SS GEM
TC3 M
DMA TC4 M GEM
GEM MM
TC5 M SS GEM MM
SS GEM
CPU/3
EDMA_1,2
CPU/3 GEM
GEM MM 32b CONFIG
TC6 M 128b VBUSP
64ch TC7 M VBUSM SCR
DMA TC8 M
TC9 M
SCR
M
SRIO
M
S QM_SS
PA_SS M
S PCIe
QM_SS M
PCIe M S EMIF16
TSIP 0,1 M
VBUS
Packet
Q0 Q1 Qx DMA APDSP
IF IF IF
(Internal) Queue
Buffer . Descriptor Link Interrupts
RAM APDSP
Memory . RAM
Queue
. Manager
Queue Events
bytes in length.
20 configurable memory 256 byte
buffers …
regions (for descriptor
storage) 19
Region 19
The number of elements in
the region must be a power
of 2, from 32 buffers to 4096
buffers in the region.
Copyright © 2010 Texas Instruments. All rights reserved.
Linking RAM
Linking RAM contains 1 entry for each Linking RAM
descriptor . Linking RAM entry is effectively
Forward Pointer Table
an extension of the descriptor
17 - - -
Linking RAM stores Forward data pointer - x - -
that is critical for the PUSH / POP operations - - - -
performed by the Queue Manager - - - -
- 5 19 x
Linkage between physical address of
descriptor and physical address of Linking
RAM is performed inside the QM using Queue Contents
information provided in the Queue
Management configuration registers Queue 0 Queue 1
Host Processor
Queue Manager
Free Rx
Tx
Queue
Descriptor Queue
Queue
TX 3 Port transmits
TX 4 Port Posts
the buffer being
Packet Descriptor
pointed to by to return Queue
the descriptor
Tx Port Rx Port
RX 4
Host Processor
Interrupt according to pacing rules or poll
Optionally prefetches the descriptor
INIT: Host Allocates to L2 prior to interrupting
Rx Free Descriptors RX 3 Not Empty
Level Status Interrupt Generator
and initializes queues
Queue Manager
Free Rx
Tx
Descriptor Queue
Queue
Queue
Tx Port Rx Port
DSP DSP
Core X Core Y
L1 Cache L1 Cache
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
DDR2 SDRAM
DSP DSP
Core X Core Y
L1 Cache L1 Cache
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
DDR2 SDRAM
DSP DSP
Core X Core Y
L1 Cache L1 Cache
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
Shared L2 or DDR
L1 Cache L1 Cache
L2 RAM L2 RAM
L2 Cache L2 Cache
DSP DSP
Core X Core Y
L1 Cache L1 Cache
EDMA
Data Data
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
DSP DSP
Core X Core Y
L1 Cache L1 Cache
Packet
DMA
Src Dst
Que Que
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
DSP DSP
Core X Core Y
L1 Cache L1 Cache
Queue
Manager
L2 RAM L2 RAM
L2 Cache Data Switch L2 Cache
Fabric Center
Shared
Queue
Core X push data to Shared Queue, Core Y pop data from Shared
Queue
Multi-core can access Shared Queue simultaneously without mutual
exclusion
Software need maintenance the cache coherency.
Copyright © 2010 Texas Instruments. All rights reserved.
Outline
C6678 DSP Overview
Multi-core DSP programming
Interconnection and resource sharing
Interconnection Architecture
Shannon Hardware queue
Inter-core communication
Shared Resource Management
Peripherals overview
88
C6678 C6678
C6678 C6678 C6678 C6678 DSP DSP
DSP DSP DSP DSP
SRIO
Ring Switch Switch
C6678 C6678 C6678 C6678
C6678 C6678
DSP DSP DSP DSP
DSP DSP
Chain Mesh
C6678 C6678
C6678 C6678 C6678
DSP DSP
DSP DSP DSP
89
91