Floating Point DSPs by Bhaskar
Floating Point DSPs by Bhaskar
Floating Point DSPs by Bhaskar
Floating-point/integer multiplier
The multiplier performs single-cycle
multiplications on 32-bit floating-point and
24-bit integer values and the results are 40-
bit and 32-bit respectively.
Arithmetic logic unit (ALU)
The ALU performs single-cycle operations on
32-bit integer, 32-bit logical, and 40-bit
floating-point data (Input 24-bit integer and
32 –bit floating point)
32-bit barrel shifter
The barrel shifter is used to shift up to 32
bits left or right in a single cycle.
Internal buses (CPU1/CPU2 and REG1/REG2)
CPU bus and Register file bus
CPU bus 32-bit two buses
Register file bus 40-bit two buses
Auxiliary register arithmetic units (ARAUs)
Two auxiliary register arithmetic units
(ARAU0 and ARAU1) can generate two
addresses in a single cycle. They support
addressing with displacements, index
registers (IR0 and IR1), and circular and bit-
reversed addressing.
CPU register file
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
`C3X CPU Register file 6
`C3x provides 28 registers in a multiport register file that is tightly coupled to the CPU.
Extended-precision registers (R7-R0) – 40 bit (All other registers are 32- bit)
Registers R7 – R0 can be operated upon by the multiplier and ALU and can be used as
general-purpose registers.
Auxiliary registers (AR7-AR0)
Index registers (IR1 and IR0)
Block size register (BK)
Auxiliary, index and block size registers used for indirect addressing mode
Data page pointer (DP)
Used for direct addressing mode
System stack-pointer (SP)
used for stack management
Status register (ST)
Used for system function
CPU/DMA interrupt-enable register (IE)
CPU interrupt flag register (IF)
Used for interrupts
I/O flag register (IOF)
Used for I/O activity
Repeat start-address register (RS)
Repeat end-address register (RE)
Repeat count register (RC)
Used for repeat operations
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
`C3X Memory and Buses 7
Internal Buses
All address buses are 24-bit and All data buses are 32-bit
Program buses: PADDR and PDATA
Data buses: DADDR1, DADDR2, and DDATA
DMA buses: DMAADDR and DMADATA
External bus : External address bus and External data bus
The instruction cache contains 64 x32-bit words of RAM. It is divided into two 32-word segments
(Segment 0 and Segment 1). A 19-bit segment start address (SSA)register is associated with each
segment. For each word in the cache, there is a corresponding single bit-present (P) flag.
Cache Control Bits
Cache Clear Bit (CC), Cache Enable Bit (CE) and Cache Freeze Bit (CF).
Cache Algorithm
Cache Hit – Cache contains the IW, IW read from cache, LRU stack number changed
Cache Miss – cache doesn’t contain IW, Two types of cache miss
Sub-segment miss – SSA register matches, but P flag is not set – copy the IW from memory to
cache, set the P flag and change the LRU stack number
Segment Miss – Neither of SSA matches - SSA register loaded with 19MSBs of PA, copy IW from
memory to cache , P flag is set and LRU stack number
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
Floating point processor –Data formats 9
Unsigned-Integer Formats
Short Unsigned-Integer Format
It is a 16-bit integer format. This format can be zero filled to 32 bit.
The range of a short-unsigned integer format is 0 ≤ si ≤ 216.
Floating-Point Formats
Floating-point formats consist an exponent field (e) and a mantissa field (man). The
mantissa field has single-bit sign field (s) and a fraction field (f ).
The exponent field is a 2s-complement number that determines the factor of 2 by which
the number is multiplied.
Mantissa can be signed number or normalized 2s-complement number. In the
normalized case, MSB is non-sign bit.
Register Addressing
Immediate Addressing
Direct Addressing
Indirect Addressing
General indirect addressing
Indexed addressing
Circular addressing mode
Bit-reversed addressing mode
PC-Relative Addressing
Register Addressing
The CPU register contain the operand, both source and destination operand
Syntax : ADDF R1,R2 - floating point content in R1 and R2 added, result in R2
SUBI R2,R1 – fixed point content R1 subtracted from R2, result in R1
MPYI3 R1,R2,R3 – fixed point content R1&R2 multiplied,,result in R3
Immediate Addressing
If the operand is a 16-bit then it is short immediate addressing mode, more than 16 up to
24 bit then it is long immediate addressing mode. No symbol required for the operand
Syntax : ADDI 1000h,R2 - immediate operand 1000h added to R2, result in R2 (SI)
MPYI 11111h,R3 - immediate operand 11111h multiplied with R3, result in R3 (LI)
Direct Addressing
`C3X memory space is divided into 256 pages and each page has 64K words ( 64Kx32 bit)
The data address is formed by the concatenation of the eight LSBs of the data-page
pointer (DP) with the 16 LSBs of the instruction word
The symbol used for specify the operand is @ Assume DP = 20h
Syntax : ADDF @1200h,R2 - The floating
point content in page 32,location 1200h
is added to register R2, result in R2
ADDI @0200h,R3 – The fixed
point content in page 32, location 0200h
is added to register R3, result in R3
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
Floating point processor Addressing Modes cont… 16
Indirect Addressing
Indirect addressing mode instruction format
Using the current content of ARn for data memory address (dma) and if the address
displacement is ± 1 to 256 dma locations then it is general Indirect addressing mode.
Syntax:
LDI 1200h,AR2 – load the address in AR2
ADDI *AR2(0),A – access the data using the content of AR2 as dma with zero displacement
ADDI *+AR2(6),A - access the data using the content of AR2+disp (1206h) as dma
Circular Addressing
Many DSP algorithms, such as convolution and correlation, require a circular buffer in
memory.
In convolution and correlation, the circular buffer acts as a sliding window that contains
the most recent data to process.
As new data is brought in, the new data overwrites the oldest data by increasing the
pointer to the data through the buffer in counter-clockwise fashion.
When the pointer accesses the end of the buffer, the device sets the pointer to the
beginning of the buffer
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
Floating point processor Addressing Modes cont… 19
The status register (ST) contains global information about the state of the CPU.
Operations usually set the condition flags of the status register according to whether the
result is 0, negative, etc.
C Carry flag
V Overflow flag
Z Zero flag
N Negative flag
UF Floating-point under flow flag
LV Latched overflow flag
LUF Latched floating-point underflow flag
OVM Overflow mode flag Overflow mode flag
RM Repeat mode flag Repeat mode flag
CE Cache enable
CF Cache freeze
CC Cache clear
GIE Global interrupt-enable
c) Based on flags
NN - Non-negative NUF - No underflow NLUF - No latched floating-point
N - Negative UF - Underflow underflow
NZ - Nonzero NC - No carry LUF - Latched floating-point
Z - Zero C - Carry underflow
NV - No overflow NLV - No latched overflow ZUF - Zero or floating-point
V - overflow LV - Latched overflow underflow
Un-conditional instructions
BR , CALL and RET
Conditional instructions
B cond , CALL cond , RET cond, LDF cond, LDI cond, TRAP cond
Delayed conditional instructions
BRD, B cond D
No multi-conditional instructions
No execute conditional instruction
The ’C3x supports multiple internal and external interrupts, which can be used for a
variety of applications
Interrupt location
The interrupts INT0 – TINT1 can be used either by CPU or by the DMA controller
Interrupt-trap table pointer (ITTP) ,Allows the relocation of interrupt and trap vector tables
Generation of Interrupt or trap vector address
The ITTP bit field dictates the starting location (base) of the interrupt-trap vector table.
This base address is formed by left shifting by eight bits the value of the ITTP bit field.
This shifted value is called the effective base address and is referenced as EA[ITTP].
The location of an interrupt or trap vector is given by the addition of the effective base
address formed by the ITTP bit field (EA[ITTP]) and the offset of the interrupt or trap vector in
the interrupt trap vector table.
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
Repeat Operations 26
Pipeline phases
Fetch - Fetches the instruction words from memory & updates the program counter (PC).
Decode - Decodes the instruction word and performs address generation. Also, the
decode unit controls modification of the ARn registers in the indirect
addressing mode and of the stack pointer when PUSH to/POP from the stack
occurs.
Read - If required, reads the operands from memory.
Execute - If required, reads the operands from the register file, performs the necessary
operation, and writes results to the register file. If required, results of previous
operations are written to memory.
DMA - Direct memory access activity through DMA bus
Pipeline Conflicts
Pipeline conflicts in the ’C3x can be grouped into the following categories:
Branch conflicts
Branch conflicts involve most of those instructions or operations that read and/or modify
the PC.
Register conflicts
Register conflicts involve delays that can occur when reading from, or writing to,
registers that are used for address generation.
Memory conflicts
Memory conflicts occur when the internal units of the ’C3x compete for memory
resources.
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
On-chip peripherals 31
Timers
The ’C3x has two 32-bit general-purpose timer modules.
The timer modules can be used to signal to the ’C3x or the external world at specified
intervals or to count external events.
Timer pins
Each timer has one pin associated with the timer clock signal (TCLK) pin.
This pin (TCLK) is used as a general-purpose I/0 signal, as a timer output, or as an input
for an external clock for a timer.
Timer Registers
Each timer has three memory mapped registers
Global-control register - determines the operating mode of the timer
Period register - specifies the timer’s signaling frequency
Counter register - contains the current value of the incrementing counter
FUNC Controls the function of TCLK. FUNC = 0, TCLK is a general-purpose digital I/O port.
FUNC = 1, TCLK is configured as a timer pin.
I/O Input/output pin - I/O = 0, TCLK is input pin, I/O = 1, TCLK is configured as output pin.
DATOUT Data output
DATIN Data input
GO Resets and starts the timer counter.
HLD Counter hold signal
GO HLD Result
0 0 All timer operations are held. No reset is performed (reset value).
0 1 Timer proceeds from state before write.
1 0 All timer operations are held, including zeroing of the counter. The GO bit is not cleared
until the timer is taken out of hold.
1 1 Timer resets and starts
C/P Clock/pulse mode control - 0 – clock mode, 1 – pulse mode
CLKSRC Clock source – 0- internal clock, 1 – external clock
INV Inverter control bit
TSTAT Timer status bit - indicates the status of the timer
The ’C30 has two totally independent bidirectional serial ports. Both serial ports are
identical, and there is a complementary set of control registers in each one.
You can configure each serial port to transfer 8, 16, 24, or 32 bits of data per word
simultaneously in both directions.
The clock for each serial port can originate either internally, through the serial port
timer, or externally, through a supplied clock.
Serial port pins
CLKR - Receive clock signal & its complement
CLKX - Transmit clock signal & its complement
FSR - Receive frame synchronization signal & its complement
FSX - Transmit frame synchronization signal & its complement
DR - Receive serial data & its complement
DX - Transmit serial data & its complement
The ’C3x resets the AIC through the external pin XF0.
It also generates the master clock for the AIC through the timer 0 output pin, TCLK0.
In turn, the AIC generates the CLKR0 and CLKX0 shift clocks as well as the FSR0
and FSX0 frame synchronization signals.
1. The A/D signals the ’C3x via the A/D’s SYNC signal (connected to the FSR0 pin) that serial data is
to be transmitted.
2. The 32-bit word is then serially transmitted, MSB first, out the SOUTA serial pin of the DSP102 to
the DR0 pin of the ’C3x serial port.
3. The ’C3x is programmed to drive the analog interface bit clock from the CLKX0 pin of the ’C3x.
4. The bit clock drives both the A/D’s and D/A’s XCLK input.
5. The ’C3x transmit clock also acts as the input clock on the receive side of the ’C3x serial port.
6. Since the receive clock is synchronous to the internal clock of the ’C3x, the receive clock can
run at full speed (that is, f(H1)/2).
1. DMA operation is to transfer a block of data from external memory to on-chip memory
without the use of Accumulator.
2. One DMA access is reading data from the source address location and writing it in the
destination address location. It consists of one read and one write activity.
3. DMA access needs two addresses, source address and destination address
4. To count the number of data transferred, we need a count register.
DMA Signals
BR Bus request signal externally driven low in hold mode to indicate a request for
DMA access.
HOLD External request for control of address, data, and control lines.
HOLDA Indication to external circuitry that the memory address, data, and control line
are in high impedance, allowing external access.
IAQ Acknowledge BR request for access while HOLDA is low.
Apart from the above signals normal parallel port signals are used.
R/W Read/write signal indicates the data bus direction for DMA reads (high) and DMA
writes (low).
A(15-0) Address lines
D(15–0) Data lines
STRB When IAQ and HOLDA are low, STRB selects the memory access and determines
its duration.
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
`C3X DMA Controller 41
1. The DMA controller is a programmable peripheral that transfers blocks of data to any
location in the memory map without interfering with CPU operation.
2. The ’C3x can interface to slow, external memories and peripherals without reducing
throughput to the CPU.
3. The ’C3x DMA controller features are:
4. Transfers to and from anywhere in the processor’s memory map. E.g. transfers can be
made to and from on-chip memory, off-chip memory, and on-chip serial ports.
5. Concurrent CPU and DMA controller operation with DMA transfers at the same rate as
the CPU (supported by separate internal DMA address and data buses).
6. Source and destination-address registers with auto increment/decrement.
7. Synchronization of data transfers via external and internal interrupts
DMA Registers
DMA Control register: contains the status and mode information about the associated
DMA channel
Source-address register: contains the memory address of data to be read
Destination-address register: contains the memory address where data is written
Transfer-counter register: contains the block size to move
Dr. M. Bhaskar, Professor, ECE, NIT, Trichy-15
`C3X DMA Controller cont… 42
End of Part-7