Physical Sinthesys Tutorial PDF
Physical Sinthesys Tutorial PDF
Physical Sinthesys Tutorial PDF
Gord Allan
September 3, 2003
This tutorial is designed to take a simple digital design from RTL through
to a routed layout.
Contents
1 Introduction 3
1.1 Introduction to UNIX . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Tutorial Installation . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Documentation . . . . . . . . . . . . . . . . . . . . . . . 4
2 Design Flow 5
5 Verilog Simulation 17
5.1 Setting up NC-Verilog . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Simulating a Design . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Waveforms in UNIX simulations . . . . . . . . . . . . . . . . . . 19
5.3.1 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3.2 Viewing with SimVision . . . . . . . . . . . . . . . . . . . 20
5.4 Running Gate-Level Simulations . . . . . . . . . . . . . . . . . . 21
1
6 Quick Synthesis 22
6.1 Scripting Repeated Commands . . . . . . . . . . . . . . . . . . . 22
8 Digital Libraries 27
8.1 Logical Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.2 Physical Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8.3 Section Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
10 Floorplanning 34
10.1 Power Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
10.2 Rearranging the Layout . . . . . . . . . . . . . . . . . . . . . . . 39
2
ls List the items in the current directory.
cd[dir] Change to directory < dir >.
cp < source >< dest > Copy source file to destination
rm < f ile > Remove (or delete) < f ile >
more < f ile > Displays the contents of a file, pausing on each page.
lp < f ile > Prints a file to the standard printer.
man < command > Gives help on any unix command. eg. man ls
1 Introduction
This tutorial accompanies a set of files which can be obtained from www.doe.carleton.ca/ gal-
lan/digflow.gz. Together, they document how to take a sample design, a 16-bit
x 8-bit signed multiplier through the CMC supported design flow from RTL
description through to layout.
3
1.3 Related Documentation
The documentation can be divided into the following categories:
Cadence Tools
Online documentation is available via the cdsdoc command. This brings
up a document browser which allows you to select or search for help on any
of the Cadence tools. Selecting a document in the browser will, eventually,
open a Netscape window pointing to the relevent document2 . All of this
documentation is provided in both .html and .pdf form and is physically lo-
cated at /CMC/tools/cadence/{tool-stream}/doc/{tool}. Within cdsdoc,
there are many possible libraries. To get access to all relevent libraries,
overwrite the file /.cdsdoc/cdsdoc.ini with the one from digflow/samples/cdsdoc.ini.
Standard Cells
There are two standard cell libraries available to us in the .180 um technol-
ogy from Virtual Silicon Technologies (VST) and from Artisan. Short-
cuts to the standard cell documentation (.pdfs) are located in digflow/vstlib
and digflow/artlib. More information is available within the /CMC/kits/cmosp18/...
directory structure if neccessary.
Technology Parameters
As with the standard cells, a shortcut to the process parameter documen-
tation is provided in digflow/tech. This file contains all of the electrical
characteristyics regarding resistance and capacitance for different layers
and operating conditions.
Synopsys Documentation
If using Synopsys tools, the Synopsys On-Line Documentation (SOLD)
can be accessed by typing the sold command. Within this documentation
there is a very good description of RTL coding styles for proper synthesis
applying to both Synopsys and Cadence synthesis tools.
2 If Netscape is too slow, when it opens it will not be pointing to the proper document.
4
2 Design Flow
ASIC design flows vary widely according to the current state of EDA (Electronic
Design Automation) tools and company preferences. The current flow is based
primarily on tools provided by Cadence Design Systems, but where appropriate,
competing tools are mentioned.
In this document we will focus on the steps from RTL Design through to
Global Routing, but for completeness the entire ASIC flow is described.
5
in order to fully determine whether a design will meet timing and area
requirements, it must be physically layed-out. During this step, the basic
floorplan of the chip is described so that the interconnect delays can be
estimated during compilation.
Power Planning Each cell must be connected to power and ground
along its edges. To protect the chip wiring, the current through any par-
ticular wire must be limited below some threshold. Based on your designs
speed, layout, and toggling activity, power rails must be distributed across
the design so that this limit is not violated.
Compiling From the generic HW mapping, the tool picks elements from
the digital library and logically arranges them to perform the required
tasks within the timing constraints.
Scan Insertion If all of a designs flip-flops can be configured to form
a long shift-register, manufacturing faults can be detected. Tools can
automatically place multiplexors at the input to all flip-flops and link
them together into a scan-chain. During normal operation the circuit
is unaffected, but when a test signal is asserted the scan-chain can be
used to isolate manufacturing defects. Synopsys DC, Cadences PKS,
and Mentors FastScan can automatically insert the additional circuitry
to allow scan-testing.
Clock Tree Insertion Ideally the clock signal will arrive to all flip-flops
at the same time. Due to variations in buffering, loading, and interconnect
lengths, however, the clocks arrival is skewed. A clock-tree insertion tool
evaluates the loading and positioning of all clock related signals and places
clock buffers in the appropriate spots to minimize skew to acceptable
levels. Some clock tree insertion tools, all from Cadence, include CTSGen,
ctgen, and CTPKS.
Optimization After placing the cells, adding scan circuitry and inserting
a clock-tree, the design may no longer meet timing requirements. This op-
timization step can restructure logic, re-size cells, and vary cell placement
in order to meet constraints.
Routing Up until this point, all timing estimates assume that signals can
be routed without being detoured, as can be caused by wiring congestion.
After initial optimization, the routing is actually performed in two steps:
1. Global Routing creates a coarse routing map of the design. It evalu-
ates areas which are highly congested and plans how signals should go
around those area. After global routing, the design can be re-timed
using more accurate interconnect data.
2. Final Routing uses the plan from the global route and lays out the
metal tracks and vias to physically connect the cells. Two final-
routers are available - WarpRoute and NanoRoute.
6
Parasitic Extraction Once the detailed routing tracks are inserted, an
extraction tool is used to more accurately determine the resistance and
capacitance of each net. Two such extraction tools are Fire and Ice
and HyperExtract. These tools can also be used to determine the cross-
coupling capacitance between two signals which are important when eval-
uating signal integrity.
Post-Routing In-Place-Optimization After importing the parasitic in-
formation (usually in the form of a .rspf file), timing is re-evaluated to
ensure it meets the constraints. At this stage limited changes can be
performed, such as cell re-sizing and net re-routing in attempts to close
timing.
Signal Integrity Fixes If the cross-coupling capacitance between two
signal lines is high, quick transitions on one net can affect the other.
Within the EDA tools, these nets are referred to as victims and agres-
sors. Agressors are characterized by large drivers and quick transistion
times, whereas victims posess the opposite characteristics. Signal integrity
violations can be divided into two categories:
1. Crosstalk is caused when a victim and agressor pair transition at
the same time. The victim may be either sped up (if both signals
transition in the same direction), or delayed. This variation is then
taken into account for either best or worst case timing analysis.
2. Glitching is caused when a transition on the agressor net can cause
a logical change (from 1-to-0 or 0-to-1) on the victim net.
In either case, the signal integrity tool (Cadences CeltIC) identifies the
victim and agressor nets for repair. To fix such a violation, buffers can be
inserted, nets can be re-routed, or shielding can be inserted between the
offending nets. After any signal integrity fixes, extraction is re-done and
timing closure must be verified.
Physical Checks Once timing closure has been assured, various physical
checks are carried out. If any changes are made, extraction should be re-
done and timing re-evaluated:
Antenna Check During manufacture, when a metal patch is being
deposited charge builds upon it. If the charge builds faster than it can
be dissipated than a large voltage can be developed. If a transistors
gate is exposed to this large voltage then it can be destroyed. This is
referred to as an antenna violation. To prevent this, leakage diodes
can be inserted to drain excess charge, or long metal traces on a
single layer can be prevented.
Layout vs. Schematic (LVS) The LVS tool extracts the connec-
tivity information from the routed layout and compares it with the
final logical netlist. An LVS match confirms that errors were not in-
troduced during the physical layout of the design. Tools to perform
7
for LVS include Cadences Assura (formerly Diva, formerly Dracula)
and Mentors Calibre.
Design Rule Checking (DRC) The design rule check validates that
the spacing and geometry in the design meets the requirements of the
foundry. The same tools used for LVS are used to perform DRC.
8
3 HDL Coding Guidelines
Many of these items are taken, with permission, from HDL Coding Guidelines,
by Damjan Lampret and Jamil Khatib, June 7, 2001, www.opencores.org
3.1 Description
The guidelines are of different importance, and fall into three classes
Good practice - signifies a rule that is common good practice and should
be used in most cases. This means that in some cases there are specific
problems that violate this rule.
Recommendation - signifies a rule that is recommended. It is uncommon
that a problem can not be solved without violating this rule.
Strong recommendation - signifies a hard rule, this should be used in all
situations unless a very good reason exists to violate it.
3.2 Resets
Resets make the design deterministic. It prevents reaching prohibited states
and avoides simultation/synthesis mismatches.
Recommendation: All flip-flops should have a reset. Prevents simula-
tion/synthesis mismatches.
Recommendation: Resets should be active-low. Cell libraries contain
active-low reset flops. Coding them as such prevents the insertion of un-
wanted buffering on the reset logic.
Recommendation: Resets should be asynchronous. Most flops have them.
Maintains compatibility between ASIC/FPGA code. Easier debugging.
Good Practice: The active-low reset should be applied asynchronously,
de-asserted synchronously.
// reset comes off once when pushbutton is high AND posedge clk
assign rst_an = rst_sn & rst_an_pushbutton;
9
Strong Recommendation: Active-low, asynchronously reset flops are coded
as follows:
3.3 Clocks
Recommendation: Signals that cross different clock domains should be
sampled before and after the crossing domains (double sampling is pre-
ferred). Prevent meta-stability state.
Good practice: Use as few clock domains as possible in any design.
Recommendation Do not use clocks or reset as data or as enable. Do not
use data as clock or as reset. Code such as this must be prevented:
10
Recommendation: Start buses at bit 0. Some tools dont support buses
that dont start at bit 0.
Recommendation: Use MSB to LSB for busses. This is to avoid misin-
terpretation through the design hierarchy.
11
Good Practice: Compare buses with the same width. The missing bits
may have unexpected value in the comparison process.
Strong recommendation: Avoid using long if-then-else statements and use
case statement instead. This is to prevent inferring of large priority de-
coders and makes the code easier to be read.
Strong Recommendation: Avoid using internal tri-state signals. They
increase power consumption and make backend tuning more difficult.
12
digflow
... ...
13
A B
EN EN
RB RB
16 8
signed
clocks, resets,enables
are common to all registers.
24
EN
RB
14
Paramaterized Behavioural Description: This description and architecture
is equivalent to the sign-extension solution earlier, but, in this case the
operand widths of A and B are specified as parameters. This allows the
code to be re-used in any situation and is higly encouraged. On the other
hand, paramaterized code is often more difficult to read and understand.
A UNIX symbolic link is used to make this file the default for this tutorial.
15
Simulation Wrappers Testbench and Verification Suite Source Code References
include ../tb/functions.v
tb/gatesim.v
// Vector IO Stimulation tb/test2.v
include ../tb/main_tb.v
include ../release/filter.v
include ../../artlib/cells.v .
Gate Level Netlist . Individual Tests
module gatesim . Sets up IO vectors and sequences test.
$sdf_annotate(...);
$shm_probe("AS"); Standard Cell Definitions tb/testn.v
tb tb_inst();
16
5 Verilog Simulation
Within the UNIX environment we will use Cadences NC-Verilog for our simu-
lations.
ncverilog ../rtl/signed_mult.v
Though this will not run a simulation, it will compile the design and inform
you of any syntax errors. Note that the output from any ncverilog run is
captured in the file ncverilog.log.
3. Familiarize yourself with the main testbench ../tb/main tb.v :
Line 1: The timescale directive should only be included once at the
beginning of a simulation.
Line 7: The VERBOSE constant is used to determine the extent of
debugging information displayed. 0 for None, and higher values to
dump more information.
5 For speedy operation, by default, NC-Verilog does not record waveform traces, even when
told to. Using the +access+r options over-rides this behaviour. Running the setup script
in this tutorial aliases ncverilog to ncverilog +access+r so that signal recording is on by
default.
17
Lines 27-28: The check vectors routine in verilog lib/src/vector search.v
searches for the occurance of expected buffer in output buffer. Since
arrays cannot be passed in standard verilog, these must be global
variables.
Line 34: The instantiation of the multiplier, or the device-under-test
(DUT).
Line 47: If the vector search routines are used they must be included
within the module definition.
Line 53: The result from the DUT is converted to an integer using
sign-extension.
Lines 56-63: The interface to the DUT should behave like hardware,
capturing the result on the positive edge of the clock like a register.
The integer results are stored sequentially in output buffer for later
comparison.
Lines 66-67: It is convenient to specify the inputs A and B as integers.
This truncates them for application to the DUT.
Line 73: Displays the IO vectors if the VERBOSE constant is above
0.
Line 87: Start of main test sequencing.
Lines 104-110: Reset the system at the start of each test. A good
rule of thumb is not to change inputs at the active clock edge. As
such we use the negative edge of the clock to trigger all changes to
DUT inputs.
Lines 115-121: Prepare random inputs for the DUT within the proper
range of values.
Line 123: Calculate the expected result using verilogs integer multi-
plication abilities.
Line 127: Call the check vector function to search for 90 consecutive
matching positions between output buffer and expected buffer. The
routine displays whether a match was found or not.
Line 133: Start the next test using the same format as lines 104
through 130.
4. Having looked at the RTL and the testbench, run the simulation from the
multiply/sim directory, with the command:
Examine the output and note how the search function reports that the
expected vectors were found in the recorded output stream. To get more
detailed information, change Line 7 of main tb.v to define VERBOSE
18
2 and re-run the simulation. Now each result is displayed as it occurs,
and the output and expected buffers are displayed by the search routine.
Change the VERBOSE level to 1 and re-run the simulation to observe the
difference.
5. Now well intentionally introduce a bug and view the simulation result. In
rtl/signed mult.v, change Line 78 to use the unextended inputs Areg and
Breg instead of Aext and Bext. Re-run the simulation and examine the
output to see how the errors are reported. Ensure you fix rtl/signed mult.v
before moving on.
6. Rather than using NC-Verilog, well try using the slightly older (and
slower) Verilog-XL for the next simulation (just so you can say youve
used Verilog-XL). Replace ncverilog with verilog on the command
line.
In this design, the output is not registered within the module and so the
results appear a cycle earlier. Note how the search-routine reports that
the expected string was found at position 2 in the output buffer, not 3 as
before. Without a flexible routine to match up the output and expected
vectors, the test would have improperly failed.
19
Figure 4: Simvision Waveforms for Signed Multiplier
ncverilog ../tb/rtlsim.v
The simulation will run as before, but will record the waveforms in the rtlsim
subdirectory.
This launches the tool, loads the rtlsim database, and returns the command
prompt. The tool opens to the design-browser. Expand the signal hierarchy by
highlighting the rtlsim folder, and selecting Edit - Explode. Select the tb icon.
Note how the signals are displayed in the viewer. Chose Select - All from the
menu, and click on the waveform icon to view the selected traces (Figure 4). In
the waveform viewer you can zoom-in and out, pan around, go to specific time
periods, etc... As in many graphical systems, there are many ways to perform
any task and it is usually easiest to learn through exploration.
If there is a particular waveform setup that you wish to record, you can save
a Command script from the file menu. Note that this only saves the Setup
6 The previous version was called Signalscan and is still available.
20
such as the list of signals, cursors, zoom settings, etc... but does NOT save
the underlying signal data.
21
6 Quick Synthesis
Cadence and Synopsys are the two primary providers of ASIC synthesis tools.
Synopsys Design Compiler (DC) has long been the standard, but Cadences
Builgates and Physical Synthesis (PKS) tools have recently emerged as a com-
perable, lower cost, solution.
For the purpose of this tutorial we will focus on Cadence tools, but well also
introduce you to basic synthesis in Synopsys DC.
The Cadence tool-set can be subdivided into 3 classes:
Buildgates (BG) - Basic synthesis tool. Started with bg shell.
Buildgates Extreme (BGX) - Adds advanced synthesis techniques for dat-
apath components. Started with bgx shell.
Physical Synthesis (PKS) - Adds physical awareness to BGX. Started with
pks shell.
All 3 flavours have the same interface, but with different capabilities. The orig-
inal Buildgates is highly crippled and generates very poor results. For normal
synthesis, BGX is the flavour to use, but, if the design is timing critical or
floorplanning is required then PKS is the appropriate tool.
Often during initial design phases, area and timing estimates are required
long before a project is ready for layout. Tables 2 and 3 list the required
commands to quickly synthesize an RTL or Behavioral design using the Cadence
or Synopsys tools.
Start the tools from their respectively directories (multiply/pks and multi-
ply/syn). In the GUI version of PKS, the command prompt is available along
the bottom of the screen (Figure 5). To get the command prompt in Design An-
alyzer (which is the GUI version of dc shell), select Setup - Command Window
from the Menu bar (Figure 6).
Following the commands listed in Tables 2 and 3, synthesize the signed
multiplier in both tools.
By examining the generated reports, try to compare the results in terms of
speed and area before we go further into the details. Exit the tools using either
the GUIs, or the quit command.
22
Figure 5: Screenshot of Multiplier in PKS
23
Figure 6: Screenshot of Multiplier in Design Analyzer
24
Examine the files multiply/pks/tcl/quicksynth.tcl and multiply/syn/scr/quicksynth.scr,
and compare them with Tables 2 and 3. Note how values such as the clock pe-
riod and root pin have been replaced with variables, allowing the script to be
re-used for other designs.
From the multiply/pks directory, re-synthesize the multiplier automatically
by issuing the command:
pks_shell -f tcl/quicksynth.tcl
This will start PKS in text mode, and immediately run the referenced script.
Once synthesis is finished, it will end with the PKS command prompt. From
there, you can issue further PKS commands or quit to the UNIX shell.
Remember, the GUIs are useful for learning and experimentation, but once
issues are settled, scripts should be written to automatically generate your layout
from RTL.
25
Main Menu
Default Toolbar Quickbuttons
Text Editor
Schematic Viewer
Hierarchy Design Browser Layout Viewer
26
Commands and switches do not need to be fully specified. (ie. set clock root
-clock myclk clkpin and set clock ro clkpin -cl myclk are equivalent.)
Most synthesis commands begin with one of:
8 Digital Libraries
8.1 Logical Libraries
The first step in ASIC synthesis is to read the library data for standard cells
and any macro blocks (eg. RAMS). The logical and timing data for the library
may be provided in any of the following (roughly) equivalent forms7 :
.tlf - Cadence Timing Library Format
.ctlf - Compiled (Binary) TLF
.alf - Cadence Ambit Library Format
.lib - Synopsys Library Format
.db - Synopsys Database Format
These libraries contain:
Design Rules
Maximum Slew
Maximum Load
Maximum Fanout
Default Design Units (typical unit)
Capacitance (pF )
Delay (nS)
Area (um2 )
7 Though tools can convert from one format to another, the process is typically buggy and
frustrating.
27
Power (Dynamic - mW , Static - uW )
Resistance (k)
And then for Best, Worst, and Typical process conditions:
Process, Temperature, Voltage Ratings
Wireload Estimates Average Interconnect RC vs Net Fanout
Cell Data
Logical Function
Timing Delay Tables (Delay versus Load and Slew)
Pin Capacitance Estimates
Static and Dynamic Power Dissipation
Cell Area
Typically a library vendor will provide the cell data in seperate files for best,
worst, and typical environments. Most circuit synthesis should be performed
using the worst-case delays, however, best-case models must be considered when
fixing hold-time violations. In the quick-synthesis of Section 6 we loaded only
the worst case libraries, but for full synthesis we should merge the best and
worst case libraries. After the merge operation, PKS will chose the fast or slow
model appropriately.
To use the Artisan cells, and merge the best and worst case data into a
library called cells, issue the PKS command:
read_tlf -min ~/digflow/artlib/cells_bc.tlf \
-max ~/digflow/artlib/cells_wc.tlf \
-name cells
You can safely ignore the warnings Missing Input( ) expression for LATCH(
).
After having read in the data, use the command report library -wireload -
operating cond to view the global information listed in the library files. Using
another variation of the report library command well experiment with pattern
matching. Issues the commands:
1. report library -help to see the syntax of the command.
2. report library -cell NAND2* to list all variations of 2 input NAND gates.
3. report library -cell NAND*XL to list all low-power (XL) NAND gates.
4. report library -cell NAND?X? to list all NAND gates with un-inverted
inputs.
28
8.2 Physical Libraries
As device sizes shrink, interconnect RC delays are becoming more significant
than traditional gate delays. As such, wireload models which assume an
interconnect delay based on chip area and fan-out are inaccurate. To decrease
estimation errors, Physical Synthesis tools perform the placement and global
routing of cells as part of the mapping process.
In order to perform the layout, the tool needs additional information. A .tf
(technology file) or LEF8 (Library Exchange Format) normally contains contains
data regarding a process parasitic information (ie. TSMC CMOSP18). And
often a sperate LEF file contains the physical dimensions of the standard cells.
In the case of the Artisan cells, all of the data has been combined in a single
file and can be read using the command9 :
read_lef ~/digflow/artlib/cells.lef
Unfortunately, there is some overlap between what is specified in the log-
ical libraries, and what is in a LEF file. Specifically, thy both includes data
regarding a cells area and logical function. The dual-specifications can create
inconsistencies. To ensure this is not the case, run the command:
check_library cells
Though all logical cells should have physical equivalents, there are rare cells
such as loading capacitors or antenna diodes that may not have logical
equivalents.
Scripts to load either the VST or Artisan cell libraries are provided as
tcl/load vstlib.tcl and tcl/load artlib.tcl. These scripts also load additional li-
braries for the IO pads which are available. Once PKS starts, either of these
can be run using source tcl/<script name.tcl>
PKS (as well as SE) can only read the older version, whereas the router (wroute) can use the
newer version.
9 In cases where the process and cell information are in seperate files, the process informa-
tion must be read first, and then the cell data is read with a read lef update command.
29
Read logical libraries read tlf -min cells bc.tlf -max cells wc.tlf -name cells
Report on logical libraries report lib
Read physical libraries read lef tech.lef
Additional physical libraries read lef update morecells.lef
Check consistency of library check library cells
Physical Libraries (Normally .LEFs) contain the data neccessary for lay-
out.
Consistency should be verified between physical and logical libraries.
Wildcards (* and ?) can be used in TCL based pattern matching.
The scripts tcl/load vstlib.tcl and tcl/load artlib.tcl are provided.
30
9.3 Timing Constraints
In order to synthesize a design properly we need to inform the tool of all relevent
boundary conditions and constraints. In large projects this is often the most
complex part of the design.
In Section 6 we constrained the design merely by asserting the clock period.
This assumes that our IO will not be a factor in timing analysis. If the critical
path is internal to the circuit than this is okay for experimentation purposes.
When the design is integrated into a larger project, however, we need to consider
the boundary conditions. This involves setting:
Input Delay - The time, after the clock edge, that it will take for the signal
to reach the input port. This should be specified for both the best, and
worst-case scenarios.
External Delay - The delay a signal will experience outside of our designs
boundary, before it reaches a register. Again, external delay should be
specified for the best and worst-case.
Port Capacitances - The capacitance that our design must drive, or any
additional capacitance that must be driven by input drivers.
Driving Cell/Resistance - This determines how fast the input driver can
charge the port-capacitance, and is added on to the specified input delay.
When every IO of a design is registered, such as in our reference design (Figure
8), the constraints are simplified.
Unfortunately, this is not often the case and more elaborate constraints need
to be applied. In Figure 9 we illustrate how to accomodate:
Elaborate IO timing variations
False and multi-cycle timing paths
Clock Insertion Delay and Uncertainty
Constraints become even more complex when we need to consider data trans-
fer across clock-domains. In this case, all clocks must be synchronously related,
and the timing relationship between each domain is explicitly stated. Since con-
straint description can be quite involved, we will not go into such complications.
Following the commands outlined in Figure 8, finish properly constraining
the tutorial design. Note the use of the command find -inputs * which returns a
list of input object-ids. To return the names instead of object-ids, use get names
[find -inputs *]. Feel free to experiment with different variations of the find
command as it can be very usefull in larger designs.
When finished, save the result in the Ambit Database Format (ADB):
write_adb adb/constrained.adb
31
Primary Constraints
Input Output Internal
Delay set_input_delay Delay set_external_delay Clock Period set_clock
Load set_port_capacitance Load set_port_capacitance False Path set_false_path
Drive Cell set_drive_cell Resistance set_wire_resistance MultiCycle set_cycle_addition
Resistance set_drive_resistance Fanout set_fanout_load* ClockInsertion set_clock_insertion_delay
ClockUncertaintly set_clock_uncertaintly
Assumed
Ideal Flop
rst_an
Assumed
Ideal Flop
en
A Z
clk
Create a symbolic clock called refclk with a 20nS period. set_clock refclk period 20.0
Bind clock waveform to the actual CLK pin. set_clock_root clock refclk clk
Assumes all inputs, other than the CLK, are driven by a 1X flop. set_drive_cell cell DFFX1 pin Q [find inputs * noclocks]
Assumes infinite drive strength on the CLK and RST pins. set_drive_resistance 0 clk
Set the bestcase input delay to a fast tcq (200ps) set_input_delay clock refclk early 0.2 [find inputs * noclocks]
Set the worstcase input delay to a slow tcq (500ps) set_input_delay clock refclk late 0.5 [find inputs * noclocks]
Assume a load of 10fF (about 2 standard loads) for all ports. set_port_capacitance 0.01 [find ports *]
The data must arrive at least a setup time before the next edge. set_external_delay clock refclk late 0.5
Ensures that we meet hold timerequirements. set_external_delay clock refclk early 0.1
32
Primary Constraints
Input Output Internal
Delay set_input_delay Delay set_external_delay Clock Period set_clock
Load set_port_capacitance Load set_port_capacitance False Path set_false_path
Drive Cell set_drive_cell Resistance set_wire_resistance MultiCycle set_cycle_addition
Resistance set_drive_resistance Fanout set_fanout_load* ClockInsertion set_clock_insertion_delay
ClockUncertaintly set_clock_uncertaintly
Assumed Assumed
Ideal Flop Ideal Flop
0.5n A EN V
multicycle path N
1X 5f
2X W
B 2n
M
5f 5f
5f
C 1n
X
2n
2X
5f
2n 5f
D Y
2k
false path
5f
E Z 1n
5f
0.2n
3n CLK
1n
33
Read verilog sources read verilog ../rtl/rtl files.v
Map to generic hardware do build generic
Create a clock waveform set clock refclk -period 10
Bind the waveform to clk pin set clock root -clock refclk clk
Set input drive strengths set drive cell -cell DFFX1 -pin Q {A B en rst an}
Set port loads set port capacitance 0.01 {A B Z en rst an}
Prepare infinite clock drive set drive resistance 0 CLK
Set best case input delay set input delay -early 0.1 {A B en rst an}
Set worst case input delay set input delay -late 0.5 {A B en rst an}
Set best case external delay set external delay -early -0.1 Z
Set worst case external delay set external delay -late 0.3 Z
Save the design write adb adb/constrained.adb
Key Points:
All Verilog (or VHDL) source files must be read into the tool.
The top level module must be mapped to generic hardware.
Basic timing constraints must be applied including:
Clock Period
Clock Root
Port Loading
Any Input/Output External Delays
The find command can be used to return a list of object ids.
The get names command converts object-ids to names.
10 Floorplanning
The floorplanning process takes a logical netlist and lays out the standard cells
in groups of rows.
34
The chicken and the egg phenomema is alive and well when it comes to
floorplanning. We cant floorplan until we have a netlist, but we cant get
an accurate netlist until we have an idea of the floorplan. The solution is to
floorplan an initial netlist but leave enough flexibility for optimization, and
the addition of test-features, and clock buffers.
To get an idea of the available floorplanning options, issue the command:
report_floorplan_parameters
We will start with these, just to get an idea of how our design will eventually
look. To generate the initial floorplan:
Restart PKS, and load the appropriate libraries.
35
Figure 10: Result of an Initial Floorplan
36
Estimated Avg Power = 530uW or 194 uA Ideal Connections
Designed For Peak of ~1.5mW or 830 uA
~415uA
M1M2 Vias
W = 0.415um
70um~ 140 Sq
70um ~ 170 Sq
~415uA
~140um
The result indicates that the circuit will consume an average of 0.53mW,
before consideration of clock buffering and test-insertion. To accomodate these
additions, and respect peak power conditions, well design for a system that
consumes 1.5mW. At 1.8V, this means that the circuit will consume up to
830uA. Viewing the initial layout, it appears that there are about 25 rows of
standard cells. If we assume roughly equal power distribution, each row will
draw about 35uA. If we power the design with a single stripe stripe down the
middle of the design, connected at both top and bottom then:
The stripe must handle 830uA total. Referring to Figure 11, if evenly
drawing from top and bottom sources, the current bottlneck will be 415uA.
To satisfy current density limits, the power and ground supply stripes must
be at least 0.415um wide.
37
Estimated Avg Power = 530uW or 194 uA Connection points to ideal supply.
Designed For Peak of ~1.5mW or 830 uA THEN added Engineering margin
M1M2 Vias
~140um
~70um
~140um
then across the row to an extreme edge on M1. From the initial layout this
would be about 70um of 0.415um wide M2 + 70um of 0.8um wide M1.
This is 170RsqM 2 + 87RsqM 1 . Referring to the process documentation,
RsqM x = 0.08, wheras a typical VIA resistance is 6. The maximum
resistance this path would face would therefore be 26. Even if all 800uA
of the circuits current were consumed by this single cell at the extreme
edge of power, voltage drop would only be 21mV, well under the 90mV
allowace.
As the above calculations show, IR drop is not a problem, but we must en-
sure the supply rails are at least 0.415um wide to prevent self-heat/electromigration
problems. Since the standard cell width is 0.66um, well use this width
and spacing for our power stripes as well.
There are tools available for more detailed analysis. Having used these
relatively ballpark figures, as shown in Figure 12, we should add even
more margin to the design well add a power ring, with a width of
2*0.66um, and make multiple connections to the ideal supplies.
When the power grid is eventually in place, it will occupy a portion of the
die which could otherwise be used for placing cells and routing signals. Though
38
PKS does not perform power routing, we need to inform it of these planned
obstructions so that it can work around them. Issue the following command,
and note that all horizontal specifications should be a multiple of the cell pitch
(0.66um).
To reserve space for the eventual power ring, we increase the core-to-boundary
offsets. This is done through the set floorplan paramaters command as follows:
39
11 Clock Tree Insertion
11.1 What is a clock tree?
Ideally the clock signal will arrive to all flip-flops at the same time. Due to
variations in buffering, loading, and interconnect lengths, however, the clocks
arrival is skewed. A clock-tree insertion tool evaluates the loading and position-
ing of all clock related signals and places clock buffers in the appropriate spots
to minimize skew to acceptable levels. Some clock tree insertion tools, all from
Cadence, include CTSGen, ctgen, and CTPKS. We will use CTPKS to create
a clock tree within PKS.
40