Guidelines For Design Synthesis Using Synopsys Design Compiler Design Synthesis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Design Synthesis

Guidelines
For Introduction
Design Synthesis
Using One of the most important steps in ASIC design is the synthesis phase. Synthesis
is an automatic method of converting a higher level of abstraction to a lower level of
Synopsys Design Compiler abstraction. In other words the synthesis process converts Register Transfer Level (RTL)
descriptions to gate-level netlists. These gate-level netlists can be optimized for area,
speed, testability, etc. The synthesis process is shown in Fig 1.0.

RTL Technology Environment


Description Library & Constraints

Synthesis

Gate-level
Netlists

Figure 1. Synthesis process overview

The inputs to the synthesis process are RTL HDL description, circuit constraints
and attributes for the design, and a technology library. The synthesis process produces an
optimized gate-level netlist from all these inputs. Synthesizing a design is an iterative
process and begins with defining the constraints for each block of the design. In addition
to these constraints, a file defining the synthesis environment is also needed. The
environment file specifies the technology cell libraries and other relevant information that
the tool uses during synthesis.

Department of Computer Science Engineering Synthesis Environment


University of South Carolina
The most commonly used synthesis tool in the ASIC industry is Synopsys Design
Columbia Compiler. Design constraints and other synthesis options are given as commands or
default settings to the tool.
Synopsys synthesis tool is divided into two sections, Design Analyzer and Design
Compiler. The former is the graphical interface and the latter is the command shell
interface to the synthesis tool. These two interfaces can be invoked using the following
Revised: December 2000. By: Sreesa Akella
commands.
Graphical interface, Design analyzer: Once these variables are setup properly, one can invoke the synthesis tool at the
unix_prompt> design_analyzer command prompt using any of the commands given for the two interfaces.

Shell Interface, Design compiler:


unix_prompt> dc_shell Design Objects

There are eight different types of objects categorized by Design Compiler.


Startup File
Design: It corresponds to the circuit description that performs some logical function. The
The Synopsys synthesis tool when invoked, either through Design analyzer or design may be stand-alone or may include other sub-designs. Although sub-design may
Design compiler command, reads a startup file, which must be present in the current be part of the design, it is treated as another design by the Synopsys.
working directory. This startup file is .synopsys_dc.setup file. There should be two
startup files present, one in the current working directory and other in the root directory Cell: It is the instantiated name of the sub-design in the design. In Synopsys terminology,
in which Synopsys is installed. The local startup file in the current working directory there is no differentiation between the cell and instance; both are treated as cell.
should be used to specify individual design specifications. This file does not contain
design dependent data. Its function is to load the Synopsys technology independent Reference: This is the definition of the original design to which the cell or instance
libraries and other parameters. The user in the startup files specifies the design dependent refers. For e.g., a leaf cell in the netlist must be referenced from the link library, which
data. The settings provided in the current working directory override the ones specified in contains the functional description of the cell. Similarly an instantiated sub-design must
the root directory. be referenced in the design, which contains functional description of the instantiated sub-
design.
There are four important parameters that should be setup before one can start
using the tool. They are: Ports: These are the primary inputs, outputs or IO’s of the design.

• search_path Pin: It corresponds to the inputs, outputs or IO’s of the cells in the design. (Note the
This parameter is used to specify the synthesis tool all the paths that it should difference between port and pin)
search when looking for a synthesis technology library for reference during
synthesis. Net: These are the signal names, i.e., the wires that hook up the design together by
• target_library connecting ports to pins and/or pins to each other.
The parameter specifies the file that contains all the logic cells that should used
for mapping during synthesis. In other words, the tool during synthesis maps a Clock: The port or pin that is identified as a clock source. The identification may be
design to the logic cells present in this library. internal to the library or it may be done using dc_shell commands.
• symbol_library
This parameter points to the library that contains the “visual” information on the Library: Corresponds to the collection of technology specific cells that the design is
logic cells in the synthesis technology library. All logic cells have a symbolic targeting for synthesis; or linking for reference.
representation and information about the symbols is stored in this library.
• link_library
This parameter points to the library that contains information on the logic gates in Design Entry
the synthesis technology library. The tool uses this library solely for reference but
does not use the cells present in it for mapping as in the case of target_library. Before synthesis, the design must be entered into the Design Compiler or Design
Analyzer (referred to as DC/DA from now on) in the RTL format. DC/DA provides the
An example on use of these four variables from a .synopsys_dc.setup file is given below. following two methods of design entry:

search_path = “. /synopsys/libraries/syn/cell_library/libraries/syn” read command


target_library = class.db analyze & elaborate commands
link_library = class.db
symbol_library = class.db
The analyze & elaborate commands are two different commands, allowing designers to The target_library, link_library, and symbol_library parameters in the start-
initially analyze the design for syntax errors and RTL translation before building the up file are used to set the technology library for the synthesis tool.
generic logic for the design. The generic logic or GTECH components are part of
Synopsys generic technology independent library. They are unmapped representation of The Synopsys® .lib technology library contains the following information
boolean functions and serve as placeholders for the technology dependent library. • Wire-load models for net length and data estimation. Wire-load models available in
The analyze command also stores the result of the translation in the specified the technology library are statistical and hence inaccurate when estimating data.
design library that maybe used later. So a design analyzed once need not be analyzed • Operating Conditions along with scaling k-factors for different delay components to
again and can be merely elaborated, thus saving time. Conversely read command model the effects of temperature, process, and voltage on the delay numbers.
performs the function of analyze and elaborate commands but does not store the • Specific delay models like piece-wise linear, non-linear, cmos2 etc. for calculation of
analyzed results, therefore making the process slow by comparison. delay values.
Parameterized designs (such as usage of generic statement in VHDL) must use For each of the technology primitive cells the following information is modeled
the analyze and elaborate commands in order to pass required parameters, while • Interface pin names, direction and other information.
elaborating the design. The read command should be used for entering pre-compiled • Functional descriptions for both combinational and sequential cells which can be
designs or netlists in DC/DA. modeled in Synopsys®
One other major difference between the two methods is that, in analyze and • Pin capacitance and drive capabilities
elaborate design entry of a design in VHDL format, one can specify different
• Pin to pin timing
architectures during elaboration for the same analyzed design. This option is not available
• Area
in the read command.
The commands used for both the methods in DC are as given below:
Read command:
dc_shell>read –format <format> <list of file names> Register Transfer-Level Description
“-format” option specifies the format in which the input file is in, e.g. VHDL
Sample command for a reading “adder.vhd” file in VHDL format is given below A Register Transfer-Level description is a style that specifies a particular design
dc_shell>read –format vhdl adder.vhd in terms or registers and combinational logic in between. This is be shown by the
“register and cloud” diagram in Fig 2.0
Analyze and Elaborate commands:
dc_shell>analyze -format <format> <list of file names> Register Register
dc_shell>elaborate <.syn file> -arch “<architecture >” –param “<parameter>” Data-in Data-out
.syn file is the file in which the analyzed information of the design analyzed is stored. Combinational
e.g: The adder entity in the adder.vhd has a generic parameter “width” which can be Logic
specified while elaboration. The architecture used is “beh” defined in the adder.vhd file.
The commands for analyze and elaborate are as given below:
CLK CLK
dc_shell> analyze -format vhdl adder.vhd
dc_shell> elaborate adder –arch “beh” –param “width = 32”
Clock
Figure 2. Register and cloud diagram
Technology Library
Technology libraries contain the information that the synthesis tool needs to The registers can be described explicitly, through component instantiation, or
generate a netlist for a design based on the desired logical behavior and constraints on the implicitly, through inference. The combinational logic is described either by logical
design. The tool referring to the information provided in a particular library would make equations, sequential control statements (CASE, IF then ELSE, etc.), subprograms, or
appropriate choices to build a design. The libraries contain not only the logical function through concurrent statements and are represented by cloud objects in figure 2.0 between
of an ASIC cell, but the area of the cell, the input-to-output timing of the cell, any the registers.
constraints on fanout of the cell, and the timing checks that are required for the cell. RTL is the most popular form of high-level design specification. A good coding
Other information stored in the technology library may be the graphical symbol of the style would help the synthesis tool generate a design with minimal area and maximum
cell for use in creating the netlist schematic. performance.
General Guidelines process(A, B, C)
begin
Following are given some guidelines which if followed might improve the performance case (A) is
of the synthesized logic, and produce a cleaner design that is suited for automating the when 0 => D = B;
synthesis process. when others => D = C;
end case;
Clock logic including clock gating and reset generation should be kept in one end process;
block – to be synthesized once and not touched again. This helps in a clean
specification of the clock constraints. Another advantage is that the modules that The same code can be written using if statement along with elsif statements to cover all
are being driven by the clock logic can be constrained using the ideal clock possible branches.
specifications Three state buffers: A tri-state buffer is inferred whenever a high impedance (Z) is
No glue logic at the top: The top block is to be used only for connecting modules assigned to an output. Tri-state logic is generally not always recommended
together. It should not contain any combinational glue logic. This removes the because it reduces testability and is difficult to optimize – since it cannot be
time consuming top-level compile, which can now be simply stitched together buffered.
without undergoing additional synthesis. Signals versus Variables in VHDL: Signal assignments are order independent, i.e.
Module name should be same as the file name and one should avoid describing the order in which they are placed within the process statement does not have any
more that one module or entity in a single file. This avoids any confusion while effect on the order in which they are executed as all the signal assignments are
compiling the files and during the synthesis. done at the end of the process. The variable assignments on the other hand are
While coding finite state machines, the state names should be described using the order dependent. The signal assignments are generally used within the sequential
enumerated types. The combinational logic for computing the next state should be processes and variable assignments are used within the combinational processes.
in its own process, separate from the state registers. Implement the next-state
combinational logic with a case statement. This helps in optimizing the logic
much better and results in a cleaner design. Design Attributes and Constraints
Incomplete sensitivity lists must be avoided as this might result in simulation
mismatches between the source RTL and the synthesized logic. A designer, in order to achieve optimum results, has to methodically constrain the
Memory elements, latches and flip-flops: A latch is inferred when an incomplete if design, by describing the design environment, target objectives and design rules. The
statement with a missing else part is specified. A flip-flop or a register as it is constraints contain timing and/or area information, usually derived from the design
referred to, is inferred when an edge sensitive statement is specified in the process specifications. The synthesis tool uses these constraints to perform synthesis and tries to
body. A latch is more troublesome than a latch as it makes static timing analysis optimize the design with the aim of meeting target objectives.
on designs containing latches. So designers try to avoid latches and prefer flip-
flops more to latches.
Multiplexer Inference: A case statement is used for implementing multiplexers. To Design Attributes
prevent latch inferences in case statements the default part of the case statement
should always be specified. On the other hand an if statement is used for writing Design attributes set the environment in which a design is synthesized. The
priority encoders. Multiple if statements with multiple branches result in the attributes specify the process parameters, I/O port attributes, and statistical wire-load
creation of a priority encoder structure. models. The most common design attributes and the commands for their setting are given
below:
Ex: process(A, B, C)
begin Load: Each output can specify the drive capability that determines how many loads can
if A= 0 then D = B; end if; be driven within a particular time. Each input can have a load value specified that
if A= 1 then D = C; end if; determines how much it will slow a particular driver. Signals that are arriving later than
end process; the clock can have an attribute that specifies this fact. The load attribute specifies how
The above example infers a priority encoder with the first if statement given the much capacitive load exists on a particular output signal. The load value is specified in
precedence. The same code can be written using a case statement to implement a the units of the technology library in terms of picofarads or standard loads, etc... The
multiplexer as follows. command for setting this attribute is given below:
set_don’t_touch: This is used to set a don_touch property on the current_design, cells,
set_load <value> <object_list> references, or nets. This command is frequently used during hierarchical compilation of
e.g. dc_shell> set_load 1.5 x_bus blocks for preventing the DC from optimizing the don’t_touch object.
e.g. dc_shell> set_don’t_touch current_design
Drive: The drive specifies the drive strength at the input port. It is specified as a current_design is the variable referencing the current working design. It can be set using
resistance value. This value controls how much current a particular driver can source. The the current_design command as follows
larger a driver is, i.e 0 resistance, the faster a particular path will be, but a larger driver dc_shell>current_design <design_name>
will take more area, so the designer needs to trade off speed and area for the best
performance. The command for setting the drive for a particular object is given below set_input_delay: It specifies the input arrival time of a signal in relation to the clock. It is
used at the input ports, to specify the time it takes for the data to be stable after the clock
set_drive <value> <object_list> edge. The timing specification of the design usually contains this information, as the
e.g. dc_shell> set_drive 2.7 ybus setup/hold time requirements for the input signals. From the top-level timing
specifications the sub-level timing specifications may also be extracted.
e.g. dc_shell> set_input_delay –max 23.0 –clock CLK {datain}
Design Constraints dc_shell> set_input_delay –min 0.0 –clock CLK {datain}
The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
Design constraints specify the goals for the design. They consist of area and max and min input delays for the datain with respect to CLK, the setup-time requirement
timing constraints. Depending on how the design is constrained the DC/DA tries to meet for the input signal datain is 7ns, while the hold-time requirement is 0ns.
the set objectives. Realistic specification is important, because unrealistic constraints
might result in excess area, increased power and/or degrading in timing. The basic set_output_delay: This command is used at the output port, to define the time it takes for
commands to constrain the design are the data to be available before the clock edge. This information is usually is provided in
the timing specification.
set_max_area: This constraint specifies the maximum area a particular design should e.g. dc_shell> set_output_delay – max 19.0 –clock CLK {dataout}
have. The value is specified in units used to describe the gate-level macro cells in the The CLK has a period of 30 ns with 50% duty cycle. For the above given specification of
technology library. max output delay for the dataout with respect to CLK, the data is valid for 11 ns after the
e.g. dc_shell> set_max_area 0 clock edge.
Specifying a 0 area might result in the tool to try its best to get the design as small as
possible set_max_delay: It defines the maximum delay required in terms of time units for a
particular path. In general it is used for blocks that contain combination logic only.
create_clock: This command is used to define a clock object with a particular period and However it may also be used to constrain a block that is driven by multiple clocks, each
waveform. The –period option defines the clock period, while the –waveform option with a different frequency. This command has precedence over DC derived timing
controls the duty cycle and the starting edge of the clock. This command is applied to a requirements.
pin or port, object types. e.g. dc_shell> set_max_delay 5 –from all_inputs() – to_all_outputs()
Following example specifies that a port named CLK is of type “clock” that has a period
of 40 ns, with 50% duty cycle. The positive edge of the clock starts at time 0 ns, with the set_min_delay: It defines the minimum delay required in terms of time units for a
falling edge occurring at 20 ns. By changing the falling edge value, the duty cycle of the particular path.. It is the opposite of the set_max_delay command. This command has
clock may be altered. precedence over DC derived timing requirements.
e.g. dc_shell> create_clock –period 40 –waveform {0 20} CLK e.g. dc_shell> set_max_delay 3 –from all_inputs() – to_all_outputs()

set_don’t_touch_network: This is a very important command, usually used for clock


networks and resets. This command is used to set a dont_touch property on a port, or on Optimizing Designs
the net. Note setting this property will also prevent DC from buffering the net. In addition
any gate coming in contact with the “don’t_touch” net will also inherit the attribute. A fully optimized design is one, which has met the timing requirements and
e.g. dc_shell> set_dont_touch_network {CLK, RST} occupies the smallest area. The optimization can be done in two stages one at the code
level, the other during synthesis. The optimization at the code level involves
modifications to RTL code that is already been simulated and tested for its functionality.
This level of modifications to the RTL code is generally avoided as sometimes it leads to
inconsistencies between simulation results before and after modifications. However, there The above code is rewritten with only on addition operator being employed. The
are certain standard model optimization techniques that might lead to a better synthesized hardware synthesized is given in Figure 3(b).
design.
if A = ‘1’ then
Model Optimization temp := C; // A temporary variable introduced.
else
Model optimizations are important to a certain level, as the logic that is generated temp := D;
by the synthesis tool is sensitive to the RTL code that is provided as input. Different RTL end if;
codes generate different logic. Minor changes in the model might result in an increase or E = B + temp;
decrease in the number of synthesized gates and also change its timing characteristics.
C D
A logic optimizer reaches different endpoints for best area and best speed
depending on the starting point provided by a netlist synthesized from the RTL code. The
different starting points are obtained by rewriting the same HDL model using different
A MUX
constructs. Some of the optimizations, which can be used to modify the model for
obtaining a better quality design, are listed below.
B temp
Resource Allocation
This method refers to the process of sharing a hardware resource under mutually-
exclusive conditions. Consider the following if statement. ALU(+)

if A = ‘1’ then
E = B + C; E
else
E = B + D; Figure 3 (b). With resource allocation.
end if;
It is clear from the figure that one ALU has been removed with one ALU being
The above code would generate two ALUs one for the addition of B+C and other for the shared for both the addition operations. However a multiplexer is induced at the inputs of
addition B + D which are executed under mutually exclusive conditions. Therefore a the ALU that contributes to the path delay. Earlier the timing path of the select signal
single ALU can be shared for both the additions. The hardware synthesized for the above goes through the multiplexer alone, but after resource sharing it goes through the
code is given below in Figure 3 (a). multiplexer and the ALU datapath, increasing its path delay. However due to resource
sharing the area of the design has decreased. This is therefore a trade-off that the designer
B C B D may have to make. If the design is timing-critical it would be better if no resource sharing
is performed.

Common sub-expressions and Common factoring


ALU(+) ALU(+) It is often useful to identify common subexpressions and to reuse the computed
values wherever possible. A simple example is given below.

B := R1 + R2;
…..
C <= R3 – (R1 + R2);
A MUX
Here the subexpression R1 + R2 in the signal assignment for C can be replaced by
B as given below. This might generate only one adder for the computation instead of two.
E
Figure 3 (a). Without resource allocation. C <= R3 – B;
Common factoring is the extraction of common subexpressions in mutually-exclusive C := A + B;
branches of an if or case statement. …………
temp := C – 6; // A temporary variable is introduced
if (test)
A <= B & (C + D); for c in range 0 to 5 loop
else ……………
J <= (C + D) | T; T := temp;
end if; // Assumption : C is not assigned a new value within the loop, thus the above
expression would remain constant on every iteration of the loop.
In the above code the common factor C + D can be place out of the if statement, which ……………
might result in the tool generating only one adder instead of two as in the above case. end loop;

temp := C + D; // A temporary variable introduced. Constant folding and Dead code elimination
if (test) The are possibilities where the designer might leave certain expressions which are
A <= B & temp; constant in value. This can be avoided by computing the expressions instead of the
else implementing the logic and then allowing the logic optimizer to eliminate the additional
J <= temp | T; logic.
end if; Ex:
C := 4;
Such minor changes if made by the designer can cause the tool to synthesize ….
better logic and also enable it to concentrate on optimizing more critical areas. Y = 2 * C;
Computing the value of Y as 8 and assigning it directly within your code can avoid the
Moving Code above unnecessary code. This method is called constant folding.
In certain cases an expression might be placed, within a for/while loop statement,
whose value would not change through every iteration of the loop. Typically a synthesis The other optimization, dead code elimination refers to those sections of code, which are
tool handles the a for/while loop statement by unrolling it the specified number of times. never executed.
In such cases redundant code might be generated for that particular expression causing
additional logic to be synthesized. This could be avoided if the expression is moved Ex.
outside the loop, thus optimizing the design. Such optimizations performed at a higher A := 2;
level, that is, within the model, would help the optimizer to concentrate on more critical B := 4;
pieces of the code. An example is given below. if(A > B) then
……
C := A + B; end if;
…………
for c in range 0 to 5 loop The above if statement would never be executed and thus should be eliminated from the
…………… code.
T := C – 6; The logic optimizer performs these optimizations by itself, but nevertheless if the
// Assumption : C is not assigned a new value within the loop, thus the above designer optimizes the code accordingly the tool optimization time would be reduced
expression would remain constant on every iteration of the loop. resulting in faster tool running times.
……………
end loop; Flip-flop and Latch optimizations
The above code would generate six subtracters for the expression when only one Earlier in the RTL code section, it has been described how flip-flops and latches
is necessary. Thus by modifying the code as given below we could avoid the generation are inferred through the code by the synthesis tool. However there are only certain cases
of unnecessary logic. where the inference of the above two elements is necessary. The designer thus should try
to eliminate all the unnecessary flip-flop and latch elements in the design. Placing only
the clock sensitive signals under the edge sensitive statement can eliminate the
unnecessary flip-flops. Similarly the unwanted latches can be avoided by specifying the
values for the signals under all conditions of an if/case statement. It is clear that after using the parentheses the timing path for the datapath has been
reduced as it does not need to go through one more ALU as in the earlier case.
Using Parentheses.
The usage of parentheses is critical to the design as the correct usage might result Partitioning and structuring the design.
in better timing paths. A design should always be structured and partitioned as it helps in reducing
Ex. design complexity and also improves the synthesis run times since it smaller sub blocks
Result <= R1 + R2 - P + M; synthesis synthesize faster. Good partitioning results in the synthesis of a good quality
design. General recommendations for partitioning are given below.
The hardware generated for the above code is as given below in Figure 4 (a).
Keep related combinational logic in the same module
R1 R2
Partition for design reuse.
Separate modules according to their functionality.
ALU(+)
Separate structural logic from random logic.
P Limit a reasonable block size (perhaps a maximum of 10K gates per block).
Partition the top level.
Do not add glue-logic at the top level.
ALU(-) Isolate state-machine from other logic.
M Avoid multiple clocks within a block.
Isolate the block that is used for synchronizing the multiple clocks.

ALU(+)
Optimization using Design Compiler/Design Analyzer

Result For the optimization of design, to achieve minimum area and maximum speed, a
lot of experimentation and iterative synthesis is needed. The process of analyzing the
Figure 4 (a) Without using parentheses design for speed and area to achieve the fastest logic with minimum area is termed –
design space exploration.
If the expression has been written using parentheses as given below, the hardware For the sake of optimization, changing of HDL code may impact other blocks in the
synthesized would be as given in Figure 4 (b). design or test benches. For this reason, changing the HDL code to help synthesis is less
desirable and generally is avoided. It is now the designer’s responsibility to minimize the
Result <= (R1 + R2) – (P - M); area and meet the timing requirements through synthesis and optimization. The later
R1 R2 P M versions of DC, starting from DC98 have their compile flow different from previous
versions. In the DC98 and later versions the timing is prioritized over area. Another
difference is that DC98 performs compilation to reduce “total negative slack” instead of
ALU(+) ALU(-) “worst negative slack”. This ability of DC98 produces better timing results but has some
impact on area. Also DC98 requires designers to specify area constraints explicitly as
opposed to the previous versions that automatically handled area minimization. Generally
some area cleanup is performed but better results are obtained when constraints are
specified.
ALU(-)
The DC has three different compilation strategies. It is up to user discretion to
choose the most suitable compilation strategy for a design.

Result a) Top-down hierarchical compile method.


b) Time-budget compile method.
Figure 4 (b). After using parentheses c) Compile-characterize-write-script-recompile (CCWSR) method.
Top-down hierarchical Compile Advantages
Prior to the release of DC98 this method was used to synthesize small designs as Less memory intensive.
this method was extremely memory intensive and took a lot of time for large designs. In Good quality of results because of optimization between sub-blocks of the design.
this method the source is compiled by reading the entire design with constraints and Produces individual scripts, which may be modified by the user.
attributes applied, only at the top level. Disadvantages
DC98 provided Synopsys the capability to synthesize million gate designs by The generated scripts are not easily readable.
tackling much larger blocks (>100K) at a time. This approach is feasible for some designs It is difficult to achieve convergence between blocks
depending on the design style (single clock etc.) and other factors. One may use this Lower block changes might need complete re-synthesis of entire design.
technique to synthesize larger blocks at a time by grouping the sub-blocks together and
flattening them to improve timing.
Resolving Multiple instances
Advantages Before proceeding for optimization, one needs to resolve multiple instances of the
Only top level constraints are needed. sub-block of your design. This is a necessary step as Dc does not permit compilation until
Better results due to optimization across entire design. multiple instances are resolved.
Disadvantages Ex: Lets say moduleA has been synthesized. Now moduleB that has two
Long compile time. instantiations of moduleA as U1 and U2 is being compiled. The compilation will be
Incremental changes to the sub-blocks require complete re-synthesis. stopped with an error message stating that moduleA is instantiated 2 times in moduleB.
Does not perform well, if design contains multiple clocks or generated clocks. There are two methods of resolving this problem. You can set a don_touch attribute on
moduleA before synthesizing moduleB, or uniquify moduleB. uniquify a dc_shell
Time-budgeting compile. command creates unique definitions of multiple instances. So it for the above case it
This process is best for designs properly partitioned designs with timing generates moduleA-u1 and moduleA_u2 (in VHDL), corresponding to instance U1 and
specifications defined for each sub-block. Due to specifying of timing requirements for U2 respectively.
each block, multiple synthesis scripts for individual blocks are produced. The synthesis is
usually performed bottom-up i.e., starting at the lowest level and going up to the top most Optimization Techniques
level. This method is useful for medium to very large designs and does not require large Various optimization techniques that help in achieving better area and speed for
amounts memory. your design are given below.

Advantages Compile the design


Design easier to manage due to individual scripts. The compilation process maps the HDL code to actual gates specified from the
Incremental changes to sub-blocks do not require complete re-synthesis. target library. This is done through the compile command. The syntax is given below
Can be used for any style of design, e.g. multiple and generated clocks.
Disadvantages compile –map_effort <low | medium | high>
Difficult to keep track of multiple scripts. -incremental_mapping
Critical paths seen at top level may not be critical at lower level. -in_place
Incremental compilations may be needed for fixing DRC’s. -no_design_rule | -only_design_rule
-scan
Compile-Characterize-Write-Script-Recompile
This is an advanced synthesis approach, useful for medium to very large designs The compile command by default uses the –map_effort medium option. This
that do not have good inter-block specifications defined. It requires constraints to be usually produces the best results for most of the designs. It also default settings for the
applied at the top level of the design, with each sub-block compiled beforehand. The sub- structuring and flattening attributes. The map_effort high should only be used, if target
blocks are then characterized using the top-level constraints. This in effect propagates the objectives are not met through default compile.
required timing information from the top-level to the sub-blocks. Performing a The -incremental_mapping is used only after initial compile as it works only at
write_script on the characterized sub-blocks generates the constraint file for each sub- gate-level. It is used to improve timing of the logic.
block. The constraint files are then used to re-compile each block of the design.
Flattening and structuring set_structure <true | false>
Flattening implies reducing the logic of a design to a 2-level AND/OR -design <list of designs>
representation. This approach is used to optimize the design by removing all intermediate -boolean <low | medium | high>
variables and parenthesis. This option is set to “false” by default. -timing <true | false>
The optimization is performed in two stages. The first stage involves the
flattening and structuring and the second stage involves mapping of the resulting design If the design is not timing critical and you want to minimize for area only, then set
to actual gates, using mapping optimization techniques. the area constraints (set_max_area 0) and perform Boolean optimization. For all other
case structure with respect to timing only.
Flattening
Flattening reduces the design logic in to a two level, sum-of-products of form, Removing hierarchy
with few logic levels between the input and output. This results in faster logic. It is Dc by default maintains the original hierarchy that is given in the RTL code. The
recommended for unstructured designs with random logic. The flattened design then can hierarchy is a logic boundary that prevents DC from optimizing across this boundary.
be structured before final mapping optimization to reduce area. This is important as Unnecessary hierarchy leads to cumbersome designs and synthesis scripts and also limits
flattening has significant impact on area of the design. the DC optimization within that boundary, without optimizing across hierarchy.
In general one should compile the design using default settings (flatten and To allow DC to optimize across hierarchy one can use the following commands.
structure are set as false). If timing objectives are not met flattening and structuring
should be employed. It the design is still failing goals then just flatten the design without dc_shell> current_design <design name>
structuring it. The command for flattening is given below
dc_shell> ungroup –flatten –all
set_flatten <true | false>
-design <list of designs> This allows the DC to optimize the logic separated by boundaries as one logic resulting in
-effort <low | medium | high> better timing and an optimal solution.
-phase <true | false>
Optimizing for Area
The –phase option if set to true enables the DC to compare the logic produced by DC by default tries to optimize for timing. Designs that are not timing critical but
inverting the equation versus the non-inverted form of the equation. area intensive can be optimized for area. This can be done by initially compiling the
design with specification of area requirements, but no timing constraints. In addition, by
Structuring using the don_touch attribute on the high-drive strength gates that are larger in size, used
The default setting for this is “true”. This method adds intermediate variables that by default to improve timing, one can eliminate them, thus reducing the area
can be factored out. This enables sharing of logic that in turn results in reduction of area. considerably.
For ex. Once the design is mapped to gates, the timing and area constraints should again
be specified (normal synthesis) and the design re-compiled incrementally. The
Before structuring After structuring incremental compile ensures that DC maintains the previous structure and does not bloat
P = ax + ay + c P = aI + c the logic unnecessarily.
Q=x+y+z Q=I+z
I =x+y The following points can be kept in mind for further area optimization:
The shared logic generated might effect the total delay of the logic. Thus one
should be careful enough to specify realistic timing constraints, in addition to using 1) Bind all combinational logic as much as possible. If combinational logic were spread
default settings. over different blocks of the design the optimization of the logic would not be perfect
Structuring can be set for timing(default) or Boolean optimization. The latter resulting in large areas. So better partitioning of the design with combinational logic
helps in reducing area, but has a greater impact on timing. Thus circuits that are timing not spread out among different blocks would result in better area.
sensitive should not be structured for Boolean optimization. Good examples for Boolean 2) At the top level avoid any kind of glue logic. It is better to incorporate glue logic in
optimization are random logic structures and finite state machines. The command for one of the sub-components thus letting the tool to optimize the logic better.
structuring is given below.
Timing issues large Tdelay across the combinational logic and thus is not valid/unstable during the
There are two kind of timing issues that are important in a design- setup and hold setup time period triggering a violation. The flip-flop needs a certain time to read the
timing violations. input. During this period the data must remain stable and unchanged and any change
would result in improper working of the device and thus a violation. In case of hold
Setup Time: It indicates the time before the clock edge during which the data should be timing violation the data arrives faster than usual because the Tdelay + Tprop is not
valid i.e. it should be stable during this period and should not change. Any change during enough to delay the data enough. The flip-flop needs some time to store the data, during
this period would trigger a setup timing violation. Figure 5(a) illustrates an example with which the data should remain stable. Any change during this period would result in a
setup time equal to 2 ns. This means that signal DATA must be valid 2 ns before the clock violation. The data changes faster without giving the flip-flop sufficient time to read it
edge; i.e. it should not change during this 2ns period before the clock edge. thus triggering a violation.
When the synthesis tool reports timing violations the designer needs to fix them. There
Hold Time: It indicates the time after the clock edge during which the data should be held are three options for the designer to fix these violations.
valid i.e. it should not change but remain stable. Any change during this period would
trigger a hold timing violation. Figure 5(b) illustrates an example with hold time equal to 1) Optimization using synthesis tool: this is the easiest of all the other options. Few of
1 ns. This means that signal DATA must be held valid 1 ns after the clock edge; i.e. it the techniques have been discussed in the section Optimization Techniques above.
should not change during the 1 ns period after the clock edge. Few other techniques will be dealt with later in this section.

2) Microarchitectural Tweaks: This is a manual approach compared to the previous one.


Here the designer should modify code to make microarchitectural changes that effect
the timing of the design. Some of these techniques were discussed in the section
clock clock
Optimization Techniques and few new ones would be dealt with in this section.

DATA invalid valid DATA valid invalid 3) Architectural changes: This is the last option as the designer needs to change the
whole architecture of the design under consideration and would take up a long time.

Setup time > 2 ns Hold time > 1 ns Optimization using synthesis tool
The tool can be used to tweak the design for improving performance. A designer
Figure 5(a) Timing diagram for Figure 5(b) Timing diagram for for performance optimization can employ the following ways.
setup time on DATA hold time on DATA
a) Compilation with a map_effort high option;
b) Group critical paths together and give them a weight factor;
The synthesis tool automatically runs its internal static timing analysis engine to check c) Register balancing;
for setup and hold time violations for the paths, that have timing constraints set on them. d) Choose a specific implementation for a module;
It mostly uses the following two equations to check for the violations. e) Balancing heavy loading.

Tprop + Tdelay < Tclock - Tsetup (1) Compilation with a map_effort high
Tdelay + Tprop > Thold (2) The initial compilation of a design is done with map_effort as medium when
employing design constraints. This usually gives the best results with flattening and
Here Tprop is the propagation delay from input clock to output of the device in question structuring options. In case the desired results are not met i.e. the design generates some
(mostly a flip-flop); Tdelay is the propagation delay across the combinational logic timing violations then the map_effort of high can be set. This usually takes a long time
through which the input arrives; Tsetup is the setup time requirement of the device; to run and thus is not used as the first option. This compilation could improve design
Tclock is clock period; Thold the hold time requirement of the device. performance by about 10%
So if the propagation delay across the combinational logic, Tdelay is such that the
equation (1) fails i.e. Tprop + Tdelay is more than Tclock – Tsetup then a setup timing Group critical paths and assign a weight factor
violation is reported. Similarly if Tdelay + Tprop is greater than Thold then a hold timing We can use the group_path command to group critical timing paths and set a
violation is reported. In the case of the setup violation the input data arrives late due to weight factor on these critical paths. The weight factor indicates the effort the tool needs
to spend to optimize these paths. Larger the weight factor the more the effort. This The following methods can be used for this purpose.
command allows the designer to prioritize the critical paths for optimization using the a) Logic duplication to generate independent paths
weight factor. b) Balancing of logic between flip-flops
group_path –name <group_name> -from <starting_point> -to <ending_point> -weight c) Priority decoding versus multiplex decoding
<value>
Logic duplication to generate independent paths
Register balancing
This command is particularly useful with designs that are pipelined. The Consider the figure 5(a). Assuming a critical path exists from A to Q2, logic optimization
command reshuffles the logic from one pipeline stage to another. This allows extra logic on combinational logic X, Y, and Z would be difficult because X is shared with Y and Z.
to be moved away from overly constrained pipeline stages to less constrained ones with We can duplicate the logic X as shown in figure 5(b). In this case Q1 and Q2 have
additional timing. The command is simply balance_registers. independent paths and the path for Q2 can be optimized in a better fashion by the tool to
ensure better performance.
Choose a specific implementation for a module
A synthesis tool infers high-level functional modules for operators like ‘+’, ‘-’,
‘*’, etc.. . however depending upon the map_effort option set, the design compiler would
choose the implementation for the functional module. For example the adder has the Q1
Y
following kinds of implementation. A
X
a) Ripple carry – rpl B
b) Carry look ahead –cla
c) Fast carry look ahead –clf
d) Simulation model –sim
C Z Q2

The implementation type sim is only for simulation. Implementation types rpl, cla,
and clf are for synthesis; clf is the faster implementation followed by cla; the slowest
being rpl. Figure 5(a). Logic with Q2 critical path.
If compilation of map_effort low is set the designer can manually set the
implementation using the set_implementation command. Otherwise the selection will
not change from current choice. If the map_effort is set to medium the design compiler A
would automatically choose the appropriate implementation depending upon the Q1
B X+Y
optimization algorithm. A choice of medium map_effort is suitable for better optimization
or even a manual setting can be used for better performance results.

Balancing heavy loading


Designs generally have certain nets with heavy fanout generating a heavy load on A
a certain point. A large load would be difficult to drive by a single net. This leads to B X+Z Q2
unnecessary delays and thus timing violations. The balance_buffers command comes in
hand to solve such problems. this command would make the design compiler to create C
buffer trees to drive the large fanout and thus balance the heavy load.
Figure 5(b). Logic duplication giving allowing Q2 an independent path.

Microarchitectural Tweaks Logic duplication can also be used in cases where a module has one signal
The design can be modified for both setup timing violations as well as hold timing arriving late compared to other signals. The logic can be duplicated in front of the fast -
violations. Lets deal with setup timing violations. arriving signals such that timing of all the signals is balanced. Figure 6(a)&(b) illustrate
When a design with setup violations cannot be fixed with tool optimizations the this fact quite well. The signal Q might generate a setup violation as it might be delayed
code or microarchitectural implementation changes should be employed. due to the late-arriving select signal of the multiplexer. The combinational logic present
at the output could be put in front of the inputs (fast arriving). This would cause the delay
due the combinational logic to be used appropriately to balance the timing of the inputs Priority encoding versus multiplex encoding
of the multiplexer and thus avoiding the setup violation for Q. When a designer knows for sure that a particular input signal is arriving late then
priority encoding would be a good bet. The signals arriving earlier could be given more
priority and thus can be encoded before the late arriving signals.
A Consider the boolean equation:
Q
C
Q = A.B.C.D.E.F
B

sel It can be designed using five and gates with A, B at the first gate. The output of
first gate is anded with C and output of the second gate with D and so on. This would
ensure proper performance if signal F is most late arriving and A is the earliest to arrive.
Figure 6(a). Multiplexer with late-arriving sel signal If propagation delay of each and gate were 1 ns this would ensure the output signal Q
would be valid only 5 ns after A is valid or only 1 ns after signal H is valid.

Multiplex decoding is useful if all the input signals arrive at the same time. This
A C would ensure that the output would be valid at a faster rate. Thus multiplex decoding is
Q faster than priority decoding if all input signals arrive at the same time. In this case for
the boolean equation above the each of the two inputs would be anded parallely in the
B C form of A.B, C.D and E.F each these outputs would then be anded again to get the final
output. This would ensure Q to be valid in about 2 ns after A is valid.
sel

Figure 6(a). Logic-duplication for balancing of timing between the signals. Fixing Hold time violations
Hold time violations occur when signals arrive to fast causing them to change
before they are read in by the devices. The best method to fix paths with hold time
Balancing of logic between flip-flops violations is to add buffers in those paths. The buffers generate additional delay slowing
This concept is similar to the balance_registers command we have come across the path considerably. One has to careful while fixing hold time violations. Too many
in the Tool optimization section. The difference is that the designer does this at the code- buffers would slow down the signal a lot and might result in setup violations which is a
level. To fix setup violations in designs using pipeline stages the logic between each stage problem again.
should be balanced. Consider a pipeline stage consisting of three flip-flops and two
combinational logic modules in between each flip-flop. If the delay of the first logic
module is such that it violates the setup time of the second flip-flop by a large margin and
the delay of the second logic module is so less that the data on the third flip-flop is
comfortably meeting the setup requirement. We can move part of the first logic module to
the second logic module so that the setup time requirement of both the flip-flops is met.
This would ensure better performance without any violations taking place. Figure 7
illustrates the example.

D Q X D Q Y D Q

clock

Figure 7. Logic with pipeline stages