Adaptive Dynamic Relaxation Algorithm For Non-Linear Hyperelastic Structures

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Computer methods

in applied
mechaniGs and
engineering
EI,SEVIER Comput. Methnds Appl. Mech. Engrg. 126 (1995) 91-1119

Adaptive Dynamic Relaxation algorithm for non-linear


hyperelastic structures
Part II. Single-processor implementation
David R. Oakley "'*'~, Norman F. K n i g h t , J r . b'2
"Applied Research Associates. hw., Raleigh, NC 27615, USA
hOld Dominion University, Norfolk, VA 23529-0247, USA

Received 21) December 1993

Abstract

An efficient single-processor implementation of the Adaptive Dynamic Relaxation (ADR) algorithm is devdoped. It is
designed to exploit data locality and can exploit vectorization of the finite element coreputations. The ADR algorithm is used to
solve for the non-linear static response of two- and three-dimensional hyperelastic systems involving frietionless contact.
Perf .r. :Trice is ~,omparedwith an existing finite element code whidl utilizes a direct solution method. ADR is found to be reliable
and highly vectorizablc, and it outperforms the direct solution method for the highly non-linear problemsconsidered. In addition,
it permits the use of a very simple and efficient contact algorithm. In contrast to direct solution methods, ADR has minimal
memory requirements and is easily parallelizable and scalable to more processors. For the class of problems addressed, it
represents a very prt)misingapproach for parallel-vector proce.,.,ing.

1. Introduction

Use of the finite element method to solve structural problems of increasing computational size and
complexity continues to be the focus of intense research. Yet even on current hig;h-speed vector
computers, solution costs, especially for transient dynamic analyses, are often prohibitive. Emerging
high-performance computers offer tremendous speedup potential for these type,',, of applications,
provided an optimal solution strategy is implemented. Existing sequential solution procedures may be
adapted to operate on these computers However, these procedures have beer~ developed and
customized for sequential operation and may not be the best approach for multiprocesr,or computers.
T o exploit this potential fully, problem formulations an0 solution strategies need to 0e re-evaluated in
light of their suitability for parallel and vector processing. As such, the overall goal of this research is to
develop an adaptive algorithm for predicting static and dynamic response of non-linear hyperelastic
structures which exploits these emerging high-performance computing systems.
The basic formulation for the adaptive dynamic relaxation ( A D R ) algorithm for hyperelastic
structures is given by Oakley and Knight [1]. Dynamic relaxation is a technique by which the static
solution is obtained by determining the steady-state response to the transient dynamic analysis for an
a u t o n o m o u s system. In this case, the t,ansient part of the solution is not of interest, only the

* Corresponding author.
t Senior Engineer. Former Dean's Scholar, Department of Mechanical Engineering, Clerason University, Clemson, SC.
2 Associate Professor. Department of Aerospace Engineering.

0045-7825/95/$09.50 ~) 1995 Elsevier Science S.A. All rights reservcd


S S D I b~,~45-7825(95)00806-3
92 D.R. Oakley. N.F. Knight. Jr. / Comput. Methods Appl. Mech. Engrg. 126 (IOt~q) 91-109

steady-state response is desired. Since the transient solution is not desired, fictitious mass .and damping
matrices which no longer represent the physical system are chosen to accelerate the determination of
the steady-state response. 'Fhcse matrices are rede~ned (using existing equations) so as to produce the
most rapid convergence. For highly non-linear problems where stiffness changes significantly during the
analysis, adaptive technique~ exist which automatically update the integration pararneters when
necessary 141.
An A D R algorithm represents a unified approach for ooth static and transient dynamic analyses, and
is known to be very competitive for certain problems with high non-linearities and instabilities [4-6].
Reliability is ensured by integration parameters which are adaptively chant, cd throughout the analysis
to accommodate these non-linear effects. Although a very small time step is generally required to
ensure numerical stability, the computational cost per time step is very low and is mostly associated
with evaluation of the internal force vector.
The present paper represents the second part of a three-part study to evaluate the potential of A D R
for solving complex structural problems on parallel-vector computers. The formulation of an A D R
algorithm with application to non-linear hyperelasfic structures is presented in [1] including a complete
derivation of the algorithm and the problem-adaptive scheme used ta ensure reliability and improve
performance. Finite element equations are derived for the non-linear analysis of elastic and hyperelastic
solids subject to large deformations. A very simple and efficient algorithm based on solver constlaints is
developed to enforce frictionless contact conditions. Only structured meshes of a single material are
considered. This allows a single time step to be used for the entire spatial domain. Otherwise. efficient
analysis could require the use of different time steps in different parts of the finite element mesh leading
to a mixed time integration or subcycling implementation of A D R (see [7]).
The objective of this paper is to develop and evaluate the performance of a single-processor
implementatic, n of ADR. A new organization of the finite elemeut computations is implemented to
exploit vector processing. Two- and three-dimensional test cases are used to assess the analysis and
performance capabilities of the algorithm. A D R performance for die non-linear static analysis of each
test case is compared with that of an existing finite element code which employs the Newton-Raphson
procedure and a highly optimized Cholesky skyline solver. Relative speedups due to vectorization arc
also presented.
The remainder of this paper is organized as follows. In Section 2, the basic formulation of A D R is
reviewed, and the sequential implementation is presented. In Section 3, details regarding the
vectorization approach are given. In Section 4, the test cases are described and then analysis and
perfoimance results are summarized and discussed. Conclusions are given in Section 5.

2. Sequential implementation

This section begins with a review of the basic formulation of the A D R algorithm given in [1].
Afterwards, a flowchart of the complete sequential solution process for single-processor computers is
presented. A brief description of a Convex C240 computer is also given.

2.1. A D R formulation

The A D R algorithm is based on the following semi-discrete equations of motion governing structural
dynamic response for the nth time increment

MD" + C D " + F ( D ' ) = P " (2.1)


where M is a diagonal mass matrix, C is a damping matrix, F is the intcrnal force vector, and P is a
vector of external loads. The vectors/~i,/~ and D represent the acceleration, velocity and displacement
vectors, respectively. Internal force F is a function of the displacements and may be assembled on an
element-by-element basis [1].
A D R involves the use of an explicit numerical time integration technique to solve Eq. (2.1). In the
D.R. Oakley. N.F Knight. Jr. / Comput. Meth,,ds Appl. Mech. Engrg. 126 (1995) 91-109 93

current algorithm, a half-station, central-difference technique is used which provides the following
approximations for the temporal derivatives
1
D .... .2 = -h (D .... - D") (2.2)

1
D" = ~ ( O . . . . . . . O ..... 2) (2.3)

where n is a fixed time increment and D" is averaged over a time step as
l .n
D"=~(D '""+O" ,,2) (2.4)

Substituting E q s (2.3) and (2.4) into Eq. (2.1) and assuming mass-proportional damping (C = cM)
yields the fundamental time-marching e.luations for advancing the velocity and displacement vectors to
the next time step. Thus,
f),,+~,2=/2-ch\ .,, ~,.,

D "+t = O " + h D "+~:'- (2.6)


Using an explicit time integration technique, the resultinz system of equations is linear, even for
non-linear problems. Also, if a diagonal mass matrix is used, the matrix inverse of M is a trivial
computation, and these equations represent an uncoupled system of algebraic equations in which each
solution component may be computed independently. For transient dynamic analysis, a time history of
displacements (system response) is sought. Mass and damping vectors which best model the physical
properties of the system are used. Techniques for estimating the maximum allowable time step size are
available, such that the time step size may change during the transient dynamic analysis. As such,
explicit time integration techniques are attractive candidates for implementation on high-performance
computers. These techniq,aes generally have low memory and communication requirements but are also
only conditionally stable numerically.
The objective of a static analysis using A D R is to obtain the steady-state solution of the pseudo-
transient response. Thus, in ADR each time step is in fact an iteration. The mass and damping
parameters need not represent the physical system. Instead, they are defined so as to produce the most
rapid convergence, where convergencc i:erein is based on a relative error of the force imbalance or

l i P - F"ll IIR"II
- - - ~< e,,,, (2.7)
= IIPll IIPll
where P is the static load and R" is the residual force vector for time step n. Note that when
convergence is obtained, the internal forces balance the external forces and the inertial terms vanish;
that is, the steady-state or static solution is obtained.
A derivation of the fictitious mass and damping equations for A D R is given in [1]. Based on
Gerschgorin's theorem, the new fictitious mass matrix is defined as
h~2 n h2
(2.8)

where S represents a lumped stiffness which may be computed on an element-by-element basis as

S = Z s, where (s,,L= Z ]koI,, (2.9)


e=l /=1

The quantity k 0 in this expression corresponds to the element stiffness matrix k, (see [1]). As shown in
Eq. (2.8), the fictitious mass M and time step size h are not independent. However, once either is
specified the other value may be readily computed. Herein, the time step size is arbitrarily set to one.
Numerical experiments with other values (e.g. 10, 100, 1000) confirm that this choice is arbitrary and no
94 D.R Oal~ley. N.F. Knight. Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109

change in algorithm performance or convergencc characteristics is observe.J. The new damping


coefficient c is computed using

c = 2\/~ w*~ere (D ..... -')Tg"D" - ' "~"


A(,~ (D,----_~/2).t.MD,,_~-T (2.10)

where A, rep_resents the lowest eigenvalue estimated from the mass-stiffness Rayleigh quotient shown.
The vector S" is a diagonal estimator of the directional stiffness after step n, the components of which
are given by

.q;'- ~ (2.11)

The new damping coefficient c is updated every time step since it only involves minimal computa-
tions. The new mass matrix must be evaluated at the beginning of the analysis, and in addition,
subsequent updates are sometimes necessary to maintain numerical stability. Stability is determined
using the perturbed apparent-frequency error measure [4] where values of e.,.greater than one represent
unstable conditions. This error measure is given by

h 2 ID; ' - / ~ ' , ' - ' l (2.12)


"- 4 iO;'-O;'-tl

2.2. S o l u t i o n process

A flowchart of the sequential implementation of A D R is shown in Fig. ! h t,egin~ with initialization


o f time s~ep (or iteration) counter n, and the two key flags istif and lend. The istif flag controls
evaluation of lumped stiffness matrix S and fictitious mass matrix M. After the first iteration, thesv
quantities are only updated as necessary to maintain numerical stability. The i e n d flag controls exits
from the solution process, it is activated when convergence is achieved, when numerical problems (such

I f~- e.,l~h element


Phase I .computefe {andsel
.coipute F {lindS] ]

lupdateM (2.8)}
phase 2 [computec (2.10) [
_ ."_=_ J i s

compute Du*t (2.6)-----]


adjust De÷! forco~u~ct I
Phase 3 evaluatestability(2.12) [
if (unstable)istif= I [

1
l computeforceimbalance
compute error me~sure ~ 2.7)

l,
Fig. 1. Sequential ADR algorithm. Numbers in parentheses refer to equation numbers in the text.
DR. Oakley. N.F. Knight. Jr. / Comput. Methods App). Mech. Engrg. 126 (1~'J5) 91-199 95

as collapsing elements) are detected, or when the number of time steps (or iterations) has reached the
user-specified limit.
As shown in Fig. l, a four-phase process is executed for each time step. Phase 1 consists of finite
element computations in which internal force F and possibly lumped stiffness S are evaluated on an
element-by-element basis. Note that items inside the curly brackets { . . . } may or may not be evaluated
on each pass through this phase. Phase 2 is the adaptive stage in which integration parameters are
updated. If necessary, S may be used to re-evaluate M based on Eq. (2.8). Scalar quantities/)X~/j and
D M D are computed and used to evaluate damping coefficient c in accordance with Eq. (2.10). Phase 3
is the solution step for the n + 1 step in which displacements for the next time step are computed from
Eqs. (2.5) and (2.6). Contact conditions are also enforced in this phase using the solver-constraints
technique described in Ill. If unstable conditions exist as determined from Eq. (2.12), istif is reset to 1.
Phase 4 evaluates the force imbalance and convergence. Scalar quantity IIR]I is computed and then the
error is determined using Eq. (2.7). The scalar quantity I[PII in Eo. (2.7) is constant and is determined
once at the beginning of the analysis.

2.3. C o n v e x C240

The sequential algorithm of Fig. 1 is implemented on several single-processor computers including


workstations and a single processor of a Convex C240 mini-supercomputer. The C240 is a four-
processor, shared-memory mini-supercomputcr. The system used in this study has 512 Mybtes of main
memory. Each processor has multiple vector arithmetic and logic functional unit~ that access main
memory directly through eight high-speed vector registors. The maximum computation rate is achieved
when both addition and multiplication vector functional units are operating simultaneously, producing
two results per clock period. The C240 has a clock speed of 25 MHz which implies a theoretical peak
vector speed of 50 MFLOPS per processor, resulting in a combined peak performance of 200 MFLOPS.

3. Vectorization

Most of the emerging high-performance computers allow for vector as well as parallel processing. The
exploitation of vector-processing capabilities is essential as it represents a better utilization of available
CPU resources. For the parallel execution of ADR, the sequential algorithm is essentially executed on
each processor for a subset of elements (see [2]). As such, any vectorization of the sequential code will
directly benefit parallel performance.
~':,,.c comi.~u~atr.:,~. ;~i ca~.i) A;)~,. ,)~;~ :,~c~?.,. ; , ,,., ~. .' :.~ : ..~r sta~ !'..~-,~.", ~. ::-~.:~ .dl of the
computation time is consumed in the hrst pnase where t~Je ,mernai force vector t ' and pos~iOi) Jumped
stiffness matrix S are evaluated using the finite element equations. Computations in the remaining
phases are minimal and are easily vectorizable since they involve only vector quantities. Thus,
vectorization efforts focused on the finite element computations in Phase I, To exploit vector processing
fully, a new organization of these computations is implemented.

3.1. Original algorithm

The organization of the original algorithm used to compute F and S is shown in Fig. 2(at. It consists
of two primary loops--an outer loop over the elements and an inner loop over the numerical
integration or Gauss points. Nodal displacements d e for a given element are first extracted from the
global displacement vector D. This process represents a global-to-local mapping. The mapping function
is a connectivity-based array which provides the global indice,~ fc~r local nodal quantities of a given
element. The inner loop is then executed to accomplish the numerical integration (via Gauss
quadrature) required for evaluating the element internal force vector f, and tangent stiffness matrix k,.
Full integration is used in this work which implies 2 × 2 or 4 Gauss points for the 2-D bilinear element
and 2 x 2 x 2 or 8 Gauss points for the 3-D trilinear element. Upon coJ,pletion of the inner loop over
the number of Gauss points, element lumped stiffness matrix s, is computed based on Eq. (2.9).
96 D.R, Oakley. N.F. Knighl, Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-!09

for each elem~t for each element I


map D to d, ]
co~ue j

for e..x:hGausspoint [
1
for each Oaos~ pc~t
up,ate f. and k. ]
I. con~ I for eachelement
updater.
continue

c.onliaue j e.~tmue
(a) Ori~a,,U Organization

1
I fore~hei . . . . ,1
computes,

(b) NewOrganization
Fig. 2. Computationof F and S.

Element internal force vector f, and lumped stiffness matrix s, are then mapped into the global internal
force F and lumped stiffness S, respectively. This is a local-to-global mapping used to implement the
assembly process for F and S. The same mapping function described previously based on element
connectivity is utilized.
For ~pp%ations with nested loops, vectorization may only be applied to the innermost loop. With the
~,,re,:~:~,~ organization, the inner loop is executed for each element. Thus, the number of vector
instructions would co~espond to the number of elements. However, the length of each vector
instruction is limited to the number of Gauss points which in this case is either 4 or 8. These conditions
are highly inefficient for vector processing and would probably degrade, rather than enhance,
performance.

3.2. N e w orga~tization

Organization of the internal force and lumped stiffness algorithm is modified to enhance the
vector-processing potently', The new organization is depicted in Fig. 2(b). The primary difference is
that the nested loop structure is reversed such ~hat the outer loop now occurs over the number of Gauss
points and inner loops occur over the number of elements. This organization allows the element loops
to b~ "~ectorizcd. For a given Gauss point, the element interual force vector and tangent stiffness matrix
are now evaluated and updated for every element. With this organization, there are two vector
instructions per Gauss point (one for the f, update loop and a second for the k, update loop), and
vector lengths equal the number of elements. As such, the associated vector processing is extremely
efficient, and substantial vector speedups are achievable.
As shown in Fig. 2(b), the computation of s~ and the two mapping processes are isolated and
vectorized separately over the number of elements. This results in three vector instructions having a
D.R. Oakley. N.F. Kni£ht. Jr. / Comput. Methods Appl. Liech. Engrg. 126 (1995) 91-1(19 97

length equal to the number of elements. The same advanta~es described earlier are therefore realized.
Other secondary changes are made to minimize computa,io:, time. A noteworthy example concerns the
computation of shape function quantitations. Duc to tbc total Lagrangian formulation, the shape
functions are computed with respect to the initial undefofmed coordinates and remain unchanged
during the analysis. To avoid redundant computation of these quantities at each time step, they are
calculated once at the beginning of the ;malysis and then stored.
The new organization does increase the memory requirements. Additional memory is needed to store
the shape function quantities, stress and constitutive tensc~rs, inten:al force vectors, and tangent
stiffness matrices of all elements. However. these increases do not substa~atially affect advantages ADR
has with respect to traditional direct solution schemes.

4. Numerical results

The performance of ADR has been evaluated using a single processor of a Convex C240 for several
different test cases. Non-linear static solutions have been generated using vectorized A D R and a highly
optimized Cholesky skyline solver driven by a Newton-Raphson procedure (subsequently referred to as
the direct solution method). In this section, numerical perforraance results are presented. The test cases
utilized are described first. Displacement and contact results for each solution method are then
compared to verify accuracy. The vector performance of ADR is discussed. Factors affecting the
sequential 19erformance of both solution methods are reviewed, and then performance results for each
are compared.

4.1, Test cases

Test cases have been created to demonstrate performance and accuracy of ADR with respect to the
direct solution method for non-linear static analysis. They are intended to represent some of the
problems which occur in tire modeling and analysis. As such, they are designed to include contact,
curved geometry, large deformations and non-linear hyperelastic materials. Models of different size and
bandwidth have been created to evaluate the influence of these effects on performance. In addition,
some of the models are designed to exhibit structural instabilities (corresponding to the loss of positive
definiteness) to further test the rc,bustness of each solution method.
Thirteen models have been developed for analyzing foul" cantilever beam problems and four curved
beam problems. Their fundamental specifications are summarized in Table 1. Associated coordinate
systems and geometric parameters are defined on subsequent figures.

Straight test cases


The straight test cases shown in Fig. 3 correspond to the 2-D plane stress and 3-D analysis of elastic
and hyperelastic cantilever beams subjected to tip loading and frictionless contact. Two discretizations
are considered for each problem---one with 1024 elements a,~d another with 8192. Fig. 4 shows the 8192
element model of the 3-D hyperelastic beam in its final 'steady-state' deformed configuration. While
these test cases include contact and a hyperelastic material, the structural response does not exhibit
strong non-linearities even though large deflections occur. As such, the ADR algorithm may not be
competitive with the standard static solution procedure.
The dimensions of each beam are given in Table 1. The first two represent the length L and width W
as defined in Fig. 3(a), and the third corresponds to thickness or height H in the Z direction as shown in
Fig. 3(b). The column labeled Mesh indicates the number of elements used along each coordinate
direction. The discretization is uniform such that al! elements in a given mesh are the same size. The
material used in each model is indicated, where E implies elastic and H implies hyperelastic. The elastic
material constants represent Young's modulus and Poisson's ratio (taken as 1000N/mm 2 and 0,
respectively), and the hyperelastic constants represent coefficients C~ and C 2 of the Mooney-Rivlin
material law (taken as 0,03427 and 0.00283, respective!y). The loads given in Table 1 correspond to the
98 D.R. Oakley, N.F. Knight. Jr, / Comput. Methods Appl. Mech. Engrg. 126 11995) 91-109

Table 1
Model specifications
Model Total Dimensions Mesh Material Load
elements (mm) (N)
2D beam 1024 t6ll x 20 128 x 8 E 51)
H 0.01
8192 1611 × 211 512 x 16 E 50
H 0.01

3D beam 1024 1611 x 20 x 20 64 x 4 x 4 E 1000


H 11.2
8192 1611 x 20 x 20 128 x 8 × 8 E I(XX)
H 0.2

3 D arch 128 9.5 x 1 x 1 32 x 2 x 2 H 0.(R)38


1024 9.5 x 1 x 1 64 × 4 x 4 H 0.(~)38
3D tunnel 8192 9.5 x 1 x 32 64 x 4 × 32 E 4950

3D torus 8192 180 x 120 × 120 32 × 16 x 16 E 851RX)


H 17

E = elastic, Hookeao mat'~riah H = hyperelastic.

(a) 2-D C,eomct;y m ~ e l a s i ~ : I. × W)

, ,I

y 6o mr:

~"'-- ,c = - o.~2sx÷ 4 s

(c) Comact Problem


Fig. 3. Straight test cases.

total vertical force uniformly distributed over the right end of each beam (the left end of each beam is
completely fixed).
For each model, an inclined contact surface is imposed representing a plane described by the
equation given in Fig. 3(c). An inclined surface is used so that more than one node will come into
contact. Contact is only enforced for those nodes on the bottom surface of each beam, which are
b e t w e e n 40 and 60 mm in the X direction.
D.R. Oakley, N.F. Knight, Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109 99

Fig. 4. Deformed configuration of the large 3-D cantilever beam test case.

f'urua,4 tpxt rnxe~


~'~. . . . . . . ::d ~:st ~.~:~:s rc'I~,rt:.~,.:,: ~ _ J ~), :re!2-..: c{re'*~v..: ,icia; • : • eta~t{ : ~31i:,,tr;c:~ "hick shell or
"tunnel,' and a 3-D elastic and hyperelastic torus subjected to a line load at the s u m m i t (see Fig. 5).
Similar to the straight test cases, two discretizations of the arch are considered consisting of 128 a n d
1024 8-node solid elements. Fig. 6 shows the 1024 element circular arch in its d e f o r m e d state. T h e
t u n n e l is shown in Fig. 7(a) and (b) in its u n d e f o r m e d and d e f o r m e d configurations, respectively. T h e
d e f o r m e d configuration of the torus model when loaded against a flat surface is s h o w n in Fig. 8. For
the~e test cases, structural instabilities and snap-through m a y occur. A s such, the A D R algorithm
s h o u l d be competitive with the standard static solution procedure.
T h e first two dimensions given in the lower half of Table 1 now represent inner radius R a n d radial
width W as defined in Fig. 5. T h e third correspond~ to thickness in the Z direction. T h e m e s h
specifications refer to the n u m b e r of e l e m e n t s in the circumferential, radial and Z directions. Y o u n g ' s
m o d u l u s is taken to be 2 0 7 0 0 N / r a m 2 and 2 0 7 N / m m 2 for the 3-D elastic tunnel a n d torus models,
respectively. Poisson's ratio is taken to be 0.292 for both of these models. A s before, the hyperelastic
c o n s t a n t s Ct and C 2 are taken to be 0.03427 and 0.00283, respectively. T h e loads s h o w n represent the
total vertical force imposed for each model. D u e to s y m m e t r y , only half of the arch and tunnel are
actually modeled. A s such, the loads given in Table 1 represent the total load applied to the finite
e l e m e n t model. T h e base of each model is completely fixed, and all nodes in the Y Z plane are
restrained in the X direction. For the torus models, the contact surface is defined to be a horizo:ttal fiat
plane tangent to the base and initially in contact with just a single line of nodes.
100 D.I~,. Oakley, N.F. Knight. Jr. / ('omput. Methods Appl. Mech. Engrg. 126 (1995) 91-109

(a) 3-D Ci~calarArch

(b) 3-D CylindricalThickShellor -furmel

(e) 3-D CircularToru.~with Rec~ngularCross-Sectio:r


Fig. 5. Curved test cases.

A d d i t i o n a l J'eatures
Performance related attributes of each model are given in Table 2. T h e total n u m b e r of e l c m e n s, the
total n u m b e r of equations or degrees of freedom, and the average total bandwidth of the t~ngent
stiffness matrix are given. T h e n u m b e r of blocks refers to the direct solution m e t h o d . If the size of the
tangent stiffness matrix K exceeds available local m e m o r y , an in-core solution is not possible. Wh~ n this
occurs, K is divided into blocks and written to a secondary-storage device. Each block is then rea,:l into
main mcr~ory and factored separately. T h e n u m b e r of blocks required is a direct indication of tl-:e size
of K. Cases where one block is specified correspond to an in-core solution. In the present context,
structurai instabilities refer to conditions during the analysis (such as s n a p - t h r o u g h buckling) which
cause the lowest eigenvalue to become negative and positive definiteness of the tangent stiffness natrix
to be lost. As shown in Table 2, instabilities are exhibited by the curved test case problems.

4.2. A c c u r a c y

Non-linear static analysis results obtained using the two solution m e t h o d s ( A D R algorithm and direct
solution m e t h o d ) are s u m m a r i z e d in "Fable 3. T h e Y values are m a x i m u m vertical displacements for the
free e n d of the b e a m and for the s u m m i t of the arch, tunnel and torus. T h e X values are the
corresponding horizontal displacements for the free end of the beam. Large d e f o r m a t i o n s arc present in
each case as indicated by the magnitude of the displacements. G,,~od a g r e e m e n t is a c h i e v e d - - t h e
average difference is less than one percent in magnitude.
D.R. Oakh'y. N.F. Knight. Jr. / ('omput. Method.~ Appl. Merit. f-ngrg. 12(, (",~5) 91-109 llll

Fig. 6. Deformed eonliguration of the 3-D circular arch test ease.

T h e discrepancies are most likely the result of usin 5 different convergence criterion in each solution
m e t h o d . In tile direct soiation procedure, the convergence criterion is based on the relative change in
the incrementai displacements c o m p a r e d with the lotal displuccmcnt estimate, or

IIaD~ II
" = IlU~+,fl (4.1)

w h e r e k corresponds to ~,." kth N e w t o n - R a p h s o n itcr~'Aion. A simil~,r convergence criterion was


initially attempted with the A I ) R algorithm using

Ho. . . . . o"11
Iio .... II ~4.2)
where n d e n o t e s the n t h time s t e p However. in the form stlown, it proved to be ineffective since the
change in displacements betweeia consecutive time steps is gen~erzdly vcrv snlall throughout the entire
analysis. A D R is more a m e n a b l e to a convergence c: .crion based on the residual R)rce imbalance error
indicator defined by Eq. (2.7).
T h e n u m b e r of sectors of the finite clement model in contact predicted I~y each solution m e t h o d is
p r e s e n t e d in Table 4. where a sector implies one node for 2-D models and a row of nodes for 3-D
models. For the b e a m models, contact only occurred at nodes near tlJc left and right (40 and 6(1 ram)
edges or boundaries of the beam contact region. Contact occurred at the h,~itom row o r sector of nodes
D.R. Oakley, N.F. Knight. Jr. / Comput. Me~hods Appl. Mech. Engrg. 126 (1995) 91-109

(a)

(b)

Fig. 7. (ill Undeformc'd configuration of the 3-D tunnel test case; (b) deformed configuration of the 3-D tunnel test case.

for the 3-D torus models. For each sector in contact, both solution m e t h o d s predicted vcrfical
displacements corresponding to the exact value at the b o u n d a r y (to four decimal places). As indicated,
each solution m e t h o d predicts essentially the same n u m b e r of sectors to be in contact. In a few cases,
the n u m b e r of sectors differs by one, which is likely due to differences in the contact formulation and
convergence criteria between the two solution methods.
D.R. Oakley, N.F. Knight, Jr. Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109 1113

Fig. 8. Deformed contiguration of the 3-D torus test case.

Table 2
Model features
Model Total Total Avcr~lge Toted Contac! Ir;st~lbilJty
elements dof bandwidth blocks
2D beam 11124 23114 21 I Y
8192 i 74(JF, 37 1 Y
3D beam 11124 481111 911 1 Y
8192 311(14 272 ~ Y
3D arch !28 855 35 I
111.'.4 .~,775 89 I
3D tunnel ~':192
, 31515 5111 14
3D torus gl92 27744 1594 3~ Y

4.3. Vector performance

V e c t o r p e r f o r m a n c e is e v a l u a t e d b y d e t e r m i n i n g t h e C P U t i m e r e q u i r e d to c o m p l e t e t h e th-st five t i m e
s t e p s f o r e a c h m o d e l u s i n g t h e b a s e l i n e a n d v e c t o r i z e d v e r s k ~ n s o f tile A D R i m p l e m e n t a t i o n . R e l a t i v e
104 D.R. Oakley. N.F. Knight, Jr. / Cornput. Methods Appl. Mech. Engrg. 126 (1995) 91-109

Table 3
Displacement results
Model Total Material ADR Direct Dirt ADR Direct Diff
elements Y (ram) Y (ram) (%) X (ram) X (ram) (%)
2D beam 1(124 E 44.69 44.82 -(I.3 15.56 15.65 -0.6
H 41.88 41.90 -0.0 13.52 13.52 -0.0
8192 E 44.72 44.88 -0.4 15.56 15.7(I -0.9
H 42. I0 4I .98 0.3 13.64 13.59 (1.4
3D beam 1024 E 44.37 43.67 1.6 15.38 15.13 1.3
H 41.71 40.76 2.4 13.44 13.06 2.9
8192 E 44.70 -~L44 0.6 15.56 15.50 0.4
H - 41.71 - - 13.53
3D arch 128 H 18.54 18.49 0.3 .-
1024 H 18.83 18.70 0.7 -
3D tunnel 8192 E 16.85 17.19 -2.0 -
3D torus 8192 E 80.22 81.25 - 1.3 -
H 13.04 - -

Table 4
Number of sectors in contac!
Model T(~I Material Left end Left end Right end Right end
citroen,is ADR Direct ADR Direct
2D beam 1(124 E 1 1 2 2
H I I 2 2
8192 E 3 4 9 9
H 3 4 7 8
3D beam 11124 E ~1 ~) 1 1
H I 1
8192 E 1 1 2 3
3D torus 8192 E I 1 - -
H I I - -

s p e e d u p s are c o m p u t e d a n d are p r e s e n t e d in T a b l e 5. T h e y r a n g e from 3.89 to 7.35 w i t h a n a v e r a g e o f


5.17. S o m e g e n e r a l t r e n d s are o b s e r v e d a n d are p r e s e n t e d next.
H i g h e r s p c c d t : p s o c c u r for the elastic v e r s i o n of e a c h m o d e l b e c a u s e the h y p e r e l a s t i c stiffness
c a l c u l a t i o n s are m o r e c o m p l e x t h a n those for elastic m a t e r i a l s . H y p e r e l a s t i c m o d e l s i n v o l v e m a n y m o r e
v a r i a b l e s a n d m e m o r y r e f e r e n c e s a n d a l e t h e r e f o r e m o r e difficult to v e c t o r i z e efficiently. D a t a l o c a l i t y
i s s u e s b e c o m e critical in t h e i r i m p l e m e n t a t i o n . T h e 2-D b e a m m o d e l s r e a l i z e b e t t e r s p e e d u p s t h a n t h e i r
3-D c o u n t e r p a r t s . F o r 3-D m o d e l s , a l a r g e r p e r c e n t a g e of the t i m e is s p e n t in the stiffness c o m p u t a t i o n
p o r t i o n of the c o d e . S p e e d u p suffers since the v e c t o r i z a t i o n of t h e s e c o m p u t a t i o n s is less t h a n t h a t for
i n t e r n a l force c o m p u t a t i o n . O v e r a l l , the v e c t o r p e r f o r m a n c e of the c o d e is v e r y g o o d a n d i n d i c a t e s t h a t
a p p r o x i m a t e l y 8() p e r c e n t of the c o m p u t a t i o n s are v e c t o r i z a b l c b a s e d o n A m d a h l ' s law [3].

4.4. Sequential perJormance factors

M a n y factors affect the s e q u e n t i a l p e r f o r m a n c e of A D R a n d the d i r e c t s o l u t i o n m e t h o d . A r e v i e w of


the p r i m a r y factors is g i v e n in the f o l l o w i n g discussion.

Direct sohttion method performance


P e r f o r m a n c e of the d i r e c t s o l u t i o n m e t h o d is p r i m a r i l y a f u n c t i o n of the n u m b e r of e l e m e n t s , the
b a n d w i d t h of K, a n d the e x t e n t of n o n - l i n e a r i t i e s a n d instabilities. A n i n c r e a s e in the n u m b e r of
e l e m e n t s results in a p r o p o r t i o n a l i n c r e a s e in c o m p u t a t i o n s r e q u i r e d for the e q u a t i o n a s s e m b l y p r o c e s s .
It also i n c r e a s e s c o m p u t a t i o n s in the s o l u t i o n p r o c e s s , w h e r e the e x t e n t of this d e p e n d s o n t h e
D.R. Oakley, N.F. Knight. Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109

Table 5
ADR vector performance
Model Total Material Vector
elements speedup
2D beam 11124 E 7.35
H 5.65
8192 E 6.88
H 5.82
3D beam 1024 E 6.20
H 3.99
8192 E 5.21
H 3.89
3D arch 128 H 3.94
1024 H 4.(19

3D tunnel 8192 E 4.99


3D torus 8192 E 4.89
H 4.25

associated bandwidth. A large bandwidth has two effects. It significantly increases the number of
computations required to factor and solve the linearized system of equations. In addition, it increases
the size (in terms of non-zero entries) of the resulting matrix K. That is, the matrix may no longer fit
into main memory. When this occurs, an in-core solution is prevented, and memory paging to
secondary-storage devices is required in which one block of the matrix is read into main memory at a
time. Factorization is then performed on a block-by-block basis. The associated paging or disk accesses
are costly in terms of performance.
The number of N e w t o n - R a p h s o n iterations directly affects performance of the direct solution
method. As shown in Table 7, many iterations may be required when hyperelastic materials, non-linear
contact and incompressibility constraints, large deforma~.ions, and structural instabilities are present as
in the applications considered here.

A D R performance
A D R performance is a function of the number of time steps (or iterations) required for convergence
and the computational effort required for each step or iteration. The number of time steps is d e p e n d e n t
on the size of the time step and on the maximum displacement exhibited by the given model.
Time step size is sens!tive to the stiffness characteristics of the model. It is inversely proportional to
the maximum frequency of the discretized model, and therefore directly proportional to the minimum
distance Lm,. between two nodes. Thus, a decrease in element size results in a proportional decrease in
time step size. Time step size is also sensitive to the penalty method formulations as these artificially
inflate system stiffness characteristics. The formulation for enforcing incompressibility in hyperelastic
models leads to reductions in time step size by an order of mag, iitude. This problem is reduced by
delaying the enforcement of incompressibility until the torce imbalance error given by Eq. (2.7) is less
than one percent or shortly before final convergence. In this way, an initial configuration representing a
first approximation is achieved relatively quickly using large time steps. Then the incompressibility
condition is imposed such that the displacements are fine-tuned to the correct confignration using small
time steps. This approach is successfully implemented and does result in considerable savings in C P U
time.
U n d e r normal convergence conditions (i.e. no instabilities or delayed incompressibility enforcement)
the number of steps for convergence is related to the ratio of time step size (which is proportional to
Lmin) and the maximum displacement Ymax" The relative number of iterations between two different
models of the same material can be estimated by comparing the Ym~,x/Lmin ratios of each. This is
illustrated using Table 6, which shows Ymax/Lmin ratios and the corresponding number of time steps for
five different elastic models. Models with the smallest Ym~x/L,,,, should achieve the best performance.
106 D.R. Oakley, N.F. Knight, Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109

Table 6
Number of time steps versus Ym.,JL,,,,,, ratio
Model Total Material Ym.,, L~,. Y,,.,,/L ...... Time Y,,,JL~ .... Time
elements (mm) (mm) steps steps*
2Dbeam 1024 E 45 1.25 36 6799 1.0 1.0
8192 E 45 0.3125 144 31526 4.11 4.6
3Dbeam 11124 E 44 2.5 18 3133 0.5 11.5
8192 E 45 1.25 36 6854 1.0 1.0
3Dtorus 8192 E 81) 7.5 11 1865 11.3 0.3
* Normalized with respect to the first model.

The computational effort per time step in the A D R algorithm is a function of the n u m b e r of
elements. An increase in the n u m b e r of elements results in a proportional increase in computations,
independent of the mesh bandwidth. Computational load is also affected by the adaptive needs of each
time step. The fictitious mass M must sometimes be updated to maintain numerical stability. This
process is eomputationally expensive (as it involves the evaluation of element stiffness matrices) and
requires approximately the same computational effort as required for five steps. For the test cases
considered here, mass updates during the course of the analysis are unnecessary. The fictitious mass is
evaluated once at the start of the analysis, and in the case of hyperelastic models, re-evaluated a second
time at the start of incompressibility enforcement.
Relative to the direct solution method, A D R performance is generally much less sensitive to strong
non-linearities and structural instabilities unless these substantially increase the maximum frequency of
the discretized structure. Numerical stability during the solution process is automatically maintained by
the adaptive features of A D R . Application of A D R to problems with structural instabilities associated
with snap-through buckling where ",he lowest eigenvalues may become negative can result in more rapid
convergence than the diiect solution method. Under these conditions, the A D R damping ,.',Jefficient is
set to zero, since it is proportional to the square root of the lowest eigenvalue. This results in a fasier
rise time for the pseudo-transient response and improves the rate of convergence.

4.5. Sequential performance results

The solution times for each test case are given in Table 7. The ratios of these solution times, also
tabulated, provide an indication of the relative performance of A D R with respect to the conventional
direct solution method on a seque:ltial machine. Values less than one imply that the A D R solution is

Table 7
Sequential performance
Model Total Material NR NR CPU ADR ADR CPU Time
elements iterations (rains) steps (rains) ratio
2D beam 1024 E 8 1 6799 6 5.13
H 8 2 71S5 8 5.25
8192 E 8 11 31526 237 22.40
H 8 14 33878 285 20.~7
3D beam 1024 E 8 5 3133 28 5.2~
H 125 116 32169 3311 2.84
8192 E 8 711 6854 463 6.60
H 126 1376
3D arch 128 H 391 42 14409 19 0.45
1024 I-I 397 368 21844 246 0.67
3D tunnel 8192 E 24 41)4 6097 432 1.07
3D torus 8192 E 7 612 1865 123 0.211
H 13838 1297
D.R. Oakley, N.F. Knight. Jr. / Comput. Methods Appl. Mech. Engrg. 126 (1995) 91-109 107

faster than the direct solution method. Explanations regarding the performance of each problem are
summarized in the following discussion.

2-D beam
Performance for the elastic and hyperelastic versions of each model is similar because they are plane
stress analyses in which the incompressibility effect is compensated for in the forml:la*,ion [8]. As such,
the hyperelastic models are not subjected to any penalties, and the associated red,Jctions in time step
size do not occur. Performance for the small 2-D models is poor compared to ~hc direct solution
method primarily due to the fact that only a few Newton-Raphson iterations are required, and an
in-core solution is possible due to the small problem size and small bandwidth. For the larger
companion 2-D models, both solution schemes exhibit an 8-fold (factor of 8) increase in CPU time per
iteration. However, for ADR, the 4-fold reduction in element width results in a 4-fold increase in the
number of time steps since the maximum frequency increases Thus, ADR performed much worse than
the direct solution method.

3-D beam
Performance for the small 3-D elastic motiel is sim I:'x !~ the corresponding 2-D model since the
}"~/Lm,~. ratios are essentially thz same in both case.,,. Also, the same favorable conditions mentioned
earlier for the direct solution method are sttll present (i.e. small bandwidth, in-core storage, few
Newton-Raphson iterations). ADR is more competitive for the hy0erelastic version of this model.
Incompressibility enforcement is delayed as described earlier. The direct solution method experiences
convergence difficulties and requires i5 times as many Newton-Raphson iterations as the elastic
counterpart.
In going from the small 3-D to the large 3-D model, performance drops some, but not nearly as much
as it does for the corresponding 2-D models. There are two reasons for this. First, for the 3-D model,
an element width is reduced by a factor of 2 rather than 4, and the number of time steps is only
doubled. Secondly, ~he direct solution method begins to suffer from memory limitations. An in-core
solution is no longer possible and some paging is required. As a result, the CPU time per iteration for
the direct solution method increases by a factor of 12 relative to the small 3-D model, as opposed to the
factor of 8 increase exhibited by the large 2-D versions. Due to limited computer resources, the ADR
algorithm was not applied to the large 3-D hyperelastic beam problem. It is anticipated that
performance similar to the small 3-D beam results would be obtained.

3-D arch
A D R outperforrrr the direct solution method for these test cases. Even though it requires a large
number of time steps to accommodate incompressibility, the time step size reductions needed to
preserve numerical stability do not penalize its rate of convergence. ADR performance is less for the
1024 element 2-D model since the element width (and therefore time step size) is reduced by a factor of
two resulting in a two-fold increase in the number of steps or iterations. The direct solution method
benefits from an in-core solution; however, it requires a large number of Newton-Raphson iterations to
converge due to the structural instabilities and incompressibility constraints.

3-D tunnel
Performance of ADR and the direct solution method for this test case are essentially equivalent.
ADR requires fewer time steps to converge than for the small 3-D elastic beam, even though it should
need approximately twice as many steps based on its Ym,x/L,,n ratio. The rapid convergence is likely
due to reductions in the damping coefficient which occur during structural instabilities as described
earlier. Structural instability and increased bandwidth have a negative affect on performance of the
direct solution method. With respect to the large 3-D elastic beam~ the number of Newton-Raphson
iterations triples due to the structural instability or snap-through behavior, and the CPU time per
iteration doubles due to the increased bandwidth.
108 D.R. Oakley. N.F. Knight. Jr. / Compllt. Method3 Appl. Mech. Eng~g. 126 (1905) 91-I09

3 - D torus
ADR outperforms the d~rect solution method for both the elastic and hypcrelastic versions of this
model. L'onsidering the elastic version first, the number of time steps required for convergence is less
than one third the number of time steps required for the large 3-D elastic beam. This is consistent with
the low Ym+,,.Lmm ratio of the torus. Performance of the direct solution method is heavily penalized by
the large ba,~dwidth. In comparison with the large 3-D elastic beam, the CPU time per iteration
increases by an order of magnitude. For the hyperelastic model, incompressibility enforcement leads to
a nine-fold increase in the number of time steps for ADR. Resources did not permit complctioo of this
analysis with the direct solution method. After 54 CPU hours on a Convex C240, the direct solution
method had ~; l! not converged, and the job was terminated.

5. Conclusions

The overall objective of this research is to develop efficient sequential and parallel implementations
of the ADR algorithm and evaluate their performance for the static analysis of non-linear, hypere!astic
systems involving frictionlcss contact. For problems of this nature, the ADR method may represent one
of the best approaches for parallel processing which is the ultimate interest. In contrast to direct
solution methods, it has minimal memory requirements, is easily parall~lizable, and is scalable to more
processors. It also avoids the ill-conditioning related convergence problems of other iterative methods
for non-linear problems. This paper represents the second of a three-part series and is focused on the
sequential implementation of ADR.
Performance of a sequential implementation of ADR is evaluated on a Convex C240 computer. The
atgorithm is developed from the formulations outlined in [!1. A new organization of the finite element
computations is implemented to exploit vector processing. ADR is used to obtain the non-linear static
solution of 2-1:) and 3-D cantilever beam problems and 3-D arch, tunnel and torus problems.
Perform:race is compared with that of an existing finite element code which utilizes a direct solution
method.
ADR is found to be very reliable--it successfully converged in all cases despite the presence of strong
non-linearities and structural instabilities. Contact legions and specific displacement variable restllts
agree well with those obtained using the direct solution method. The contact algorithm used with ADR
is found to bc ,,cry simple and efficient. It involves only trivial computations and is therefore well suited
for parallel processing+ The minimal memory rcquircments of ADR are demonstrated, as all of its
solutions are performed in-core. In contrast, the direct solution method had to resort to out-of-core
solutions using block factorization for thc larger problems,
The vector performance of ADR is very good--speedups on the order of five are achieved,
Vectorization does increase memory requirements, but not enough to substantially reduce the memory
advantages ADR ha~ with respect t+~ ~lirec-! snlutinn schemes. The high vcctorization potential of ADR
will further augment its performance on parallel-vector computers.
Pelformance of the direct method is primarily influenced by bandwiJth and the degree of non-
linearitics and structural instabilities present, in contrast, ADR performance is found to be scnsi',ive to
the maximum displacement (which affects the number of time steps) and clement size (whic~ governs
time step size). Reductions in elcment size due to mesh refinement lead to proportional decreases in
algorithm performance. Models with the smallest ratio of displacement to element size tend t,~ achieve
the best performancc. Penalty mcthod formulations are found to reduce ADR performance as they lead
to the use of smaller time steps. In fact, enforcing contact using a penalty method is not recommended
for this reason. The impact of incompressibility penalties is reduced by delaying its enforcement until
shortly before convergence.
In comparison with the direct solution method, ADR performs bcst for models exhibiting high
bandwidth, strong non-linearitics and structural instabilities, and a low ratio of displacement to element
size. It is found to be especially attractive for mo~lels having large bandwidths. ADR outperforms thc
direct solution method for the 3-D arch and torus test cases and achieves equivalent performance for
D . R t)akley. N,F. Knight..Ir. / ('omput, Method.~ Appl. Mech. /:ngrg. 12O (1995) 91-I119 1119

the tunnel problem. It p e r f o r m s much worse for the 2-D and 3-D b e a m test cases, as these p r o b l e m s are
characterized by relatively low b a n d w i d t h s ~lnd stable soluti(~ns.
Exceptional sequential p e r f o r m a n c e relative tc, the direct sc~lution mctLlod was not expected in this
study. A D R is considered primarily because of its potential for parallel processing and its reliability and
c o m p e t i t i v e n e s s for the class of pr,~blcms addrcsscd. The par~dlcl- and vector-processing potential of
A D R may enable s p c e d u p s to be achieved which are one or cvcn twit t)rdcrs of m a g n i t u d e higher than
t h o s e possible with the direct solution mcthod. Accordingly, the scqucnli:d p e r f o r m a n c e achieved here
fl)r the 3-D p r o b l e m s (on the (}rdcr of 5 times slower ,it worss and 5 times faster at best) is impressive
and w a r r a n t s further research on parallel c o m p u t i n g systems.

Acknowledgment

T h e a u t h o r s gratefully acknowledge the ,~upport provided by N A S A Langley G r a n t N A G - l - 1 5 0 5 and


access to the C o n v e x C240 m i n i - s u p e r c o m p u t c r . Mr. Ronnie E. Gillian is the technical m o n i t o r . In
addition, the first a u t h o r gratefully acknowledges the s u p p o r t providcd by the D e a n ' s Scholar P r o g r a m
at C l e m s o n University.

References

II] D. Oakley and N. Knight, Adaptive Dynamic Relaxation algorithm [~r nonlinear hyperelastic structures--Part I. Formula-
tion, Comput. Methods Appl. Mech. Engrg. 126 (1995) 67
[2] D. Oakley, N. Knight and D. Warner. Adaptive Dynamic Relaxation algorith~i~ for nonlinear hyperelastic structures----Part
Ill. Parallel implementation, Comput. Methods Appl, '.,!~ch Engrg. 126 (1995) I l 1.
[3] r2. Amdahl, Validity of the ~i,':,gtc ~rocc:,~r ~:F~ro~ch :~; achieving large-scale COml)uling capabilities, AFIPS Conference
Proceedings, AFIPS Prc~s, .'-.~.,,ntvale. NJ 30 11967) ~,v,3 485.
I41 e. Underwood, Dynatnic relaxation, in: T. Belytschko and T.J.R. Hughes, cds.. Computational Methods for Transient
Dynamic Analysis (North Holland, Amsterdam, 1983) 246-265.
[5] M. Papadrakakis, Post-buckling analysis of spatial structures by vector iteration methods, C~mput. Slruct. 14 (1981) 393-402.
161 M. Papadrakakis. A family of methods with three-term recursion forniuiae, Int. ]. Numer Methods Engrg. 18 11982)
1785--1799.
17] T. Belytschko and N. Gilbertsen. Concurrent and vectc~rized mixed time, explicit nonlinear structural dynamics algorithms,
in: A.K. Noor, ed., Parallel Computations and Their Impact on Mechanics (American Society of Mechanical Engineers. New
York, 1987) 279-29~).
[8] B. Haggblad and J. Sundberg, Large strain solutions of rubber ccrg~p~m,r,ts. C~mput. Struct. 17 11983) 835-843.

You might also like