APT Config
APT Config
APT Config
4. Know what data points are striped (RAID) and which are not. Where possible,
avoid striping across data points that are already striped at the spindle level.
The Datastage environment variables are grouped and each variable falls into one of categories.
Basically the default values set up during an installation are resonable and in most cases there is
no need to modify them.
Setting environment variables for parallel execution in Datastage Administrator
The operator specific variables under parallel properties are stage specific settings and usually
set during an installation. The settings apply to the supported parallel database engines (DB2,
Oracle, Sas and Teradata).
APT_DBNAME - default DB2 database name to use
APT_RDBMS_COMMIT_ROWS - RDBMS commit interval
Reporting
The reporting variables control logging options and take True/False values only.
APT_DUMP_SCORE - shows operators, datasets, nodes, partitions,
combinations and processes used in a job.
APT_RECORD_COUNTS - helps detect and analyze load imbalance. It prints the
number of records consumed by getRecord() and produced by putRecord()
OSH_PRINT_SCHEMAS - shows unformatted metadata for all stages (interface
schema) and datasets (record schema). OSH_PRINT_SCHEMAS environment variable
should be set to verify that runtime schemas match the job design column
definitions (especially from Oracle).
OSH_DUMP - shows an OSH script and produces a verbose description of a step
before executing it
APT_NO_JOBMON - disables performance statistics and process metadata
reporting in Designer.
Compiler
APT_COMPILER - path to the C++ compiler needed to compile transformer
stages
The Datastage EE configuration file is a master control file (a textfile which sits on the server
side) for Enterprise Edition jobs which describes the parallel system resources and
architecture. The configuration file provides hardware configuration for supporting such
architectures as SMP (Single machine with multiple CPU , shared memory and disk), Grid ,
Cluster or MPP (multiple CPU, mulitple nodes and dedicated memory per node).
The configuration file defines all processing and storage resources and can be edited with any
text editor or within Datastage Manager.
The main outcome from having the configuration file is to separate software and hardware
configuration from job design. It allows changing hardware and software resources without
changing a job design. Datastage EE jobs can point to different configuration files by using job
parameters, which means that a job can utilize different hardware architectures without
being recompiled.
The Datastage EE configuration file is specified at runtime by a $APT_CONFIG_FILE
variable.
Configuration file structure
Datastage EE configuration file defines number of nodes, assigns resources to each node and
provides advanced resource optimizations and configuration.
The configuration file structure and key instructions:
pool - defines resource allocation. Pools can overlap accross nodes or can be
independent.
A basic configuration file for a single machine, two node server (2-CPU) is shown below. The
file defines 2 nodes (dev1 and dev2) on a single etltools-dev server (IP address might be
provided as well instead of a hostname) with 3 disk resources (d1 , d2 for the data and temp as
scratch space).
The configuration file is shown below:
{
node "dev1"
fastname
pool ""
resource
resource
resource
"etltools-dev"
disk "/data/etltools-tutorial/d1" { }
disk "/data/etltools-tutorial/d2" { }
scratchdisk "/data/etltools-tutorial/temp" { }
node "dev2"
{
fastname "etltools-dev"
pool ""
resource disk "/data/etltools-tutorial/d1" { }
resource scratchdisk "/data/etltools-tutorial/temp" { }
}
The sample configuration file for a cluster or a grid computing on 4 machines is shown below.
The configuration defines 4 nodes (etltools-prod[1-4]), node pools (n[1-4]) and s[1-4), resource
pools bigdata and sort and a temporary space.
{
node "prod1"
{
fastname "etltools-prod1"
pool "" "n1" "s1""tutorial2" "sort"
resource disk "/data/prod1/d1" {}
resource disk "/data/prod1/d2" {"bigdata"}
resource scratchdisk "/etltools-tutorial/temp" {"sort"}
}
node "prod2"
{
fastname "etltools-prod2"
pool "" "n2" "s2""tutorial1"
resource disk "/data/prod2/d1" {}
resource disk "/data/prod2/d2" {"bigdata"}
resource scratchdisk "/etltools-tutorial/temp" {}
}
node "prod3"
{
fastname "etltools-prod3"
pool "" "n3" "s3""tutorial1"
resource disk "/data/prod3/d1" {}
resource scratchdisk "/etltools-tutorial/temp" {}
}
node "prod4"
{
fastname "etltools-prod4"
pool "n4" "s4""tutorial1"
resource disk "/data/prod4/d1" {}
resource scratchdisk "/etltools-tutorial/temp" {}
The easiest way to validate the configuration file is to export APT_CONFIG_FILE variable
pointing to the newly created configuration file and then issue the following command:
orchadmin check