DataStage Material
DataStage Material
DataStage Material
DATA STAGE
2/12/2011
Page 2
Data stage Designer Data stage Server Data stage Director TCP/IP
When I was installed Data stage software in our personal PC its automatically comes in our PC is having 4 components in blue color like DATASTAGE ADMINISTRATOR, DATASTAGE DESIGNER, DATASTAGE DIRECTOR, DATASTAGE MANAGER. These are the client components. DS Client components:1) Data Stage Administrator:This components will be used for to perform create or delete the projects. , cleaning metadata stored in repository and install NLS.
Page 3
Page 4
Page 5
File stages
Note:- All file stage are passive stages means which defines just to read or write access only.
Sequential File stage: it is one of the file stages which it can be used to reading the data from file or writing the data to file. It can support single input link or single output link and as well as reject link.
Page 6
Dataset:It is also one of the file stages which it can be used to store the data on internal format, it is related operating system. So, it will take less time to read or write the data.
Page 7
File Set:It is also one of the file stage which it can be used to read or write the data on file set. The file it can be saved with the extension of .fs. it operating parallel
Page 8
10) What is exact difference between Dataset and File set? Dataset is an internal format of Data Stage the main points to be considered about dataset before using are: 1) It stores data in binary in the internal format of Data Stage so, it takes less time to read/write from dataset than any other source/target. 2) It preserves the partioning schemes so that you don't have to partition it again. 3) You cannot view data without data stage Now, About File set 1)It stores data in the format similar to a sequential file. 2) Only advantage of using file set over a sequential file is "it preserves partioning scheme". 3) You can view the data but in the order defined in partitioning scheme.
Page 9
Complex Flat File:This file is used to read the data form Mainframe file. By using CFF we can read ASCII or EBCDIC (Extended Binary coded Decimal Interchange Code) data. We can select the required columns and can omit the remaining. We can collect the rejects (bad formatted records) by setting the property reject to save (other options: continue fail). We can flatten the arrays (COBOL files).
Page 10
Processing Stages
Aggregator stage:It is one the processing stage which it can be used to perform the summaries for the group of input data. It can support single input link which carries the input data and it can support single out put link which carries aggregated data to output link.
When I was go for properties for aggregator stage..Double click on aggregate stage then it will show
Page 11
When I was go for the properties of the copy stage. Double click on copy stage It will show like
Page 12
When I was go for propertief of filter stage. Double click on filter stage it will show
Page 13
When I was go for properties of switch stage.. double click on switch stage..
Page 14
Join stage:It is also one of the processing stages which can be used to combine two or more input datasets based on key field. It can support two or more input datasets and one output dataset, and doesnt support reject link. Join can be performing inner join, left outer, right outer, full-outer joins.
Inner join means to display the matched records from both the side tables. Left-outer join means to show the matched records from both sides as well as unmatched records from left side table. Right-outer join means to show the matched records from both sides as well as unmatched records from Right side table. Full-outer join means to show the matched as well as unmatched records from both sides. When I was go to the properties of join stage. Double click on join stage
Page 15
Merge stage:It is also one of the processing stages which it can be used to merge the multiple input data. It can support multiple input links, the first input link is called master input link and remaining links are called Updated links. It can be perform inner join and left-outer join only.
Page 16
When I was go to the properties of the merge stage..Double click on merge stage
Page 17
Look-up stage:This is also one of the processing stages which can be used to look-up on relational tables. It can support multiple input links and single output link and support single reject link. This is simple job for regarding on explanation of look-up stage
This stage will be performing inner join and left-outer join. When I was go for properties of look-up stage. Double click on look-up stage, it will show like Please look into below diagram
Page 18
Q) On which case it will be perform either inner or left-outer join? To see observe picture it is having one icon is constraints. Double click on that icon it will show one window is look like
Page 19
Page 20
Page 21
Sort stage:It is also one of the processing stage which can be used to sort data based on key field, either ascending order or descending order. It can be support single input link and single output link.
Page 22
Modify Stage:It is also one of the processing stages which it can be used to when you are able to handle Null handling and Data type changes. It is used to change the data types if the source contains the varchar and the target contains integer then we have to use this Modify Stage and we have to change according to the requirement. And we can do some modification in length also.
Page 23
Pivot stage:it is also one of the processing stage. Which can be used to make many people have the following misconceptions about Pivot stage? 1) It converts rows into columns 2) By using a pivot stage; we can convert 10 rows into 100 columns and 100 columns into 10 rows 3) You can add more points here!! Let me first tell you that a Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as NORMALIZATION. Another fact about the Pivot stage is that it's irreplaceable i.e. no other stage has this functionality of converting columns into rows!!! So, that makes it unique, doesn't!!! Let's cover how exactly it does it.... For example, lets take a file with the following fields: sno, sname, m1, m2, m3
Basically you would use a pivot stage when u need to convert those 3 fields like m1,m2,m3 into a single field marks which contains a unique value per row...i.e. You would need the following
Page 24
Page 25
Q) What is difference between primary key and surrogate key? Surrogate key is an artificial identifier for an entity. in Surrogate key are generated by the system sequentially. Primary key is a natural identifier for an entity. In primary key are all the values are entered manually by the are uniquely identifier there will be no repletion of data.
Page 26
In this Editor it is having stage variables, Derivations, and Constraints. Stage Variable - An intermediate processing variable that retains value during read and doesnt pass the value into target column.
Page 27
Q) What is order of the Execution in Transformer stage? Order of Execution is 1) Stage Variables. 2) Derivations. 3) Constraints.
Change Capture Stage:This is also one of the active processing stage which it can be used to capture the changes between two sources like After and Before. The source which is used as reference to capture the change is called after dataset. The source which we are looking for the change is called before dataset. The change code will be added in output dataset. So, by this change code will be recognizing delete, insert or update.
Page 28
Page 29
Development and Debug Stages Row generator Stage:it produces set of data fitting specified meta data. It is useful where you want to test your job but have no real data available to process. It is having no input links and a single output link.
Page 30
Page 31
Input data:-
Output data:-
Page 32
Tail Stage:This stage helpful for testing and debug the application with large datasets. This stage selects BOTTOM N rows from the input dataset and copies the selected rows to an output datasets. It can have a single input link and single output link.
Page 33
Sample Stage:This stage will be having single input link and any number of output link when operating percent or period mode.
Page 34
Peek Stage:it can have a single input link and any number output link. It can be used to print the record column values to the job log view.
Page 35
Mock data:-
Page 36
SCD type-2 is the common problems in DWH. It is to maintain the history information for particular organization in target. So, for every update in the source, it insert new record in target. In this implementation, it is having two input datasets like before and after datasets, these two are connected to change capture stage which is connected to transformer stage which is having two output links like insert link and update link. The insert link is connected to stored procedure stage which is connected to transformer which is connected to target stage. And also other output link (update link) of the transformer stage which is joined with the target stage while removing records by using remove duplicate stage. The output link of the join stage which is connected to transformer stage which is connected to target update stage. For example, the source is having EMP table with 100 records. When I was run the job, the records was initially loaded into target insert stage, how it means, First compare two input datasets, in first time there is no change in the records. So, the change capture stage gives the change code=1. The transformer stage transforms the records from source to target by generating sequence to the records by using stored procedure stage. If any updation is occurred at source, that updation records will be stored to target side (TGT_UPDATE). How it will be store means, first two compare two input datasets, changes is occurred at source level then change capture stage gives the change code=3 . by using this change code, the transformer stage transform the records to join stage through the update link. Join stage joins the updated records and target records by removing duplicate records using remove duplicate stage. The output of the join stage to connected to transform stage which was transforming update records to target update stage.
Page 37
Page 38
Controlling Data stage jobs through some other Data stage jobs. Ex: Consider two Jobs XXX and YYY. The Job YYY can be executed from Job XXX by using Data stage macros in Routines. To Execute one job from other job, following steps needs to be followed in Routines. 1. Attach job using DSAttachjob function. 2. Run the other job using DSRunjob function 3. Stop the job using DSStopJob function
Page 39
Page 40