IBM Infosphere Metadata Workbench v8 7 Tutorial
IBM Infosphere Metadata Workbench v8 7 Tutorial
IBM Infosphere Metadata Workbench v8 7 Tutorial
Version 8 Release 7
Tutorial
SC19-3606-00
Tutorial
SC19-3606-00
Note Before using this information and the product that it supports, read the information in Notices and trademarks on page 29.
Copyright IBM Corporation 2010, 2011. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Tutorial: Preparing data for lineage reports . . . . . . . . . . . . . . . 1
Setting up the tutorial environment . . . . . . Copying the installed tutorial files . . . . . Importing assets into the metadata repository . . Importing databases, database tables, data files, and BI models . . . . . . . . . . . . Importing an IBM InfoSphere DataStage job . Creating application and file extended data source assets . . . . . . . . . . . . Importing extension mapping documents . . Module summary . . . . . . . . . . Completing tasks before running lineage reports . Managing lineage . . . . . . . . . . Identifying schemas as identical . . . . . Module summary . . . . . . . . . . Running lineage reports . . . . . . . . . . 1 . 2 . 3 . 3 . 10 . . . . . . . . 11 13 16 16 16 17 20 20 Running data lineage reports . . Running business lineage reports . Module summary . . . . . . . . . . . . . . . . . . . 20 . 21 . 22
Product accessibility
. . . . . . . . 23
Accessing product documentation. . . 25 Links to non-IBM Web sites. . . . . . 27 Notices and trademarks . . . . . . . 29 Contacting IBM . . . . . . . . . . . 33 Index . . . . . . . . . . . . . . . 35
iii
iv
Tutorial
Learning objectives
After you complete the lessons in this module, the tutorial is ready for use.
Time required
The time needed to complete the setup depends on the overall performance of your system and on other IBM InfoSphere Information Server product modules that you installed.
Prerequisites
You must know the Web address of IBM InfoSphere Metadata Workbench and of InfoSphere Information Server. You must know the user name and password of an account that has the Metadata Workbench Administrator role, the Suite Administrator role, and the DataStage and QualityStage Administrator role. These roles might be in different accounts or all roles might be in the same account. The following software must be installed on the appropriate tier for InfoSphere Information Server: v IBM InfoSphere Metadata Workbench v IBM InfoSphere DataStage and QualityStage v Istool (This software is typically installed in InfoSphere_installation_directory\Clients\istools\cli where InfoSphere_installation_directory is the top-level installation directory of InfoSphere Information Server.) The following software must be installed on the same computer from which you run the tutorial: v InfoSphere DataStage and QualityStage Designer client v IBM InfoSphere DataStage and QualityStage Administrator v Any version of the Microsoft Windows operating system v Microsoft Internet Explorer versions 6 or 7, or Mozilla Firefox version 2
Tutorial
Learning objectives
The lessons in this module explain how to do the following actions: v Import databases, database tables, and data files from files that were created by vendor software. v Import business intelligence (BI) reports and models that were created by IBM Cognos. v Create extended data source assets of type application and of type file by importing a file that is in a comma-separated value (CSV) format v Import extension mapping documents with two rows of source-to-target mappings. The extension mapping document is in a CSV format. v Import jobs that were created by IBM InfoSphere DataStage and QualityStage Designer. In this tutorial, you import assets into the metadata repository by using import files. Typically however, assets are created or imported into the metadata repository when an IBM InfoSphere Information Server product module connects to the data source.
Time required
This module takes approximately 45 minutes to complete.
v USRNAME and PASSWORD is the user name and password of an account on InfoSphere Information Server with the Suite Administrator role. v FULL_PATH_TO_FILE is the file name of an extracted tutorial file with the directory path. An example might be C:\temp\tutorial\EWS.ISX. Note: The command to import Report.isx is import -dom SERVERNAME -u USRNAME -p PASSWORD -ar FULL_PATH_TO_FILE -cm -replace You can check that the databases, database tables, data files, and BI models are in the metadata repository by doing these steps: 1. Open the Web browser and connect to the Web address of IBM InfoSphere Metadata Workbench. 2. Type your user name and password, and click Login. The Welcome page of the metadata workbench is displayed. 3. In the left pane of the metadata workbench, click the Discover tab. 4. In the Additional Types list, select Database and click Display. The newly created databases are displayed in the right pane of the metadata workbench.
5. Right-click DW_MART and select Open Details in New Window to display its details in the asset information page.
Tutorial
6. In the Additional Types list in the left pane, select Data Files and click Display. The newly created data files are displayed.
7. Right-click C:\EWS\Prod\GlobalSales and select Open Details in New Window to display its details in the asset information page.
Tutorial
8. In the Additional Types list in the left pane, select BI Model and click Display. The newly created BI model EWS is listed. 9. Right-click EWS and select Open Details in New Window. Note the BI collections and BI report that are listed in the asset information page of the BI model EWS:
Lesson checkpoint
In this lesson, you imported databases, database tables, data files, and BI models. You can display these imported assets in the Browse tab of the left pane of the metadata workbench. You imported these databases, schemas, database tables, and data files into the metadata repository:
Tutorial
Figure 6. List of imported assets that are displayed in the Browse tab of left pane
You created the BI model EWS that contains BI collections, PROD_MRT and SALES_MRT. You created the BI report ProductionRunReport.
Figure 7. List of imported BI model and report assets that are displayed in the Browse tab of left pane
10
Tutorial
8. Click OK to import the project. 9. Click File > Exit to close IBM InfoSphere DataStage and QualityStage Designer. You can check that the jobs are in the metadata repository by doing these steps in IBM InfoSphere Metadata Workbench: 1. In the left pane of the metadata workbench, click the Discover tab. 2. In the Additional Types list, select Job and click Display. A list of all jobs is displayed. 3. Narrow your search to display only those jobs whose name begins with EWS by typing this string in the Narrow Your Results field.
Figure 9. List of new jobs that begin with "EWS" in the metadata repository
Lesson checkpoint
In this lesson, you imported InfoSphere DataStage jobs into the metadata repository.
11
1. In the left pane of the metadata workbench, click the Advanced tab and select Import Extended Data Sources. 2. Click Add in the Import Extended Data Sources window. Browse to the directory where you extracted the tutorial files and select these files: v ExtensionApplication_Source.csv v Extended_File_Source.csv
Figure 10. Two files whose assets are created in the metadata repository
3. Click OK. The Status window displays the import status as the files are read. File and application assets are created in the metadata repository. 4. Click OK to close the Status window. You can check that the extended data source assets are in the metadata repository by doing these steps: 1. In the left pane of the metadata workbench, click the Discover tab. 2. In the Asset Type list, select Application and click Find. Right-click the newly created application asset CRM and select Open Details in New Window to display its details.
12
Tutorial
Figure 11. Asset information page of CRM, a new application asset in the metadata repository
3. In the Asset Type list, select File and click Find. Right-click the newly created file asset Customer Data Upload and select Open Details in New Window to display its details.
Lesson checkpoint
In this lesson, you imported two files in a CSV format by using InfoSphere Metadata Workbench. The import created new extended data source assets of type file and of type application in the metadata repository. You learned the following tasks: v How to create extended data source assets in the metadata repository by importing a CSV file. v How to view the new extended data source assets in the metadata repository after the import.
13
In the Import Extension Mapping Documents window, click Add in the top pane. Browse to the directory where you copied the tutorial files and select EWS Mapping1.csv and EWS Mapping2.csv. Leave the Source, Target, and Configuration fields blank. 3. Click OK and then click Save to save and import the extension mapping document into the metadata repository. 4. Click OK to close the Status window. 5. Right-click the extension mapping document EWS Mapping 2.csv and select Open Details in New Window. In the Extension Mappings pane of the asset information page, the two mapping rows, both called SP Read, are listed. Inventory Data is a source asset of type file and is mapped to the target assets AmericaProd and Plant. AmericaProd is an asset of type file structure. Plant is an asset of type file field. 2.
Figure 12. Asset information page of an extension mapping document with two mapping rows
To see the asset information page of the source or target assets, right-click the asset name and select Open Details in New Window. In this example, the asset information page of InventoryData displays the same information about the extension mapping rows as the asset information page of the extension mapping document.
14
Tutorial
Lesson checkpoint
In this lesson, you created an extension mapping document with two mapping rows. You noted the source-to-target mapping assignment by looking at the asset information page of the extension mapping document, or of the source or target assets. You learned the following tasks: v How to import an extension mapping document with source and target assets.
Tutorial: Preparing data for lineage reports
15
v How to see the source-to-target mappings when you view the asset information page of the extension mapping document.
Module summary
In this module, you imported assets into the metadata repository that are needed to create a lineage report. You imported the following assets: v Database v Database file v v v v IBM Cognos business intelligence (BI) report Extended data source assets of type application and of type file Extension mapping documents with source-to-target mappings Compiled IBM InfoSphere DataStage and QualityStage jobs
The metadata repository now has the assets that are needed for the lineage report. The next step is to perform administrative tasks to prepare the data.
Lessons learned
In this module, you learned how to do the following tasks: v Import databases, database files, BI reports, and compiled IBM InfoSphere DataStage and QualityStage jobs by using istool command-line interface. v Create extended data source assets of type application and of type file by using IBM InfoSphere Metadata Workbench. v Import extension mapping documents with source-to-target mappings by using InfoSphere Metadata Workbench.
Learning objectives
The lessons in this module explain how to do the following actions: v Set relationships between stages and tables or data file structures, between stages, and between database tables and views. v Map a database alias to ensure that stages and database tables are correctly linked in lineage reports. v Run data source identity to identify duplicate database and schemas.
Time required
This module takes approximately 30 minutes to complete.
Managing lineage
In this lesson, you learn how to manage lineage by setting relationships between stages and tables or data file structures, between stages, and between database tables and views.
16
Tutorial
When the target stage in one job is matched to the source stage in the next job, the metadata workbench reports display the cross-job analysis. When views are matched to their source database tables, the metadata workbench displays the relationships in the lineage report. The Manage Lineage utility in the metadata workbench works with stages that connect to databases and data files. The most commonly used stages are supported. To run the Manage Lineage utility: 1. Click the Advanced tab in the left pane of the metadata workbench and then click Manage Lineage. 2. Select the EWS project, which has new jobs, and then click the Detect . Associations icon This step detects associations between stages and data sources. and do the following steps: 3. Click the Map Database Alias icon a. In the Mapped Database Aliases table, locate the row of the database alias, DW_Mart. Click Select in that row. b. In the Select a Database for DW_Mart window, click Find and then select EWS in the results list. Click Select. c. Click Save to define EWS as the alias for the database DW_Mart. d. Repeat step 2 to update the database alias. These steps maps a database name to an alias name. The Last Run field of the row in the Transformation Project results table displays the data and time that Manage Lineage was last run.
Lesson checkpoint
In this lesson, you assigned a database alias to a database to ensure that stages and database tables are correctly linked in lineage reports. You learned how to run the Manage Lineage utility.
17
Figure 14. Two schemas from different databases are identified as identical
In the Browse tab of the left pane of the metadata workbench, you can see that the database tables of the matched schemas SCHEMA1 have the same table names. In this case, the database tables are also identified as identical.
18
Tutorial
Figure 15. Database tables from different schemas are also identified as identical
Lesson checkpoint
In this lesson, you identified SCHEMA1 of database DW_MART and SCHEMA1 of database EWS as identical schemas. Database tables in these schemas have the same name. As a result, these database tables are also identified as identical.
19
Module summary
In this module, you performed administrative tasks on the assets that you created or imported into the metadata repository. The administrative tasks prepared the data for correct lineage.
Lessons learned
In this module, you learned how to do the following tasks: v Define a database alias so that the Manage Lineage utility can set relationships between stages of a job and database tables. v Run the Manage Lineage utility. This step links the target stage in one job to the source stage in the next job, and links views to database tables. v Define two schemas as identical. All database tables and database columns that are contained by identical schemas are also marked as identical when their names match.
Learning objectives
The lessons in this module explain how to do the following actions: v Run data lineage reports v Run business lineage reports
Time required
This module takes approximately 20 minutes to complete.
20
Tutorial
You can display information about the assets in a data lineage report in any of the following ways: v Open each twisty of an asset type (in the right pane) to display the assets. v Move the mouse pointer over the graphic of an asset in the left pane. v Click the graphic of an asset to expand the asset information in the right pane. You can get additional information about an asset by clicking the icons at the bottom of each graphic. The results of data lineage vary according to the asset that you select as your start and end points for the data flow. If, instead of selecting CRM as the starting point for data lineage, you select EWS_SalesStaging job, then the data lineage report would be different.
Lesson checkpoint
In this lesson, you learned how to run a data lineage report.
21
Business lineage reports show a scaled-down view of lineage without the detailed information that is not needed by a business user. Business lineage reports show data flows only through those assets that have been configured to be included in business lineage reports. In addition, business lineage reports do not include extension mapping documents or jobs from IBM InfoSphere DataStage and QualityStage. To 1. 2. 3. run business lineage reports: Click the Discover tab in the left pane of the metadata workbench. In the Asset Type list, select BI Report. Click Find. In the Find Results pane, right-click ProductionRunreport and select Business Lineage.
Lesson checkpoint
In this lesson you learned how to create a business lineage report.
Module summary
In this module you ran data lineage and business lineage reports.
Lessons learned
In this module, you learned how to do the following tasks: v How to configure and run data lineage reports. v How to run business lineage reports.
22
Tutorial
Product accessibility
You can get information about the accessibility status of IBM products. The IBM InfoSphere Information Server product modules and user interfaces are not fully accessible. The installation program installs the following product modules and components: v IBM InfoSphere Business Glossary v IBM InfoSphere Business Glossary Anywhere v IBM InfoSphere DataStage v IBM InfoSphere FastTrack v v v v IBM IBM IBM IBM InfoSphere InfoSphere InfoSphere InfoSphere Information Analyzer Information Services Director Metadata Workbench QualityStage
For information about the accessibility status of IBM products, see the IBM product accessibility information at http://www.ibm.com/able/product_accessibility/ index.html.
Accessible documentation
Accessible documentation for InfoSphere Information Server products is provided in an information center. The information center presents the documentation in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to set display preferences in your browser. It also allows you to use screen readers and other assistive technologies to access the documentation.
23
24
Tutorial
25
26
Tutorial
27
28
Tutorial
Notices
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web
29
sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to
30
Tutorial
IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
31
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. The United States Postal Service owns the following trademarks: CASS, CASS Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV and LACSLink licensee of the United States Postal Service. Other company, product or service names may be trademarks or service marks of others.
32
Tutorial
Contacting IBM
You can contact IBM for customer support, software services, product information, and general information. You also can provide feedback to IBM about products and documentation. The following table lists resources for customer support, software services, training, and product and solutions information.
Table 1. IBM resources Resource IBM Support Portal Description and location You can customize support information by choosing the products and the topics that interest you at www.ibm.com/support/ entry/portal/Software/ Information_Management/ InfoSphere_Information_Server You can find information about software, IT, and business consulting services, on the solutions site at www.ibm.com/ businesssolutions/ You can manage links to IBM Web sites and information that meet your specific technical support needs by creating an account on the My IBM site at www.ibm.com/account/ You can learn about technical training and education services designed for individuals, companies, and public organizations to acquire, maintain, and optimize their IT skills at http://www.ibm.com/software/swtraining/ You can contact an IBM representative to learn about solutions at www.ibm.com/connect/ibm/us/en/
Software services
My IBM
IBM representatives
Providing feedback
The following table describes how to provide feedback to IBM about products and product documentation.
Table 2. Providing feedback to IBM Type of feedback Product feedback Action You can provide general product feedback through the Consumability Survey at www.ibm.com/software/data/info/ consumability-survey
33
Table 2. Providing feedback to IBM (continued) Type of feedback Documentation feedback Action To comment on the information center, click the Feedback link on the top right side of any topic in the information center. You can also send comments about PDF file books, the information center, or any other documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: [email protected]
34
Tutorial
Index C
customer support contacting 33
L
legal notices 29
N
non-IBM Web sites links to 27
P
product accessibility accessibility 23 product documentation accessing 25
S
software services contacting 33 support customer 33
T
trademarks list of 29
W
Web sites non-IBM 27
35
36
Tutorial
Printed in USA
SC19-3606-00