Data Warehouse Concepts Presentation
Data Warehouse Concepts Presentation
Data Warehouse Concepts Presentation
Top
Management
Middle Management
Enterprise
Data
Marketing Data Mart
Warehouse
Metadata Repository
Production Data Mart
Data Warehousing
Operational System
Highly Volatile Data
Generally normalized to 3rd normal form
Direct user interaction system
Very less redundancy (Almost 0)
Data Warehousing
Source
Systems
Data BI Tools
ETL DQ Warehouse
Query Tools
OLAP Tools
Data Mining
Data
Visualization
Data Warehousing
• CRM ODS
Extract, Enterprise
• ERP Transformation, Data Reporting
• Legacy and Load (ETL) Tools
Warehouse Data Mart
• e-Commerce Layer
OLAP Tools
External • Cleanse Data
Data Mart
• Standardize Values Ad Hoc
Data Query Tools
• Apply Business Rules Metadata
• Purchased • Merge Records Repository
Data Mart Data Mining
Market Data Tools
• Spreadsheets
Data Warehousing
Reporting Tools
ETL Process
The extraction of data from many heterogeneous
systems
The transformation of this extracted data into structures
and types that follow the business rules of the data
warehouse
The loading of this transformed (cleansed) data into the
data warehouse structures in preparation for data analysis
Data Warehousing
ETL Process
The ETL design process is perhaps the most
time consuming stage of the Data Warehouse
project.
It is often the case that over 50% of the time
dedicated to the Data Warehousing project is
spent on designing and developing the ETL
processes.
Your ETL processes will determine the quality of
data that ends up in your Data Warehouse.
Data Warehousing
Data Extraction
Data Cleaning
Find and Replace – for instance to synchronize a
building name where there were instances of the
same building referred to under different
abbreviations – i.e. 'London Health Centre', 'London
HC', 'London Hth Cen' etc
Convert Case – for example on the 'title' column of
a 'customers' table, converting all instances of
'MRS', 'mrs' and 'MRs' to 'Mrs'
Merging data from different data sources
NULL value handling – conversion to a default value
Data Type conversion – to synchronize data from
different systems, i.e. the CustomerId in one
Data Warehousing
Integration
Client Client
Metadata Warehouse
Paper Reports
Integration
• Periodic
•On-demand
Aggregation
Once we have tables that are ready for loading
into the DW we can perform summary calculations
(aggregations) and store this summary data to
enable quicker running of queries.
When creating our dimensional data model it is
essential that good paths of aggregation form
part of the design of the dimension tables.
Data Warehousing
Loading Data
Incremental vs. refresh
Off-line vs. on-line
Frequency of loading
At night, 1x a week/month, continuously
Informatica
Abinito
Oracle Express / Warehouse Builder
MS-DTS from Microsoft (SSIS)
DataStage from Ascential Software
SAS System from SAS Institute
Data Modeling
Definition:
SALES_DETAIL
CUSTOMER
SALES_RECORD_ID
CUSTOMER_ID
Sales Detail Customer CUSTOMER_ID
Sales Record PRODUCT_SKU
Customer ID
ID
Logical Physical
PRODUCT
Model Model
Product PRODUCT_SKU
Product SKU
Data Modeling
Normalized Structures
Denormalized structures
Data Modeling
Data Modeling
Data Modeling
Dimension Model Overview
Dimension2 Dimension3
Fact ••
•
Dimension1 Dimensionn
Data Modeling
Dimension Model
Same information as a ‘transactional’ database in 3NF
Goals
◦ User understandability
◦ Query performance
Measure / Fact :
A business performance measurement, typically numeric
and additive, that is stored in a fact table.
Types of Fact
Additive
Semi Additive
Non Additive
Dimension Model
Conforming the dimensions
Common dimensions across the Facts/ data marts have
to be exactly same or subset of the main dimension table
Dimension Model
Conforming the dimensions in Matrix View
Dimension Model
Types of Facts
Additive
Facts / Measures which are in numeric format and can be shown in
aggregations like Sum, Avg., etc.
For example, Unit Quantity and Sales Amount.
Semi Additive
Facts / Measures which are in numeric format but can not be
calculated like additives example Bank Balances, Inventory levels
etc.
Non Additive :
Non-additive facts are facts that must be calculated at each level of
aggregation; that is, they can not be directly summed.
For example, Age and Temperature.
Schema and Types
What is Schema ?
Logical or Physical design of a set of Database Tables,
indicating the relationship among the tables.
Schema Types
STAR SCHEMA
HYBRID SCHEMA
Star Schema
Star Schema
Star Schema
Star Schema
Data Modeling
Star Schema
STAR SCHEMA
Simple and easy overview -> ease-of-use
Relatively flexible
Fact table is normalized
Dimension tables often relatively small
“Recognized” by many RDBMSes -> good Performance
Hierarchies are ”hidden” in the columns
Dimension tables are de-normalized
Data Modeling
Snow Flake Schema
Snow Flake Schema
Data Modeling
Snow Flake Schema
Snow Flake Schema
Data Modeling
Hybrid Schema
Hybrid Schema
Data Modeling
Snow Flake Schema