Data Mining and Business Intelligence Module:1 Data Warehouse (DWH)

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 18

Data Mining and Business Intelligence

Module:1 Data Warehouse (DWH)

Faculty Name : Dr. Pallavi Chavan


Topic (4 hrs)
DWH characteristics, B3- Ch 2 - pg 20

Dimensional modeling: B3- Ch 10 - pg 210


Star, Snowflakes, B3- Ch 11 - pg 235
OLAP operation, B3- Ch 15 - pg 343
OLTP vs OLAP

Data Mining as a step in KDD B1- Ch 1 - pg 07

Kind of patterns to be mined B1- Ch 1 - pg 08

Technologies used B1- Ch 1 - pg 23

Data Mining applications B1- Ch 1 - pg 27

2
Lecture 3

Data Warehouse
Need for Data Warehousing

• Integrated, company-wide view of high-


quality information (from disparate
databases)

• Separation of operational and


informational systems and data

4
Separating Operational and Informational Systems

• Operational system – a system that is used to run a business in real time, based
on current data; also called a system of record

• Informational system – a system designed to support decision making based on


historical point-in-time and prediction data for complex queries or data-mining
applications

5
Definition

A data warehouse is:

– Subject-oriented
– Integrated
– Time-variant
– Non-volatile

collection of data in support of managements decision


making process.
Definitions

• Subject-oriented: e.g. customers, patients,


students, products
• Integrated: consistent naming conventions,
formats, encoding structures; from
multiple data sources
• Time-variant: can study trends and changes
• Non-Volatile: read-only, periodically
refreshed

7
Data Mart

• a simple form of data warehouse focused


on a single subject

• A data warehouse that is limited in scope


Uses of a datawarehouse

• Presentation of standard reports and


graphs
• For dimensional analysis
• Data mining
Advantages

• Lowers cost of information access


• Improves customer responsiveness
• Identifies hidden business opportunities
• Strategic decision making
Roadmap to DataWarehousing

• Data extracted, transformed and cleaned


• Stored in a database - RDBMS
• Query and Reporting systems
• Executive Information System and Decision
Support System
Data Extraction and Load

• Find sources of data : Tables, files, documents,


commercial databases, emails, Internet

• Tool to clean data – Apertus

• Tool to convert codes, aggregate and calculate


derived values - SAS
• Data Reengineering tools
Metadata

• Database that describes various aspects of


data in the warehouse

• Administrative Metadata: Source database


and contents, Transformations required,
History of Migrated data
Storage

• Relational databases
• MDD
Measurements are numbers that
quantify the business process
Dimensions are attributes
that describe measurements
Tools

• Data Extraction – SAS (statistical analysis


system)
• Data Cleaning - Apertus, Trillium
• Data Storage - ORACLE, SYBASE
• Optimizers - Advanced Parallel
Optimizer Bitmap, Indices
Star Index
16
There are mainly five components of Data Warehouse:

1) Database

2) ETL Tools

3) Meta Data 

4) Query Tools

5) DataMarts.

Lect 42 Architecture of Data


17 Warehouse
Thank You

You might also like