Part B
Part B
Part B
In 1973 Willis Ware of the RAND Corporation chaired a committee to advise the Secretary of the U.S. Department
of Human Services on privacy issues. The report proposes a set of principles of fair information practice.
Ware and his committee expected these principles to apply to all collections of personal data on individuals.
Unfortunately, that is not the way the legislation developed. The Ware committee report led to the 1974 Privacy Act
(5 USC 552a), which embodies most of these principles, although that law applies only to data maintained by the
U.S. government. The Privacy Act is a broad law, covering all data collected by the government. It is the strongest
U.S. privacy law because of its breadth: It applies to all personal data held anywhere in the government.
Statements on data transfer (to other organizations) were more explicit than before HIPAA.
Consumers still had little control over the disclosure or dissemination of their data.
Statements were longer and more complex, making them harder for consumers to understand.
Even within the same industry branch (such as drug companies), statements varied substantially, making it
hard for consumers to compare policies.
Statements were unique to specific web pages, meaning they covered more precisely the content and
function of a particular page.
Controls on U.S. Government Web Sites
Because privacy is ambiguous, privacy policies are an important way to both define the concept in a particular
setting and specify what should or will be done about it.
The Federal Trade Commission (FTC) determined that in order to obey the Privacy Act, government web sites
would have to address five privacy factors.
Notice. Data collectors must disclose their information practices before collecting personal information
from consumers.
Choice. Consumers must be given a choice as to whether and how personal information collected from
them may be used.
Access. Consumers should be able to view and contest the accuracy and completeness of data collected
about them.
Security. Data collectors must take reasonable steps to ensure that information collected from consumers
is accurate and secure from unauthorized use.
Enforcement. A reliable mechanism must be in place to impose sanctions for noncompliance with these
fair information practices.
In 2002, the U.S. Congress enacted the e-Government Act of 2002 requiring that federal government agencies post
privacy policies on their web sites. Those policies must disclose
The e-Government Act places strong controls on government data collection through web sites. As we described,
privacy outside the government is protected by law in some areas, such as credit, banking, education, and healthcare.
But there is no counterpart to the e-Government Act for private companies.
No Deceptive Practices
The Federal Trade Commission has the authority to prosecute companies that engage in deceptive trade or unfair
business practices. If a company advertises in a false or misleading way, the FTC can sue. The FTC has used that
approach on web privacy: If a company advertises a false privacy protection that is, if the company says it will
protect privacy in some way but does not do so the FTC considers that false advertising and can take legal action.
Because of the FTC, privacy notices at the bottom of web sites do have meaning.
In 1981, the Council of Europe (an international body of 46 European countries, founded in 1949) adopted
Convention 108 for the protection of individuals with regard to the automatic processing of personal data, and in
1995, the European Union (E.U.) adopted Directive 95/46/EC on the processing of personal data. Directive
95/46/EC, often called the European Privacy Directive, requires that rights of privacy of individuals be maintained
and that data about them be,
Special protection for sensitive data. There should be greater restrictions on data collection and processing
that involves "sensitive data."
Data transfer. This principle explicitly restricts authorized users of personal information from transferring
that information to third parties without the permission of the data subject.
Independent oversight. Entities that process personal data should not only be accountable but should also
be subject to independent oversight. In the case of the government, this requires oversight by an office or
department that is separate and independent from the unit engaged in the data processing.
Anonymity and Multiple Identities
One way to preserve privacy is to guard our identity. Not every context requires us to reveal our identity, so some
people wear a form of electronic mask.
Anonymity
A person may want to do some things anonymously. For example, a rock star buying a beach house might want to
avoid unwanted attention from neighbors, or someone posting to a dating list might want to view replies before
making a date.
Multiple Identities
Most people already have multiple identities. To your bank you might be the holder of account 123456, to your
motor vehicles bureau you might be the holder of driver's license number 234567, and to your credit card company
you might be the holder of card 345678. For their purposes, these numbers are your identity; the fact that each may
(or may not) be held in your name is irrelevant.
Pseudonymity
Sometimes, full anonymity is not wanted. A person may want to order flower bulbs but not be placed on a dozen
mailing lists for gardening supplies. But the person does want to be able to place similar orders again, asking for
the same color tulips as before. This situation calls for pseudonyms, unique identifiers that can be used to link
records in a server's database but that cannot be used to trace back to a real identity.
Multiple identities can also be convenient, for example, having a professional e-mail account and a social one.
Similarly, disposable identities (that you use for a while and then stop using) can be convenient.
The government gathers and stores data on citizens, residents, and visitors. Government facilitates and regulates
commerce and other kinds of personal activities such as healthcare, employment, education, and banking. In those
roles the government is both an enabler and regulator of privacy and a user of private data.
Authentication
Government plays a complex role in personal authentication. Many government agencies (such as the motor vehicles
bureau) use identifiers to perform their work. Authentication documents (such as passports and insurance cards)
often come from the government. The government may also regulate the businesses that use identification and
authentication keys.
They recognized risks when the government started to acquire data from other parties:
The committee recommended several steps the government can take to help safeguard private data.
Data minimization. Obtain the least data necessary for the task. For example, if the goal is to study the
spread of a disease, only the condition, date, and vague location (city or county) may suffice; the name or
contact information of the patient may be unnecessary.
Data anonymization. Where possible, replace identifying information with untraceable codes (such as a
record number); but make sure those codes cannot be linked to another database that reveals sensitive data.
Audit trail. Record who has accessed data and when, both to help identify responsible parties in the event
of a breach and to document the extent of damage.
Security and controlled access. Adequately protect and control access to sensitive data.
Training. Ensure people accessing data understand what to protect and how to do so.
Quality. Take into account the purpose for which data were collected, how they were stored, their age, and
similar factors to determine the usefulness of the data.
Restricted usage. Different from controlling access, review all proposed uses of the data to determine if
those uses are consistent with the purpose for which the data were collected and the manner in which they
were handled (validated, stored, controlled).
Data left in place. If possible, leave data in place with the original owner. This step helps guard against
possible misuses of the data from expanded mission just because the data are available.
Explain NIDS that should capture and analyze packet coming across a Network Connection for Content that matches
Known Attacks.
NIDS: A network intrusion detection system (NIDS) is an intrusion detection system that tries to detect
malicious activity such as denial of service attacks, port scans or even attempts to crack into computers by
monitoring network traffic.
Snort: an open source network intrusion prevention and detection system. It uses a rule-based language
combining signature, protocol and anomaly inspection methods
Snort: the most widely deployed intrusion detection and prevention technology and it has become the de
facto standard technology worldwide in the industry.
A packet sniffer: capture and display packets from the network with different levels of detail on the console
Packet logger: log data in text file
Honeypot monitor: deceiving hostile parties
NIDS: network intrusion detection system
Requirement of snort :
Lightweight NIDS
Small, flexible
Highly capable system
Snort architecture :
Explain the Enterprise Architecture Framework that solve the complex problem by decompose into multiple
subcategories of Master Data Management.
Refer PDF
List and Elaborate the elements that leveraged in implementing Customer Data Integration Solution.
Refer PDF
Explain the principles of organizing and Managing information to make sense of various and Contradictory
assertions about MDM.
Refer PDF
Explain the Component that monitors activity to identify malicious or suspicious events and State the goals and
Functions of that Component and Compare its Different Types.
Intrusion detection systems complement these preventive controls as the next line of defense. An intrusion detection
system (IDS) is a device, typically another separate computer, that monitors activity to identify malicious or
suspicious events.
The two general types of intrusion detection systems are signature based and heuristic.
Signature-based intrusion detection systems perform simple pattern-matching and report situations that match a
pattern corresponding to a known attack type. Heuristic intrusion detection systems, also known as anomaly
based, build a model of acceptable behavior and flag exceptions to that model; for the future, the administrator can
mark a flagged behavior as acceptable so that the heuristic IDS will now treat that previously unclassified behavior
as acceptable.
Intrusion detection devices can be network based or host based. A network-based IDS is a stand-alone device
attached to the network to monitor traffic throughout that network; a host-based IDS runs on a single workstation
or client or host, to protect that one host.
A simple signature for a known attack type might describe a series of TCP SYN packets sent to many different ports
in succession and at times close to one another, as would be the case for a port scan. An intrusion detection system
would probably find nothing unusual in the first SYN, say, to port 80, and then another (from the same source
address) to port 25. But as more and more ports receive SYN packets, especially ports that are not open, this pattern
reflects a possible port scan. Similarly, some implementations of the protocol stack fail if they receive an ICMP
packet with a data length of 65535 bytes, so such a packet would be a pattern for which to watch.
Because signatures are limited to specific, known attack patterns, another form of intrusion detection becomes
useful. Instead of looking for matches, heuristic intrusion detection looks for behavior that is out of the ordinary.
The original work in this area (for example, [TEN90]) focused on the individual, trying to find characteristics of
that person that might be helpful in understanding normal and abnormal behavior. For example, one user might
always start the day
The two styles of intrusion detection pattern matching and heuristic represent different approaches, each of
which has advantages and disadvantages. Actual IDS products often blend the two approaches.
1. Responding to alarms:
Whatever the type, an intrusion detection system raises an alarm when it finds a match. The alarm can range
from something modest, such as writing a note in an audit log, to something significant, such as paging the system
security administrator. Particular implementations allow the user to determine what action the system should take
on what events.
In general, responses fall into three major categories (any or all of which can be used in a single response):
Monitor, collect data, perhaps increase amount of data collected Protect, act to
reduce exposure Call a human
2. False Results:
Intrusion detection systems are not perfect, and mistakes are their biggest problem. Although an IDS might
detect an intruder correctly most of the time, it may stumble in two different ways: by raising an alarm for something
that is not really an attack (called a false positive, or type I error in the statistical community) or not raising an alarm
for a real attack (a false negative, or type II error).
Too many false positives means the administrator will be less confident of the IDS's warnings, perhaps
leading to a real alarm's being ignored. But false negatives mean that real attacks are passing the IDS without action.
We say that the degree of false positives and false negatives represents the sensitivity of the system. Most IDS
implementations allow the administrator to tune the system's sensitivity, to strike an acceptable balance between
false positives and negatives.
Strength :
Enumerate the eight dimensions of privacy and elaborate the steps to protect Privacy Loss.
Rezgui et al. list eight dimensions of privacy (specifically as it relates to the web, although the definitions carry over
naturally to other types of computing).
Information collection: Data are collected only with knowledge and explicit consent.
Information usage: Data are used only for certain specified purposes.
Information retention: Data are retained for only a set period of time.
Information security: Appropriate mechanisms are used to ensure the protection of the data.
Access control: All modes of access to all forms of collected data are controlled.
Policy changes: Less restrictive policies are never applied after-the-fact to already obtained data.
Here are the privacy issues that have come about through use of computers.
Data Collection
Advances in computer storage make it possible to hold and manipulate huge numbers of records. Disks on ordinary
consumer PCs are measured in gigabytes (109 bytes), and commercial storage capacities often measure in terabytes
(1012 bytes). We never throw away data; we just move it to slower secondary media or buy more storage.
No Informed Consent
Where do all these bytes come from? Although some are from public and commercial sources (newspapers, web
pages, digital audio, and video recordings) and others are from intentional data transfers (tax returns, a statement to
the police after an accident, readers' survey forms, school papers), still others are collected without announcement.
Telephone companies record the date, time, duration, source, and destination of each telephone call.
Loss of Control
We realize that others may keep data we give them. When you order merchandise online, you know you have just
released your name, probably some address and payment data, and the items you purchased. Or when you use a
customer appreciation card at a store, you know the store can associate your identity with the things you buy. Having
acquired your data, a merchant can redistribute it to anyone. The fact that you booked one brand of hotel room
through a travel agent could be sold to other hotels. You have little control over dissemination of your data.
In the cases just described, customer details are being marketed. Information about you is being sold and you have
no control; nor do you get to share in the profit. Even before computers customer data were valuable. Mailing lists
and customer lists were company assets that were safeguarded against access by the competition. Sometimes
companies rented their mailing lists when there was not a conflict with a competitor
Steps to Protect against Privacy Loss
The committee recommended several steps the government can take to help safeguard private data.
Data minimization. Obtain the least data necessary for the task. For example, if the goal is to study the
spread of a disease, only the condition, date, and vague location (city or county) may suffice; the name or
contact information of the patient may be unnecessary.
Data anonymization. Where possible, replace identifying information with untraceable codes (such as a
record number); but make sure those codes cannot be linked to another database that reveals sensitive data.
Audit trail. Record who has accessed data and when, both to help identify responsible parties in the event
of a breach and to document the extent of damage.
Security and controlled access. Adequately protect and control access to sensitive data.
Training. Ensure people accessing data understand what to protect and how to do so.
Quality. Take into account the purpose for which data were collected, how they were stored, their age, and
similar factors to determine the usefulness of the data.
Restricted usage. Different from controlling access, review all proposed uses of the data to determine if
those uses are consistent with the purpose for which the data were collected and the manner in which they
were handled (validated, stored, controlled).
Data left in place. If possible, leave data in place with the original owner. This step helps guard against
possible misuses of the data from expanded mission just because the data are available.
We use the term authentication to mean three different things [KEN03]: We authenticate an individual, ident
or attribute. An individual is a unique person. Authenticating an individual is what we do when we allow a per
to enter a controlled room: We want only that human being to be allowed to enter. An identity is a character str
or similar descriptor, but it does not necessarily correspond to a single person, nor does each person have only
name. We authenticate an identity when we acknowledge that whoever (or whatever) is trying to log in as ad
has presented an authenticator valid for that account. Similarly, authenticating an identity in a chat room as Suz
does not say anything about the person using that identifier: It might be a 16-year-old girl or a pair of middle-a
male police detectives, who at other times use the identity FrereJacques.
Finally, we authenticate an attribute if we verify that a person has that attribute. An attribute is a characteris
Here's an example of authenticating an attribute. Some places require one to be 21 or older in order to dr
alcohol. A club's doorkeeper verifies a person's age and stamps the person's hand to show that the patron is o
21. Note that to decide, the doorkeeper may have looked at an identity card listing the person's birth date, so
doorkeeper knew the person's exact age to be 24 years, 6 months, 3 days, or the doorkeeper might be authori
to look at someone's face and decide if the person looks so far beyond 21 that there is no need to verify. The sta
authenticator signifies only that the person possesses the attribute of being 21 or over.
In computing applications we frequently authenticate individuals, identities, and attributes. Privacy issues a
when we confuse these different authentications and what they mean. For example, the U.S. social security num
was never intended to be an identifier, but now it often serves as an identifier, an authenticator, a database key
all of these. When one data value serves two or more uses, a person acquiring it for one purpose can use it
another.
Individual Authentication
There are relatively few ways of identifying an individual. When we are born, for most of us our birth is registe
at a government records office, and we (probably our parents) receive a birth certificate. A few years later
parents enroll us in school, and they have to present the birth certificate, which then may lead to receiving a sch
identity card. We submit the birth certificate and a photo to get a passport or a national identity card. We rece
many other authentication numbers and cards throughout life.
The whole process starts with a birth certificate issued to (the parents of) a baby, whose physical descript
(height, weight, even hair color) will change significantly in just months. Birth certificates may contain the bab
fingerprints, but matching a poorly taken fingerprint of a newborn baby to that of an adult is challenging at b
(For additional identity authentication problems, Fortunately, in most settings it is acceptable to settle for w
authentication for individuals: A friend who has known you since childhood, a schoolteacher, neighbors,
coworkers can support a claim of identity.
Identity Authentication
We all use many different identities. When you buy something with a credit card, you do so under the identity
the credit card holder. In some places you can pay road tolls with a radio frequency device in your car, so
sensor authenticates you as the holder of a particular toll device. You may have a meal plan that you can acc
by means of a card, so the cashier authenticates you as the owner of that card.
You check into a hotel and get a magnetic stripe card instead of a key, and the door to your room authentic
you as a valid resident for the next three nights. If you think about your day, you will probably find 10 to
different ways some identity of you has been authenticated.
From a privacy standpoint, there may or may not be ways to connect all these different identities. A credit c
links to the name and address of the card payer, who may be you, your spouse, or anyone else willing to pay y
expenses. Your auto toll device links to the name and perhaps address of whoever is paying the tolls: you, the c
owner, or an employer. When you make a telephone call, there is an authentication to the account holder of
telephone, and so forth.
Discuss the new regulatory and compliance legislations that have become the primary drivers for the emergence
and adoption of the integrated Risk Management.
Regulatory Compliance Requirements and Their Impact on MDM IT Infrastructure
The regulations that are focused on protecting customer financial and personally identifiable data, as well as the
risks associated with its misuse, include but are not limited to the following:
• The Sarbanes-Oxley Act of 2002 (SOX) defines requirements for the integrity of the financial data and
availability of appropriate security controls.
• The USA Patriot Act includes provisions for Anti-Money Laundering (AML) and Know Your Customer (KYC).
• The Gramm-Leach-Bliley Act (GLBA) mandates strong protection of personal financial information through its
data protection provisions.
• The Basel II and Basel III Capital Requirements define various requirements for operational and credit risks.
• FFIEC guidelines require strong authentication to prevent fraud in banking transactions.
• The Payment Card Industry (PCI) Standard defines the requirement for protecting sensitive cardholder data
inside payment networks.
• California’s SB1386 is a state regulation requiring public written disclosure in situations when a customer file
has been compromised.
• Do-Not-Call and other opt-out preference requirements protect customers’ privacy.
• International Accounting Standards Reporting IAS2005 defines a single, high-quality international financial
reporting framework.
• The Health Insurance Portability and Accountability Act (HIPAA) places liability on anyone who fails to
properly protect patient health information including bills and health-related financial information.
• New York Reg. 173 mandates the active encryption of sensitive financial information sent over the Internet.
• Homeland Security Information Sharing Act (HSISA, H.R. 4598) prohibits public disclosure of certain
information.
• The ISO 17799 Standard defines an extensive approach to achieve information security including
communications systems requirements for information handling and risk reduction.
• The European Union Data Protection Directive mandates the protection of personal data.
• Japanese Protection for Personal Information Act.
• Federal Trade Commission, 16 CFR Part 314 defines standards for safeguarding customer information.
• SEC Final Rule, Privacy of Consumer Financial Information (Regulation S-P), 17 CFR Part 248 RIN 3235-
AH90.
• OCC 2001-47 defines third-party data-sharing protection.
• 17 CFR Part 210 defines rules for records retention.
• 21 CFR Part 11 (SEC and FDA regulations) define rules, for electronic records and electronic signatures.
The Sarbanes-Oxley Act
The Sarbanes-Oxley Act (SOX) addresses a set of business risk management concerns and contains a number of
sections defining specific reporting and compliance requirements.
They require that the company’s CEO/CFO must prepare quarterly and annual certifications that attest that:
• The CEO/CFO has reviewed the report.
• The report does not contain any untrue or misleading statement of a material fact or omit to state a material fact.
• Financial statements and other financial information fairly present the financial condition.
• The CEO/CFO is responsible for establishing and maintaining disclosure controls and has performed an evaluation
of such controls and procedures at the end of the period covered by the report.
• The report, discloses to the company’s audit committee and external auditors:
• Any significant deficiencies and material weaknesses in Internal Control over Financial Reporting (ICFR)
• Protect against any reasonably anticipated threats or hazards to the security or integrity of the data
• Protect against unauthorized access to or use of such data that would result in substantial harm or inconvenience
to any customer
Other Regulatory/Compliance Requirements
USA Patriot Act: Anti-Money Laundering (AML) and Know Your Customer (KYC) Provisions
The USA Patriot Act of 2001 is an abbreviation of the 2001 law with the full name "Uniting and Strengthening
America by Providing Appropriate Tools to Restrict, Intercept and Obstruct Terrorism Act." It requires information
sharing among the government and financial institutions, implementation of programs that are concerned with
verification of customer identity, implementation of enhanced due-diligence programs; and implementation of anti-
money laundering programs across the financial services industry.
USA Patriot Act Technology Impact Business process requirements of the USA Patriot Act include:
• Development of the AML policies and procedures.
• Designation of a compliance officer.
• Establishment of a training program.
• Establishment of a corporate testing/audit function.
• Business units that manage private banking accounts held by noncitizens must identify owners and sources of
funds.
Basel II Capital Accord Technical Requirements
The Basel Committee on Banking Supervision introduced a capital measurement system, commonly referred to as
the Basel Capital Accord. This system addressed the design and implementation of a credit risk measurement
framework for a minimum capital requirement standard.
―Basel II,‖ this capital framework consists of three pillars8:
• Pillar I Minimum capital requirements
• Pillar II Supervisory review of an institution’s internal assessment process and capital adequacy
• Pillar III Effective use of disclosure to strengthen market discipline as a complement to supervisory efforts
FFIEC Compliance and Authentication Requirements
The Federal Financial Institutions Examination Council (FFIEC) issued new guidance on customer authentication
for online banking services. According to the FFIEC guidance, the authentication techniques employed by the
financial institution should be appropriate to the risks associated with those products and services used by the
authenticated users. The new regulation guides banks to apply two major methods:
• Risk assessment Banks must assess the risk of the various activities taking place on their Internet banking site.
• Risk-based authentication Banks must apply stronger authentication for high-risk transactions.
A Customer can interact with the ABC Company across a variety of different Channels, including web self service,
Interactive voice Response, telephone systems, customer Service representatives access via telephone or in person,
in branch interactions and so on. The ABC company stated goal of MDM project is to enable and ensure a consistent
customer experience regardless of the channel or the interaction mode and this experience should be achieved be
creating a holistic view of customer available on demand in realtime and to all channels supported by ABC
Company.
As stated in our use case scenario, a customer can interact with the ABC Company across a variety of different
channels, including Web self-service, Interactive Voice Response (IVR) telephone system, customer service
representative access via telephone or in person, in-branch interactions, and so on.
Let’s start with the reconciliation engine style design approach. This architecture style positions the MDM Data
Hub as a “slave” (reconciliation engine) for the existing data sources and applications. With this background, let’s
consider the specific process steps and Data Hub components involved when a customer interacts with one of the
ABC Company’s customer touch points.
1. At the point of contact, the customer provides his or her identification information (for example, name and address
or the user ID established at the time of enrollment in the service).
2. The message is forwarded to the EMB Transaction Manager. The message gets assigned a unique transaction ID
by the EMB Transaction Manager’s Key Generator.
3. The Transaction Manager can optionally log transaction information, including the transaction type, transaction
originator, time of origination, and possibly some other information about the transaction.
4. The Transaction Manager forwards the message to the Match Suspect Extractor.
5. The Match Suspect Extractor reads the identification parameters in the message and creates an MDM Data Hub
extract with suspect matches. It can also contain records that were selected using various fuzzy matching algorithms.
6. The Match Suspect Extractor sends the extract message to the Match Engine. The Match Engine is an MDM Data
Hub component that performs the match.
7. The Transaction Manager orchestrates other MDM Data Hub services to gather additional information about the
identified customer. The customer identifier is sent to the Enterprise Record Locator, which contains a cross-
reference facility with linkage information between the Data Hub keys and the source system keys.
8. The Transaction Manager invokes the Enterprise Attribute Locator service to identify the best sources of data for
each attribute in the enterprise data model.
9. The Transaction Manager sends inquiry messages to the source systems through the EMB, where it invokes the
Distributed Query Constructor service, which generates a distributed query against multiple systems.
10. The source systems process the query request messages and send the messages to the Response Assembler. The
Response Assembler assembles the messages received from the Data Hub and source systems into the Response
Message.
11. The assembled message is published to the EMB. The Transaction Manager recognizes the message as part of
the transaction that was initiated by the request in step 2. The Transaction Manager returns the message to the
requestor and marks the transaction complete.
12. The ABC Company customer service employee who initiated this transaction in the first place can review the
returned message and take advantage of the accurate and complete information about the customer and his or her
transactions.
13. The Data Hub initiates a change transaction by sending a transaction message to the EMB. The transaction may
result in changes to the critical attributes used for matches.
14. The EMB makes messages available to the source systems in either a push or pull mode. In either case,
operational source systems can accept or reject a change.
3.5.2 Batch Processing
Let’s consider the question of whether the MDM system still needs a batch data synchronization facility that may
use Extract, Transformation, and Load (ETL) processes and components.
The process begins with file extraction from legacy systems. The file extracts are placed in the Loading Area and
loaded into tables. The purpose of this area is to bring all the data onto a common platform. The Loading Area
preserves the data in its original structure (in our example, account centric) and makes the data available for loading
into the staging ares. The following types of transformations occur when the data is loaded into the staging area:
• Core transformations to the customer-centric view. In addition, the staging area must preserve legacy system keys.
They will be used to build the cross-reference Record Locator Service.
• Reference code translations are used to bring the codes into the Data Hub–specific format.
• Data validation occurs and the beginning of exception processing for records that do not comply with established
standards. The records enter the exception processing framework from the staging area.
• The data defaults are checked against declared definitions.
• If data enrichment processing from an external vendor or vendors is in scope, it can optionally be done in the
staging area. Otherwise, data enrichment occurs in the Data Hub.
From the staging area, the data is loaded into the MDM Data Hub. The Data Hub data load performance has to be
compatible with the processing throughput maintained by ETL.
Let’s take a closer look at these components and services.
Legacy System Data Entry Validation Component This component serves as the first barrier preventing
erroneous data entry into the system. Good information management practice shows that it is important to bring
data validation upstream as close to the point of data entry or creation as possible. Data validation components
restrict the formats and range of values entered on the user interface screen for a data entry application.
Legacy System Message Creation and Canonical Message Format Component Each legacy system that needs
to be integrated into the real-time synchronization framework must have components responsible for creating and
publishing the message to the Enterprise Message Bus.
Legacy System Message-Processing Components Each legacy system should be able to receive and process
messages in a canonical format. The processing includes message interpretation and orchestration in terms of native
legacy system functions and procedures.
Message Validation and Translations Message validation components must be able to validate the message
structure and message content (payload). These components should also be “code translation aware” in order to
translate system-specific reference code semantics into enterprise values, and vice versa.
Transaction Manager and Transaction Logging Service As the name implies, the Transaction Manager is
responsible for the execution and control of each transaction. The Transaction Manager registers each transaction
by assigning a transaction identifier (transaction ID).
All transactions in the transaction life cycle are recorded in the Transaction Log using Transaction Logging Service,
regardless of whether they are successful or not. Transaction Log structures, attributes, and service semantics are
defined in the MDM Metadata Repository.
Match Suspect Extractor When a new piece of customer information arrives (new customer record, change in the
existing customer record, or deletion of an existing record), the matching engine needs to receive an extract with
suspected records for which the match groups must be recalculated.
Identity Store The Identity Store maintains the customer data with the superset of records that includes records
from all participating systems in scope. Also, the Identity Store includes the Match Group keys. In a typical MDM
system for a large enterprise, the Identity Store may have hundreds of millions of records.
Change Capture This component is responsible for capturing the record or records that have been changed, added,
or deleted. Pointers to these records are the entry information required by the Match Suspect Extractor to begin
match processing.
Purge, Archival, and Audit Support Purge and archival components are responsible for purging records to get rid
of history records that exceeded the predefined retention threshold. The audit features allow MDM Data Hub
solutions to archive purged records for potential audit or investigation purposes.
Enterprise Record Locator The Enterprise Record Locator contains information about all system source keys and
the Identity Store keys. The Enterprise Record Locator is the key component that stores cross-system reference
information to maintain the integrity of the customer records.
Enterprise Attribute Locator This component enables the data governance group to specify the best trusted source
of data for each attribute in the canonical data model. The Enterprise Attribute Locator stores pointers to the best
source of data, and can be implemented as a subject area within the Metadata Repository. An administrative interface
is required to maintain these pointers. The Enterprise Attribute Locator information can be defined at different levels
of trust:
• System-level trust, when a certain system is selected as the trusted source for all profile data.
• Attribute-level trust, when a single trusted source is defined for each attribute in the canonical data model.
• Trust at the level of attributes and record types, when a single trusted source is defined for each attribute in the
canonical data model with additional dependencies on the record type, account type, and so on.
• Trust at the level of attributes, record types, and timestamps is similar to the trust level defined in the previous
bullet, except that it includes the timestamp of when the attribute was changed. This timestamp attribute is an
additional factor that can impact the best source rules.
Race Condition Controller The Race Condition Controller is responsible for defining what change must prevail
and survive when two or more changes conflict with each other. This component should resolve the conflicts based
on the evaluation of business rules that consider, among other factors, the source of change by attribute and the
timestamp at the attribute level.
Distributed or Federated Query Constructor When the customer data is distributed or federated across multiple
data stores, this component should be able to parse a message and transform it into a number of queries or messages
against legacy systems.
Message Response Assembler Once each of the source systems participating in creating a response successfully
generates its portion of the response message, the complete response message must be assembled for return. This is
the job that the Message Response Assembler is responsible for.
Error Processing, Transactional Integrity, and Compensating Transactions At a high level, there are two ways
to handle transactional errors. The conservative approach enforces all-or-nothing transactional semantics of
atomicity, consistency, isolation, and durability (ACID properties) and requires the entire distributed transaction to
complete without any errors in order to succeed
Hub Master Components In the case of the Hub Master, such as the Transaction Hub, all attributes that are
maintained in the Hub Master must be protected from the changes in the source systems.