Difference Between Teradata Primary Index and Primary Key
Difference Between Teradata Primary Index and Primary Key
Difference Between Teradata Primary Index and Primary Key
EmailShare
One must not get confused between Primary Key and Primary Index in Teradata. Primary KEY is more of a logical thing however Primary INDEX is more of physical thing. In Teradata, Primary INDEX is used for finding best access path for data retrieval and data insertion and Primary KEY is used for finding each rows uniquely just like in other RDBMS. So below are few differences between PRIMARY KEY and PRIMARY INDEX:
PRIMARY KEY
1 2 3 PRIMARY KEY cannot be NULL PRIMARY KEY is not mandatory in Teradata PRIMARY KEY does not help in data distribution.
PRIMARY INDEX
PRIMARY INDEX can be NULL PRIMARY INDEX is mandatory In Teradata PRIMARY INDEX helps in data distribution. PRIMARY INDEX can be UNIQUE (Unique Primary Index)
4 5
Now we will see few scenarios to see how these two are handled in Teradata:
PRIMARY KEY and PRIMARY INDEX in TERADATA a) I have not defined PRIMARY INDEX or PRIMARY KEY on table what will happen now: In this case, Teradata will check if any column is defined as UNIQUE, then it will make that column as UNIQUE PRIMARY INDEX else first column will be created as PRIMARY INDEX. b) I have not defined PRIMARY INDEX however a column is defined as PRIMARY KEY: In this case, Teradata will make the PRIMARY KEY column as UNIQUE PRIMARY INDEX of the table. c) I have defined both PRIMARY KEY and PRIMARY INDEX on different column: In this case, Teradata will make PRIMARY KEY column as UNIQUE SECONDARY INDEX i.e. UNIQUE INDEX on the table. So one must understand the importance of PRIMARY INDEX in Teradata. Generally, PRIMARY KEY concept is taken care by UNIQUE PRIMARY INDEX in Teradata environment.
EmailShare
What is Secondary Index in Teradata ? Secondary Index in Teradata provides an alternate path to retrieve the data. It is used only for data retrieval and it has nothing to do with data storage. For data storage, Teradata use PRIMARY INDEX. So why do we need Secondary Index , when Primary Index is available for Data Storage as well as Data Retrieval ? There may be some situations when the query may not be using Primary Index column for data retrieval. In such cases, data retrieval will be very slow. In such situations we can create Secondary Index on such columns which are not part of PRIMARY Index however are used very often in JOIN conditions or other conditions for data retrieval. Like PRIMARY INDEX, we have two types of SECONDARY Index too: Unique Secondary Index: CREATE UNIQUE INDEX [COLUMN_NAME] ON TABLENAME; Non-Unique Secondary Index: CREATE INDEX [COLUMN_NAME] ON TABLENAME; Creating SECONDARY INDEX may help in performance optimization however it also comes at some cost in terms of resources. Whenever a SECONDARY index is created on table , a subtable is created on all the AMPs which hold following information:
SECONDARY INDEX VALUE || SECONDARY INDEX ROW_ID || PRIMARY INDEX ROW_ID So whenever we query using column defined as SECONDARY INDEX, all AMPs are asked to check for their sub-table if they hold that value. If yes, then AMPs retrieve the corresponding PRIMARY INDEX row_id from their subtable. Then the AMP holding the PRIMARY INDEX row_id is asked to retrieve respective records. Hence, Data Retrieval via Secondary Index is always 2 AMP or more AMP operation. For NUSI [Non Unique Secondary Index] subtable is created in the same AMP holding the PRIMARY row_id. However for USI[Unique Secondary Index], subtables hold the information about rows of different AMPs. Secondary Index avoids
FULL TABLE scan. However one should collect STATS on Secondary Index columns in order to allow Optimizer to use Secondary Index and not Full Table Scan. Advantages of Secondary Index:
Avoids FULL Table Scan by providing alternate data retrieval path. Enhances performances. Can be dropped and created anytime. A table may have multiple Secondary Index defined where as only one Primary Index is permissible.
Needs extra storage space for SUBTABLE. Needs extra I/O to maintain SUBTABLE. Collect STATS is required in order to avoid FULL TABLE SCAN.
To Drop Secondary Index use below command: DROP INDEX [COLUMN_NAME] ON TABLENAME;