Secondary Index
"last but not the least!!" is what i would say when it comes to secondary index. This is very much true since secondary index is concept when it comes to handy when others doesn't!
A secondary index is one which provides alternate access paths to the data thus helping in speeding of the queries.They are usually defined on columns which are more frequently used rather than PI column in joining condition.
A secondary index can be defined as unique secondary index or non unique secondary index.
syntax:
CREATE UNIQUE INDEX (Columns) ON dbname.tablename ;
This creates Unique secondary index on Employees table
CREATE INDEX (Columns) ON dbname.tablename ;
This creates non unique secondary index on employees table.
A worthy tip:
Please make sure to use naming for index when multiple columns are defined . It would be easy to drop a index on that table just by using indexname,
create index SI_emp (firstname,lastname) from employees; --- index created as SI_emp for table employees
Drop index SI_emp on employees; --- easy to drop index by giving index name
So what exactly does this Secondary index do?? Does it hash the data values based on secondary index and redistribute data? No. It doesn't'
It just creates a subtable for defined secondary index , which contains there columns
-secondary index value
-secondary index rowid (hashed value for SI)
-hash value for primary index(pointing to target amp)
This subtable is created on each amp which in turn points to target AMP. Hence Secondary index can be TWO amp operation for USI and MANY amp operation for NUSI. So whenever a SI column is used in a query, based on that column optimizer quickly reaches subtable by way of row hash generated for SI value and using PI row id corresponding to it, .it would reach target amp.
The secondary index Sub table is key to alternate access paths provided in teradata. With this benefit we end up paying the price for extra perm space consumed by subtables and overhead for their maintenance. So think wisely before going for Secondary index or use the beauty of dynamic secondary index concept which supports creation of Secondary index when required and dropping of it when not used hence saving considerable amount of space,
I want to share one of the scenarios where we had a table with set of date columns and Only PI defined on that table, A query was run on particular date range of that table ,which went on for full table scan taking more than 4 hours to complete.
so decided to focus on this date column since it looked like main culprit for performance issue.
Two things strike my mind when date column came into Picture . Should i go for PPI or SI??
1. In case of PPI,
create table with PPI and repopulate data . But this adds to overhead since there was addition of 2 bytes to every row increasing size of the table by 2*n rows.
Second issue was PPI causing redistribution of PI in partitions resulting in Primary index based queries taking more than normal time. Joins happening on PPI columns to non PPI table would result in sliding window join which would further decrease the performance.
Most important of all, how frequently was this date column used in queries . Since frequency of usage was less and seeing drawbacks of PPI , i decided to go on with Secondary index which could be created any time and dropped any time.
Since date column is range based , I created value ordered secondary index as follows
create index (date-column) order by values on employees ;
Stats were collected on to ensure optimizer to pick secondary index. I could see the performance improvement in query after creation of valued ordered NUSI . Please note that Value ordered secondary index can be created only as NUSI and supports date, int,decimal upto 4 bytes.
Please make sure to collect stats on secondary index since optimizer might have to decide whether reading SI subtable is faster than going for full table scan ( reading all the amps)
I want to point out some Limitations of secondary index
1. Secondary index is not compatible with fastload as there is overhead for fastload on duplicate secondary index checking with subtable creation
Drop SI before doing fastload on that table and then recreate them,
2. Table can have upto 32 secondary indices in teradata
3. Multicolumn secondary index can have upto 16 columns defined in a index
4. USI is not supported on Multiload, but NUSI is supported by multiload
5. USI cannot be created on table which already has duplicate data in columns to be defined as part of USI.
6. SI cannot be created on column which is already defined as PI.
"last but not the least!!" is what i would say when it comes to secondary index. This is very much true since secondary index is concept when it comes to handy when others doesn't!
A secondary index is one which provides alternate access paths to the data thus helping in speeding of the queries.They are usually defined on columns which are more frequently used rather than PI column in joining condition.
A secondary index can be defined as unique secondary index or non unique secondary index.
syntax:
CREATE UNIQUE INDEX (Columns) ON dbname.tablename ;
This creates Unique secondary index on Employees table
CREATE INDEX (Columns) ON dbname.tablename ;
This creates non unique secondary index on employees table.
A worthy tip:
Please make sure to use naming for index when multiple columns are defined . It would be easy to drop a index on that table just by using indexname,
create index SI_emp (firstname,lastname) from employees; --- index created as SI_emp for table employees
Drop index SI_emp on employees; --- easy to drop index by giving index name
So what exactly does this Secondary index do?? Does it hash the data values based on secondary index and redistribute data? No. It doesn't'
It just creates a subtable for defined secondary index , which contains there columns
-secondary index value
-secondary index rowid (hashed value for SI)
-hash value for primary index(pointing to target amp)
This subtable is created on each amp which in turn points to target AMP. Hence Secondary index can be TWO amp operation for USI and MANY amp operation for NUSI. So whenever a SI column is used in a query, based on that column optimizer quickly reaches subtable by way of row hash generated for SI value and using PI row id corresponding to it, .it would reach target amp.
The secondary index Sub table is key to alternate access paths provided in teradata. With this benefit we end up paying the price for extra perm space consumed by subtables and overhead for their maintenance. So think wisely before going for Secondary index or use the beauty of dynamic secondary index concept which supports creation of Secondary index when required and dropping of it when not used hence saving considerable amount of space,
I want to share one of the scenarios where we had a table with set of date columns and Only PI defined on that table, A query was run on particular date range of that table ,which went on for full table scan taking more than 4 hours to complete.
so decided to focus on this date column since it looked like main culprit for performance issue.
Two things strike my mind when date column came into Picture . Should i go for PPI or SI??
1. In case of PPI,
create table with PPI and repopulate data . But this adds to overhead since there was addition of 2 bytes to every row increasing size of the table by 2*n rows.
Second issue was PPI causing redistribution of PI in partitions resulting in Primary index based queries taking more than normal time. Joins happening on PPI columns to non PPI table would result in sliding window join which would further decrease the performance.
Most important of all, how frequently was this date column used in queries . Since frequency of usage was less and seeing drawbacks of PPI , i decided to go on with Secondary index which could be created any time and dropped any time.
Since date column is range based , I created value ordered secondary index as follows
create index (date-column) order by values on employees ;
Stats were collected on to ensure optimizer to pick secondary index. I could see the performance improvement in query after creation of valued ordered NUSI . Please note that Value ordered secondary index can be created only as NUSI and supports date, int,decimal upto 4 bytes.
Please make sure to collect stats on secondary index since optimizer might have to decide whether reading SI subtable is faster than going for full table scan ( reading all the amps)
I want to point out some Limitations of secondary index
1. Secondary index is not compatible with fastload as there is overhead for fastload on duplicate secondary index checking with subtable creation
Drop SI before doing fastload on that table and then recreate them,
2. Table can have upto 32 secondary indices in teradata
3. Multicolumn secondary index can have upto 16 columns defined in a index
4. USI is not supported on Multiload, but NUSI is supported by multiload
5. USI cannot be created on table which already has duplicate data in columns to be defined as part of USI.
6. SI cannot be created on column which is already defined as PI.