Tuesday, November 16, 2010

Teradata Performance Tuning - Basic Tips

Performance tuning thumb rules.

Here are very basic steps which are used to PT any given query in given environment . As a pre-requiste , make sure 
- user has proper select rights and actual profile settings
- Enough space available to run and test the queries

1. Run explain plan (pressing F6 or “EXPLAIN sel * …”,)
Then see for potential information like
- No or low confidence
- Product joins conditions
- By way of an all row scan - FTS
- Translate

Also check for
- Distinct or group by keywords in SQL query
- In/ not in keywords and check for the list of values generated for the same

APPROACHES

A. In case of product join scenarios,check for
- Proper usage of alias
- joining on matching columns
- Usage of join keywords - like specifying type of joins (ex. inner or outer )
- use union in case of "OR” scenarios
- Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.

B. collects stats
- Run command "diagnostic help stats on for the session"
- Gather information on columns on which stats has to be collected
- Collect stats on suggestions columns
- Also check for stats missing on PI, SI or columns used in joins - "help stats <databasename>.<tablename>
- Make sure stats are re-collected when at-least 10% of data changes
- remove unwanted stats or stat which hardly improves performance of the queries
- Collect stats on columns instead of indexes since index dropped will drop stats as well!!
- collect stats on index having multiple columns, this might be helpful when these columns are used in join conditions
- Check if stats are re-created for tables whose structures have some changes 

c. Full table scan scenarios
- Try to avoid FTS scenarios as, it might take very long time to access all the data in every amp in the system
- Make sure SI is defined on the columns which are used as part of joins or Alternate access path.
- Collect stats on SI columns else there are chances where optimizer might go for FTS even when SI is defined on that particular column

2. If intermediate tables are used to store results, make sure that
- It has same PI of source and destination table

3. Tune to get the optimizer to join on the Primary Index of the largest table, when possible, to ensure that the large table is not redistributed on AMPS

4. For large list of values, avoid using IN /NOT IN in SQLs. Write large list values to a temporary table and use this table in the query

5. Make sure when to use exists/not exists condition since they ignore unknown comparisons (ex. - NULL value in the column results in unknown) . Hence this leads to inconsistent results

6. Inner Vs Outer Joins
Check which join works efficiently in given scenarios.Some examples are 
- Outer joins can be used in case of large table joining with small tables (like fact table joining with Dimension table based on reference column)
- Inner joins can be used when we get actual data and no extra data is loaded into spool for processing
Please note for outer join conditions:
1. Filter condition for inner table should be present in "ON" condition
2. Filter condition for outer table should be present in "WHERE" condition

1 comment: