Sunday, September 5, 2010

FAQ's on Collect statistics

Collect stats just derives the data demographics of the table. These demographics are useful for optimizer to decide the execution of given query which in turn improves performance.

It collects the information like:
  • total row counts of the table,
  • how many distinct values are there in the column,
  • how many rows per value, is the column indexed,
  • if so unique or non unique etc.

What if collect stats is not done on the table?
Teradata uses a cost based optimizer and cost estimates are done based on statistics.
So if you donot have statistics collected then optimizer will use a Dynamic AMP Sampling method to get the stats. If your table is big and data was unevenly distributed then dynamic sampling may not get right information and your performance will suffer.

How can i know the tables for which the collect stats has been done?
Run the Help Stats command on that table.
this will give you Date and time when stats were last collected. You will also see stats for the columns ( for which stats were defined) for the table

Whenever collect stats is done on the particular  table(say on index/column) where can I find information regarding these entries ?
Collected statistics are stored in DBC.TVFields or DBC.Indexes tables. However, these two tables cannot be queried .

When to collect stats on  tables which have stats ?

1.    Typical guideline is  roughly 10% of the data has changed. (By measuring delta in perm space since last collected.) 
2.    Recollect based on stats that have aged 60-90 days. (say last time stats collected was 2 months ago) .

Please note :
  Collect stats could be pretty resource consuming for large tables. So it is always advisable to schedule the job at off peak period .


  1. collect stats on the table.
    if we delete table , collect stats also removed from table or can we drop it manually?

  2. hi,

    there are two things here
    1. If you delete the table , the old stats still exists as there is no automatic stats updation in Teradata.
    2. If you drop the table, then stats will also be dropped

    Hope this Helps

    Vinay SHet

  3. We generally collect stats to increase confidence levels but in some cases it doesn't impact on the confidence levels while working with multiple tables. Is there any chance to improve the confidence levels?

    1. Hi,

      There are 4 types of statistics collection ,
      Single, multicolumn,Index and Sample .

      Some quick tips I can suggest
      1. make sure stats are collected on INDEX of table.
      2. Multi column stats is not a good option to go for join scenarios as it can bring down the performance of queries
      3. please note that Multicolumn stats information is stored in 16bytes. If combination of columns used in multicolumn is greater than 16bytes , stats are not efficient
      4. Please make sure to collect single column stats on columns which have very high NULL values