Wednesday, October 27, 2010

Teradata interview questions PART 4

Teradata interview questions PART 4
Teradata interview questions PART 4 consolidated over period of time from various sources.

Explain Different types of temporary tables in Teradata
Answer:

Derived temporary tables
  # Derived tables are local to an SQL query.
  # Not included in the DBC data dictionary database, the definition is kept in cache.
  # They are specified on a query level with an AS keyword in an sql statement

Volatile temporary tables
  # Local to a session ( deleted automatically when the session terminates)
  # Not included in the DBC data dictionary database and table definition is stored in cache. However, the volatile tables need to have unique names across the session.
  # Created by the CREATE VOLATILE TABLE sql statement

Global temporary tables
  # Global tables are local to a session and deleted automatically when the session terminates
  # A permanent table definition is stored in the DBC data dictionary database (DBC.Temptables)
  # defined with the CREATE GLOBAL TEMPORARY TABLE sql


How do you find out number of AMP's in the Given system
Answer:
 
Select HASHAMP () +1;


List types of HASH functions used in teradata?
Answer:

There are HASHROW, HASHBUCKET, HASHAMP and HASHBAKAMP.
The SQL hash functions are:
  • HASHROW (column(s))
  • HASHBUCKET (hashrow)
  • HASHAMP (hashbucket)
  • HASHBAKAMP (hashbucket)

Example:
SELECT
            HASHROW ('Teradata')   AS "Hash Value"
            , HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"
            , HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"
            , HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata')))  AS "AMP Fallback Num" ;


What are the basic criteria to select Primary Index column for a given table?
Answer:
A thumb rule of ADV demographics is followed.
Access Demographics
Identify index candidates that maximize one-AMP operations.
Columns most frequently used for access (Value and Join).

Distribution Demographics
Identify index candidates that optimize parallel processing.
Columns that provide good distribution.

Volatility Demographics
Identify index candidates with low maintenance I/O.

What are the basic rules that define how PI is defined in Teradata?
Answer:

The following rules govern how Primary Indexes in a Teradata Database must be defined as well as how they function:

One Primary Index per table.
A Primary Index value can be unique or non-unique.
The Primary Index value can be NULL.
The Primary Index value can be modified.
The Primary Index of a populated table cannot be modified.
A Primary Index has a limit of 64 columns.  


How to make sure Optimizer chooses NUSI over Full Table Scan (FTS)?
Answer:

A optimizer would prefer FTS over NUSI, when there are no Statistics defined on NUSI columns.
It is always suggested to collect statistics whenever NUSI columns are defined on the table.
Verify whether index is being used by checking in Explain plan.


What are the advantages and dis-advantages of secondary Indexes?
Answer:

Advantages:
1. A secondary index might be created and dropped dynamically
2.A table may have up to 32 secondary indexes.
3. Secondary index can be created on any column. .Either Unique or Non-Unique
4. It is used as alternate path or Least frequently used cases.  ex. defining SI on non indexed column can improve the performance, if it is used in  join or filter condition of a given query.
5. Collecting Statistics on SI columns make sure Optimizer 

Disadvantages
1. Since Sub tables are to be created, there is always an overhead for additional spaces.
2. They require additional I/Os to maintain their sub tables.
3. The Optimizer may, or may not, use a NUSI, depending on its selectivity.
4. If the base table is Fallback, the secondary index sub table is Fallback as well.
5. If statistics are not collected accordingly, then the optimizer would go for Full Table Scan.


When should the statistics be collected?
Answer:
Here are some excellent guidelines on when to collect statistics:
·       All Non-Unique indices
·       Non-index join columns
·       The Primary Index of small tables
·       Primary Index of a Join Index
·       Secondary Indices defined on any join index
·       Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions
·       Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.
Where does TD store transient journal?
Answer:

In perm space -> dbc.transientjournal
But that special table can grow over dbc's perm limit until the whole system runs out of perm space.
How can you find the Teradata Release and Version from Data Dictionary Table?
Answer:

SELECT * FROM DBC.DBCINFO;   
    

How can you determine I/O and CPU usage at a user level in Teradata?
Answer:

SELECT ACCOUNTNAME, USERNAME, SUM (CPUTIME) AS CPU, SUM (DISKIO) AS DISKIO
FROM DBC.AMPUSAGE
GROUP           BY 1,2
ORDER            BY 3 DESC;


How can you track Login Parameters of users in Teradata?
Answer:
SELECT LOGDATE, LOGTIME, USERNAME, EVENT FROM DBC.LOGONOFF;

Difference between Create table (copy) and Create table (select)


Difference between Create table (copy) and Create table (select)

When ever we need to create a copy of existing table we tend to use create table(copy ) from existing table or Create table ( select) from existing table.

Many may ignore the difference in running of create table in two different ways assuming the structure created to  be same.  But in actual case, it is  not so!!
let us try out two type of create table types using examples to understand the differences.

Create a table Check123 which include not null ,default ,UPI and USI definations in it

SHOW   TABLE check123;
/*
CREATE SET TABLE check123 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER DEFAULT 0 ,
     int12 INTEGER NOT NULL DEFAULT 0 ,
      int2 INTEGER NOT NULL,
      int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/
 
Step1:   Create table Check_COPY  from Check123 using CREATE TABLE (COPY ) method

CREATE   TABLE check_COPY AS check123 WITH no data ;
Run show table command  to check for table structure
SHOW   TABLE check_COPY;
/*
CREATE SET TABLE  check_COPY ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER DEFAULT 0 ,
      int12 INTEGER NOT NULL DEFAULT 0 ,
      int2 INTEGER NOT NULL,
      int3 INTEGER NOT NULL)
UNIQUE PRIMARY INDEX prim1 ( int3 )
UNIQUE INDEX uniq1 ( int2 );
*/

From the following observation we can understand that the table created using COPY method will retain all datatypes and index definations like UPI and NUPI

Step2:   Create table Check_SELECT  from Check123 using CREATE TABLE (COPY ) method

CREATE      TABLE Check_SELECT AS
( sel * FROM   check123 ) WITH no data ;

Run show table command  to check for table structure
SHOW   TABLE Check_SELECT;
/*
CREATE SET TABLE Check_SELECT ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT
     (
      int1 INTEGER,                          --DEFAULT MISSING
      int12 INTEGER,                -- DEFAULT and NOTNULL  MISSING
      int2 INTEGER,                          -- NOTNULL  MISSING
      int3 INTEGER)                          -- NOTNULL  MISSING
PRIMARY INDEX ( int1 );
 */

Hence  when table is created using  CREATE TABLE using SELECT from table method, the table created will not retain following from original table
·                     DEFAULT
·                     NOT NULL
·                     UNIQUE PRIMARY INDEX
·                     UNIQUE INDEX

Monday, October 25, 2010

distinct vs group by in Teradata .

There is always a debate going on when finding out unique values in a given table.  The problem comes into picture when we use Group by or distinct to find it.
Both return same number of rows , but with some execute  time difference between them. Execution time is always a very important factor considering performance as one of the major factors is teradata warehouse.

So which is more efficient ?? DISTINCT or GROUP BY???

Since DISTINCT redistributes the rows immediately, more data may move between the AMPs, where as  GROUP BY that only sends unique values between the AMPs.
So, we can say that  GROUP BY sounds more efficient. 
    But when you assume that data is nearly unique in a table, GROUP BY will  spend more time attempting to eliminate duplicates that do not exist at all.Therefore, it is wasting its  time to check for duplicates the first time. Then, it must redistribute the same amount of data .

Let us see how these steps are used in each case for elimination of Duplicates
(can be found out using explain plan)

DISTINCT
1. It reads each row on AMP
2. Hashes the column value identified in the distinct clause of select statement.
3. Then redistributes the rows according to row value into appropriate AMP
4. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value

P.s: There are cases when "Error : 2646 No more Spool Space " . In such cases try using GROUP BY.

GROUP BY
1. It reads all the rows part of GROUP BY
2. It will remove all duplicates in each AMP for given set of values using "BUCKETS" concept
3. Hashes the unique values on each AMP
4. Then it will re-distribute them to particular /appropriate AMP's
5. Once  redistribution is completed , it
    a. Sorts data to group duplicates on each AMP
    b. Will remove all the duplicates on each amp and sends the original/unique value


Hence it is better to  go for

  • GROUP BY  -  when Many duplicates
  • DISTINCT        -  when few or no duplicates
  • GROUP BY -  SPOOL space is exceeded

Thursday, October 21, 2010

Teradata HELP commands

Teradata provides one of the user friendly featured like HELP commands.

Let us see the various help commands and their uses.

HELP SESSION;   
This command is used to display the user name, account name, logon date and time, current database name, collation code set and character set being used and also , transaction semantics, time zone and character set data.

HELP DATABASE <database-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a database and table comments

HELP USER <user-name> ;   
This command is used to display the names of all the tables (T), views (V), macros (M), and triggers (G) stored in a user area and table comments

HELP TABLE <table-name> ;   
This command is used to display the column names, type identifier, and any user written comments on the columns within a table.

HELP VOLATILE TABLE ;   
This command is used to display the names of all Volatile temporary tables active for the current  user session.

HELP VIEW <view-name> ;   
This command is used to display the column names, type identifier, and comments on the columns within a view.

HELP MACRO <macro-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP TRIGGER <trigger-name> ;   
This command is used to display details created for a trigger, like action time and sequence.

HELP PROCEDURE <procedure-name> ;   
This command is used to display the characteristics of parameters passed to it at execution time.

HELP COLUMN <table-name>.*  OR  HELP COLUMN <view-name>.*  OR HELP COLUMN <table-name>.<column-name>, .…;   
This command is used to display detail data describing the column level characteristics.

HELP INDEX <table-name> ;   
This command is used to display the indexes and their characteristics like unique or non-unique and the column or columns involved in the index. This information  is used by the Optimizer to create a plan for SQL.

HELP STATISTICS <table-name> ;   
This command is used to display values associated with the data demographics collected on the table. This information is used by the Optimizer to create a plan for SQL.

HELP CONSTRAINT <table-name>.<constraint-name> ;   
This command is used to display the checks to be made on the data when it is inserted or updated and the columns are involved.

HELP 'SQL';   
This command is used to display a list of all  available SQL commands and functions.

HELP 'SQL  <command>';   
This command is used to display the basic syntax and options for the SQL command used in place of the <command>.

HELP 'SPL';   
This command is used to display a list of available SPL commands.

HELP 'SPL <command>';   
This command is used to display the basic syntax and options for the SPL command used in place of the <command>.

Sunday, October 17, 2010

Working around with Transposition of Table data

I was working around with some transposition  and want to share one of the samples .

Consider the customer table ,having customer and month details
customer...... month
Ron................ 1
Kev................. 2
joh................. 1
Nel................. 2
Ave................. 11
Cin................. 10
tra................. 3

Case statement play very important role in transposition of rows to columns and viceversa. In the following scenarios , we can find the extensive usage of case statement 

Scenario 1:
Display total number of customers for each month

jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
2......2......1......0......0......0......0......0......0......1......1......0....

The sql query is as follows:

sel
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE ;


Scenario 2:
Display customer and month details with every customer mapped to corresponding month

customer....jan....feb....mar....apr....may....jun....jul....aug....sep....oct....nov....dec
Ron...........1......0......0......0......0......0......0......0......0......0......0......0....
Kev...........0......1......0......0......0......0......0......0......0......0......0......0....
joh...........1......0......0......0......0......0......0......0......0......0......0......0....
Nel...........0......1......0......0......0......0......0......0......0......0......0......0....
Ave...........0......0......0......0......0......0......0......0......0......0......1......0....
Cin...........0......0......0......0......0......0......0......0......0......1......0......0....
Tra...........0......0......1......0......0......0......0......0......0......0......0......0....

The sql query is as follows:

sel
customer,
count(case when month = '1' then customer else null end) "jan",
count(case when month = '2' then customer else null end) "feb",
count(case when month = '3' then customer else null end) "mar",
count(case when month = '4' then customer else null end) "apr",
count(case when month = '5' then customer else null end) "may",
count(case when month = '6' then customer else null end) "jun",
count(case when month = '7' then customer else null end) "jul",
count(case when month = '8' then customer else null end) "aug",
count(case when month = '9' then customer else null end) "sep",
count(case when month = '10' then customer else null end) "oct",
count(case when month = '11' then customer else null end) "nov",
count(case when month = '12' then customer else null end) "dec"
from CUST_TABLE;

Find column type using built in function!!!

Did you know that datatype of column could be found  using a built in function called "TYPE" .

Till I came across this function, I used to follow the ancient method of digging into Data  Dictionaries(dbc.columns) to find the type of given column.

There are some cases in which people use CASE function to find the match for type of the column and based on which they do  execute steps.
E.g:  Check for column as timestamp , if yes extract date or so...

To avoid usage of dbc.columns table , we can use Type(Column Name)  function

SELECT TYPE(TAB1.COL1);

Type(COL1)
-----------
INTEGER

Calculate average on Time field

Strange, but yes we can calculate average on time fields. Consider a scenario in which a table has following structure

City            Mode        Start-time             end-time
char(10)    char(10)     char(5)(HH:MI)    char(5)(HH:MI)
------------------------------------------------------------------
Paris          Taxi                04:55                09:33
Rome        Taxi                02:14                 08:44
Madrid      Shuttle            01:21                 03:19

The requirement is to calculate average end-time from the table.

Since end-time is char (5), this has to be converted into Time format HH:MI:SS , before calculating the average

When A cast is applied as follows
cast(end-time||':00' as time) and then try to get an average , We will get following error

Error - "invalid operation on an ANSI date-time or interval"

To fix, this issue the concept of interval is applied.

Select CITY, avg (cast(trim(end-time)||':00' as interval hour(4) to second (0) ))
from TRAVEL group by CITY ;

Hence, using interval, we can override the ANSI time error and calculate average on time field

Thursday, October 7, 2010

Teradata Timestamp tips & tricks - 3


Tip1: Extracting month from date as 'MMM'

select current_date (format 'MMM');

Date
----
Oct

 Please use " sel cast ((cast(current_date as format 'mmm')) as char(3));   "  while querying on Sql Assistant(queryman)


Tip2: Converting number of seconds to Minutes and Seconds

If i want to convert the input of seconds into minutes+seconds,    following query is used
ex: 300 seconds --> 0005:00  or 5:00 mins

create table table1
( second_s integer );
insert into table1 (300);

Select second_s * INTERVAL '0000:01' MINUTE TO SECOND from Table1;

Please note that ,if seconds interval is more then 9999(approx 7 days) , it will probably result in a "7453: Interval field overflow".


Tip3: Converting number of seconds to Minutes and Seconds
Similarly , We can convert seconds to Day to second format as well!!

Select second_s* INTERVAL '0000 00:00:01' DAY TO SECOND from table1;

This will work for up to 27 years  after which it results in "7453: Interval field overflow" error.


Tip4: How to add/subtract minutes to timestamp!!

Problem: To the given timestamp value, how can we add or subtract given minutes.
ex:
 Timestamp1                          Minutes1         Result 
2010-01-04 02:10:08   (+)    330               = 2010-01-04 07:40:08
2010-01-02 18:04:32   (-)     245               = 2010-01-02 13:59:32

create table Table_1
(Timestamp1 timestamp(0) ,Minutes1 integer );
insert into table_1 values('2010-01-04 02:10:08',330);
insert into table_1 values('2010-01-02 18:04:32',245);

select Timestamp1 + (Minutes1 * interval '1' minute) from table_1;
select Timestamp1 - (Minutes1 * interval '1' minute) from table_1;

How to insert a string having quotes into a column?

Problem:
Say there are some values coming for given column say comments  like
'It's a wonderful day'
'I don't know '

As we know that teradata  considers anything between two quotes to be 1 statement . But what if we need to insert the statement having the quotes as apostrophe in it?(above examples)

CREATE SET TABLE SAMPLE1
(
col1 VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( col1 );

To insert these values , I need to use insert statements as below.
insert into SAMPLE1 values('It"s a wonderful day');
insert into SAMPLE1 values('I don"t know ');

In case of teradata, two consequent apostrophe    are used instead of one’

Can i Generate random numbers in Teradata?

There is a scenario in which I need to generate RANDOM numbers for a given column (integer ). In teradata we can use a function called as RANDOM(min_value, max_value) .

This is implemented in sql as follows.

Select RANDOM(1,500) from table;

Please note:    There might be cases where Random function can produce duplicate values. SO while designing tables , make sure that the column making use of RANDOM is not part of UNIQUE PRIMARY INDEX




How to find tables that have a PPI

In teradata we use lot of data dictionary tables (dbc)  to find definition/details about anything related to data.

How to find the tables which have PPI(partitioned primary index) ?
We make use of DBC.indices view to find any details regrading index.

Query :


SEL * FROM dbc.indices
WHERE indextype = 'Q'          ---- 'Q' means Partitioned Primary Index
AND databasename = 'DATABASENAME' ;

Infact , DBC.Indices can be used to find out various types of index associated with tables in a given database.

Sunday, October 3, 2010

which is faster ? select * from table or select 'all Columns' from table ??

Many of us would have come across a scenario where  listing column names in select SQL  was found to be faster then using select * from table  command . This indeed is interesting .


The reason being ,

In case of using "select * from table" , A extra stage is added  where * is replaced by column names by teradata  and then it would fetch the data .

But using "select <all Columns > from table "  eliminates this extra stage of verifying and fetching on columns from the table.

Hence it is always recommended to use "select <all Columns > from table "

How to Convert column data to rows in teradata

Problem :
There is a requirement to convert column data into rows.
Consider the country table with data as follows

Table has 4 columns (country , value1,value2,value3)
NZ  50  60  70
AUS 110 120 130

The output should be displayed in following manner
country    value
NZ  50
NZ  60
NZ  70
AUS 110
AUS 120
AUS 130

Resulting query:
select country, value1 as value from T_country
union all
select country, value2 as v from T_country
union all
select country, value3 as v from T_country;

There might be various  approaches to this problem.Union was found out to be efficient and served the purpose.

Materialised view in teradata? Is it possible?

Most of the databases uses concept of materialised views  (which stores data ). Sadly, in case of teradata  there is no such concept of materialised views.

The closest option of having materialsed view in case of teradata is by using JOIN index (or aggregate index) .  A view can be created on join INDEX  . The command used to create index is as follows

 "CREATE JOIN INDEX...(Select table options)".

When Join Index is executed, it stores data as a table in spool space, hence making the join much faster

 Please Note:
There are drawbacks on  Join Indices.
1. Optimizer would determine whether The index is beneficial. If yes then it invokes it!
2.As the join index cannot be accessed directly and a view  is created that looks like the join index . However, this approach does not guarantee that the join index will be used when view is called.