Tuesday, December 14, 2010

Teradata Interview questions - Part 6

Teradata  Interview questions  - Part 6
What is a Sparse Index?
Answer:

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

Difference between Stored Procedure and Macro?
Answer:

SP:
It does not return rows to the user.
It has to use cursors to fetch multiple rows
It used Inout/Out to send values to user
It Contains comprehensive SPL
It is stored in DATABASE or USER PERM
A stored procedure also provides output/Input capabilities

Macros:
It returns set of rows to the user.
It is stored in DBC PERM space
A macro that allows only input values

How do you execute the given SQL statement repeatedly in BTEQ?
Answer:

Select top 1* from database.table1;
=n
Here "=n” is to run the previous sql statement, "n" number of times.

What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?
Answer:


1. Partition by CASE
CREATE      TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case1 (
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range  - example using date range
CREATE TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range1 (
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month ,
NO RANGE
        OR  UNKNOWN);

P.S:If  we use NO RANGE or NO CASE  - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition

Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?
Answer:

PPI cannot be defined on PI column in Table.  Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.
In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed  (ROW KEY = ROW ID +PPI value )

What is Cross Join?
Answer:

It is a Teradata specified Join, which is used as equivalent to product join.
There is no “On” clause in case of CROSS join
SELECT  EMP.ename , DPT.Dname
FROM     employee EMP
CROSS JOIN
Department DPT
WHERE
EMp.deptno = DPT.depto ;



What does SLEEP function does in Fast load?
Answer

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions.
Sleep command can be used with all load utilities not only fastload.
This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.


What is TENACITY? What is its default value?
Answer

TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for Fast Load is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.


How to skip 1st record while using Bteq IMPORT?
Answer:

By using SKIP=1 ;   , we can skip first record.
 .import infile=<filename>, skip=1;

How many types of Index are present in teradata?
Answer:

There are 5 different indices present in Teradata
1. Primary Index
      a.Unique primary index
      b. non Unique primary index
2. Secondary Index
      a. Unique Secondary index
      b. non Unique Secondary index
3. Partitioned Primary Index
      a. Case partition (ex. age, salary...)
      b. range partition ( ex. date)
4. Join index
      a. Single table join index
      b. multiple table join index
      c. Sparse Join index ( constraint applied on join index in where clause)
5. Hash index
   

Monday, December 13, 2010

Sparse Index ( SPARSE JOIN INDEX)

It is type of join index which contains a WHERE clause   , that  reduces number of rows participating  from the base tables.
All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

By default , a given join index will have a NUPI defined on the first column if it is not specified in create JOIN index statement.   We can explicitly mention a column to be NUPI for the join Index.

Any combination of AND, OR, IN  (LOGICAL) conditions  can be used along with  WHERE clause in Sparse Index.

Check the following example for Sparse Join Index.
CREATE      JOIN INDEX DEV.testjoin AS
sel  A.DEPTno  ,  b.sal
FROM 
dept  A
INNER JOIN
 employee B
ON  A.deptno=b.dept
WHERE a.deptno=20   --- sparse index
PRIMARY INDEX (sal);
 
In the following DDL for Join INDEX,
- Sal is explicitly mentioned as NUPI
- a filter clause is applied on Dept.DeptNO = 20, which reduces the number of rows  ( an Ideal case of Sparse Index )


P.s:  Join Index never allows a Unique Index to be created.

Wednesday, December 8, 2010

Peformance tuning Tips : LIKE Clause


While tuning queries in Teradata , We take care of major performance issues but ignore small cases which might still cause the query to perform badly.

I wanted to mention about one such case of  LIKE clause , which many people good at performance tuning miss it  assuming like patterns does not harm the performance . But in reality this is not so!!

If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if at all possible.

eg;   LIKE '%STRING%'  will be processed differently compared to LIKE 'STRING%'

If a leading character   'STRING%'   is used in the begining of like clause , the the Qptimizer makes use of an index to perform on query thereby increasig the performance

But if the leading character' in '%STRING%' is a wildcard(say '%') , then the Optimizer will not be able to use an index, and a full table scan  (FTS ) must be run, which reduces performance and takes more time.

Hence it is suggested to go for '%STRING%'   only if STRING is a part of entire pattern say 'SUBSTRING'

Rank vs Row_Number in Teradata

Anyone working on Teradata would be aware of the two most common OLAP functions in Teradata which are used for common use i.e.
  1. Rank()
  2. Row_Number()

  In one of my posts I have mentioned about how to generate sequence number (ex. Surrogate Key) using Row_Number.   Today let us find out the difference in usage of rank and row number functions by taking following scenario on tables (sales_daily)


SELECT    
item_id,
sales_date,
Actual_sales,
RANK() OVER (ORDER  BY Actual_sales DESC)  ,                                  -- RANK function
ROW_NUMBER() OVER (ORDER     BY Actual_sales DESC)                   -- ROW_NUMBER function
FROM   Sales_daily
WHERE    
sales_date BETWEEN DATE  '2010-01-01'     AND DATE '2010-03-01';


The result set is as follows:
Item_id
Sales_date
Actual_sales
Rank
Row_Number()
120
4/1/2010
550
1
1
120
4/2/2017
550
1
2
120
4/2/2020
450
3
3
120
4/2/2006
350
4
4
120
4/2/2027
350
4
5
120
4/1/2005
350
4
6
120
4/1/2003
250
7
7
120
4/2/2003
250
7
8
120
4/1/2025
200
9
9
120
4/1/2002
200
9
10
120
4/1/2021
150
11
11
120
4/2/2001
150
11
12
120
4/1/2001
150
11
13
120
4/1/1931
100
14
14


In the result set, we can observe the following,  
1)       Rows 1 and 2 have same rank assigned  since that have same actual_sales value
2)       Rows 1 and 2 have different row numbers assigned even though same actual_sales value
      3)   Row 3 has rank “3” assigned as ranking positions

Hence , 
rows are always assigned an incremented sequence number with ROW_NUMBER.
Rows are always assigned the same number with RANK function.