Tuesday, December 14, 2010

Teradata Interview questions - Part 6

Teradata  Interview questions  - Part 6
What is a Sparse Index?
Answer:

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

Difference between Stored Procedure and Macro?
Answer:

SP:
It does not return rows to the user.
It has to use cursors to fetch multiple rows
It used Inout/Out to send values to user
It Contains comprehensive SPL
It is stored in DATABASE or USER PERM
A stored procedure also provides output/Input capabilities

Macros:
It returns set of rows to the user.
It is stored in DBC PERM space
A macro that allows only input values

How do you execute the given SQL statement repeatedly in BTEQ?
Answer:

Select top 1* from database.table1;
=n
Here "=n” is to run the previous sql statement, "n" number of times.

What are types of PARTITION PRIMARY INDEX (PPI) in Teradata?
Answer:


1. Partition by CASE
CREATE      TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY case1 (
            order_total < 10000 ,
            order_total < 20000 ,
            order_total < 30000,
NO           CASE     OR        UNKNOWN ) ;

2. Partition by Range  - example using date range
CREATE TABLE ORDER_Table
(
ORD_number  integer NOT NULL,
customer_number integer NOT NULL,
order_date  date ,
order_total integer
)
PRIMARY INDEX (customer_number)
PARTITION BY range1 (
            Order_date BETWEEN date '2010-01-01'       AND      date '2010-12-01'
            EACH interval '1' month ,
NO RANGE
        OR  UNKNOWN);

P.S:If  we use NO RANGE or NO CASE  - then all values not in this range will be in a single partition.
If we specify UNKNOWN, then all null values will be placed in this partition

Can we define PARTITION PRIMARY INDEX (PPI) on a Primary Index column in Table? Explain Why?
Answer:

PPI cannot be defined on PI column in Table.  Since PI is used to locate an AMP and store the data based on HASH value (ROW ID ) it cannot be used as PPI column.
In case of PPI , The data stored in AMP's are Partitioned based on PPI column after they are row hashed  (ROW KEY = ROW ID +PPI value )

What is Cross Join?
Answer:

It is a Teradata specified Join, which is used as equivalent to product join.
There is no “On” clause in case of CROSS join
SELECT  EMP.ename , DPT.Dname
FROM     employee EMP
CROSS JOIN
Department DPT
WHERE
EMp.deptno = DPT.depto ;



What does SLEEP function does in Fast load?
Answer

The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions.
Sleep command can be used with all load utilities not only fastload.
This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.


What is TENACITY? What is its default value?
Answer

TENACITY specifies the amount of time in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The default for Fast Load is “no tenacity”, meaning that it will not retry at all. If several FastLoad jobs are executed at the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon for the number of sessions requested for up to four hours.


How to skip 1st record while using Bteq IMPORT?
Answer:

By using SKIP=1 ;   , we can skip first record.
 .import infile=<filename>, skip=1;

How many types of Index are present in teradata?
Answer:

There are 5 different indices present in Teradata
1. Primary Index
      a.Unique primary index
      b. non Unique primary index
2. Secondary Index
      a. Unique Secondary index
      b. non Unique Secondary index
3. Partitioned Primary Index
      a. Case partition (ex. age, salary...)
      b. range partition ( ex. date)
4. Join index
      a. Single table join index
      b. multiple table join index
      c. Sparse Join index ( constraint applied on join index in where clause)
5. Hash index
   

Monday, December 13, 2010

Sparse Index ( SPARSE JOIN INDEX)

It is type of join index which contains a WHERE clause   , that  reduces number of rows participating  from the base tables.
All types of join indexes, including single table, multitable, simple or aggregate can be sparse.

By default , a given join index will have a NUPI defined on the first column if it is not specified in create JOIN index statement.   We can explicitly mention a column to be NUPI for the join Index.

Any combination of AND, OR, IN  (LOGICAL) conditions  can be used along with  WHERE clause in Sparse Index.

Check the following example for Sparse Join Index.
CREATE      JOIN INDEX DEV.testjoin AS
sel  A.DEPTno  ,  b.sal
FROM 
dept  A
INNER JOIN
 employee B
ON  A.deptno=b.dept
WHERE a.deptno=20   --- sparse index
PRIMARY INDEX (sal);
 
In the following DDL for Join INDEX,
- Sal is explicitly mentioned as NUPI
- a filter clause is applied on Dept.DeptNO = 20, which reduces the number of rows  ( an Ideal case of Sparse Index )


P.s:  Join Index never allows a Unique Index to be created.

Wednesday, December 8, 2010

Peformance tuning Tips : LIKE Clause


While tuning queries in Teradata , We take care of major performance issues but ignore small cases which might still cause the query to perform badly.

I wanted to mention about one such case of  LIKE clause , which many people good at performance tuning miss it  assuming like patterns does not harm the performance . But in reality this is not so!!

If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if at all possible.

eg;   LIKE '%STRING%'  will be processed differently compared to LIKE 'STRING%'

If a leading character   'STRING%'   is used in the begining of like clause , the the Qptimizer makes use of an index to perform on query thereby increasig the performance

But if the leading character' in '%STRING%' is a wildcard(say '%') , then the Optimizer will not be able to use an index, and a full table scan  (FTS ) must be run, which reduces performance and takes more time.

Hence it is suggested to go for '%STRING%'   only if STRING is a part of entire pattern say 'SUBSTRING'

Rank vs Row_Number in Teradata

Anyone working on Teradata would be aware of the two most common OLAP functions in Teradata which are used for common use i.e.
  1. Rank()
  2. Row_Number()

  In one of my posts I have mentioned about how to generate sequence number (ex. Surrogate Key) using Row_Number.   Today let us find out the difference in usage of rank and row number functions by taking following scenario on tables (sales_daily)


SELECT    
item_id,
sales_date,
Actual_sales,
RANK() OVER (ORDER  BY Actual_sales DESC)  ,                                  -- RANK function
ROW_NUMBER() OVER (ORDER     BY Actual_sales DESC)                   -- ROW_NUMBER function
FROM   Sales_daily
WHERE    
sales_date BETWEEN DATE  '2010-01-01'     AND DATE '2010-03-01';


The result set is as follows:
Item_id
Sales_date
Actual_sales
Rank
Row_Number()
120
4/1/2010
550
1
1
120
4/2/2017
550
1
2
120
4/2/2020
450
3
3
120
4/2/2006
350
4
4
120
4/2/2027
350
4
5
120
4/1/2005
350
4
6
120
4/1/2003
250
7
7
120
4/2/2003
250
7
8
120
4/1/2025
200
9
9
120
4/1/2002
200
9
10
120
4/1/2021
150
11
11
120
4/2/2001
150
11
12
120
4/1/2001
150
11
13
120
4/1/1931
100
14
14


In the result set, we can observe the following,  
1)       Rows 1 and 2 have same rank assigned  since that have same actual_sales value
2)       Rows 1 and 2 have different row numbers assigned even though same actual_sales value
      3)   Row 3 has rank “3” assigned as ranking positions

Hence , 
rows are always assigned an incremented sequence number with ROW_NUMBER.
Rows are always assigned the same number with RANK function.

Thursday, November 25, 2010

Types of Teradata Joins

Teradata joins

When we join two or more tables on a column or set of columns, Joining takes place. This will result in data resulting from matching records in both the tables. This Universal concept remains the same for all the databases.
In Teradata, we have Optimizer (a very smart Interpreter), which determines type of join strategy to be used based on user input taking performance factor in mind.

In Teradata, some of common join types are used like
- Inner join (can also be "self join" in some cases)
- Outer Join (Left, Right, Full)
- Cross join (Cartesian product join)

When User provides join query, optimizer will come up with join plans to perform joins. These Join strategies include
- Merge Join
- Nested Join
- Hash Join
- Product join
- Exclusion Join


 Merge Join
--------------------

Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen based on row hash of the columns involved in the joins WHERE Clause.
            If two tables to be joined have same primary Index, then the records will be present in Same AMP and Re-Distribution of records is not required.

There are four scenarios in which redistribution can happen for Merge Join
Case 1: If joining columns are on UPI = UPI, the records to be joined are present in Same AMP and redistribution is not required. This is most efficient and fastest join strategy
Case 2: If joining columns are on UPI = Non Index column, the records in 2nd table has to be redistributed on AMP's based on data corresponding to first table.
Case 3: If joining columns are on Non Index column = Non Index column , the both the tables are to be redistributed so that matching data lies on same amp , so the join can happen on redistributed data.  This strategy is time consuming since complete redistribution of both the tables takes across all the amps
Case 4: For join happening on Primary Index, If the Referenced table (second table in the join) is very small, then this table is duplicated /copied on to every AMP.

 Nested Join
-------------------
Nested Join is one of the most precise join plans   suggested by Optimizer .Nested Join works on UPI/USI used in Join statement and is used to retrieve the single row from first table . It then checks for one more matching rows in second table based on being used in the join using an index (primary or secondary) and returns the matching results.

Example:

Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP ,
DEPARTMENT DEP
Where EMP.Enum = DEP.Enum
and EMp.Enum= 2345;    -- this results in nested join

 Hash join
----------------
Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash Join to be close relative of Merge based on its functionality. In case of merge join, joining would happen in same amp.   In Hash Join, one or both tables which are on same amp are fit completely inside the AMP's Memory   . Amp chooses to hold small tables in its memory for joins happening on ROW hash.

Advantages of Hash joins are
1. They are faster than Merge joins since the large table doesn’t need to be sorted.
2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so quickly.

 Exclusion Join
-------------------------

These type of joins are suggested by optimizer when following are used in the queries
- NOT IN
- EXCEPT
- MINUS
- SET subtraction operations


Select EMP.Ename , DEP.Deptno, EMP.salary
from
EMPLOYEE EMP
WHERE EMP.Enum NOT IN
( Select Enum from
DEPARTMENT DEP
where Enum is NOT NULL );

Please make sure to add an additional WHERE filter “with <column> IS NOT NULL” since usage of NULL in a NOT IN <column> list will return no results.

Exclusion join for following NOT In query has 3 scenarios
Case 1: matched data in "NOT IN" sub Query will disqualify that row
Case 2: Non-matched data in "NOT IN" sub Query will qualify that row
Case 3: Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a typical example of this scenario). 

How to Export without junk characters in FastExport?

How to Export without junk characters in FastExport?

Solution1: Create a small OUTMOD to strip off the two-byte VARCHAR length field.

Solution2: Another alternative if you are in a UNIX environment is to use AWK or SED or CUT the file after it has been output to strip the leading to characters from each record.

STEP1: Write the query in FASTEXPORT file as follows

SELECT '|'||Col1||';'||Col2 from TAB1; -- Here '|' pipe operator is used as delimiter.
So result generated will be as follows
^A|Col1;Col2

"^A" is a junk value generated during EXPORT.

STEP2: Write UNIX script and pass filename as first parameter

#####UNIX SOLUTION #############
##create a file trim.ksh ############
## Run the file trim.ksh <user_filename>###

FileName=$1
cat $FileName | cut -f2- -d '|' > $FileName.tmp
mv $FileName.tmp $Join_Export.txt1

The Result after cutting all junk characters till occurrence of PIPE operation will be as follows
Col1;Col2

Solution 3: by casting Entire result to CHAR

When we are exporting a table in record mode , Please make sure that data exported does not contain extra 2 characters by casting entire result to CHAR().

For example: if table1 has 2 columns, Col1 is of type char (6), Col2 is of type char (2)
Write the query as follows:

Select cast (c1 || c2) as char (8) from table1;

Note: If we write query like “select c1 || c2 from table1",
The records exported will contain 2 byte length at the beginning, because concatenation returns VARCHAR () type not CHAR () type.