Thursday, November 18, 2010

Teradata interview questions PART 5

Which is faster – MultiLoad delete or Delete command?

MultiLoad delete is faster then normal Delete command, since the deletion happens in data blocks of 64Kbytes, where as delete command deletes data row by row.  Transient journal maintains entries only for Delete command since Teradata utilities doesn’t support Transient journal loading.

How to Skip or Get first and Last Record from Flat File through MultiLoad?

In .IMPORT command in Mload we have a option to give record no. from which processing should begin. i.e. ‘FROM m’ ‘m’ is a logical record number, as an integer, of the record in the identified data source where processing is to begin. You can mention ’m’ as 2 and processing will  start from second record.

THRU k and FOR n are two options in the same Mload command, functions same towards the end of the processing.

Adding to the above, if from n"start record" and for n "stop record" are not mentioned, mload considers records from   start till the end of the file.

What do High confidence, Low confidence and No confidence mean in EXPLAIN plan?

Explain gives the execution strategy means what are the different steps that the query will go through.

HIGH CONFIDENCE:  Statistics are collected.
LOW CONFIDENCE:   Statistics are not collected.  But the where condition is having the condition on indexed column. Then estimations can be based on sampling.
NO CONFIDENCE: Statistics are not collected and the condition is on non indexed column.

What is the default join strategy in Teradata???
# 1
The Teradata Cost based optimizer will decide the join strategy based on the optimum path. The common strategies followed are from Merge, Hash & Nested Join

# 2 
Three strategies followed by optimizer are:
1. Duplication of rows of one table to every amp
--> This one is opted by optimizer when the non-PI column is on a small table.
2. Redistribution of the non PI rows of the table to the amp containing the matching PI row amp.
--> This one is opted when the tables are large. The non PI column is in one table is redistributed to the amp containing the matching PI.
3. Redistribute both the rows of the table by hash values.
--> This is chosen when the join is on a column that is not the PI in either table. This will require the most spool space.

Can we take collect stats on Derived Tables and Volatile tables and Temporary tables?
What is Global Temporary table?
What is the use of this Global Temporary table?

1. No for volatile and derived tables and yes for global tables.
2. Global tables are temp tables like volatile tables but unlike volatile tables, their definitions are retained in dd.
3. It is used whenever there is a need for a temporary table with same table definition for all users.

How to handle nulls in Teradata???
How many columns can be there in a table???
How to find average sal with out using avg function????
1. Use zeroifnull, nullifzero in select and NULL in insert directly.
2. 256 columns max per table.
3.  Without using "avg" we can find the avg salary by using sum (sal)/count (sal);
  sel sum(sal)/count(sal) as avgsal from tablename

Why Fload doesn’t support multiset table?

Fload does not support Multiset table because of following reason.

Say, the fastload job fails. Till the fastload failed, some number of rows was sent to the AMP's.
Now if you restart FLOAD, it would start loading record from the  last checkpoint and some of the consecutive rows are sent for the second time. These will be caught as duplicate rows are found after sorting of data.
   This restart logic is the reason that Fastload will not load duplicate rows into a MULTISET table. It assumes they are duplicates because of this logic. Fastload support Multiset table but does not support the duplicate rows. Multiset tables are tables that allow duplicate rows. When Fastload finds the duplicate rows it discards it. Fast Load can load data into multiset table but will not load the duplicate rows.

How many types of Skew exist?

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists. Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?
When data is not distributed equally on all the AMPs.
-Cpu skew?
Who is taking/consuming more CPU called cpu skew.
-IO skew?
Who perform more IO Operation? Resulting in IO Skew

Why does varchar occupy 2 extra bytes?

The two bytes are for the number of bytes for the binary length of the field.
It stores the exact no of characters stored in varchar

What is the difference between start schema and Fsldm? 

Star Schema
--> Demoralized for query performance
--> A star schema has one Fact tables and many dimensional tables to store the related.

FSLDM (Financial Services Logical Data Model)
--> 3rd normal form
-->Proposed and designed by NCR to cater the needs of financial sectors, especially for the Banking domain.
--> It has many Fact tables and dimension tables.
--> The FSLDM can be customized to cater to bank's specific needs.
Ex. The major fact tables include Party (Customers of the bank), Product (Services offered by the bank), Event (An event occurred ex: An ATM transaction), Agreement (A deal between
the party and the bank for a product) , etc..
Can you load multiple data files for same target table using Fastload?

Yes, we can Load a table using multiple datafiles in Fastload.

Before giving "end loading" statement user can define file path and use insert sql for multiple source files and give "end loading" statement at the end

Explain types of re-distribution of data happening for joining of columns from two tables
Case 1 - P.I = P.I joins
Case 2 - P.I = non Index joins
Case 3 -
non Index = non Index joins

Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.
Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs.
non Index column.  Ideal scenario is when small table is redistributed to be joined with large table records on same amp
case3 - data from both the tables are redistributed on all AMPs.  This is one of the longest processing queries , Care should be taken to see that stats are collected on these columns
How to skip the header row in the fastload script

RECORD 2;   /* this skips first record in the source file */
How to find out list of indexes in Teradata?

Nonpartitioned Primary
Partitioned Primary
join index
hash index
primary key
unique constraint
value ordered secondary
hash ordered ALL covering secondary
valued ordered ALL covering secondary
ordering column of a composite secondary index
Multi column statistics
Derived column partition statistics
field1 column of a join or hash index
field2 column of a join or hash index

databasename, tablename, columnname, indextype, indexnumber, indexname
databasename,  tablename, indexnumber;

How to identify PPI columns?

Select databasename , tablename , columnposition ,columnname from dbc.indices
where indextype ='Q'
order by 1 ,2,3 ;

What are the scenarios in which Full Table Scans occurs?

1. The where clause in SELECT statement does not use either primary index or secondary index
2. SQL Statement which uses a partial value (like or not like), in the WHERE statement.
3. SQL Statement which does not contain where clause.
4. SQL statement using range in where clause. Ex. (col1 > 40 or col1 < =10000)

What are Differences between Teradata and ANSI Session modes in Teradata?

1. Comparison is not Case sensitive
2. Create table are default to SET tables
3. Each transaction is committed implicitly
4. Supports all Teradata commands 
5. It follows BTET (Begin and End Transaction) Mode

1. Comparison is CASE sensitive
2. Create table are default to MULTISET tables
3. Each transaction has to be committed explicitly
4. Does not support all Teradata commands 
5. It does not follow BTET Mode

How do you create materialized view in Teradata?

There is no such thing as a "materialized view" in Teradata. The equivalent in Teradata would be a join index (or aggregate index) with a corresponding view put on top of it. The command to create one is "CREATE JOIN INDEX...(lots of options)".

Join indices are maintained when the data is maintained. They cannot be accessed directly, but are invoked when the optimizer determines it is beneficial. Even though you cannot invoke a join index directly, you create a view that looks like the join index and in that way, it would be similar to having direct access. However, this approach does not guarantee that the join index will be used.

What is difference B/w User and database in Teradata?

- User is a database with password but database cannot have password
- Both can contain Tables , views and macros
- Both users and databases may or may not hold privileges
- Only users can login, establish a session with Teradata database and they can submit requests

No comments:

Post a Comment