Monday, August 23, 2010

Teradata interview questions PART 1



How many error tables are there in fload and Mload and what is their significance/use?
Can we see the data of error tables?
When mload job fails, can we access mload tables? If yes then how?
Answers:

Fload uses 2 error tables
ET TABLE 1: where format of data is not correct.
ET TABLE 2: violations of UPI
it maintains only error field name, errorcode and data-parcel only.


Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table
1. ET TABLE - Data error
MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations
MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task


Apart from error tables, it also has work and log tables
3. WORK TABLE - WT
Mload loads the selected records in the work table

4. LOG TABLE
A log table maintains record of all checkpoints related to the load job, it is essential l/ mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.


How Teradata makes sure that there are no duplicate rows being inserted when its a SET table?
Answers:

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.
If it’s a duplicate it silently skips it without throwing any error.


What is basic Teradata query language?
Answers:

1. It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing, exporting and reporting purposes.
2. The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
3. BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it.


What are cliques? What is Vdisk and how it will communicate with physical data storage at the time of data retrieval through AMP?
Answers:

A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same disk arrays creates a clique.

Each AMP vproc must have access to an array controller, which in turn accesses the physical disks. AMP vprocs are associated with one or more ranks (or mirrored pairs) of data. The total disk space associated with an AMP is called a vdisk. A vdisk may have up to three ranks. Hence Vdisk will communicate with physical storage through array controllers.


What is the difference between MultiLoad & Fast load interns of Performance?
Answers:

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .
So it is faster as well as it follows the below steps to load the data in the table
Phase1 - It moves all the records to the entire AMP first without any hashing
Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

MultiLoad:
It does the loading in the 5 phases
Phase1 - It will get the import file and checks the script
Phase2 - It reads the record from the base table and store in the work table
Phase3 - In this Application phase it locks the table header
Phase4 - In the DML operation will done in the tables
Phase5 - In this table locks will be released and work tables will be dropped.

Which two statements are true about a foreign key?
Answers:

Each Foreign Key must exist as a Primary Key.
Foreign Keys can change values over time.
Answer-1:
First: True
Second: False
1. Foreign Keys can change values over time.
2. Each Foreign Key must exist as a Primary Key.

How does indexing improve query performance?
Answers:

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.
or
The frequently used queries need not hit a large table for data. they can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index
.

What is a common data source for the central enterprise data warehouse?
Answers:

ODS=>Operational Data Source


What are the enhanced features in Teradata V2R5 and V2R6?
Answers:

V2R6 included the feature of replica in it. in which copy of data base are available on another system which means V2R6 provide the additional data protection as comparison to V2R5 while if data from one system has been vanishes.


Where is the GRANT table option usually done? When tables are newly created, what is the default role and what the default privileges which get assigned?
Answers:

The GRANT option for any particular table depends on the privileges of the user. If it is an admin user you can grant privileges at any point of time.
The default roles associated with the newly created tables depend on he schema in which they are created.


What is error table? What is the use of error table?
Answers:

The Error Table contains information concerning:
- Data conversion errors Constraint violations and other error conditions:
* Contains rows which failed to be manipulated due to constraint violations or Translation error
* Captures rows that contain duplicate Values for UPIs.
* It logs errors & exceptions that occurs during the apply phase.
* It logs errors that are occurs during the acquisition phase.


What is optimization and performance tuning and How does it work in practical projects. Explain with an example.
Answers:

Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to fetch results.
Optimization is directly proportional to the availability of --
1. CPU resources
2. Systems resources - amps PEs etc.
Teradata performance tuning is a technique of improving the process in order for query to perform faster with the minimal use of CPU resources.


Does SDLC changes when you use Teradata instead of Oracle?
Answers:

If the teradata is going to be only a data base means It won?t change the System development life cycle (SDLC)
If you are going to use the Teradata utilities then it will change the Architecture or SDLC
If your schema is going to be in 3NF then there won’t be huge in change


What is an optimization and performance tuning and how does it really work in practical projects?
Answers:

Performance tuning and optimization of a query involves collecting statistics on join columns, avoiding cross product join, selection of appropriate primary index (to avoid skewness in storage) and using secondary index.
Avoiding NUSI is advisable.


What are two examples of an OLTP environment?
Answers:

The two examples of OLTP are:
1- ATM.
2- POS

5 comments:

  1. This is really great!!! Thanks for whomever has gathered, compiled and created these questions.. Wonderful effort and I could really love to say THANKS!!

    ReplyDelete
  2. Thanks .....Was really helpful

    ReplyDelete
  3. Great work ....Helpful for many aspirants ....Thanks and Keep updating this section.

    ReplyDelete
  4. Really helpful. Gonna collect all the 7 parts these questions.
    Thanks for sharing it.

    ReplyDelete