Teradata interview questions PART 2
Teradata interview questions PART 2 consolidated over period of time from various sources. Thanks! to the original posters
Teradata performance tuning and optimization
1. collecting statistics
2. Explain Statements
3. Avoid Product Joins when possible
4. select appropriate primary index to avoid skewness in storage
5. Avoid Redistribution when possible
6. Use sub-selects instead of big "IN" lists
7. Use derived tables
8. Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
9. Use Compression on large tables
How many codd's rules are satisfied by teradata database?
Answers: 12 codd’s rules.
What is the difference between Global temporary tables and Volatile temporary tables?
Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.
Volatile Temporary tables (VTT) -
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table )
What is the difference between start schema and Fsldm?
FSLDM --> Financial Services Logical Data Model (Teradata FS-LDM 7.0) its developed by Teradata for the financial sectors (Specifically for Bank). it can be customised based on the user requirement.
StarSchema --> Its a relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension.
Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation
Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join's has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that
it is making AMP local JOIN. Remember the basic thing what ever Teradata does.
It does keeping in consideration for Space and Performance and not to forget the Efficiency.
My simple formula:
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.
What are set tables and multiset tables in Teradata?Explain with an appropriate example?
1) Set tables cannot accept duplicate at row level not Index or key level.
Example of rows for set table:
R1 c2 c3 c4 ..cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Rejected
2 1 2 4 ... 9
3 2 4 4 ... 9
4 3 4 4 ... 9
2) Multi set Tables can accept
duplicate at row level not Index or key level.Exmaple of rows for multi set table:
R1 c2 c3 c4 .. cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Accepted
2 1 2 4 ...9
3 2 4 4 ...9
3 2 4 4 ...9
Duplicate is Accepted
What is FILLER command in Teradata?
While using the mload of fastload if you don't want to load a particular filed in the datafile to the
target then use this filler command to achieve this
How a Referential integrity is handled in Teradata?
By use of TPUMP utility, referential integrity is handled in teradata
Explain about Skew Factor?
The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.
ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%
Why AMP & PE are called Vprocs?
Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).
Why Fload doesn’t support multiset table?
Fast Load will load the Data Fastly,Mean Concurrently,Bulk Insert Internally TeraData server will Create Bulk insert DML statement (Like: insert into Table1 as sel * from Table2 with data) For Fast Performance it will not support the duplicate values,Referential Integrity.
What is use of compress in teradata?Explain?
Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)
for 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for
entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is
stored in column header and is to be used as default of that column unless a value is present.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column
What is the process to restart the multiload if it fails?
MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.
To Unlock the Main Table in case of acquisation Phase :
Mload RELEASE MLOAD ;
To release lock in application phase failure :
MLOADRELEASE MLOAD .IN APPLY;
Can we load a Multi set table using MLOAD?
We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET
Can I use “drop” statement in the utility “fload”?
YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT
Is it possible that there r two primary key will be in a same table?
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key
In a table can we use primary key in one column and in another column both unique and not null constrains.if yes how?
Yes, you can have a column for Primary key and have another column which will have no duplicates or null.e.g.A Salary Table will have employee ID as primary key.
The table also contains TAX-ID which can not be null or duplicate
What are the types of tables in Teradata ?
1. set table
2. multiset table
3. derived table
4. global temporary table(temporary table)
5. volatile table