Thursday, September 2, 2010

Teradata interview questions PART 3

Teradata interview questions PART 3 consolidated over period of time from various sources. Thanks! to the original posters

=================================================================

Can you recover the password of a user in Teradata?

Answers:

No, you can’t recover the password of a user in Teradata. Passwords are stored in this data dictionary table (DBC.DBASE) using a one-way encryption method. You can view the encrypted passwords using the following query

SELECT * FROM DBC.DBASE;

=================================================================

Explain Ferret Utility in Teradata?

Ferret (File Reconfiguration tool) is an utility which is used to display and set Disk Space Utilization parameters within Teradata RDBMS. When you select the Ferret Utility parameters, it dynamically reconfigures the data on disks. We can run this utility through Teradata Manager; to start the Ferret Utility type (START FERRET) in the database window.

Answers:
Following commands can be used within Ferret Utility:

1. SHOWSPACE – Well this command reports you the amount of Disk Cylinder Space is in use and the amount of Disk Cylinder Space is available in the system. This will give you an information about Permanent Space cylinders, Spool Space cylinders, Temporary Space cylinders, Journaling cylinders, Bad cylinders and Free cylinders. For each of these 5 things it will present you 3 parameters i.e. Average Utilization per cylinder, % of total avaliable cylinders and number of cylinders.
2. SHOWBLOCKS – This command will help you in identifying the Data Block size and the number of Rows per data block. This command displays the Disk Space information for a defined range of Data Blocks and Cylinders.

=================================================================

Explain TPUMP (Teradata Parallel Data Pump) Utility in Teradata?

Answers: 

* TPUMP allows near real time updates from Transactional Systems into the Data Warehouse.
* It can perform Insert, Update and Delete operations or a combination from the same source.
* It can be used as an alternative to MLOAD for low volume batch maintenance of large databases.
* TPUMP allows target tables to have Secondary Indexes, Join Indexes, Hash Indexes, Referential Integrity, Populated or Empty Table, Multiset or Set Table or Triggers defined on the Tables.
* TPUMP can have many sessions as it doesn’t have session limit.
* TPUMP uses row hash locks thus allowing concurrent updates on the same table.

=================================================================

How can you determine I/O and CPU usage at a user level in Teradata?

Answers: 

You can find out I/O and CPU Usage from this Data Dictionary Table DBC.AMPUSAGE;

SELECT ACCOUNTNAME, USERNAME, SUM(CPUTIME) AS CPU, SUM(DISKIO) AS DISKIO FROM DBC.AMPUSAGE GROUP BY 1,2 ORDER BY 3 DESC;

=================================================================

How can you find the Table Space Size of your table across all AMPs?

Answers:

You can find the Table Space Size of your table from this Data Dictionary Table DBC.TABLESIZE

SELECT DATABASENAME, TABLENAME, SUM(CURRENTPERM) FROM DBC.TABLESIZE WHERE DATABASENAME = ‘’ AND TABLENAME = ‘’ GROUP BY DATABASENAME , TABLENAME;

=================================================================
How can you find the Teradata Release and Version information from Data Dictionary Table?
Answers:


To find Release and Version information you can query this Data Dictionary table DBC.DBCINFO

SELECT * FROM DBC.DBCINFO;

=================================================================
How can you track Login Parameters of users in Teradata?
Answers: 


You can view all these parameters in this Data Dictionary Table DBC.LOGONOFF
SELECT LOGDATE, LOGTIME, USERNAME, EVENT FROM DBC.LOGONOFF;

=================================================================
How can you use HASH FUNCTIONS to view Data Distribution across all AMPs in Teradata?
Answers: 


Hash Functions can be used to view the data distribution of rows for a chosen primary index.

SELECT HASHAMP(HASHBUCKET(HASHROW())) AS “AMP#”, COUNT(*) FROM GROUP BY 1 ORDER BY 2 DESC;

HASHROW – returns the row hash value for a given value
HASHBUCKET – the grouping of a specific hash value
HASHAMP – the AMP that is associated with the hash bucket

This is really good, by looking into the result set of above written query you can easily find out the Data Distribution across all AMPs in your system and further you can easily identify un-even data distribution.

=================================================================
How do you transfer large amount of data in Teradata?
Answers: 


Transferring of large amount of data can be done using various Application Teradata Utilities which resides on the host computer ( Mainframe or Workstation) i.e. BTEQ, FastLaod, MultiLoad, Tpump and FastExport.

* BTEQ (Basic Teradata Query) supports all 4 DMLs: SELECT, INSERT, UPDATE and DELETE. BTEQ also support IMPORT/EXPORT protocols.
* Fastload, MultiLoad and Tpump transfer the data from Host to Teradata.
* FastExport is used to export data from Teradata to the Host.

=================================================================
How does Hashing happens in Teradata?

Answers: 

* Hashing is the mechanism through which data is distributed and retrieved to/from AMPs.
* Primary Index (PI) value of a row is the input to the Hashing Algorithm.
* Row Hash (32-bit number) value is the output from this Algorithm.
* Table Id + Row Hash is used to locate Cylinder and Data block.
* Same Primary Index value and data type will always produce same hash value.
* Rows with the same hash value will go to the same AMP.

So data distribution depends directly on the Row Hash uniqueness; be careful while Choosing Indexes in Teradata.

=================================================================
How to eliminate Product Joins in a Teradata SQL query?
Answers: 

 
1. Ensure statistics are collected on join columns and this is especially important if the columns you are joining on are not unique.
2. Make sure you are referencing the correct alias.
3. Also, if you have an alias, you must always reference it instead of a fully qualified tablename.
4. Sometimes product joins happen for a good reason. Joining a small table (100 rows) to a large table (1 million rows) a product join does make sense.


=================================================================
How to select first N Records in Teradata?
Answers: 


To select N records in Teradata you can use RANK function. Query syntax would be as follows
SELECT BOOK_NAME, BOOK_COUNT, RANK(BOOK_COUNT) A FROM LIBRARY QUALIFY A <= 10;

=================================================================
How to view every column and the columns contained in indexes in Teradata?
Answers:


Following query describes each column in the Teradata RDBMS
SELECT * FROM DBC.TVFields;

Following query describes columns contained in indexes in the Teradata RDBMS
SELECT * FROM DBC.Indexes;

=================================================================
What are the 5 phases in a MultiLoad Utility?
Answers:

* Preliminary Phase – Basic Setup
* DML Phase – Get DML steps down on AMPs
* Acquisition Phase – Send the input data to the AMPs and sort it
* Application Phase – Apply the input data to the appropriate Target Tables
* End Phase – Basic Cleanup

=================================================================
What are the functions of a Teradata DBA?
Answers:

Following are the different functions which a DBA can perform:
1. User Management – Creation and managing Users, Databases, Roles, Profiles and Accounts.
2. Space Allocation – Assigning Permanent Space, Spool Space and Temporary Space.
3. Access of Database Objects – Granting and Revoking Access Rights on different database objects.
4. Security Control – Handling logon and logoff rules for Users.
5. System Maintenance – Specification of system defaults, restart etc.
6. System Performance – Use of Performance Monitor(PMON), Priority Scheduler and Job Scheduling.
7. Resource Monitoring – Database Query Log(DBQL) and Access Logging.
8. Data Archives, Restores and Recovery – ARC Utility and Permanent Journals.

=================================================================
What are the MultiLoad Utility limitations?
Answers:

MultiLoad is a very powerful utility; it has following limitations:

* MultiLoad Utility doesn’t support SELECT statement.
* Concatenation of multiple input data files is not allowed.
* MultiLoad doesn’t support Arithmatic Functions i.e. ABS, LOG etc. in Mload Script.
* MultiLoad doesn’t support Exponentiation and Aggregator Operators i.e. AVG, SUM etc. in Mload Script.
* MultiLoad doesn’t support USIs (Unique Secondary Indexes), Refrential Integrity, Join Indexes, Hash Indexes and Triggers.
* Import task require use of PI (Primary Index).

=================================================================
What are TPUMP Utility Limitations?
Answers:

Following are the limitations of Teradata TPUMP Utility:
* Use of SELECT statement is not allowed.
* Concatenation of Data Files is not supported.
* Exponential & Aggregate Operators are not allowed.
* Arithmatic functions are not supported.

=================================================================
What is FILLER command in Teradata?
Answers:

While running Fastload or Multiload if you don’t want to load a particular field from the datafile to the target table then use the FILLER command to achieve this. Syntax for FILLER command would be as following:

.LAYOUT FILE_PRODUCT; /* It is input file layout name */
.FIELD Prod_No * char(11); /* To load data into Prod_No */
.FIELD Prod_Name * char(11); /* To load data into Prod_Name */
.FIELD Location * char(11); /* To load data into Location */
.FILLER Prod_Chars * char(20); /* To skip the value for the next 5 locations */

=================================================================
What is the difference between Access Logging and Query Logging in Teradata?
Answers:

1. Access Logging is concerned with security (i.e. who’s is doing what). In access logging you ask the database to log who’s doing what on a given object. The information stored is based on the object not the SQL fired or the user who fired it.
2. Query Logging (DBQL) is used for debugging (i.e. what’s happening around ?). Incase of DBQL database keep on tracking various parameters i.e. the SQLs, Resource, Spool Usage, Steps and other things which help you understand what’s going on, the information is fruitful to debug a problem. Further DBQL is enabled on a User id rather than an object like say Table or so.

=================================================================
What is the difference between Sub-Query & Co-Related Sub-Query?
Answers: 


When queries are written in a nested manner then it is termed as a sub-query. A Sub-Query get executed once for the parent statement whereas Co-Related Sub-Query get executed once for each row of the parent query.

Select Empname, Deptno, Salary from Employee Emp where Salary = (Select Max(Salary) from Employee where Deptno = Emp.Deptno) order by Deptno

=================================================================
What is Reconfiguration Utility in Teradata and What it is used for?
Answers:

* When we feed Primary Index value to Hashing Algorithm then it gives us Row Hash(32 bit number) value which is used to make entries into Hash Maps.
* Hash Maps are the mechansim for determining which AMP will be getting that row.
* Each Hash Map is an array of 65,536 entries and its size is close to 128KB.

When Teradata is installed on a system then there are some scrpits which we need to execute i.e. DIP Scripts. So it creates a Hash Maps of 65,536 entries for the current configuration. But what if you want to add some more AMPs into your system?

Reconfiguration (Reconfig) is a technique for changing the configuration (i.e. changing the number of AMPs in a system) and is controlled by the Reconfiguration Hash Maps. System builds Reconfiguration Hash Maps by reassigning hash map entries to reflect new configuration of system.

Lets understand this concept with the help of an example; suppose you have a 4 AMPs system which holds 65,536 entries. Each AMP is responsible for holding (65,536/4=16,384) 16,384 entries.

Now you have added 2 more AMPs in your current configuration so you need to reconfigure your system. Now each AMP would be responsible for holding (65,536/6=10922) 10,922 entries.

=================================================================

2 comments:

  1. diff b/n explain select coommend nd select commend which is faster

    ReplyDelete
    Replies
    1. Explain select command will display the explain plan of the query only .

      Select command will execute and provide the output as per explain plan generated by optimizer.

      Br,
      Vinay SHet

      Delete