Saturday, August 28, 2010

Teradata Timestamp tips & tricks

Some tips which might be useful while working on date and timestamp. (queries run and verified on V2R6)

Tip 1: Create Timestamp from Date and Time


Tip 2: Convert timestamp field in Teradata into format 'MMM-YYYY'

Current TimeStamp(6)

Tip 3: Inserting current_timestamp value into timestamp(0) column

Create table stats(record_time TIMESTAMP(0));

Whenever we try to populate the current_timestamp value into this column using
insert into stats(record_time) select current_timestamp ; we get following error.
7454: Date Time field overflow.

To avoid this, we have two solutions
1. Data type for CURRENT_TIMESTAMP is TIMESTAMP(6) and you are trying to insert a
TIMESTAMP(6) data to a TIMESTAMP(0) and that's why you are getting this error.
insert into stats(record_time)
Sel cast(CURRENT_DATE as TimeStamp(0))
((CURRENT_TIME - time '00:00:00') hour to second(0)) ;

2. insert into stats(record_time) select CURRENT_TIMESTAMP(0);

Tip 4: Add minutes to timestamp

How to add 10 minutes to current timestamp in TD?
The following example is taken straight from Teradata Functions and Operators reference manual.
Example 1: Subtract two TIMESTAMP Types

Consider the following table:
(start_time TIMESTAMP(0),end_time TIMESTAMP(0));
Determine the difference, specifying an Interval unit of DAY TO SECOND for the result:
SELECT (end_time - start_time) DAY(4) TO SECOND FROM BillDateTime;

The DAY(4) specifies four digits of precision, and allows for a maximum of 9999 days, or
Approximately 27 years.
The result is an Interval that looks like this:
5 16:49:20.340000

Tip 5: Convert varchar to timestamp
have a column in source strdate1 as VARCHAR(23) and the value being '20100310181010'.

Tip 6: Display Timestamp with AM /PM format
SELECT CAST(TIMESTAMP'2010-03-12 14:32:45' AS FORMAT 'yyyymmddbhh:mi:ssbt') (char(20));

Tip 7: Display timestamp in 22 Mar 2010 00:26:54.840  format
select cast(current_TIMESTAMP(3)  as timestamp(3) FORMAT 'ddbmmmbyyyybhh:mi:ss.s(3)')(char(25));

 Tip 8: Subtract two TIMESTAMP Types

Consider the following table:
CREATE TABLE BillDateTime (start_time TIMESTAMP(0),end_time TIMESTAMP(0) );

Determine the difference, specifying an Interval unit of DAY TO SECOND for the result:
SELECT (end_time - start_time) DAY(4) TO SECOND FROM BillDateTime;

The DAY(4) specifies four digits of precision, and allows for a maximum of 9999 days, or
Approximately 27 years.
The result is an Interval that looks like this:
5 16:49:20.340000

 Tip 9: Extract function on  DATE & TIME  
Extracting From Current Date(March 20,2010)
·               Query                                                                                      Result
  SELECT DATE;                                                                 10/03/20 (Default format)
   SELECT EXTRACT(YEAR FROM DATE);               2010
   SELECT EXTRACT(DAY FROM DATE);                  20
   SELECT EXTRACT(YEAR FROM DATE + 365);      2011
   SELECT EXTRACT(DAY FROM DATE + 12);          01

Extracting From Current Time ( 2:42 PM )
               Query                                                                           Result  
SELECT TIME;                                                                    14:42:32 (Default format)
SELECT EXTRACT(HOUR FROM TIME);                               14
SELECT EXTRACT(MINUTE FROM TIME);                            42

Monday, August 23, 2010

Export /Import using Teradata SQL assistant

please Use this option  for few reocrds only (say in 1000's) . For large data volumes use fastload (empty table loading) or Mload utilities

1. Login to the Source Database using Teradata SQL Assistant. ( QA / PROD)

2. Paste the Source SQL Query in the "Query" window.

3. Select the following Menu Option "File=>Export Results"

4. Go to "Tools=>Options" and select the Date Format as 'YYYY-MM-DD'.
P.S:  This option is used when a date column exists in the table. 

5. Run the Query (Press - F5) and Provide the Target Location where you want the output file to be saved:

6. Open the Saved File (ex. Test_Export.txt) and remove the First Line from the File Which contains all the column Names.

7. Login to the Target Database using Teradata SQL Assistant (Dev).

8. Select the following Menu Option "File=>Import Data"

9. Paste the SQL Insert Query in the "Query" window.
ex. Insert into TBENW_RVLTN_REPL_LOC_SKU_FW values (?,?,?,?,?);

NOTE: There should be a '?' as a Place-Holder for all the columns in your Source File.

10. Run the Query (Press - F5).

11. Turn off the Import Data mode and resume querying.

Importing Excel Data into Teradata Table

1. Make sure you have the columns in Excel in the same order as the table you created. Remove the header rows from the Excel file and Save the file as comma delimited file from Excel (CSV File).
2. In SQL Assistant set the Delimiter as ","(comma) .
( Delimiter is available in Menu Tools , Options, Import/Export)

3. Go to "Tools=>Options" and select the Date Format as 'YYYY-MM-DD'.
P.S:  This option is used when a date column exists in the table. 

4. From Teradata SQL Assistant, click 'File', then 'Import Data'.

5. Run the command: insert into TABLENAME values (?,?,?....) 
You must supply a ? (question mark) for every column. So, for a table with 5 columns the values command would look like: values (?,?,?,?,?).

6. Highlight the insert query and Press F5. This will prompt you for a file location. Select the appropriate file.

7. This will load in the order in which they appear in your text file, which is why it is important to match the columns in Excel with the order of the columns in your table.

8. Turn off the Import Data mode and resume querying.


1. Difference between Teradata and Other RDBMS?
2. Draw the picture of Teradata warehouse?
3. What are the differences between Star schema and Snow flake schema?
4. Differentiate Dimension table and Fact table?
5. What is Fact less fact table?
6. Describe the different types of dimension tables? And explain with
7. Explain Teradata important components with architecture?
8. Difference between SMP and MPP?
9. Explain the Shared nothing architecture?
10.What is a set table?
When we will use multiset table?
11.What is the usage of Fallback?
12.How many primary keys we will apply on a table?
13.How does rows store in Teradata?
14.Tell me the types of Indexes?
15.Difference between UPI and NUPI and the situation where exactly
We will use these.?
16.The role of SI in teradata and explain the types?
17. How do we create a join and hash indexes and explain the scenerios?
When we will use exactly?
18.What is the meaning of Transient journal overhead?
19.What is skewness in teradata?
20.What is a subtable and when it requires?
21.What is the usage of coalesce function?
22.When we will use null-if function?
23. Can you write the syntax of substring and case expression.?
24.What are the different types of tables?
25.Explain the differences between global temporary table and volatile table?
26.Describe the use of Explain command and its features?
27.Have you collected statistics in your project?
28.Difference between FastLoad and Multiload?
29.Difference between BTEQ export and Fast Export?
30.Difference between general Delete and Multiload delete?
31.What is the structure for importing a file in BTEQ?
32.What is the structure for importing a file in Fastload?
33.What is the structure for importing a file in Multiload?
34.What is the structure for importing a file in Tpump?
35.How do you pass multiple files in fastload and multiload?
36.Explain different phases of fastload?
37.Explain the phases in multiload?
38.Describe the error tables in fastload with structures?
39.Describe the error tables in multiload with structures?
40.Difference between multiload and Tpump?
41.What is the use of Error code and Activitycount in Bteq?
42.How do you run the scripts in unix environment and windows
43.How do you handle the errors in ER and UV tables?
44.How do you check the status of scripts ruined?
45.Tell me the limitations of multiload?
46.Tell me the limitations of fastload?
47.What is the usage of filler in multiload?
48.What is the DIF format and when we will use?
49.Difference between Subquery and Corelated sub query?
50.What is a node and how many node system you have in your project?
51.What are the various ways to execute a query in teradata?
52.What is the difference between procedure and function?
53.What is the difference between macro and procedure?
54.What is a view and how it is useful?If so have you created any
Views in your project?
55.What are join strategies?Explain clearly with examples?
56.Have you done any performance tuning in your project if so
How do start?
57.What the various ways to tune a teradata query?
58. How do you select an index for a table?
59.Write a query for finding second maximum salary?
60.Write a query for a.finding duplicate rows?
b.removing duplicate rows?
c.keeping in another table?
61.There are two tables called customer table and calls table.
If a customer made a call, that record should be there in calls table.
I need the details of customers who didn’t made any call?Write a query?
62.I have two tables called emp and dept.thers is relationship between
Deptcode. I need all deptcodes and their corresponding employees?
63.Write a view with more then one table?
64.How do you handle if multiload script got failed in acquisition phase?
65.How do you handle if multiload script got failed in apply phase?
66.How do you handle if fastload script got failed in first phase?
67.How do you handle if fastload script got failed in second phase?
68.How do you handle Bteq script failures?
69.How do you pass parameters to a fastexport script?
70.Describe the layout of multiload?
71.Describe the layout of fastloadjavascript:void(0)?
72.Describe the layout of fastexport?

Teradata interview questions PART 2

Teradata interview questions PART 2
Teradata interview questions PART 2 consolidated over period of time from various sources. Thanks! to the original posters

Teradata performance tuning and optimization

1. collecting statistics
2. Explain Statements
3. Avoid Product Joins when possible
4. select appropriate primary index to avoid skewness in storage
5. Avoid Redistribution when possible
6. Use sub-selects instead of big "IN" lists
7. Use derived tables
8. Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC. DISTINCT sorts the data after it is redistributed)
9. Use Compression on large tables

How many codd's rules are satisfied by teradata database?
Answers: 12 codd’s rules.

What is the difference between Global temporary tables and Volatile temporary tables?

Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. That's why, data is active upto the session ends, and definition will remain there up-to its not dropped using Drop table statement.If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.If you are using volatile table, you can not put the default values on column level ( while creating table )

What is the difference between start schema and Fsldm?
FSLDM --> Financial Services Logical Data Model (Teradata FS-LDM 7.0) its developed by Teradata for the financial sectors (Specifically for Bank). it can be customised based on the user requirement.

StarSchema --> Its a relational database schema for representing multidimensional data. The data is stored in a central fact table, with one or more tables holding information on each dimension.

Join Strategies There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes itNow the Question is: By that plan is the optimizer is correct job or not ? Justify Ans2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it, Now is the optimizer is doing best? and How you avoid this situation

Teradata is smart enough to decide when to redistribute and when to copy.
It compares the tables. Are they comparable? or one is big as compared to the other?Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy.
what I mean is the small table is copied into all the AMPs in the SPOOL space.Remember all always the Join's has to take place on the AMPs SPOOL Space.By redistributing it is making sure that the 100 million rows table gets the feeling that
it is making AMP local JOIN. Remember the basic thing what ever Teradata does.
It does keeping in consideration for Space and Performance and not to forget the Efficiency.

My simple formula:
If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.

What are set tables and multiset tables in Teradata?Explain with an appropriate example?
1) Set tables cannot accept duplicate at row level not Index or key level.
Example of rows for set table:
R1 c2 c3 c4
1 2 3 4 ... 9

1 2 3 4 ... 9
Duplicate is Rejected
2 1 2 4 ... 9
3 2 4 4 ... 9
4 3 4 4 ... 9

2) Multi set Tables can accept
duplicate at row level not Index or key level.Exmaple of rows for multi set table:
R1 c2 c3 c4 .. cn
1 2 3 4 ... 9
1 2 3 4 ... 9
Duplicate is Accepted
2 1 2 4 ...9
3 2 4 4 ...9
3 2 4 4 ...9
Duplicate is Accepted

What is FILLER command in Teradata?

While using the mload of fastload if you don't want to load a particular filed in the datafile to the
target then use this filler command to achieve this

How a Referential integrity is handled in Teradata?
 By use of  TPUMP utility,  referential integrity is handled in teradata

Explain about Skew Factor?

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,
The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.
ex:AMP0 AMP110000(10%) 9000000(90%)in this situation skew factor is very high 90%

Why AMP & PE are called Vprocs?

Vprocs:Virtual process From PE to AMP (This is the network root via MSP(message passing layer),The processing data will store in Disks(These are Physical disks),Each Amp have too many P.disks,to migrate these P.disks The Bynet Network maintains Virtual disks.These V.disks will responsible for data migration.hence they are called as Virtual Process(VPROCS).

Why Fload doesn’t support multiset table?

Fast Load will load the Data Fastly,Mean Concurrently,Bulk Insert Internally TeraData server will Create Bulk insert DML statement (Like: insert into Table1 as sel * from Table2 with data) For Fast Performance it will not support the duplicate values,Referential Integrity.

What is use of compress in teradata?Explain?

Compression is used to Minimize the table size, for example when the table size is increasing anonymously We can use Compression to reduce the size of the table
Conditions:1.Compression can be declared at the time of table creation2.We can compress up to 256 column values(not columns) 3.We can't compress variable length fields (vartext,varchar..)

for 2 condition:create table tab1(::Order_type char(25) compress ('air','sea','road'):)in the above example order type have 3 fields, one should be selected by the user, so one of the field will repeat for every order, like these column values we can use compress statement because these are repeating for
entire table, like these column values TD supports 256 col generally NCR people will provides ready made scripts for these type of compressions However, we can store only one value per column and not 3(air, sea, road). The compressed value is
stored in column header and is to be used as default of that column unless a value is present.
Dept of Taxes in Washington has a database that tracks all people working in Washington. Around 99.9% of the tax payers would have Washington as their state code on their address. Instead of storing “Washington” in millions of records the compress will store the value “Washington” in the table header. Now, a value can be assumed in a row as a default unless another value exists inside the column

What is the process to restart the multiload if it fails?

MULTILOAD will creates 2 error tables, 1 work table When MULTILOAD fails We have to unlock the Main Table, here error tables and work tables are not locked like FLOAD.

To Unlock the Main Table in  case of acquisation Phase :
To release lock in application phase failure :
Can we load a Multi set table using MLOAD?

We can Load SET, MULTISET tables using Mload, But here when loading into MULTISET table using MLOAD duplicate rows will not be rejected, we have to take care of them before loading.But in case of Fload when we are loading into MULTISET duplicate rows are automatically rejected, FLOAD will not load duplicate rows weather table is SET or MULTISET

Can I use “drop” statement in the utility “fload”?

YES,But you have to declare it out of the FLOAD Block it means it should not come between .begin loading,.end loading FLOAD also supports DELETE,CREATE,DROP statements which we have to declare out of FLOAD blocking the FLOAD Block we can give only INSERT

Is it possible that there r two primary key will be in a same table?

Primary key
1. A table should have only one primary key
2. More than one column can consist of a primary key – upto 64 columns
3. Can not be NULL values (missing values)
4. Should be unique values (no duplicate value)

Foreign key
1. A table may have zero or more than that up-to 32 keys
2. More than one column can consist a primary key – up to 64 columns
3. Can have NULL values
4. Can have duplicate values
5. There should be a corresponding primary key in the parent table to enforce referential integrity for a foreign key

In a table can we use primary key in one column and in another column both unique and not null constrains.if yes how?

Yes, you can have a column for Primary key and have another column which will have no duplicates or null.e.g.A Salary Table will have employee ID as primary key.
The table also contains TAX-ID which can not be null or duplicate

What are the types of tables in Teradata ?

1. set table
2. multiset table
3.  derived table
4. global temporary table(temporary table)
5. volatile table

Teradata interview questions PART 1

How many error tables are there in fload and Mload and what is their significance/use?
Can we see the data of error tables?
When mload job fails, can we access mload tables? If yes then how?

Fload uses 2 error tables
ET TABLE 1: where format of data is not correct.
ET TABLE 2: violations of UPI
it maintains only error field name, errorcode and data-parcel only.

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table
1. ET TABLE - Data error
MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the acquisition phase of a MultiLoad import task.

2. UV TABLE - UPI violations
MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the application phase of a MultiLoad import or delete task

Apart from error tables, it also has work and log tables
Mload loads the selected records in the work table

A log table maintains record of all checkpoints related to the load job, it is essential l/ mandatory to specify a log table in mload job. This table will be useful in case you have a job abort or restart due to any reason.

How Teradata makes sure that there are no duplicate rows being inserted when its a SET table?

Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.
If it’s a duplicate it silently skips it without throwing any error.

What is basic Teradata query language?

1. It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing, exporting and reporting purposes.
2. The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
3. BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it.

What are cliques? What is Vdisk and how it will communicate with physical data storage at the time of data retrieval through AMP?

A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same disk arrays creates a clique.

Each AMP vproc must have access to an array controller, which in turn accesses the physical disks. AMP vprocs are associated with one or more ranks (or mirrored pairs) of data. The total disk space associated with an AMP is called a vdisk. A vdisk may have up to three ranks. Hence Vdisk will communicate with physical storage through array controllers.

What is the difference between MultiLoad & Fast load interns of Performance?

If you want to load, empty table then you use the fastload, so it will very useful than the MultiLoad ,because fastload performs the loading of the data in 2phase and its no need a work table for loading the data .
So it is faster as well as it follows the below steps to load the data in the table
Phase1 - It moves all the records to the entire AMP first without any hashing
Phase2 - After giving end loading command, Amp will hashes the record and send it to the appropriate AMPS.

It does the loading in the 5 phases
Phase1 - It will get the import file and checks the script
Phase2 - It reads the record from the base table and store in the work table
Phase3 - In this Application phase it locks the table header
Phase4 - In the DML operation will done in the tables
Phase5 - In this table locks will be released and work tables will be dropped.

Which two statements are true about a foreign key?

Each Foreign Key must exist as a Primary Key.
Foreign Keys can change values over time.
First: True
Second: False
1. Foreign Keys can change values over time.
2. Each Foreign Key must exist as a Primary Key.

How does indexing improve query performance?

Indexing is a way to physically reorganize the records to enable some frequently used queries to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row quickly and then return it back to the user.
The frequently used queries need not hit a large table for data. they can get what they want from the index itself. - cover queries.

Index comes with the overhead of maintenance. Teradata maintains its index by itself. Each time an insert/update/delete is done on the table the indexes will also need to be updated and maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index

What is a common data source for the central enterprise data warehouse?

ODS=>Operational Data Source

What are the enhanced features in Teradata V2R5 and V2R6?

V2R6 included the feature of replica in it. in which copy of data base are available on another system which means V2R6 provide the additional data protection as comparison to V2R5 while if data from one system has been vanishes.

Where is the GRANT table option usually done? When tables are newly created, what is the default role and what the default privileges which get assigned?

The GRANT option for any particular table depends on the privileges of the user. If it is an admin user you can grant privileges at any point of time.
The default roles associated with the newly created tables depend on he schema in which they are created.

What is error table? What is the use of error table?

The Error Table contains information concerning:
- Data conversion errors Constraint violations and other error conditions:
* Contains rows which failed to be manipulated due to constraint violations or Translation error
* Captures rows that contain duplicate Values for UPIs.
* It logs errors & exceptions that occurs during the apply phase.
* It logs errors that are occurs during the acquisition phase.

What is optimization and performance tuning and How does it work in practical projects. Explain with an example.

Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to fetch results.
Optimization is directly proportional to the availability of --
1. CPU resources
2. Systems resources - amps PEs etc.
Teradata performance tuning is a technique of improving the process in order for query to perform faster with the minimal use of CPU resources.

Does SDLC changes when you use Teradata instead of Oracle?

If the teradata is going to be only a data base means It won?t change the System development life cycle (SDLC)
If you are going to use the Teradata utilities then it will change the Architecture or SDLC
If your schema is going to be in 3NF then there won’t be huge in change

What is an optimization and performance tuning and how does it really work in practical projects?

Performance tuning and optimization of a query involves collecting statistics on join columns, avoiding cross product join, selection of appropriate primary index (to avoid skewness in storage) and using secondary index.
Avoiding NUSI is advisable.

What are two examples of an OLTP environment?

The two examples of OLTP are:
1- ATM.
2- POS

Sunday, August 22, 2010

Teradata SQL Assistant

Teradata SQL Assistant


Teradata SQL Assistant (TSA), as part of Teradata Tools and Utilities (TTU), is an ODBC-based client utility used to access and manipulate data on ODBC-compliant database servers

TSA for Windows:

Teradata SQL Assistant is an information discovery tool designed for Windows XP and Windows 2000. Teradata SQL Assistant retrieves data from any ODBC-compliant database server. The data can then be manipulated and stored on the desktop PC.


Teradata SQL Assistant for Microsoft Windows, originally called "Queryman" (before V. 6.0)or "QueryMan" (V. 6.0 and up), is also known as "SQLA" among programmers. It supports import / export tasks, but not the serious ones.
           With its user-friendly GUI, TSA for Windows is generally oriented to business users and casual data consumers. (Refer to BTEQ for comparison.) .

The key features of Teradata SQL Assistant are:

    * Send queries to any ODBC database or the same query to many different databases;
    * Export data from the database to a file on a PC;
    * Create reports from any RDBMS that provides an ODBC interface;
    * Import data from a PC file directly to the database;
    * Use an import file to create many similar reports (query results or Answersets). For example, display the DDL (SQL) that was used to create a list of tables;
    * Create a historical record of the submitted SQL with timings and status information such as success or failure;
    * Use SQL syntax examples to tailor statements;
    * Use the Database Explorer Tree to easily view database objects;
    * Use a procedure builder that provides a list of valid statements for building the logic of a stored procedure;
    * Limit data returned to prevent runaway queries.