Wednesday, January 23, 2013

Packdisk in Teradata

PACKDISK

It is one of the last resorting options available to DBA's when the system runs out of Cylinders.Packdisk helps to pack up cylinders in AMP making space for more data. We need to note that
PACKDISK is an expensive operation on the system and it should be planned out so no parallel activities are run

Running PACKDISK on regular basis is part of Best Teradata practices. It also helps improve overall performance of the system .

How do you determine whether system is running out of space?

1. By using DBC.Diskspace ,

Select 
sum(currentperm)/1024/1024/1024 "cur_space_gb" ,
sum(maxperm)/1024/1024/1024 "max_space_gb",
case when ((max_space_gb - cur_space_gb)/max_space_gb) *100  < 25
then  'Warning! Packdisk has to be run'
else  'No Packdisk activity required!' end
from DBC.Diskspace ;

2. By using Showspace from Ferret utility.
this would display free space per cylinder.  When the free space is low,it would run minicylpacks which can impact the performance of system . So DBA's need to keep an eye on Free space per cylinder.

3. SHOWFSP another Ferret utility which is very useful in identifying particular tables that needs to packing.
It also shows amount of cylinders that can freed up by specifying Free space percent (FSP)

How to start PACKDISK?

please note that We can start /pause/ abort & continue packdisk whenever needed.
(type help to figure more options is you are not sure)

1. To start packdisk, go to ferret-> type PACKDISK
2. you can define range of FSP to start packdisk (say fsp =30 will start for cylinders having less than 30% free space)
3. Above option can be ignored and PACKDISK can be invoked for all cylinders)


Monday, January 7, 2013

Minicylpack in Teradata


MINICYLPACK is File system activity and is handled by Teradata by default.
This activity will exactly free one cylinder in an AMP and it is run automatically when the user request would need free cylinder or huge amount of space.
Cylinders are dedicated for PERM and SPOOL space separately.

MINICYLPACK is kind of last resort in system when the system is completely out of space and it is initiated. The Teradata file system will start to minicylpack when the number of free cylinders drops to the value set by MiniCylPackLowCylProd. The default is 10.


Say in above scenario, an AMP with 6 cylinders, the MiniCylPackLowCylProd is set to 1.
When the number of free cylinders goes below 1 i.e. 0, MINICYLPACKS is initiated and one cylinder is freed up during the process.

MINICYLPACK is resource intense activity because it causes lot of I/O's  in process of freeing up the data blocks  . It is suggested to run SHOWFSP from Ferret to see number of free cylinders available and cylinders that can be freed up and run PACKDISK to avoid unnecessary MINICYLPACK's

Thursday, January 3, 2013

Defragmentation in Teradata


DEFRAGMENTATION

This activity is automatically handled by Teradata File System when there is lot of fragmented sectors across cylinders.

Whenever a new table is created and data is loaded into it, the data is stored in Cylinders on a particular AMP. Data is always pushed into AMP cylinders as DATA BLOCKS.
IF the table is abused so much by doing DML operations like Update, delete and inserts on regular basis the data would have scattered across various cylinders as fragments.  When data blocks are scattered across cylinder lot of empty sectors occur. Well this space might be sufficient for small updates or inserts happening on table.
What if we needed large amount of Data to be inserted into table? The data blocks may not fit into cylinder as there are no continuous sectors available.

To overcome this problem Teradata File system (FS) runs the defragmentation to push all the data into continuous blocks.  This will create all the data blocks to be packed towards top of the cylinder making way for free sectors towards the bottom.   


Defragmentation does not free up cylinders, it only groups empty sectors and push towards bottom of cylinder
Can I run defragmentation manually?
Yes indeed!! By using ferret --> defrag   y ;
We should note that defragmentation will not run unless 25% of sectors are fragmented on cylinder.

By using “FORCE” option we can override this 25% factor and defragmentation is initiated.

Please note

  • Defragmentation is entirely controlled by DefragLowCylprod parameter in DBSControl . Setting this parameter high will result defragmentation to run frequently.  We can set up-to 65535 as it is maximum limit for cylinders per AMP.
  • Defragmentation consumes lot of CPU resources and hence should be planned out !! 




Wednesday, January 2, 2013

Teradata 13.10 new features (part 3)


Td13.10 not only enhanced existing Multi valued compression but also introduced two new methods

3. Algorithmic compression
Users can now write their own compression and decompression algorithms for characters. 
These will have to be implemented in UDF and needs to be tested thoroughly before implementing it in full scale.
There are some samples functions Teradata has provide and can be found in SYSLIB.
TransUnicodeToUTF8, TransUTF8ToUnicode, LZCOMP, LZDECOMP, LZCOMP_L, LZDECOMP_L, CAMSET, DECAMSET, CAMSET_L,DECAMSET_L.



4. Block level compression (applicable to DBA’s, Capacity planners)
BLC is applied at data blocks and allows row data to be stored in datablocks* in a new compressed format.
To enable BLC on tables, new command "COMPRESS" is available in FERRET utility. After BLC is enabled on table, all tows added to table are compressed automatically. Similarly "UNCOMPRESS" command can be used to uncompress packed Data.

BLC is one of most straight forward methods where one doesn’t have to worry about type of data across various columns in table. Row header and internal Teradata structures are also compressed using BLC.

BLC plays major role in saving lot of I/O and Space but at same time it can have some performance implications as more CPU can be utilized to compress and uncompress data dynamically. 

datablocks*
(Data blocks contains data rows .Block size can vary between 64kb to 128kb in latest version)

To be Continued...

Tuesday, January 1, 2013

Teradata 13.10 new features (part 2)

2. Multi valued Compression.

This is one of my most favourite functionality in Teradata.  Since its introduction in V2R5, it has gained popularity considering the enormous space savings and reduced I/O in huge data warehouses.

Top keep upto its promise, Teradata has gone one step further to include more data types for compression.
Previously multivalued compression was limited to fixed length columns types.

In TD13.10, some new features in MVC include
- Support for varchar and varbyte
- Longer columns support upto 510 characters. Previously 255 in Teradata 12, 13


-- Multi Value Compression in TD12,13
CREATE SET TABLE EMPLOYEE_TBL
(
EMPID INTEGER NOT NULL,
EMP_NAME_FIRST VARCHAR(20),  -- > compression not possible
EMP_NAME_LAST VARCHAR(20),
DT_BIRTH DATE FORMAT 'YYYY-MM-DD' NOT NULL,
DT_JOINING DATE FORMAT 'YYYY-MM-DD' NOT NULL,
DEPT_ID INTEGER,
LOCATION CHAR(20) COMPRESS ('ALABAMA','CALIFORNIA','COLORADO','FLORIDA','GEORGIA','ILLINOIS','INDIANA') --> Location needs to be char to be compressed
)
UNIQUE PRIMARY INDEX (EMPID);

-- Multi Value Compression in TD13.10, more columns can be compressed.
CREATE SET TABLE EMPLOYEE_TBL
(
EMPID INTEGER NOT NULL,
EMP_NAME_FIRST VARCHAR(20)  COMPRESS ('JAMES','JOHN','ROBERT','MICHAEL','WILLIAM','DAVID','RICHARD','CHARLES','JOSEPH','THOMAS','PAUL','MARK','GEORGE','EDWARD','BRIAN'), -- > compression applied on VARCHAR column with criteria of common first names
EMP_NAME_LAST VARCHAR(20),
DT_BIRTH DATE FORMAT 'YYYY-MM-DD' NOT NULL,
DT_JOINING DATE FORMAT 'YYYY-MM-DD' NOT NULL,
DEPT_ID INTEGER ,
LOCATION VARCHAR(20) COMPRESS ('ALABAMA','CALIFORNIA','COLORADO','FLORIDA','GEORGIA','ILLINOIS','INDIANA') -- > Datatype changed to VARCHAR and Compression applied which results in 5-15% more space savings than CHAR
)
UNIQUE PRIMARY INDEX (EMPID);



To be continued...

Teradata 13.10 new features (part 1)

Well..I thought it’s still not late to share some interesting things about  new features of Teradata 13.10  . I want to keep this topic as simple and useful as possible for someone who needs to catch a train or bus in next 30 mins.

Here are few new features of Teradata 13.10 from development point of view as developers need to be aware of what is in house for new version.

1. New calendar functions in TD13.10
There are lot of calendar functions introduced in 13.10   which are 
·         day_of_week
·         day_of_month
·         day_of_year
·         day_of_calendar
·         weekday_of_month
·         week_of_month
·         week_of_year
·         week_of_calendar
·         month_of_quarter
·         month_of_year
·         month_of_calendar
·         quarter_of_year
·         quarter_of_calendar
·         year_of_calendar

These can be found in SYSLIB database.

The biggest problem when it comes to migrating from previous versions to 13.10 is "calendar functions".  As we have implemented these concepts using SYS_CALENDAR.CALENDAR table in previous versions.
Every client going through migration has to work on calendar functions . this is a mammothic task if sys_calendar is used in many places in ETL framework.

Following is an scenario. Which is beneficial?
-- Sample Query written in TD12, TD13
SEL A.*,DAY_OF_WEEK
 FROM DBC.TABLES  A
 ,SYS_CALENDAR.CALENDAR  -- calendar table referred to pick current date
 WHERE DATABASENAME ='DBC'
 AND CALENDAR_DATE=CURRENT_DATE;


 -- Sample QUERY rewrite for TD13.10
 SEL A.*,
 SYSLIB.DAY_OF_WEEK(CURRENT_DATE)  --elimination of calendar table and usage of  function in TD13.10
  FROM DBC.TABLES  A
 WHERE DATABASENAME ='DBC' ;

Advantages of calendar functions
1. Elimination of Calendar tables resulting in unnecessary spool. 
2. Avoiding product joins in some cases
3. Complexity of query & logic is reduced


Please note - In some cases, if you still need to retain the original functionality of queries then "table qualifier should be used as follows"
 -- Sample Query written in TD12, TD13
 -- addition of table qualifier CAL to make query compactible with TD13.10
SEL A.*,
CAL.DAY_OF_WEEK  -- Table qualifier CAL is used
 FROM DBC.TABLES  A
 ,SYS_CALENDAR.CALENDAR CAL  -- calendar table with qualifier CAL  referred to pick current date
  WHERE DATABASENAME ='DBC'
 AND CALENDAR_DATE=CURRENT_DATE;


To be continued….