Learnings

Wednesday, February 25, 2009

Unique partitioning scenario - roll in/roll out with selective retention

The common example quoted for range partitioning and the rollout/rollin or partition exchange option is one of, say, holding daily sales data in a transaction table partitioned by day. Every day, the oldest partition is rolled out and the newest one is rolled in (or exchanged). This is fine if no one ever wants to see data older than, say, 365 days. But if there are some specific transactions which need to be retained for longer than a year, then a simple drop/create partition scheme won't work very well because we will be left with some old partitions containing very little of data but still taking up a lot of the space (unless they are reorganized).
In a case like this, we can think of having a table with just 365 partitions (called P1, P2, ..., P365 say) and have a small mapping table which stores the mapping between transaction date and partition. In other words, have a mapping table such as:

Trans_Date Part_No
23-2-09 54
24-2-09 55
25-2-09 56
25-2-08 56
26-2-09 57
27-2-09 58
27-2-08 58

As we can see, for dates where we want to retain transactions, we do so by storing them in the corresponding partition for the current year. Thus, data for 25th Feb for ALL years will be stored in partition number 56, data for 26th Feb for ALL years will be stored in partition number 57 and so on.
Finally, this table has to be joined to the main transaction table (this can be hidden inside a view) for all queries to assist in partition elimination.

DB2 example follows:
Create table TP_Sales (
Trans_Date timestamp,
Partition_No int,
Trans_No int,
Trans_Amt decimal(5,2),
Retain_Flag int)
Partition by Range(Partition_No)
(Starting 1 Ending 366 every 1)

Create table Part_Hash (
Trans_Date timestamp not null primary key,
Partition_No int)

// Set up data
INSERT INTO PART_HASH
SELECT trans_date,
dayofyear(trans_date)
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x
) y


// 10 rows for every day
INSERT INTO TP_SALES
SELECT trans_date,
dayofyear(trans_date),
trans_no,
trans_amt,
1
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date,
x2.trans_no,
x2.trans_amt
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x,
(SELECT row_number() over()+1000 as trans_no
,10.5 as trans_amt
FROM sysibm.columns
FETCH FIRST 10 ROWS ONLY) x2
) y

// Statistics
runstats on table tp_sales;

// We will need to create a view to hide the partition number from end users
CREATE VIEW SALES AS
SELECT A.TRANS_DATE,
TRANS_NO,
TRANS_AMT,
RETAIN_FLAG
FROM TP_SALES A JOIN PART_HASH B
ON A.TRANS_DATE = B.TRANS_DATE
AND A.PARTITION_NO = B.PARTITION_NO


Finally, a few typical queries to see if partition elimination is happening:
a. Select sum(trans_amt) From Sales where Trans_Date = current timestamp
Only some partitions being picked up (not known at compile time though)

b. Select max(trans_amt) From Sales where Trans_Date between (current date - 10 days) and current date
Only some partitions being picked up (not known at compile time though)

c. Select sum(trans_amt) From Sales where Trans_Date between (current date - 100 days ) and (current date - 40 days)
Only some partitions being picked up (not known at compile time though)

d. Select max(trans_amt) From TP_Sales where Partition_no between 45 and 55
As expected, only partitions 45 to 55 picked up.

Thursday, February 19, 2009

On Nullable columns

Varchar columns can use a length of 0 to indicate lack of a value so there is no point in defining them as NULLABLE. Defining them as nullable adds an extra byte to the row in addition to the existing 2 bytes which denote the length of the varchar. Also, the check on 0 length can possibly circumvent all the controversy around equivalence or otherwise of '' and NULL. A small UDF may be called for in databases which say length(NULL) = NULL to instead return 0.

In Oracle, nulls are not stored in an index while in DB2 they are. If I have a unique index defined on a nullable column in Oracle, it will allow any number of nulls while enforcing uniqueness on the non-null values. In DB2, this won't happen by default unless the index is defined as being UNIQUE WHERE NOT NULL (supported on DB2 z/OS 9.1 but not LUW 9.5).

Index entries for NULL have their first byte set to X'FF' in DB2 and hence will be the rightmost value. When rows are being inserted in ascending order, if the last page is full and the new value is the highest in the index, DB2 does not split the last page but instead allocates a fresh new page. That way, leaf pages won't remain half empty. However, the presence of a single NULL value in the table can defeat this rule and lead to many half-empty pages.

Wednesday, February 18, 2009

Tracking open cursors in DB2

The Event Monitor can be used to track if an application is leaving cursors open as follows:
From db2clp, connect to the db of interest and:
reset monitor all;
update monitor switches using statement on;
drop event monitor evmon;

CREATE EVENT MONITOR evmon
FOR STATEMENTS
WRITE TO TABLE
STMT (TABLE administrator.STMT_evmon
INCLUDES (AGENT_ID,
APPL_ID,
BLOCKING_CURSOR,
CREATOR,
CURSOR_NAME,
FETCH_COUNT,
INT_ROWS_DELETED,
INT_ROWS_INSERTED,
INT_ROWS_UPDATED,
ROWS_READ,
ROWS_WRITTEN,
SQLCODE,
SQLERRD1,
SQLERRM,
SQLSTATE,
SQLWARN,
START_TIME,
STMT_OPERATION,
STMT_TEXT,
STMT_TYPE,
STOP_TIME ) );

commit;
set event monitor evmon state=1;

run the suspect application for some time (in any other session)

set event monitor evmon state=0;

To check if there are open cursors not closed, issue:

Select A.Cursor_Name,
A.Stmt_text,
A.Start_Time,
A.Stop_Time,
A.Stmt_Operation
From STMT_EVMON A
Where
A.STMT_OPERATION = 4
AND NOT EXISTS (Select 1
From STMT_EVMON B
Where B.CURSOR_NAME = A.CURSOR_NAME
AND B.AGENT_ID = A.AGENT_ID
AND B.STMT_OPERATION = 6)

Any rows returned show cursors left open by application (this probably means their connections are still open too)!

Labels:

When do cursors get closed?

Cursors can remain open in a program if they are not explicitly closed and not all records are fetched from them and the connection is not closed. In other words:
1. If all records are fetched, then the SQLCODE 100 also triggers a cursor close (regardless of whether the cursor was declared WITH HOLD or not).
2. Similarly, if the connection is closed, then the cursors are automatically closed (again regardless of WITH HOLD or not).
3. If the transaction is rolled back, that also closes open cursors automatically (regardless of WITH HOLD or not).
4. If the transaction is committed, that also closes open cursors automatically (unless declared WITH HOLD).

So, if a program opens many WITH HOLD cursors and forgets to close connections, it is possible for these cursors to hang around.

Labels:

Thursday, February 12, 2009

Applying DB2 fixpacks

There is a Fixpackreadme.txt which contains specific instructions for each fixpack. In general:

as db2inst1:

db2
db2stop
update dbm cfg using sysadm_group db2grp1
db2start
force applications all
db2stop force
terminate
db2licd -end
exit
db2_kill

as dasusr1:
db2admin stop

as root:
db2fmcu -d
db2fm -D
$HOME/sqllib/bin/ipclean
ipclean
slibclean
ipcrm

fixpack:
./installFixPak -y

and possibly a db2iupdt after that

Monday, February 09, 2009

db2cli.ini and JDBC

The JDBC Type-2 driver goes over cli and is hence influenced by what is in db2cli.ini.
The JDBC Type-4 driver is a pure driver and is not impacted by db2cli.ini

Tuesday, February 03, 2009

More about Cursor Blocking

To control the number of rows getting blocked, one can either tune RQRIOBLK (remote client) or ASLHEAPSZ. If using CLI or JDBC-2, the db2cli keyword BlockForNRows gives much more precise control on number of rows to block (equivalent of Oracle's array size).

Monday, February 02, 2009

Cursor Blocking in DB2

Just like it is inefficient to be picking single rows off disk, to avoid which the database fetches data in pages/blocks, it can be equally inefficient to fetch rows from the server to the client one at a time. In Oracle, the array size can be configured to alter this behaviour and make the server send blocks of rows at a time. In DB2, the same is called cursor blocking and is influenced by either the RQRIOBLK size (if client is remote) or the ASLHEAPSZ (if client is local) parameters.

With cursor blocking, if a qualifying row is locked on the server, the cursor will block till it can get the row. In other words, if DB2 is trying to send a block of 1000 qualifying rows to the client and row number 900 is locked, DB2 does not instead send 899 rows and wait - it waits for that row to be unlocked before sending the whole block over. If one does not keep this in mind, it might appear that DB2 is doing page-level locking or something.

To disable cursor blocking, one can either explicitly bind with the blocking option set to NO or issue a Select For Update instead of a plain select (assuming the logic warrants it). This can be seen in action as follows:
In session 1, create a table T with 10 rows, commit and then update the 10th row.
In session 2, try a Select * From T and it will block without returning anything.

In session 3, try a Select * from T For Update and it will return 9 rows and then block.

Labels: