Unique partitioning scenario - roll in/roll out with selective retention
The common example quoted for range partitioning and the rollout/rollin or partition exchange option is one of, say, holding daily sales data in a transaction table partitioned by day. Every day, the oldest partition is rolled out and the newest one is rolled in (or exchanged). This is fine if no one ever wants to see data older than, say, 365 days. But if there are some specific transactions which need to be retained for longer than a year, then a simple drop/create partition scheme won't work very well because we will be left with some old partitions containing very little of data but still taking up a lot of the space (unless they are reorganized).
In a case like this, we can think of having a table with just 365 partitions (called P1, P2, ..., P365 say) and have a small mapping table which stores the mapping between transaction date and partition. In other words, have a mapping table such as:
Trans_Date Part_No
23-2-09 54
24-2-09 55
25-2-09 56
25-2-08 56
26-2-09 57
27-2-09 58
27-2-08 58
As we can see, for dates where we want to retain transactions, we do so by storing them in the corresponding partition for the current year. Thus, data for 25th Feb for ALL years will be stored in partition number 56, data for 26th Feb for ALL years will be stored in partition number 57 and so on.
Finally, this table has to be joined to the main transaction table (this can be hidden inside a view) for all queries to assist in partition elimination.
DB2 example follows:
Create table TP_Sales (
Trans_Date timestamp,
Partition_No int,
Trans_No int,
Trans_Amt decimal(5,2),
Retain_Flag int)
Partition by Range(Partition_No)
(Starting 1 Ending 366 every 1)
Create table Part_Hash (
Trans_Date timestamp not null primary key,
Partition_No int)
// Set up data
INSERT INTO PART_HASH
SELECT trans_date,
dayofyear(trans_date)
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x
) y
// 10 rows for every day
INSERT INTO TP_SALES
SELECT trans_date,
dayofyear(trans_date),
trans_no,
trans_amt,
1
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date,
x2.trans_no,
x2.trans_amt
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x,
(SELECT row_number() over()+1000 as trans_no
,10.5 as trans_amt
FROM sysibm.columns
FETCH FIRST 10 ROWS ONLY) x2
) y
// Statistics
runstats on table tp_sales;
// We will need to create a view to hide the partition number from end users
CREATE VIEW SALES AS
SELECT A.TRANS_DATE,
TRANS_NO,
TRANS_AMT,
RETAIN_FLAG
FROM TP_SALES A JOIN PART_HASH B
ON A.TRANS_DATE = B.TRANS_DATE
AND A.PARTITION_NO = B.PARTITION_NO
Finally, a few typical queries to see if partition elimination is happening:
a. Select sum(trans_amt) From Sales where Trans_Date = current timestamp
Only some partitions being picked up (not known at compile time though)
b. Select max(trans_amt) From Sales where Trans_Date between (current date - 10 days) and current date
Only some partitions being picked up (not known at compile time though)
c. Select sum(trans_amt) From Sales where Trans_Date between (current date - 100 days ) and (current date - 40 days)
Only some partitions being picked up (not known at compile time though)
d. Select max(trans_amt) From TP_Sales where Partition_no between 45 and 55
As expected, only partitions 45 to 55 picked up.
In a case like this, we can think of having a table with just 365 partitions (called P1, P2, ..., P365 say) and have a small mapping table which stores the mapping between transaction date and partition. In other words, have a mapping table such as:
Trans_Date Part_No
23-2-09 54
24-2-09 55
25-2-09 56
25-2-08 56
26-2-09 57
27-2-09 58
27-2-08 58
As we can see, for dates where we want to retain transactions, we do so by storing them in the corresponding partition for the current year. Thus, data for 25th Feb for ALL years will be stored in partition number 56, data for 26th Feb for ALL years will be stored in partition number 57 and so on.
Finally, this table has to be joined to the main transaction table (this can be hidden inside a view) for all queries to assist in partition elimination.
DB2 example follows:
Create table TP_Sales (
Trans_Date timestamp,
Partition_No int,
Trans_No int,
Trans_Amt decimal(5,2),
Retain_Flag int)
Partition by Range(Partition_No)
(Starting 1 Ending 366 every 1)
Create table Part_Hash (
Trans_Date timestamp not null primary key,
Partition_No int)
// Set up data
INSERT INTO PART_HASH
SELECT trans_date,
dayofyear(trans_date)
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x
) y
// 10 rows for every day
INSERT INTO TP_SALES
SELECT trans_date,
dayofyear(trans_date),
trans_no,
trans_amt,
1
FROM
(SELECT to_date('31/12/2007','DD/MM/YYYY') + rn days as trans_date,
x2.trans_no,
x2.trans_amt
FROM (SELECT row_number() over() as rn
FROM sysibm.columns
FETCH FIRST 730 ROWS ONLY) x,
(SELECT row_number() over()+1000 as trans_no
,10.5 as trans_amt
FROM sysibm.columns
FETCH FIRST 10 ROWS ONLY) x2
) y
// Statistics
runstats on table tp_sales;
// We will need to create a view to hide the partition number from end users
CREATE VIEW SALES AS
SELECT A.TRANS_DATE,
TRANS_NO,
TRANS_AMT,
RETAIN_FLAG
FROM TP_SALES A JOIN PART_HASH B
ON A.TRANS_DATE = B.TRANS_DATE
AND A.PARTITION_NO = B.PARTITION_NO
Finally, a few typical queries to see if partition elimination is happening:
a. Select sum(trans_amt) From Sales where Trans_Date = current timestamp
Only some partitions being picked up (not known at compile time though)
b. Select max(trans_amt) From Sales where Trans_Date between (current date - 10 days) and current date
Only some partitions being picked up (not known at compile time though)
c. Select sum(trans_amt) From Sales where Trans_Date between (current date - 100 days ) and (current date - 40 days)
Only some partitions being picked up (not known at compile time though)
d. Select max(trans_amt) From TP_Sales where Partition_no between 45 and 55
As expected, only partitions 45 to 55 picked up.
0 Comments:
Post a Comment
<< Home