Learnings

Thursday, February 28, 2008

DB2 Design Advisor - First thoughts...

The DB2 Design Advisor is seen as an important step forward in making database performance management more automated so I thought I would play with it to see what it can do.
Had a few initial struggles getting it to work:
db2advis -d sample -s "select count(*) from tpcd.orders" -m I -a db2dmin/db2admin -o advise -c SMALL -n Administrator
and it kept failing with the following error:
0 solutions were evaluated by the advisor
exiting with error code [-30082]

Well, after scratching my head quite a bit, I finally looked into db2diag.log and the error was plainly stated there - the password or username was wrong!
On fixing it, I hit the next error:
db2advis -d sample -s "select count(*) from tpcd.orders" -m MICP -o advise -c SMALL
Using user id as default schema name.
Use -n option to specify schema
Explain tables not set up properly for schema ADMINISTRATOR
The insert into the ADVISE_INSTANCE table has failed.
0 solutions were evaluated by the advisor
exiting with error code [-219]

To fix that, I ran db2 -tvf EXPLAIN.DDL (this file is under sqllib\misc) and committed before db2advis started working:-)
After trying it with one or two sample queries, I also tried out the dynamic SQL cache option. I first reset the database monitor "db2 reset monitor for database " to clear out the dynamic SQL cache. After this, I simulated a real-life application by running a small TPC-H benchmark for half an hour or so. Ideally, just let your applications run normally for some time so that all the query patterns get captured.
Then, I issued:
db2advis -d sample -g -m MICP -o advise -c SMALL -n TPCD
The -g tells the advisor to use the statements in the dynamic SQL cache as the workload to analyze which it duly did and came up with recommendations. Some points to note in the recommendations were:
  1. It was not able to use any of the queries with bind variables for the analysis (I had fired the queries from jdbc so some of them got treated as dynamic sql with parameter markers where I had prepared the statement). However, there is an option to source the workload from packages so I guess that any 'regular' static SQL will work with the design advisor - it is only prepared statements from JDBC that may have an issue as they are treated like static SQL by the app but as dynamic SQL by DB2 (since no package is created for them).
  2. It gives a LOT of recommendations so I guess that a DBA will still be needed even with the advisor if only to pick and choose the suggestions to implement:-)
  3. It even tells you which indexes are not being used and which are the ones which are present and being used well (like certifying the DBA) and of course what new ones to create.
  4. For MDC, it came up with some interesting options which I don't understand. For the standard TPC-H Part table, it suggested MDC on the PART_SIZE column?
  5. It also prints out all the DDL for the recommendations so I can be really lazy and just copy paste.

All in all, looks promising. Hoping to be able to use it on a live problem sometime!

Labels:

Wednesday, February 27, 2008

DB2 Deep Compression

Excellent article explaining deep compression in depth for DB2. Looks like this can be a really powerful feature provided one understands how space is used and how it can be reclaimed.
Also, one interesting point that was mentioned in it was that, in DB2, the page number that is used as part of the RID of a row is a tablespace-relative page number and not a table-relative page number. Does this have a bearing on why LOADing into a table locks the whole tablespace? Will take it up as separate topic...

DB2 Partitioning - The Why and How?

If you are dealing with large DB2 databases, you may want to think about paritioning with DB2 DPF (Database Partitioning Feature) at some point. DB2 DPF is already available with your DB2 Enterprise Server Edition but needs to be separately licensed. Before you jump into it, you will probably have questions like:
  1. I have a large SMP box with many CPU's and tons of memory, can't DB2's intra-partition parallelism take care of it? Why should I consider DPF for one box?
  2. Is DPF suitable only for data warehousing or can I safely use it for OLTP systems also?
  3. Is performance the only reason for moving to DPF or are there other benefits as well?

To start with, even for a single large SMP box, there is a limit to what parallelism can achieve without partitioning. For instance, even if you have a LARGE amount of RAM, if there is only one DB2 instance, you may still not get the best possible throughput because some internal operations will still have to serialize (there may be critical latches protecting linked lists in shared memory that can end up becoming a bottleneck).

Also, for example, if I have 12 CPU's, it does not mean that the optimizer will always start off 12 parallel threads for all my large queries (there may be some constructs in my query which prevent parallelism, for instance) - this again may lead to under-utilization of resources.

And then there is the issue of manageability - things like backups or data loads could end up taking a lot of time. With DPF, DB2 can do an effective "divide and conquer" and make your large database look like many smaller ones and give you better throughput, better utilization of available resources and better manageability of the whole setup even on a single SMP box.

That was just a teaser, if I have got you interested, you should read this.

Thursday, February 14, 2008

Avoiding logging in DB2 and Oracle

Avoiding logging in DB2 and Oracle
For performance reasons, especially when dealing with large volumes of data which is easily recoverable, one may want to suppress the generation of logs (redo logs). This is possible both in
Oracle and in DB2 but there are some interesting differences.
ORACLE
In Oracle, it is possible to avoid generating logs for inserts by specifying the /*+ APPEND */ hint. When such an insert is done, the table cannot be accessed unless a commit is issued. However, it is only INSERTS for which the log generation can be avoided; UPDATES, DELETES and index creation and maintenance will ALWAYS be logged and there is no way around it.
DB2
In DB2, it is possible to avoid generating logs for INSERTS, UPDATES, DELETES and INDEX maintenance by using the NOT LOGGED INITIALLY option on the CREATE TABLE (or ALTER TABLE) and then performing the DML operation in the same unit of work. This can be especially useful in data warehouses but, as with everything, it is important to understand what happens under the covers to use the feature effectively: with NOT LOGGED specified, an insert will create pages and fill them up with data in the bufferpool and the same will happen for indexes. When a COMMIT is issued, all of these pages will be flushed to disk before the commit returns (I guess Oracle must be doing something similar). It is possible, however, that some of the pages may have been flushed to disk even before the commit is issued and that raises a couple of interesting questions:
1. What if I specify NOT LOGGED and then issue a DELETE FROM TABLE and the database crashes before I COMMIT or ROLLBACK?
2. Better still, what if I ACTIVATE NOT LOGGED and then DELETE FROM TABLE and subsequently change my mind and ROLLBACK?
Tough luck! You will receive an error saying the table can't be accessed:
SQL1477N For table "ADMINISTRATOR.AB" an object "516" in table space "3"cannot be accessed. SQLSTATE=55019
Moral of the story:
NOT LOGGED can be very useful, especially in data warehouses but test it out and understand all implications before activating it.

Labels:

Tuesday, February 12, 2008

Measuring the amount of log generated by a statement

If you want to measure the amount of log generated by an SQL statement in Oracle and DB2, you can use the following commands:

In Oracle, you can:
SQL>set autotrace on:
SQL>Run your insert/update/delete statement

Check for the entry "redo size" in the autotrace output. For example:
Statistics
----------------------------------------------------------
12975 recursive calls
1798143 db block gets
56538 consistent gets
9882 physical reads
125409912 redo size
1015 bytes sent via SQL*Net to client
874 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
31 sorts (memory)
0 sorts (disk)
297585 rows processed

That tells you how many bytes of redo were generated by the DML. As an aside, sometimes you can get non-zero redo for a SELECT also due to delayed block cleanout.

In DB2, you need to first update the monitor switch for UOW to ON;
db2=> update monitor switches using UOW on
db2=> Run your DML statements
db2=> get snapshot for applications on
You will get a detailed output, look for an entry saying "UOW log space used (bytes) ". If you want to repeat this exercise several times, you can either use the RESET MONITOR SWITCHES command between invocations or can just disconnect and reconnect because the snapshot information is for the application setting the switch.

Wednesday, February 06, 2008

Package Implementation Vs Software Development

Having been involved primarily in software development, I don't really understand how package implementations work but I came across some useful material today which talks about a typical CRM implementation. The methodology is:
Define(what you want)->Discover(detailed reqts)->Design(solutions tailored for the process)->Configure(the package)->Validate(that it does what you want)->Deploy.
The key difference from s/w development is in the Configure and Validate process. Interestingly, there is a lot of stress on trying to meet the user reqts with the available functionality; customization is frowned upon and, if unavoidable, should be kept under 20%. My own guess is that if you customize > 10%, you may be going dangerously off-track. Makes sense in a way because it is, after all, a package and not a panacea.
Also, there is a lot of stress on making sure that reqts gathering sessions are focussed around fitting the user reqts into what is available in the product and to steer clear from gathering user wish lists for features!
I may be wrong here but, in a previous assignment, I was discussing Siebel reports with a user who was involved in a data warehouse project; she wanted some of the Siebel data to be pulled out and presented in a DW through a standard reporting tool. I asked her why she couldn't do it in Siebel itself (the reporting looked very slick after all) and she said (if I understood correctly) that in Siebel, every report (or applet in Siebel terms) is tied to one underlying Business Object (such as Account or Orders) and any customization is limited to showing/hiding attributes of that underlying object.
So, if I want a report showing some fields of Orders and not others, yes, that is possible. But, if I want a report that showed attributes from different Business Objects (for example, Order details along with Customer Name, Customer City, Customer State, Customer Country, say), that is not possible because the information may be spread across several Business Objects.

Labels:

Friday, February 01, 2008

Causing a simple deadlock

If users A and B both want to update a given row, here is how they can get into a deadlock in an application using cursors:
Assume the code does a SELECT of the row that is to be updated.
1. User A fetches the row first and gets an S lock.
2. User B fetches the same row and gets an S lock too.
3. User A tries to update the row; this will try to get an X lock and wait for B.
4. User B tries to update the row; this will try to get an X lock and wait for A. Deadlock!

Thus, if we declare cursors with just a SELECT and then run multiple copies of the program, it is very easy to deadlock. To avoid this, declare cursors as SELECT FOR UPDATE. In that case,
1. User A fetches the row first and gets a U lock.
2. User B tries to fetch the row and blocks (only one can get a U lock)
3. User A tries to update the row and gets an X lock (remember, even though B is waiting, A's upgrade request will be honoured first since it already holds a lower lock on that object!)
4. User A makes the change and commits, releasing the lock
5. User B now gets the U lock or would have timed out by now.

Lock Modes

The beginning of a brave attempt to understand locking in detail. Let us start with lock modes:

Share Lock (S)
The lock owner can read the object but cannot modify it. Other concurrent processes can obtain a Share Lock or an Update Lock (one of them) on the same object. No one else can get an Exclusive Lock though.

Exclusive Lock (X)
The lock owner can read, change or delete the object. Other concurrent processes cannot get any lock on the object and have to wait unless they are running in Uncommitted Read mode. Sometimes, if lock avoidance techniques are used, other concurrent processes may be able to read committed data (not sure how yet).

Update Lock (U)
The lock owner can read the data and intends to update it in future. Other concurrent processes can read the data but no one else can change it. Before actually making the change, an Exclusive Lock (X) will be taken on the object. If other processes have an S Lock, this can cause a wait for the lock owner.
The main reason behind having an Update lock is to prevent deadlocks which can otherwise happen between two processes which have a Share Lock on the same object and first one of them attempts to get an X lock and waits; the second process may also try to get an X lock and will wait. This will lead to a deadlock. With Update locks, this can be prevented as follows:
T1 (transaction 1) gets a U lock because it intends to update. T2 gets an S lock (which means it can't attempt to update) and can read the data. If T1 now tries the update, it will try to first convert to an X lock and this will block (due to T2's S lock). T2, however, cannot try to upgrade its lock to a U lock (if it wanted to update, it should have taken a U lock initially and can't change its mind now?). What if T2 tries to upgrade to an X lock, will that not cause a deadlock? Need to check.