Learnings

Thursday, July 31, 2008

Real-time data warehousing

I have never really understood the need for real-time data warehouses (so what, there are so many other things that I haven't understood:-). I have come across a few instances where people were proposing a real-time DW. For example, in one case, a company wanted to capture up-to-the-minute measurements about the operation of all its compressors in the country. There were, if I remember, over 30 measurements to be captured every second for every machine. In addition, they wanted to be able to perform trend analysis on this data with data collected from several months back. It hence seemed like a real-time DW was the only solution.

On closer analysis though, it looked like they were going to do two different things with the data:
1. Examine the real-time data and try to predict machine failures
2. Examine data trends over a long period of time and probably try to predict failures as well.

For the first analysis, they needed very detailed measurements but they were only interested in that level of detail for readings from the last day or two (not last one year!). For the second case, they did not really need every reading from every second for every years. Rather, a reading taken every 5 mins, say, for a period of 1 year would have sufficed.
In cases like this, it may be much simpler to leave the main DW as it is (with data upto yesterday) and let the ODS store real-time data for the last few days. This immediately makes things simpler - there is no need to recompute any costly aggregates each time a new feed comes in (which is every second). The ODS table is likely to be reasonably small and can be well indexed to support all the typical queries. Importantly, there is never any AD-HOC analysis performed on recent real-time data. Instead, people typically use statistical tools to calculate things like moving averages, std deviations, etc and these are all well-defined queries known beforehand.


Here is a much better article on the same topic:
http://www.theregister.co.uk/2006/07/31/real_time_data_warehousing/

Here is one on using DataMirror for real-time DW but I wish the author had given one or two convincing business scenarios where a real-time DW is the only viable solution:
http://www.grcdi.nl/considerations.pdf

Tuesday, July 22, 2008

DB2 Explain Plan

On LUW, it is possible to explain a statement using the syntax "EXPLAIN PLAN FOR ". This actually fills in a lot of information about the statement into no less than 7 tables (EXPLAIN_INSTANCE, EXPLAIN_STATEMENT, EXPLAIN_STREAM,EXPLAIN_OBJECT,EXPLAIN_OPERATOR,EXPLAIN_PREDICATE and EXPLAIN_ARGUMENT). While there is bound to be a wealth of info in these tables, one thing in particular is very useful: the EXPLAIN_STATEMENT table will have both the query submitted by the user and also the internally re-written query by DB2! This can be really useful in debugging and can also teach a few things about how to write efficient queries.
z/OS has something similar although it only writes into 3 tables, one of which has detailed info about function calls.

Labels: , ,

Monday, July 14, 2008

SQLCODE=-180 SQLSTATE=22007 error with executeUpdate()

If you are facing an error with a java prepared statement that involves setting a date column, you are probably hitting a bug in the DB2 JDBC driver. I got the following error only on the first invocation of a prepared statement involving setDate.
com.ibm.db2.jcc.b.jm: DB2 SQL Error: SQLCODE=-180, SQLSTATE=22007, SQLERRMC=7, DRIVER=3.50.152
at com.ibm.db2.jcc.b.wc.a(wc.java:571)
at com.ibm.db2.jcc.b.wc.a(wc.java:57)
at com.ibm.db2.jcc.b.wc.a(wc.java:126)
at com.ibm.db2.jcc.b.tk.b(tk.java:1593)
at com.ibm.db2.jcc.b.tk.c(tk.java:1576)
at com.ibm.db2.jcc.t4.db.k(db.java:353)
at com.ibm.db2.jcc.t4.db.a(db.java:59)
at com.ibm.db2.jcc.t4.t.a(t.java:50)
at com.ibm.db2.jcc.t4.tb.b(tb.java:200)
at com.ibm.db2.jcc.b.uk.Gb(uk.java:2355)
at com.ibm.db2.jcc.b.uk.e(uk.java:3129)
at com.ibm.db2.jcc.b.uk.zb(uk.java:568)
at com.ibm.db2.jcc.b.uk.executeUpdate(uk.java:551)

Labels:

Monday, July 07, 2008

Prepared Statements and parameter markers

If you feel your SQL (from JDBC or CLI) is right but DB2 is still giving you an SQLCODE=-418, SQLSTATE=42610 error, try casting all your parameter markers to the right data types, that should fix the errors.