Learnings

Monday, January 03, 2011

AIX errors while compiling TPC-H data generator

If you get an error about missing DBGenOptions while trying to make the dbgen program from TPC-H on AIX using the xlc compiler, add the -qcpluscmt compiler option in the makefile. That should make sure the build goes through.
If you then get a runtime error on AIX 5.3 or higher such as:
$ ./dbgen
Could not load program ./dbgen:
System error: Not enough space

This could be because the program has to be compiled for 64 bit. Use the -q64 option in the makefile and rebuild. In other words, the following line in the makefile will help:
CC =xlc -qcpluscmt -q64

The other reason this can happen is if ulimit settings for data/stack are too low for the user. Try resetting these to unlimited first.

Labels:

Monday, December 06, 2010

Unix - tips, scripts, tricks, etc

1. Create a 1GB file:
/usr/sbin/lmktemp filename1 10737418424
2. Create a second 1GB file:
cp filename1 filename2
3. To create a 2GB file, append the first file to the second file:
cat filename1 >> filename2
4. To create a 3GB file, append the first file to the second file again:
cat filename1 >> filename2

Labels:

Monday, November 22, 2010

InfoSphere CDC Refresh causing SQL0668N

A Refresh operation using InfoSphere CDC uses the LOAD utility and this can put the target tables in SET INTEGRITY PENDING NO ACCESS state. In this case, issue the following command to get the table out of this state:
SET INTEGRITY FOR IMMEDIATE CHECKED

In some cases, the error SQL3608N may result when the above is issued as there could be a dependent table linked via a foreign key which is in no access state. In that case, take the parent table out of this state first. In an extreme case, there may be two tables which are mutually linked by foreign keys to each other and where both go into no access state. In that case, issue the command with both tablenames.
E.g:
set integrity for department, employee immediate checked

Labels:

Friday, November 12, 2010

DB2 correct way of taking a backup to avoid sql1035N

Quiesce the db, deactivate it and then backup.
From http://www.db2ude.com/?q=node/43

db2 CONNECT TO SAMPLE
db2 QUIESCE DATABASE IMMEDIATE FORCE CONNECTIONS
db2 TERMINATE
db2 DEACTIVATE DB SAMPLE
db2 BACKUP DATABASE SAMPLE TO "C:\DB2Backup" WITHOUT PROMPTING
db2 CONNECT TO SAMPLE
db2 UNQUIESCE DATABASE
db2 TERMINATE

Thursday, November 11, 2010

Infosphere CDC issue

TCPClient error: java.net.ConnectException: Connection refused java.net.ConnectException: Connection refused

If you get that error during any of the dm commands like dmcreateuser, dmlistuser, etc, it means your Access Server is down.

Labels:

Tuesday, November 09, 2010

Automatic Client Reroute test results

ACR works with HADR and Replication. Very simple to setup, the Alternate Server parameters have to be setup during the Catalog Database step. The feature works transparently with minimal changes required to the application.
Some points observed:
1. If ACR is setup, then a connect from CLP goes silently to the right database without warning or error in case primary is down.
2. Normal jdbc program (using regular drivers and such) will report an error when primary goes down but if ACR is setup, they will automatically re-establish connection (no need for java program to reconnect) and continue.
When connection is re-established automatically, the following exception is thrown:
1288696202906: EXCEPTION:[jcc][t4][2027][11212][3.61.36] A connection failed but has been re-established. The host name or IP address is "ls129.in.ibm.com" and the service name or port number is 50,000.
Special registers may or may not be re-attempted (Reason code = 1). ERRORCODE=-4498, SQLSTATE=08506

It has to be caught and basically ignored in the catch block and the application has to re-attempt the transaction just as if it had failed due to a lock timeout, for example.
3. When connection is re-established, the same connection object, statement objects, etc continue to work (no need to reconnect or re-prepare statements).
4. In a HADR setup, if you try to connect to the standby instead of to the primary, then the following exception is raised:
Exception in thread "main" com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-1776, SQLSTATE= , SQLERRMC=1, DRIVER=3.61.36
5. ACR will return the alternate connection details only on successful connection to the main database as such. If the main connection itself fails to go through, there is no question on getting info about the alternate server.
6. Enhancement in 9.5 FP1 for seamless failover - 4498 is suppressed and failing statement is automatically retried if it is the first one in UOW. Use of seamless failover probably needs a slightly different way of using the db driver (some javax.* classes to be used)

Labels:

HADR setup basic steps

1. Enable archive logging
update db cfg for sample using logarchmeth1 DISK:/tmp/samplearchlogs

2. Check that there is either no firewall between the two servers or if there is a firewall, then the following ports must be open:
523 (for DB2 DAS), main DB2 port (50000) and two ports for HADR

3. Create das instance on both servers if not already done
ls128 is primary and ls129 is standby

4. Catalog admin node on primary
db2 catalog admin tcpip node standadm remote ls129.in.ibm.com remote_instance db2inst1 system ls129.in.ibm.com

5. Catalog remote tcpip node and standby database
db2 catalog tcpip node db2_std remote ls129.in.ibm.com server 50000
db2 catalog database sample as sam_std at node db2_std

6. Catalog admin node on standby
db2 catalog admin tcpip node primadm remote ls128.in.ibm.com remote_instance db2inst1 system ls128.in.ibm.com

7. Catalog remote tcpip node and primary database
db2 catalog tcpip node db2_pri remote ls128.in.ibm.com server 50000
db2 catalog database sample as sam_pri at node db2_pri

8. Verify that svcename is set to 50000 on both instances.

9. Verify that db2comm is set to tcpip on both

10. We want to log all index creation, re-creation and re-organization so set LOGINDEXBUILD on
db2 update db cfg for sample using LOGINDEXBUILD ON

11. Backup db on primary and restore on secondary
db2 backup database sample compress
db2 restore database sample replace history file

12. Configure ACR on both servers (Automatic Client Reroute). Optional but recommended.
On primary,
db2 update alternate server for database sample using hostname ls129.in.ibm.com port 50000
On standby,
db2 update alternate server for database sample using hostname ls128.in.ibm.com port 50000

13. Update the services file on both primary and standby with following entries.
# HADR entries
db2_hadr_1 55001/tcp
db2_hadr_2 55002/tcp

14. Update db config params for primary
update db cfg for sample using hadr_local_host ls128
update db cfg for sample using hadr_local_svc db2_hadr_1
update db cfg for sample using hadr_remote_host ls129
update db cfg for sample using hadr_remote_svc db2_hadr_2
update db cfg for sample using hadr_remote_inst db2inst1
update db cfg for sample using hadr_syncmode sync
update db cfg for sample using hadr_timeout 3
update db cfg for sample using hadr_peer_window 120
db2 connect to sample
db2 quiesce database immediate force connections
db2 unquiesce database
db2 connect reset

15. Update db config params on standby
update db cfg for sample using hadr_local_host ls129
update db cfg for sample using hadr_local_svc db2_hadr_2
update db cfg for sample using hadr_remote_host ls128
update db cfg for sample using hadr_remote_svc db2_hadr_1
update db cfg for sample using hadr_remote_inst db2inst1
update db cfg for sample using hadr_syncmode sync
update db cfg for sample using hadr_timeout 3
update db cfg for sample using hadr_peer_window 120

16. Start HADR on standby
db2 deactivate db sample
db2 start hadr on database sample as standby

17. Start HADR on primary
db2 deactivate db sample
db2 start hadr on db sample as primary

Wednesday, October 27, 2010

SQL1277W Connection fails because a previous restore operation is still in progress

If you get an SQL1277W after having performed a DB2 redirected restore while trying to connect to the database, one possible cause could be that the backup was taken as a different user (such as root) instead of the instance owner. For databases that use automatic storage, you will likely first get warnings during the restore process saying that tablespaces using automatic storage can get their containers redefined at the end of the restore process.
In short, if you are doing a redirected restore of a database using automatic storage and the restore is not successful completely and you are unable to connect to the db after restoring, it is worth checking if the backup was taken as instance owner or not. Other symptoms could be that one or more tablespaces go offline in drop pending state.

Labels:

SQL1031N if some data files go missing

Sometimes, people inadvertently delete containers belonging to a DB2 database directly from the file system instead of dropping the database. In such cases, it can be frustrating to try and restore the database from a backup because the restore command fails as does the attempt to first try and drop the database.
The following might help...

$ db2 drop db cogcondb
SQL1031N The database directory cannot be found on the indicated file system.
SQLSTATE=58031

That could mean that the entry for the db is present in the system db directory but it could not be found in the local db directory (which will be in DFTDBPATH by default). So, let us uncatalog it from the system db directory and manually remove any leftover files for this db (including its logs) from the local db directories.

Go to each local db directory (such as C: , D: etc on Windows and /home/db2inst1, etc on Unix. Any directory that contains NODE0000 is two levels under a db directory).
Run the command "db2 list db directory on c:" or "db2 list db directory on /home/db2inst1" for example and make a note of what database is present under which home directory.
Next, issue a db2 list db directory command which will list all the databases that the system as such is aware of. If you find that there are databases which got listed on local but which are not present in the system directory, you can first catalog them (by issuing a catalog database on /home/db2inst1 for example). Cataloguing will create an entry for the database in the system directory after which you can drop it with the usual DROP database command.

Labels: