Where do I put the files - LOB or filesystem?
Here is another issue on which I wish I could speak authoritatively from hard-won experience but unfortunately, I can't do that yet. The question is, if I have an application where I need to store lots of files (such as audio, video, resumes, etc), should I store them in LOB columns inside the database or should I just store them on the filesystem and hold the path in the database? What are the pros and cons of each approach?
Like I said, since I haven't done it myself, I can only compile the aspects that need to be considered while making this decision:
Data Integrity
This is obviously going to be higher in the LOBs and is in fact the biggest positive of storing the information inside the database as opposed to outside it. There is no danger of having invalid links due to lost files (someone deleted a directory on the filesystem by mistake). This point becomes even stronger if the files are not just going to be inserted and possibly deleted later but if they are also going to be actively updated sometimes. Having them inside the database will give you the benefit of automatic locking and concurrency control so LOBs win here hands down.
Indexing and Searching
If you are going to be not just shuffling the files up and down but are also going to be doing a lot of searching on keywords, content, etc, then you have to consider the fact that having it in LOBs might give you the benefits of database indexing (such as interMedia indexes for Oracle or DB2 Extenders). This is not to say that it can't be done in the filesystem - there are probably products (i vaguely remember that Thunderbird could do this) which can do this stuff very well on the filesystem too. So i would say that both options are even here -depending on what you have and what you need, decide on LOBs or filesystem.
Security
If you have stringent security reqts, it is probably better to leave the stuff in LOBs and let db security take care, especially if the security is tied up to other relational data within the database (such as, if I have access to one department's employees, i should only be able to view their resumes, etc). This sort of stuff is possible in the filesystem also but it can get messy if it gets complex. So LOBs are a marginal winner here I would think.
Disk Space
If files are stored in LOBs, they will almost always take more space than storing on the filesystem. There are various factors involved here - one is that the RDBMS (whether Oracle or DB2) will allocate space in chunks for LOBs and hence you will always have to sacrifice some free space on the last few blocks/pages if you store in a LOB. With a filesystem, it probably means that you only have a little bit of wasted space on the last sector. So be aware of how "chunking" works for LOBs in your DB.
Another factor which can increase size is the need for handling rollbacks on LOBs - in Oracle, LOBs dont go into the rollback segment, instead different versions are stored in the LOB segment itself and the amount of space reserved for this is controllable by pctfree or by the retention_time (if you are using Automatic Undo Management). Check asktom.oracle.com for an explanation of this issue (in fact his new book on Designing Oracle has a chapter devoted to LOBS). Of course, this is configurable so it is not to say that you will certainly lose space - if you dont want rollback, then make sure you dont configure for it. However, this may work differently in other DBMS's so keep it in mind still.
A third factor which can increase size is if your DBMS doesn't support incremental backups or if it is too complex to set it up. With filesystems, one may argue that, if the directories are dated and made read-only after some time, some scripting can be done to perform incremental backups but if you are in a situation where all the files are inside the DB and you have to take a full backup, then this can become a big issue. Especially if you are also logging the LOB changes to the redo log files, then recovery times may also be affected. Again, it depends on what you need - if you need logging of LOBs for some reason, then the filesystem can't do it anyway but if you don't, then there may not be a strong reason to keep the files inside the DB. Something to consider.
Lastly, size may also be affected by compression - the operating system provides file compression options while this may not be available in LOBs or may not perform as well. I don't know, I am just guessing but this could be a factor too.
On the whole, with the size issue, filesystems seem to be the winner.
Size Limitations
Be aware of the LOB size limitations in your DBMS and compare with filesize limitations and any limits on the number of files and directories in your filesystem. The DBMS may only support LOBs upto 2 GB in some cases, so please check this.
Administration
Folder management can become pretty complex if you are storing 10s of thousands of files on the filesystem. Think about how it will be archived, purged, etc. Even with LOBs, the DBA will have to do some careful planning of where to put the LOBs, how to backup, etc, so again it may be more or less even here but it may be somewhat easier inside the DB.
Complexity
In general, it may be a bit painful to understand LOBs and work with them from your application compared to just pointing to the file from a browser and picking it all up. Be aware of limitations with jdbc, versions, etc, etc and understand LOBs fully before starting work on them.
Other issues
Also consider how things like load balancing, having multiple applications read the same file at the same time, etc may affect you. Load balancing may be especially thorny to handle with the DBMS compared to setting up on a router.
Lastly, even if you want to store the files in the filesystem, dont hardcode the entire path to the files inside the db - use UNC instead as that will make it more portable.
On the whole, the tilt may be slightly towards filesystems but then, I can't say for sure as I haven't worked hands-on on this. Your comments welcome!
Like I said, since I haven't done it myself, I can only compile the aspects that need to be considered while making this decision:
Data Integrity
This is obviously going to be higher in the LOBs and is in fact the biggest positive of storing the information inside the database as opposed to outside it. There is no danger of having invalid links due to lost files (someone deleted a directory on the filesystem by mistake). This point becomes even stronger if the files are not just going to be inserted and possibly deleted later but if they are also going to be actively updated sometimes. Having them inside the database will give you the benefit of automatic locking and concurrency control so LOBs win here hands down.
Indexing and Searching
If you are going to be not just shuffling the files up and down but are also going to be doing a lot of searching on keywords, content, etc, then you have to consider the fact that having it in LOBs might give you the benefits of database indexing (such as interMedia indexes for Oracle or DB2 Extenders). This is not to say that it can't be done in the filesystem - there are probably products (i vaguely remember that Thunderbird could do this) which can do this stuff very well on the filesystem too. So i would say that both options are even here -depending on what you have and what you need, decide on LOBs or filesystem.
Security
If you have stringent security reqts, it is probably better to leave the stuff in LOBs and let db security take care, especially if the security is tied up to other relational data within the database (such as, if I have access to one department's employees, i should only be able to view their resumes, etc). This sort of stuff is possible in the filesystem also but it can get messy if it gets complex. So LOBs are a marginal winner here I would think.
Disk Space
If files are stored in LOBs, they will almost always take more space than storing on the filesystem. There are various factors involved here - one is that the RDBMS (whether Oracle or DB2) will allocate space in chunks for LOBs and hence you will always have to sacrifice some free space on the last few blocks/pages if you store in a LOB. With a filesystem, it probably means that you only have a little bit of wasted space on the last sector. So be aware of how "chunking" works for LOBs in your DB.
Another factor which can increase size is the need for handling rollbacks on LOBs - in Oracle, LOBs dont go into the rollback segment, instead different versions are stored in the LOB segment itself and the amount of space reserved for this is controllable by pctfree or by the retention_time (if you are using Automatic Undo Management). Check asktom.oracle.com for an explanation of this issue (in fact his new book on Designing Oracle has a chapter devoted to LOBS). Of course, this is configurable so it is not to say that you will certainly lose space - if you dont want rollback, then make sure you dont configure for it. However, this may work differently in other DBMS's so keep it in mind still.
A third factor which can increase size is if your DBMS doesn't support incremental backups or if it is too complex to set it up. With filesystems, one may argue that, if the directories are dated and made read-only after some time, some scripting can be done to perform incremental backups but if you are in a situation where all the files are inside the DB and you have to take a full backup, then this can become a big issue. Especially if you are also logging the LOB changes to the redo log files, then recovery times may also be affected. Again, it depends on what you need - if you need logging of LOBs for some reason, then the filesystem can't do it anyway but if you don't, then there may not be a strong reason to keep the files inside the DB. Something to consider.
Lastly, size may also be affected by compression - the operating system provides file compression options while this may not be available in LOBs or may not perform as well. I don't know, I am just guessing but this could be a factor too.
On the whole, with the size issue, filesystems seem to be the winner.
Size Limitations
Be aware of the LOB size limitations in your DBMS and compare with filesize limitations and any limits on the number of files and directories in your filesystem. The DBMS may only support LOBs upto 2 GB in some cases, so please check this.
Administration
Folder management can become pretty complex if you are storing 10s of thousands of files on the filesystem. Think about how it will be archived, purged, etc. Even with LOBs, the DBA will have to do some careful planning of where to put the LOBs, how to backup, etc, so again it may be more or less even here but it may be somewhat easier inside the DB.
Complexity
In general, it may be a bit painful to understand LOBs and work with them from your application compared to just pointing to the file from a browser and picking it all up. Be aware of limitations with jdbc, versions, etc, etc and understand LOBs fully before starting work on them.
Other issues
Also consider how things like load balancing, having multiple applications read the same file at the same time, etc may affect you. Load balancing may be especially thorny to handle with the DBMS compared to setting up on a router.
Lastly, even if you want to store the files in the filesystem, dont hardcode the entire path to the files inside the db - use UNC instead as that will make it more portable.
On the whole, the tilt may be slightly towards filesystems but then, I can't say for sure as I haven't worked hands-on on this. Your comments welcome!
Labels: LOB CLOB BLOB filesystem pointer locator files storage
1 Comments:
This comment has been removed by the author.
By
Chandan Prabhudeva, at 10:09 PM
Post a Comment
<< Home