Saturday, August 11, 2012

Oracle recovery strategies

Database administrators must simulate data loss scenarios and test their recovery plans. Unfortunately, most database administrators do not test their backups or their recovery strategies. Testing out the recovery plan helps validate the processes in place and also keeps the administrators abreast of recovery issues and pitfalls. The four major causes of data loss are media failures, human errors, data corruption, and natural disasters. A recovery plan must be devised to deal with each event.

  • Media Failure
A common example of media failure is a disk head crash that causes the loss of all database files on a disk drive. All files associated with a database are vulnerable to a disk crash, including datafiles, control files, online redo logs, and archived logs. Recovery from media failure involves restoring all the affected files from backup and then performing recovery of them. In the worst-case scenario, the full database may have to be restored and recovered. Media failure is the primary concern of a backup and recovery strategy, because it typically requires restoring some or all database files and the application of redo during recovery.

  • Human Error
Users can accidentally delete data or drop tables. A good way to minimize such situations is to grant only those database privileges to the user that are absolutely required. But human errors will still occur and database administrators must be prepared to deal with them. Performing logical backups or exports
of tables that are likely targets of human errors is a good way to handle accidental table drops. Flashback Query, a new feature introduced in Oracle9i, is very good at recovering from accidental deletes or inserts. As a final resort, the entire tablespace containing the object affected by the human error may have to be
restored from backup and recovered. These are various recovery methods for human errors that should all be tested

  • Data Block Corruption
Faulty hardware or an operating system bug can result in an Oracle block that is not in a recognizable Oracle format, or whose contents are internally inconsistent. Oracle has the Block Media Recovery feature that restores and recovers corrupt blocks only and does not require for an entire datafile to be
recovered. This lowers the Mean Time to Recovery (MTTR) because only blocks needing recovery are restored and recovered while the effected datafiles remain online.

  • Disasters
Earthquakes and floods can result in entire data center facility being damaged. Such events require restore and recovery at an alternate site. This necessitates that backup copies kept at the alternate site in addition to the primary site. Off-site restores should be thoroughly tested and timed to ensure that businesses
can survive such disasters without compromising their service level agreements. By thoroughly testing and documenting the recovery plan database administrators can substantially limit unexpected outages and enhance their database system's availability.


No comments:

Post a Comment