Road To OCM: Oracle performance tips for batch jobs

Use set processing

This is probably the most common performance lapse. The point is to avoid looping through millions of rows, issuing SQL calls as you go. Instead, try to process all the rows as a group. For example, looping 1 million times, running one SQL statement each time, is much slower than issuing a single SQL statement that retrieves all 1 million rows.

There are several reasons why set processing is so much faster. First of all, you can take
advantage of multi-block reads. Multi-block processing can literally give you a 100x speedup. In contrast, performing a huge number of single-block reads is rarely a good choice. The multi-block method is especially attractive if your tables are partitioned such that Oracle only scans a few partitions.

Set processing also avoids the degradation due to issuing thousands (or even millions) of separate SQL statements. This degradation is not really eliminated by the use of bind variables. Even if you use bind variables, Oracle must still process each statement, and send back the result set. In extreme cases, the time to send the SQL statement repeatedly over the network actually becomes a bottleneck.

SQL scripts, rather than a procedural language, are oftentimes the better approach. Of course,
with SQL scripts, you are pretty much forced to handle the data as a group. Set processing means
you may need staging tables to hold intermediate results. This is common practice.

Take advantage of multi-block reads

On most servers, Oracle can read up to 1 Mb (typically 64-128 blocks) at one time. That is why a
full table scan can be performed so quickly. Keep in mind, however, that 2 conditions must be
met:

- The database parameter Db_File_Multiblock_Read_Count must be set correctly
- The table or index being scanned must have extent sizes of at least 1Mb

If the multiblock parameter is set too low at the database level, you can easily alter your session
to set the parameter higher. The second point above recognizes that Oracle will not continue a
multi-block scan across extent boundaries. If most of the extents are great than 1 megabyte, it’s
probably okay. (This is one of those cases where extent sizing really does matter.)

It is especially important to optimize your reads if you are using Oracle parallelism. Why launch
many extra processes if you don’t first optimize what a single process can do?

Optimize Queries before using Oracle Parallelism

As long as there is spare capacity on your server, you can usually improve your full table scans
by using the PARALLEL hint. But first, make sure the query runs as efficiently as possible. If
the query runs badly with one process, it won’t be much better with six or eight--plus, you will
be consuming server resources unnecessarily.

As a general rule of thumb, a parallel degree of 6 typically gives excellent performance. Of
course, this assumes that you have adequate reserves (both CPU and disk throughput) on your
server.

Avoid massive deletes

Oracle is simply not very fast at deleting millions of rows. Instead, copy and temporarily store
the rows you DO want, truncate the table, and then put the rows back in. This can easily be 10x
faster.

Use summary tables and materialized views

If you repeatedly access a large table, consider building an aggregate table, or materialized view.
A materialized view used in this manner is just like a table, but you can rebuild it with a single
command. Remember to use Oracle parallelism to speed-up the refresh.

Whether you use a materialized view, or actual table, the idea is to create a “pre-digested” form
of the data. You include only the columns and rows that meet your conditions. Once you build
the summary table, you can also build custom indexes that optimize your query.

Avoid Excessive Commits

A commit after every row will usually wreck performance. Most jobs should commit no sooner
than every 1,000 rows. Committing every 10,000 rows would be a good rule of thumb.
Unless you are working with many millions of rows, further increasing the commit interval
doesn’t really offer much improvement.

Analyze new tables

If you build and populate a new table, don’t forget to gather statistics. It’s very common to see a
performance bottleneck caused by incomplete or inaccurate table statistics. It’s not necessary to
sample all the rows when gathering statistics; the “estimate” option is usually fine.

Turn off logging when feasible

Oracle allows you to turn off transaction logging for a handful of cases: creating or rebuilding
indexes, inserting rows, and Creating Table As Select. To do this, run Alter Table [name]
Nologging, then use the hint Nologging. Remember that deactivating logging means the data
cannot be recovered after a disk crash and database recovery. So this is not appropriate in every
date.

Speed Up Inserts

In cases where you don’t really need transaction logging, you can speed up inserts a bit by using
the Nologging feature. For inserting rows, set the table NOLOGGING (using Alter Table …) ,
then use this syntax: INSERT /*+APPEND */ .

Remember, however, that all the new rows will be placed at the end of the table--above the “high
water” mark. This means that if you are performing deletes from the table, these “holes” will
never be filled. The table will grow rapidly. Eventually, you will need to rebuild the table to
crunch it down.

Avoid building flat files

When transferring data to another Oracle database, it’s far simpler (and usually faster) to transfer
data over a database link. In other words, don’t take the data out of the database if you’re just
going to put it back into another Oracle database.

Use Oracle parallelism

For scans of huge tables or indexes, it is often good practice to invoke Oracle Parallel Query
Option (PQO.) On most systems, using a parallel degree of 6 gives excellent performance. Of
course, this assumes that you have the spare resources (disk and cpu).

Use bind variables

If you will be repeating a massive number of SQL statements, it is very important to properly use
bind variables. In this way, the database engine avoids a re-parse of each statement. Also, a large
number of unique SQL statements tends to flood the shared pool, causing other SQL statements
to be released as the engine makes room for the new SQL. This will annoy the DBA.

Design partitions to match the queries

Partitioning a table does not automatically improve performance. The objective of partitioning is
to reduce (or “prune”) most of the data to be searched, leaving a small group that still meets your
conditions. In order for this to happen, the partition key must match the condition used in your
queries. For example, if your queries contain a date restriction, then you should partition by date.

There are two main ways that partitioning can speed up queries:

           - If we need to perform a full table scan, we can substitute a full partition scan. In this
             case, having lots of partitions really helps.
           - If we perform an index scan, the index will be much smaller, and the scan will be slightly
             more efficient. This boost may not too noticeable unless you are doing a huge number of
             index scans.

Road To OCM

Saturday, August 11, 2012

Oracle performance tips for batch jobs

No comments:

Post a Comment