THIS BLOG INCLUDES THE LATEST REVIEWS BY BASSOCANTOR

Oracle Performance: Measuring RAC Cache Fusion Internode Time

Oracle Performance: Measuring RAC Cache Fusion Internode Time

The Overhead of Running RAC

Each RAC cluster relies on a fast private interconnect amongst the nodes in the cluster.  Blocks needed by one node can quickly be sent from a node already having that block cached. This is called "Cache Fusion." 

There IS a cost to sending these blocks around the nodes; the transmission is not instantaneous, and in some cases can actually become a bottleneck. Of course, the modern Cache Fusion is FAR faster than the old days of OPS, where a block had to be written to disk by one node, then read by another node. That "ping" could easily cause a 10 ms delay just for one block. Well, we are much better now!

I always blame the network

I always blame the network

If you check the AWR report for a node on your cluster, you can see the sql that are slowed by cluster time.  On the large systems I have analyzed, some sql are slowed by 10% or more due to these delays.

It is not unusual to blame "the network" for RAC performance issues. The only problem with that idea is that it's often tough to prove. So, how does one figure out how fast your Cache Fusion really is?

An Easy Way to Measure Cache Fusion

Here is an easy way to check the RAC internode time.  One of the quickest events that Oracle uses to communicate is called the "2-way gc grant."  It's normally very fast (typically 1 ms or less.)  This is similar to a fast network "ping."

Here's the key point:  Just think of what would happen if the time to send a block in the cluster took much longer than 1 ms.  If that time doubled, for instance, your application could be seriously degraded.

We can get an historical chart, sorted by snapshot, of this fast "grant" event.  In this way, you can see if RAC has been having trouble communicating amongst the nodes.

Plug-in Power

Plug-in Power

WITH BASE AS (SELECT instance_number, SNAP_ID, TOTAL_WAITS, time_waited_micro/1000 timemsec,
 LAG(time_waited_micro/1000, 1) OVER (ORDER BY snap_id) AS PREV_TIME_MSEC,
 LAG(total_waits, 1) OVER (ORDER BY snap_id) AS PREV_waits
FROM dba_hist_system_event
WHERE event_name ='gc cr grant 2-way'
and instance_number = 1
and snap_id between tbd and tbd
)
SELECT b.SNAP_ID, b.instance_number NODE,
to_char(begin_interval_time, 'dd-mon-yy-hh24:mi') BEG,
 (TOTAL_WAITS-PREV_WAITS) "#WAITS",
ROUND((TIMEMSEC-PREV_TIME_MSEC)/(.001+TOTAL_WAITS-PREV_WAITS), 1) "RATE" FROM BASE b,
dba_hist_snapshot S
where b.instance_number = s.instance_number
andb.snap_id = s.snap_id
and (total_waits-prev_waits) > 99900
ORDER BY 1
/

In the above script, I use an analytical function, "Lag" to find the difference shown in 2 rows of the table.

Expected Output

On most systems I analyze, the internode time is 1 ms or less.  In the output below, you can see that the internode time is rock-steady at just .3 ms.  In my experience, that is about the best possible.

  SNAP_ID       NODE BEG                 #WAITS       RATE
--------- ---------- --------------- ---------- ----------
    17236          7 13-apr-09-05:00    1942375         .3
    17237          7 13-apr-09-06:00    1913682         .3
    17238          7 13-apr-09-07:00    3763238         .3
    17239          7 13-apr-09-08:00    2360403         .4
    17240          7 13-apr-09-09:00    1694804         .3
    17241          7 13-apr-09-10:00    1564779         .3
    17242          7 13-apr-09-11:00     551387         .3

On a well-designed system, the interconnect rate doesn't change much.  I typically see a few spikes to about 1.5 ms, but that's about it.

In the script above, be sure to put in your own snapshot_id's.  Also, you may want to check the internode performance among all the nodes--not just node 1, as shown in the script above.

More Info

Of course, Cache Fusion is a lot more than just this one "grant" event. Don Burleson has a nice overview of Cache Fusion Here.

Using a Network "sniffer"

Using a Network "sniffer"

Oracle Performance: Measuring RAC Cache Fusion Internode Time

How to Write a Great Book Review

How to Write a Great Book Review

 Naked Review How To Get Books Reviews by Gisela Hausmann

Naked Review How To Get Books Reviews by Gisela Hausmann