Sunday, April 24, 2011

Nulls in Ordering

You want to find out the tables with the highest number of rows in a database. Pretty simple, right? You whip up the following query:

select owner, table_name, num_rows
from dba_tables
order by num_rows;

And, here is the output:

OWNER         TABLE_NAME                       NUM_ROWS
------------- ------------------------------ ----------
CRM_ETL       GTT_RES_DLY
CRM_ETL       GTT_RES_PRDCT_CT
CRM_ETL       GTT_RES_PRDCT_RATE_CT
CRM_ETL       GTT_RRSD_DRVR
CRM_ETL       GTT_SUS_RES
SYS           L$1
SYS           L$2
SYS           WRI$_ADV_OBJSPACE_TREND_DATA
SYS           WRI$_ADV_OBJSPACE_CHROW_DATA
SQLTXPLAIN    SQLG$_TAB_SUBPART_COLUMNS
SQLTXPLAIN    SQLG$_DBA_SUBPART_HISTOGRAMS
SQLTXPLAIN    SQLG$_WARNING
... output truncated ...

Whoa! The NUM_ROWS columns comes up with blanks. Actually they are nulls. Why are they coming up first? This is due to the fact that these tables have not been analyzed. CRM_ETL seems like an ETL user. The tables with GTT_ in their names seem to be global temporary table, hence there are no statistics. The others belong to SYS and SQLTXPLAIN, which are Oracle default users and probably never analyzed. Nulls are not comparable to actual literals; so they are neither less or greater than the others. By default the nulls come up first when asking for a ordered list. 

You need to find the tables with the highest number of rows fast. If you scroll down, you will see these rows; but it will take some time and it makes you impatient.You can add a new predicate something like: where num_rows is not null; but it's not really elegant. It will do the null processing. And what if you want the table names with null num_rows as well? This construct will eliminate that possibility. So, you need a different approach.


Nulls Last

If you want to fetch the nulls but push them tot he end of the list rather than first, you should add a new clause to the order by - NULLS LAST, as shown below.

select owner, table_name, num_rows
from dba_tables
order by 3 desc nulls last

Here is the output:

OWNER            TABLE_NAME       NUM_ROWS
---------------- --------------- ----------
CRM_ETL F_SALES_SUMM_01 1664092226
CRM_ETL          F_SALES_SUMM_02 948708587
CRM_ETL          F_SALES_SUMM_03 167616243
... output truncated ...

This solves the problem. The nulls will be shown; but after the last of the rows with non-null num_rows value.

Saturday, April 16, 2011

Can I Fit a 80MB Database Completely in a 80MB Buffer Cache?

This is in the Series "100 Things You Probably Didn't Know About Oracle". If you haven't already, I urge you to read the other parts -

  • Part1 (Commit does not force writing of buffers into the disk), 
  • Part2 (How Oracle Locking Works), 
  • Part3 (More on Interested Transaction Lists).

During the recently concluded Collaborate 11 (http://collaborate11.ioug.org) I was asked a question which led me to the idea for this entry - the fourth in the series. If the database size is 100GB and you have a 100GB buffer cache, all the blocks will be in the memory and there will never be an I/O to the datafiles other than the regular DBWR lazy writes, right?

This is a very important question and you must consider the implications carefully. Many folks assume that by getting a large buffer cache eliminates or reduces the buffer related I/O - a very wrong assumption.A large buffer cache helps; but the relationship between buffer and block is not one to one. A block may have more than one buffer in the buffer cache. How so? Let's see how that happens. We will examine this in a non-RAC database to keep it simple.

Setup

First, let's create a table and insert some rows into it.

SQL> create table bhtest (col1 number, col2 char(2000));

Table created.

SQL> insert into bhtest values (&n,'&c');
Enter value for n: 1
Enter value for c: x
old   1: insert into bhtest values (&n,'&c')
new   1: insert into bhtest values (1,'x')

1 row created.

SQL> /
Enter value for n: 2
Enter value for c: x
old   1: insert into bhtest values (&n,'&c')
new   1: insert into bhtest values (2,'x')

1 row created.

... insert 6 rows ...
commit;

Note how I used char(2000) instead of varchar2(2000). The reason is simple. The char datatype takes up entire string of values, all 2000 of them regardless of the actual value placed there. So, even though I put 'x' there, the entire row will be quite a long one.

After the insertion, check which blocks these rows went into. We can do that by calling dbms_rowid.rowid_block_number() function.

select col1, dbms_rowid.rowid_block_number(rowid)
from bhtest;

      COL1 DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
---------- ------------------------------------
         4                                 4419
         5                                 4419
         1                                 4423
         2                                 4423
         3                                 4423

5 rows selected.

From the output we can see that rows 1 through 3 went to block 4423 and 4 and 5 went to block 4419. We also need the object ID of the object

SQL> select object_id from dba_objects
  2  where object_name = 'BHTEST'

  3  /
 OBJECT_ID
----------
     99360

This completes the setup. In summary, we know that these rows are in blocks 4419 and 4423 and the object ID is 99360.

Experiment


If possible, clear out the cache by shutting down and restarting the database. This brings up an empty cache. It's not absolutely necessary though. Now select from the table:

SQL> select * from bhtest;  

This will bring up all the blocks of the table into the buffer cache, To check for the same, you can check the view V$BH (the buffer headers). The column OBJD is the object_id. (Actually it's the DATA_OBJECT_ID. In this case both are the same; but may not be in all cases). Here are the columns of interest to us:

  • FILE# - the file_id 
  • BLOCK# - the block number 
  • CLASS# - the type of the block, e.g. data block, segment header, etc. Shown as a code 
  • STATUS - the status of the buffer, Exclusive Current, Current, etc. 
  • LOCK_ELEMENT_ADDR - if there is a buffer lock on the buffer, then the address of the lock 


To make it simpler to understand, we will use a decode() on the class# field to show the type of the block. With that, here is our query:

select file#, block#,
    decode(class#,1,'data block',2,'sort block',3,'save undo block', 4,
'segment header',5,'save undo header',6,'free list',7,'extent map',
8,'1st level bmb',9,'2nd level bmb',10,'3rd level bmb', 11,'bitmap block',
12,'bitmap index block',13,'file header block',14,'unused',
15,'system undo header',16,'system undo block', 17,'undo header',
18,'undo block') class_type, status, lock_element_addr
from v$bh
where objd = 99360
order by 1,2,3
/

Save this query because we will be using it a lot in this experiment. Here is the output.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     xcur       00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         xcur       00

7 rows selected.


There are 7 buffers. In this example we have not restarted the cache. So there are two buffers for the segment header. There is one buffer for each data block - from 4419 to 4423. The status is "xcur", which stands for Exclusive Current. I will explain that in detail in a later blog. But in summary it means that the buffer was acquired (or filled by a block) with the intention of being modified. If the intention is merely to select, then the status would have shown CR (Consistent Read). In this case since the rows were inserted modifying the buffer, the blocks were gotten in xcur mode. From a different session update a single row. For easier identification I have used Sess2> as the prompt:

Sess> update bhtest set col2 = 'Y' where col1 = 1;

1 row updated.

From the original session, check the buffers:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     xcur       00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00

8 rows selected. 

There are 8 buffers now, up one from the previous seven. Note there are two buffers for block ID 4423. One CR and one xcur. Why two?

It's because when the update statement was issued, it would have modified the block. Instead of modifying the existing buffer, Oracle creates a "copy" of the buffer and modifies that. This copy is now [Note there was a typo earlier "not", it should have been "now". Corrected. Thanks to Martin Bex] XCUR status because it was acquired for the purpose of being modified. The previous buffer of this block, which used to be xcur, is converted to "CR". There can't be more than one XCUR buffer for a specific block, that's why it is exclusive. If someone wants to find out the most recently updated buffer, it will just have to look for the copy with the XCUR status. All others are marked CR.

Let's continue with this experiment. From a third session, update a different row in the same block.

Sess3> update bhtest set col2 = 'Y' where col1 = 2;

1 row updated.

From the original session, find out the buffers.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     xcur       00
         7       4418 segment header     cr         00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         cr         00

There are 4 buffers for block 4423 now - up from 2. What happened? Since the buffer was required to be modified once more, Oracle created yet another "copy", marked it "xcur" and relegated the older one to "cr". What about the extra CR copy? That was done because Oracle had to perform something called CR processing to create a CR copy from another CR copy or an XCUR copy.

You can notice how the number of buffers proliferate. Let's change the experiment a little bit. From a 4th session, select from the table, instead of updating a row:

Sess4>  select * from bhtest ;

From the original session, check for the buffers.

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     xcur       00
         7       4418 segment header     cr         00
         7       4419 data block         xcur       00
         7       4420 data block         xcur       00
         7       4421 data block         xcur       00
         7       4422 data block         xcur       00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         cr         00
         7       4423 data block         xcur       00

12 rows selected.

Whoa! there are 12 buffers now. Block 4423 now has 6 buffers - up from 4 earlier. This was merely a select statement, which, by definition does not change data. Why did Oracle create a buffer for that?

Again, the answer is CR processing. The CR processing creates copies of the buffer and rolls them back or forward to create the CR copy as of the correct SCN number. This created 2 additional CR copies. From one block, now you have 6 buffers and some buffers were created as a result of select statement. This should answer the question whether the buffer cache of size of the database will be able to hold all the buffers.

Free Buffers

While being on the subject, let's ponder over another question - what happens when you flush the buffer cache? Let's see.

SQL> alter system flush buffer_cache;

System altered.

Checking the buffers using the script shown earlier:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

The buffers are still there and marked as belonging to the object. However the status is "free", i.e. the buffers can be reused if some session wants them for some other block. If a session wants a free buffer and can't find one, it waits with the wait event "free buffer wait". At that point, Oracle makes room in the buffer cache for the blocks requested by the session by forcing the buffers out of the buffer cache. The CR copies are discarded (since they were not updated) and the XCUR copies were written to the disk (if not written already).

Who does the writing? It's the process known as DBWR - Database Buffer Writer, which is generally named DBW0, DBW1, etc. The DBWR is a very lazy process. It sleeps most of the time, unless it is woken up by someone. In this case the session (actually the server process) kicks DBWn to write the dirty buffers to the disk and change the status to non-dirty. This is why sometimes SELECT statement may generate writing to data files. Until the buffers are freed up, the session waits patiently and displays to all that it's waiting on free buffer waits. You can check it by selecting the EVENT column from V$SESSION.

Impact on Segment Header

So far we talked about the data block. When does the segment header see some action? Segment header does not contain any data; so it does not need to be updated every time an update is made. But when the segment itself is modified, segment header is updated. Let's see an example with adding a column:

SQL> alter table bhtest add (col3 number);

Table altered.

Check the buffers:

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

16 rows selected.

There are 6 copies of the segment header. Table alteration changed the segment header block - block# 4418. Just like any other block, the buffer was copied over and modified, creating multiple copies of the block. Whe you issue another DDL operation - truncate - the result is the same:

SQL> truncate table bhtest; 

     FILE#     BLOCK# CLASS_TYPE         STATUS     LOCK_ELE
---------- ---------- ------------------ ---------- --------
         7       4416 1st level bmb      free       00
         7       4417 2nd level bmb      free       00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4418 segment header     free       00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     cr         00
         7       4418 segment header     free       00
         7       4419 data block         free       00
         7       4420 data block         free       00
         7       4421 data block         free       00
         7       4422 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00
         7       4423 data block         free       00

20 rows selected.

There are now additional segment header buffers, since truncate is a DDL command and segment header is modified. Even though there was no block of the table in the buffer cache (remember, we flushed it), the segment header still needs to be updated. Truncate also marks all blocks as free, and resets the high water mark and updates the bitmap block. The bitmap block (BMB) is used in tablespaces with automatic segment space management (ASSM) to display whether a block is free or not, which is similar to the functionality of freelists. The truncate caused the BMBs (there are two - 1st level and 2nd level) to be modified as well and they also come into the buffer cache.

Takeaways

From the above discussion you saw how a table with just two blocks populated fills up the buffer cache with 20 buffers. Imagine a normal database with, say 10000 filled blocks (8KX10,000 = 80M). It might easily fill 200,000 buffers. With a 8K block size that amounts to 8 K X 200 K = 1600M, or about 1.6 GB of buffer cache. The amount of buffers taken up depends on several factors:

(1) less buffers consumed in selects
(2) less buffers consumed if the commits occur more frequently, since the demand for CR processing is less
(3) more buffers are consumed if the activity updates buffers
(4) more the logical I/O, the more the need for buffers
(5) DDL activities increase the demand for the buffers, even if the blocks are empty

Food for Thought

Now that you know how the buffers are consumed, what do you think the situation is in a RAC environment  where buffers are transferred from one instance to the other - better, worse, or about the same? The answer will come in a future blog. Stay tuned.

Fourth Day at Collaborate 11

Thank you to those who attended my fourth and last session Designing Applications for RAC at Collaborate 11. Considering it was at 8:30 AM on Thursday, right after the big party night, you must all be really committed to your work. Much appreciated.

You can get the slides here.

I look forward to seeing you all at Collab next year.

Second Day at Collaborate 11

Many thanks to those attended my session RAC for Beginners despite the 8 AM timeslot. You must be really motivated. I hope you found the session useful and leaned something you can start using.

You can download the slides here.

Thanks to those who came to my second session of the day - Secure Your Database in a Single Day. Hopefully that met your expectations. The slides are found here.

Monday, April 11, 2011

Day One of Collaborate 11 - My Session

Thank you everyone who came to my session which got off with a rocky start due to a recording issue in the facilities. I was also the manager of the Manageability Bootcamp which ran the whole day. My sincere thanks to the speakers who spoke on the track. We had some of the best speakers and best topics.

If you want to get a copy of my session - Wait Events in RAC - you can download it here. Three more sessions and two expert panels to go.

My next sessions

(1) RAC Customer Panel - 4/12 9:15 - 10:15
(2) HA Bootcamp Panel - 4/12 11:45 - 12:15
(3) RAC for Beginners - 4/13 8:00 - 9:00
(4) Secure your Database in a Single Day - 4/13 2:15-3:15
(5) Designing Applications for RAC - 4/14 8:30 - 9:30

I hope to see all there.

Translate