Wednesday, August 31, 2005

New Orleans and surrounding area

I was just there, in June of this year.  It is so sad.  Watching CNN in Prague and it is the only thing on the news.  I cannot believe the shots I’m seeing.  They are describing roads I’ve been on, cities I’ve been in and they are unrecognizable.

20,000 people in the superdome.  By bus to Texas.  That is over 400 buses of people.  400.  Hundreds, maybe more didn’t make it through the flooding. I can only say “stunned”.

So much for technology.

My heart goes out to them down there.  I cannot imagine. Watching the people on TV is hard.

Hurrican Katrina relief, Please visit the Red Cross and give what you can. Oracle employees - Oracle will match your gifts - see www.easymatch.com/oracle for more details.

Part II Seeing a Restart

Part II, Seeing a Restart
It is easier to see a restart than you might at first think. We’ll be able to observe one, in fact, using a simple one-row table. This is the table we’ll use to test with:
ops$tkyte@ORA10G> create table t ( x int, y int );
Table created.

ops$tkyte@ORA10G> insert into t values ( 1, 1 );
1 row created.

ops$tkyte@ORA10G> commit;
Commit complete.
To observe the restart, all we need is a trigger to print out some information. We’ll use a BEFORE UPDATE FOR EACH ROW trigger to simply print out the before and after image of the row as the result of an update:
ops$tkyte@ORA10G> create or replace trigger t_bufer
  2  before update on t for each row
  3  begin
  4          dbms_output.put_line
  5          ( 'old.x = ' || :old.x ||
  6            ', old.y = ' || :old.y );
  7          dbms_output.put_line
  8          ( 'new.x = ' || :new.x ||
  9            ', new.y = ' || :new.y );
10  end;
11  /
Trigger created.
Now we’ll update that row:
ops$tkyte@ORA10G> set serveroutput on
ops$tkyte@ORA10G> update t set x = x+1;
old.x = 1, old.y = 1
new.x = 2, new.y = 1
1 row updated.
So far, everything is as we expect: the trigger fired once, and we see the old and new values. Note that we have not yet committed, however—the row is still locked. In another session, we’ll execute this update:
ops$tkyte@ORA10G> set serveroutput on
ops$tkyte@ORA10G> update t set x = x+1 where x > 0;
That will immediately block, of course, since the first session has that row locked. If we now go back to the first session and commit, we’ll see this output (the update is repeated for clarity) in the second session:
ops$tkyte@ORA10G> update t set x = x+1 where x > 0;
old.x = 1, old.y = 1
new.x = 2, new.y = 1
old.x = 2, old.y = 1
new.x = 3, new.y = 1
1 row updated.
As you can see, that row trigger saw two versions of that row here. The row trigger was fired two times: once with the original version of the row and what we tried to modify that original version to, and again with the final row that was actually updated. Since this was a BEFORE FOR EACH ROW trigger, Oracle saw the read-consistent version of the record and the modifications we would like to have made to it. However, Oracle retrieved the block in current mode to actually perform the update after the BEFORE FOR EACH ROW trigger fired. It waits until after this trigger fires to get the block in current mode, because the trigger can modify the :NEW values. So Oracle cannot modify the block until after this trigger executes, and the trigger could take a very long time to execute. Since only one session at a time can hold a block in current mode, Oracle needs to limit the time we have it in that mode.
After this trigger fired, Oracle retrieved the block in current mode and noticed that the column used to find this row, X, had been modified. Since X was used to locate this record and X was modified, the database decided to restart our query. Notice that the update of X from 1 to 2 did not put this row out of scope; we’ll still be updating it with this UPDATE statement. Rather, it is the fact that X was used to locate the row, and the consistent read value of X (1 in this case) differs from the current mode read of X (2). Now, upon restart, the trigger sees the value of X=2 (following modification by the other session) as the :OLD value and X=3 as the :NEW value.
So, that shows that these restarts happen. It takes a trigger to see them in action; otherwise, they are generally “undetectable.” That does not mean you cannot see other symptoms—such as a large UPDATE statement rolling back work after updating many rows and then discovering a row that causes it to restart—just that it is hard to definitively say, “This symptom is caused by a restart.”
An interesting observation is that triggers themselves may cause restarts to occur even when the statement itself doesn’t warrant them. Normally, the columns referenced in the WHERE clause of the UPDATE or DELETE statement are used to determine whether or not the modification needs to restart. Oracle will perform a consistent read using these columns and, upon retrieving the block in current mode, it will restart the statement if it detects that any of them have changed. Normally, the other columns in the row are not inspected. For example, let’s simply rerun the previous example and use WHERE Y>0 to find the rows:
ops$tkyte@ORA10G> update t set x = x+1 where y > 0;
old.x = 1, old.y = 1
new.x = 2, new.y = 1
old.x = 2, old.y = 1
new.x = 3, new.y = 1
1 row updated.
You might at first wonder, “Why did Oracle fire the trigger twice when it was looking at the Y value? Does it examine the whole row?” As you can see from the output, the update was in fact restarted and the trigger again fired twice, even though we were searching on Y>0 and did not modify Y at all. But, if we re-create the trigger to simply print out the fact that it fired, rather than reference the :OLD and :NEW values
ops$tkyte@ORA10G> create or replace trigger t_bufer
  2  before update on t for each row
  3  begin
  4          dbms_output.put_line( 'fired' );
  5  end;
  6  /
Trigger created.

ops$tkyte@ORA10G> update t set x = x+1;
fired
1 row updated.
and go into that second session again and run the update, we observe it gets blocked (of course). After committing the blocking session, we’ll see the following:
ops$tkyte@ORA10G> update t set x = x+1 where y > 0;
fired
1 row updated.
The trigger fired just once this time, not twice. This shows that the :NEW and :OLD column values, when referenced in the trigger, are also used by Oracle to do the restart checking. When we referenced :NEW.X and :OLD.X in the trigger, X’s consistent read and current read values were compared and found to be different. A restart ensued. When we removed the reference to that column from the trigger, there was no restart.
So the rule is that the set of columns used in the WHERE clause to find the rows plus the columns referenced in the row triggers will be compared. The consistent read version of the row will be compared to the current read version of the row, and if any of them are different the modification will restart.
Note     You can use this bit of information to further understand why using an AFTER FOR EACH ROW trigger is more efficient than using a BEFORE FOR EACH ROW. The AFTER trigger won’t have the same effect.
Which leads us to the “Why do we care?” question.
To be continued….

Tuesday, August 30, 2005

Something Different Part I of III


I was giving my seminar today and did the read/write consistency section. This is by far my favorite part of Oracle. It is the feature that made me choose Oracle above all other databases back in the early 1990’s. The following is from the forthcoming book – but I find it interesting enough to print here as well. You should also check out http://asktom.oracle.com/~tkyte/wc.html for links to some other “more raw” (not edited) reading on the topic. I think it is important for us all to understand this with regards to using Oracle. The following is quoted from the book

Part I of III; Write Consistency
So far, we’ve looked at read consistency: Oracle’s ability to use undo information to provide non-blocking query and consistent (correct) reads. We understand that as Oracle reads blocks for queries out of the buffer cache, it will ensure that the version of the block is “old” enough to be seen by that query.

But that begs the following question: What about writes/modifications? What happens when you run the following UPDATE statement:

Update t set x = 2 where y = 5;


and while that statement is running, someone updates a row it has yet to read from Y=5 to Y=6 and commits? That is, when your UPDATE began, some row had the value Y=5. As your UPDATE reads the table using consistent reads, it sees that the row was Y=5 when the UPDATE began. But, the current value for Y is now 6—it’s not 5 anymore—and before updating the value of X, Oracle will check to see that Y is still 5. Now what happens? How are the updates affected by this?
Obviously, we cannot modify an old version of a block—when we go to modify a row, we must modify the current version of that block. Additionally, Oracle cannot just simply skip this row, as that would be an inconsistent read and unpredictable. What we’ll discover is that in such cases, Oracle will restart the write modification from scratch.

Consistent Reads and Current Reads


Oracle does do two types of block gets when processing a modification statement. It performs

* Consistent reads: When “finding” the rows to modify
* Current reads: When getting the block to actually update the row of interest

We can see this easily using TKPROF. Consider this small one row example, which reads and updates the single row in table T from earlier:

ops$tkyte@ORA10GR1> alter session set sql_trace=true;

Session altered.

ops$tkyte@ORA10GR1> select * from t;

X
----------
10001

ops$tkyte@ORA10G> update t t1 set x = x+1;
1 row updated.

ops$tkyte@ORA10G> update t t2 set x = x+1;
1 row updated.

When we run TKPROF and view the results, we’ll see something like this (note that I removed the ELAPSED, CPU, and DISK columns from this report):

select * from t


call ...count query current rows
------- ----- ----- ------- ----
Parse.......1 ....0 ......0 ...0
Execute.....1 ....0 ......0 ...0
Fetch.......2 ....3 ......0 ...1
------- ----- ----- ------- ----
total.......4 ....3 ......0 ...1

update t t1 set x = x+1

call ...count query current rows
------- ----- ----- ------- ----
Parse ......1 ....0 ......0 ...0
Execute ....1 ....3 ......3 ...1
Fetch ......0 ....0 ......0 ...0
------- ----- ----- ------- ----
total ......2 ....3 ......3 ...1

update t t2 set x = x+1

call ...count query current rows
------- ----- ----- ------- ----
Parse ......1 ....0 ......0 ...0
Execute ....1 ....3 ......1 ...1
Fetch ......0 ....0 ......0 ...0
------- ----- ----- ------- ----
total ......2 ....3 ......1 ...1

So, during just a normal query, we incur three query (consistent) mode gets. During the first UPDATE, we incur the same three I/Os (the search component of the update involves finding all of the rows that are in the table when the update began, in this case) and three current mode gets as well. The current mode gets are performed in order to retrieve the table block as it exists right now, the one with the row on it, to get an undo segment block to begin our transaction, and an undo block. The second update has exactly one current mode get—since we did not have to do the undo work again, we had only the one current get on the block with the row we want to update. The very presence of the current mode gets tells us that a modification of some sort took place. Before Oracle will modify a block with new information, it must get the most current copy of it.

So, how does read consistency affect a modification? Well, imagine you were executing the following UPDATE statement against some database table:

Update t set x = x+1 where y = 5;


We understand that the WHERE Y=5 component, the read-consistent phase of the query, will be processed using a consistent read (query mode gets in the TKPROF report). The set of WHERE Y=5 records that was committed in the table at the beginning of the statement’s execution are the records it will see (assuming READ COMMITTED isolation—if the isolation is SERIALIZABLE, it would be the set of WHERE Y=5 records that existed when the transaction began). This means if that UPDATE statement were to take five minutes to process from start to finish, and someone added and committed a new record to the table with a value of 5 in the Y column, then that UPDATE would not “see” it because the consistent read would not see it. This is expected, and normal. But, the question is, what happens if two sessions execute the following statements in order:

Update t set y = 10 where y = 5;

Update t Set x = x+1 Where y = 5;

Table 7-8 demonstrates the timeline:
Table 7-8. Sequence of Updates

T1 Session 1: Update t set y = 10 where y = 5; This updates the one row that matches the criteria.

T2
Session 2: Update t Set x = x+1 Where y = 5; Using consistent reads, this will find the record session 1 modified, but it won’t be able to update it since session 1 has it blocked. Session 2 will block and wait for this row.

T3 Session 1: Commit; -- This releases session 2; session 2 becomes unblocked. It can finally do the current read on the block containing this row, where Y was equal to 5 when session 2 began its update.

So the record that was Y=5 when you began the UPDATE is no longer Y=5. The consistent read component of the UPDATE says, “You want to update this record because Y was 5 when we began,” but the current version of the block makes you think, “Oh, no, I cannot update this row because Y isn’t 5 anymore—it would be wrong”.

If we just skipped this record at this point and ignored it, then we would have a nondeterministic update. It would be throwing data consistency and integrity out the window. The outcome of the update (how many and which rows were modified) would depend on the order in which rows got hit in the table and what other activity just happened to be going on. You could take the same exact set of rows and in two different databases, each one running the transactions in exactly the same mix, you could observe different results, just because the rows were in different places on the disk.

In this case, Oracle chose to restart the update. When the row that was Y=5 when you started is found to contain the value Y=10, Oracle will silently roll back your update and restart it—assuming you are using READ COMMITTED isolation. If you are using SERIALIZABLE isolation, then at this point you would receive an ORA-08177 can't serialize access error for this transaction. In READ COMMITTED mode, after the transaction rolls back your update, the database will restart the update (i.e., change the point in time at which the update is “as of”), and instead of updating the data again, it will go into SELECT FOR UPDATE mode and attempt to lock all of the rows WHERE Y=5 for your session. Once it does this, it will run the UPDATE against that locked set of data, thus ensuring this time that it can complete without restarting.

But to continue on with the “but what happens . . .” train of thought, what happens if after restarting the update and going into SELECT FOR UPDATE mode (which has the same read-consistent and read current block gets going on as an update does), a row that was Y=5 when you started the SELECT FOR UPDATE is found to be Y=11 when you go to get the current version of it? That SELECT FOR UDPDATE will restart and the cycle begins again.
There are two questions to be addressed here—two questions that interested me, anyway. The first was, Can we observe this? Can we see this actually happen? And the second was, So what? What does this actually mean to us as developers? We’ll address these questions in turn now.

To be continued….

Time Flies

Just getting ready to do the second day of a three day seminar. I find these to be very tiring. When I'm done talking for an entire day - you can just "stick a fork in me, I am done". Well, by lunch today, I'll be halfway done anyway.

Looking forward to getting home Friday for dinner and having the three day weekend. If the weather holds as it is forecasted, we’ll have 3 days of sun and temperatures in the low 80’s(f)/29(c). 

I noticed that Jonathan Lewis has updated his site. I wish he would “blog” :) that way I could get notified instead of having to poll it from time to time. He has an interesting followup to the discussion we were having last week posted there.  You have to appreciate the quoting of Lewis Carroll’s The hunting of the Snark.

Anyway, to anyone in the area affected by the hurricane we just had – I hope you are safe and well. The pictures I’ve seen so far are incredible, hard to process. I was just down there recently and I cannot recognize it from the pictures I’ve been seeing on CNN over here in Prague. The destruction looks incredible.

Saturday, August 27, 2005

Hospitality

I'm tired right now. Spent the entire day walking around Prague. It really is a beautiful city.We started at the Prague Castle and worked our way down to the Golden Line. After that, we hit the Charles Bridge and ate a late lunch there. My hosts were Roman and Rita, two locals who have lived in Prague all their lives (Roman works for Oracle Czech):

Roman_Rita

That is a a picture of them outside of Kafka's house on the Golden Line. The weather was perfect (70f/20c) and the sky was clear. Quite a change from Iceland where the sky was clear but the temperature was more like 40f/5c. A perfect day for walking and a good thing too - we were out for almost 6 hours. Of course, there is a lot of the city left that we didn't see - but they planned it out well and we hit a lot of the major sites. I truly appreciate them taking a day to show me around, their hospitality is much appreciated.

If you want to see more pictures, I have 19 of them here.

In short, a perfect day - and now it is "Pilsner Urquell" time, I understand there might be one waiting for me downstairs.

Friday, August 26, 2005

Is reality the same for everyone

A couple of weeks ago, Lewis Cunningham sent me an interesting URL. It was to an old article by Philip K. Dick, you can read it here. (warning, it is long!)

I really liked his short story about dog. Think about reality from the perspective of a dog for a minute, how might the world look different. To a dog, a garbage man might not be a “good person”.  From the perspective of the dog, perhaps a garbage can is a place his owners carefully place food and the garbage men are therefore thieves.  The author goes onto how the dog could extrapolate from this and how these extrapolations would seem sensible to the dog given the knowledge/view of reality the dog has.

He then wrote:

 And then I began to think, Maybe each human being lives in a unique world, a private world, a world different from those inhabited and experienced by all other humans.

And you know what, given the flury of activity over the last couple of days – I believe he has something there.  Some worlds are just very different from others. It would explain why two people can look at the same thing and see totally different meanings. It would definitely explain lots of things :)

It is a long and rambling article near the end, but the first parts at least are worth a read, something to make you go “hmmm”.

Wednesday, August 24, 2005

Greetings from Iceland

Iceland has been great so far, I'm one and a half days into the seminar (one and half left to go). First day I was here, my host took me on a tour and we ended up at some hot springs.

 Me in Iceland

After that, we went to a place called the Blue Lagoon - a hot spring we could go swimming in, that was pretty cool (hot really, but you know what I mean). Now it has just been talking for two days. Still looking forward to Prague, going to spend the weekend there being a tourist, something I don't often get the chance to do.

In earlier blog entries, people had asked "why don't more Oracle people seem to be doing this", I just wanted to point out a new blog that came online recently. Abhinav Agarwal has started an Oracle Business Intelligence blog, you might want to check it out.

Sunday, August 21, 2005

On travel

I'm at the airport, waiting for my flight to Iceland. After that - to Prague for the weekend and next week. I don't know when or if I'll have internet connectivity during the next two weeks, so it'll be definitely catch as catch can for questions and blogging. That, and I'll be doing two three day seminars back to back and a day with a customer in Prague. So, out of 12 days of travel, 7 of them will be spent on my feet, talking. Meaning - I just might not feel like getting on a computer at night.

Normal programming to resume the beginning of September, although I’m sure I’ll be one from time to time, just not as much as normal.

 

Friday, August 19, 2005

Download part of the new book

If you are interested in seeing the table of contents, description of each chapter, the most amazing foreword (I could not have paid someone for a nicer, better one) by Ken Jacobs (also known as "Dr. DBA") and other bits about the new book, I've made the pdf of it available here.

I would like to point out a comment to this blog by one of the technical reviewers of the book - Gabe Romanescu.  While reviewing, I could not help but notice that Gabe seemed to have had the old book next to the new based on his comments. If anyone is able to give a fairly unbiased “how different is it” overview, I think it would be him. I fell myself the new material was a rewrite more than an “improved, now mentions 9i and 10g!” book. But I am very biased (obviously). And now you might be able to figure out who thought one of the chapters was “weak” :)

Each of the four reviewers had distinctly different personalities, I would be hard pressed to stack rank them as each had their time to put it back on the correct path. I take their input very seriously. Gabe was outstanding, I hope he stays on for the next one.

768 pages. And that was just the first third of the old book (which was 1,265 all together). I’m afraid but at the same time looking sort of forward to doing the second half. The goal will be to make it 1,000 or less pages (in my mind) but to fit in as much as I can. Lots of new ideas – instrumentation will definitely be a new chapter, the new tools available with 9i and 10g that didn’t really exist before as well.

We just sent it off to be turned into paper tonight. No more changes, no more updates, no more reviews.

I am going to reproduce a bit of the content you can download from the above link here:

First, I would like to thank Tony Davis for his work making my work read well. If you enjoy
the flow of the sections, the number of section breaks, and the clarity, then that is probably in
some part due to him. I have worked with Tony writing technical material since the year 2000
and have watched his knowledge of Oracle grow over that time. He now has the ability to not
only “edit” the material, but in many cases “tech edit” it as well. Many of the examples in this
book are there because of him (pointing out that the casual reader was not going to “get it”
without them). This book would not be what it is without him.

I consider Tony not only the editor of this book and the original Expert One on One Oracle, but a good friend as well. It is good to work with people you like. So, to Dr. Tony Davis – thanks much (what a Dr. is doing editing books I still have yet to figure out but hey, it works for me).  I owe him a martini or two at Oracle Open World.

But first, two weeks, no writing. I’m excited to go to Prague next weekend.  I have Saturday and Sunday to be “tourist”, something I don’t get the chance to do often. There will be pictures of that I am sure. (I am less excited to be having to speak all day for 7 days in the next two weeks but…)

Thursday, August 18, 2005

Way too connected

Just thought this was funny. I'm in San Francisco today at Oracle corporate. Went downtown to visit a friend. I got downtown early and found a Starbucks, got online. Saw they were online as well and instant messaged them to let them know I was here early.

Turns out we were both in the same Starbucks sitting maybe 25 feet from eachother. Good thing the network was networking or we might have sat there for an hour without knowing the other was there ;)

Now off to dinner, then redeye back to the east coast for tomorrow. I’ll be home for two days and then I’m off to Iceland and Prague for two weeks.  Everyone who says “asktom never takes questions in my time zone” might have a chance over the next two weeks – I’ll be on European time.  Have two seminars to give so I might not take a lot of new questions, but when the queues are open, they’ll be relative to GMT, not east coast time.

Wednesday, August 17, 2005

A little more on the new book

As the print date draws close for the new book, I thought I would share the details of what is in it. The price is right now on Amazon by the way. I think everyone should know what is 'inside' before they commit to buying.

Feel free to ask any question you would like about the book here and I'll be glad to address them.

Chapter 1, Developing Successful Oracle Applications. This chapter sets out my essential approach to database programming. All databases are not created equal and in order to develop database-driven applications successfully and on time, you need to understand exactly what your particular database can do and how it does it. If you do not know what your database can do, you run the risk of continually 're-inventing the wheel' – developing functionality that the database already provides. If you do not know how your database works you are likely to develop applications that perform poorly and do not behave in a predictable manner.
  The chapter takes an empirical look at some applications where a lack of basic understanding of the database has lead to project failure. With this example-driven approach, the chapter discusses the basic features and functions of the database that you, the developer, need to understand. The bottom line is that you cannot afford to treat the database as a black box that will simply 'churn out the answers' and take care of scalability and performance by itself.


Chapter 2, Architecture Overview.  This chapter covers the basics of Oracle architecture. We start with some clear definitions of two terms that are very misunderstood by many in the Oracle world, namely Instance and Database.  We also take a quick look at the SGA (Shared Global Area) and the processes behind the Oracle instance, and describe how the simple act of “connecting to Oracle” takes place.

Chapter 3, Files. This chapter covers in depth the eight types of files that make up an Oracle database and instance. From the simple parameter file to the data and redo log files, we explore what they are, why they are there, and how you use them.

Chapter 4, The Memory Structures. This chapter covers how Oracle uses memory – both in the individual processes (PGA or Process Global Area memory) and shared memory (SGA).  We explore the differences between manual and automatic PGA and, in Oracle 10g, SGA memory management and see when each would be appropriate.  After reading this chapter, you will have an understanding of exactly how Oracle uses and manages memory.

Chapter 5, Oracle Processes. This chapter offers an overview of the types of Oracle processes (server processes versus background processes).  It also goes into much more depth on the differences between connecting to the database via a shared server or dedicated server process.  We’ll also take a look process by process at most of the background processes such as LGWR, DBWR, PMON and SMON you’ll see when you start an Oracle instance and explain the functions of each.

Chapter 6, Locking and Latching. Different databases have different ways of doing things (what works well in SQL Server may not work as well in Oracle) and understanding how Oracle implements locking and concurrency control is absolutely vital to the success of your application.
  This chapter discussed Oracle's basic approach to these issues, the types of locks that can be applied (DML, DDL, latches...) and the problems that can arise if locking is not implemented carefully (deadlocking, blocking and escalation).

Chapter 7, Concurrency and Multi-versioning. In this chapter, we’ll explore my favorite Oracle feature – multi-versioning – and how it affects concurrency controls and the very design of your application. Here you will learn that all databases are not created equal and that their very implementation can have an impact on the design of your applications. We’ll start by reviewing the various transaction isolation levels as defined by the ANSI SQL standard and see how they map to the Oracle implementation (as well as how the other databases map to this standard). Then we’ll take a look at what implications multi-versioning, the feature that allows Oracle to provide non-blocking reads in the database, might have for you.


Chapter 8, Transactions. Transactions are a fundamental feature of all databases – they are part of what distinguishes a database from a file system. And yet, they are often misunderstood and many developers do not even know that they are accidentally not using them. This chapter examines how transactions should be used in Oracle and also exposes some 'bad habits' that have been picked up when developing with other databases. In particular, we look at the implications of atomicity and how it affects statements in Oracle. We also discuss transaction control statements (COMMIT, SAVEPOINT, ROLLBACK), integrity constraints, distributed transactions (the two-phase commit) and finally autonomous transactions.


Chapter 9, Redo and Undo. It can be said that the developer does not need to understand the detail of redo and undo as much as the DBA, but developers do need to know the roles they play in the database. After first defining redo, we examine what exactly a COMMIT does. We discus how to find out how much redo is being generated, turning off logging  and also analyzing redo.
  In the undo section of the chapter examine the role of undo data, and the operations that generate the most/least undo. Finally we investigate the infamous ORA-01555, snapshot too old, error, its possible causes and how to avoid it..


Chapter 10, Database Tables. Oracle now supports numerous types of table. This chapter looks at each different type – heap organized (the default, 'normal' table), index organized, index clustered, hash clustered, nested, temporary, and object – and discusses when, how, and why you should use them. Most of time the heap-organized table is sufficient, but this chapter will help you recognize when one of the other types might be more appropriate.


Chapter 11, Indexes. Indexes are a crucial aspect of your application design. Correct implementation requires an in-depth knowledge of the data, how it is distributed, how it will be used. Too often, indexes are treated as an afterthought in application development, and performance suffers as a consequence.
  This chapter we look in detail at the different types of indexes, including B*Tree, bitmap, function-based, and application domain indexes, and discuss where they should and should not be used. We'll also answer some of those common queries in the Frequently Answered Questions section, such as 'Do indexes work on views?' and 'Why isn't my index getting used?


Chapter 12, Datatypes.  Datatypes – there are a lot to choose from. This chapter explores each of the 22 built-in datatypes, explaining how they are implemented, and how and when to use each one. First up is a brief overview of NLS (National Language Support), a basic knowledge of which is necessary to fully understand the simple string types in Oracle. We then move on to the ubiquitous NUMBER type and look at the new Oracle 10g options for storage of numbers in the database. The LONG and LONG RAW types are covered, but mostly from an historical perspective. The main objective here is to show how to deal with legacy LONG columns in applications and migrate them to the LOB type. Next, we delve into the various datatypes for storing dates and time, investigating how to manipulate the various datatypes to get what we need from them.  The ins and outs of time zone support are also covered.
  Next up is the LOB datatype, covering how they are stored and what each of the many settings such as IN ROW, CHUNK, RETENTION, CACHE, and so on mean to you.  When dealing with LOBs, it is important to understand how they are implemented and how they are stored by default – especially when it comes to tuning their retrieval and storage. We close up with the ROWID and UROWID types.  These are special types, proprietary to Oracle, that represent the address of a row. We’ll describe when to use them as a column datatype in a table (which is almost never!)


Chapter 13, Partitioning. Partitioning is designed to facilitate the management of very large tables and indexes, by implementing a 'divide-and-conquer' logic – basically breaking up a table or index into many smaller, and more manageable, pieces. It is an area where the DBA and developer must work together to maximize application availability and performance. This chapter covers both table and index partitioning. We look at partitioning using local indexes (common in data warehouses) and global indexes (common in OLTP systems).


Chapter 14, Parallel Execution.  This chapter introduces the concept of, and uses for, parallel execution in Oracle.  We start by describing when parallel processing is useful and should be considered, as well as when it should not be considered. After gaining that understanding, we move into the mechanics of parallel query, the feature most people associate with parallel execution.  Next we cover parallel DML (PDML), which allows us to perform modifications using parallel execution.  We’ll see how PDML is physically implemented and why that implementation leads to a series of restrictions regarding PDML.
  We then move into parallel DDL. This, in my opinion, is where parallel execution really shines. Typically DBAs have small maintenance windows in which to perform large operations.  Parallel DDL gives the DBA the ability to fully exploit the machine resources they have available, permitting them to finish large, complex operations in a fraction of the time it would take if done serially.
  The chapter closes on procedural parallelism, the means by which you can execute your application code in parallel.  We cover two techniques here. The first is parallel pipelined functions – the ability of Oracle to execute your stored function in parallel dynamically.  The second is do-it-yourself parallelism, whereby you design your application to run concurrently.


Chapter 15, Data Loading and Unloading. This first half of this chapter focuses on SQLLDR and covers the various ways in which we can use this tool to load and modify data in the database. Issues covered include loading delimited data, updating existing rows and inserting new ones, unloading data, and how to call SQLLDR from a stored procedure. Again, SQLLDR is a well-established and crucial tool but is the source of many questions with regard to its practical use. The second half focuses on external tables – an alternative and highly efficient means by which to bulk load and unload data.

 

Monday, August 15, 2005

About the Technical Reviewers

As the print date draws close for the new book, (they have the price wrong, working on getting that fixed - the LIST will be $45, not 69.99 as listed!)  I'd like to introduce the review team and thank them.

Without a technical review team of the caliber I had during the writing of this book, I would be nervous about the content.  Jonathan Lewis, Roderick Manalac, Michael Moller, and Gabe Romanescu spent many hours poring over the material and verifying it was technically accurate as well as useful in the real world.  I firmly believe a technical book should be judged not only by who wrote it but, as importantly, by who reviewed it.

Jonathan Lewis has been involved in database work for more than nineteen years, specializing in Oracle for the last sixteen years, working at consultant level for the last twelve years. Jonathan is currently a director of the UK Oracle User Group, and is well known for his many presentations at the UKOUG conferences and SIGs. He is also renowned for his tutorials and seminars about the Oracle database engine, which he has held in various countries around the world.

Jonathan is author of the acclaimed book Practical Oracle 8i (Addison-Wesley, ISBN: 0201715848), writes regularly for the UK user group magazine, and occasionally for other publications, including OTN and DBAZine. He also finds time occasionally to publish Oracle-related material on his website http://www.jlcomp.demon.co.uk.

Roderick Manalac graduated from UC Berkeley in 1989 with a BS in Electrical Engineering and Computer Science. He's been an employee of Oracle Support Services ever since. Practically all of that time has been spent in assisting external customers and internal employees (around the globe) gain a better understanding of the subtleties involved with running the Oracle database product on Unix platforms. Other than that, he spends way too much time playing video games, watching TV, eating snacks, and willing the San Francisco Giants to win the World Series.

Michael Moller is an Oracle Certified Master (OCM) and has been in computers since his 10th birthday, approximately 40 years ago. He's been involved in pretty much everything related to building and running software systems, as a Programmer, Principal Senior System, Design Engineer, Project Manager, and Quality Assurance Manager. He worked in the computer business in USA, England and Denmark, before joining Oracle Denmark, 10 years ago, where he worked in Support and later in Premium Support. He has often taught in Oracle Education, even taking the Oracle Internals seminar on a whistle-stop tour of Europe. He spent the last 2 years of his time with Oracle working in Development in the USA, creating the course materials for advanced courses, including Internals on NLS and RAC. Nowadays Michael is gainfully employed in Miracle A/S, Denmark with consultancy and education.

Gabe Romanescu has a B.Sc. in Mathematics and works as an independent Oracle consultant. He discovered relational theory and technologies in 1992 and has found comfort ever since in the promise of applied logic in the software industry. He mostly benefits from, and occasionally contributes to, the AskTom and OTN forums. He lives in Toronto with his wife Arina and their two daughters, Alexandra and Sophia.

I really appreciate the time they spent, the comments they provided. I know myself how many hours it takes to do the work they did. It would not be the book that it is without them.

Saturday, August 13, 2005

Why do they do it

Once upon a time ago, I got tired of the spyware/whatever that was globbing up the machine we used as a bridge to our internet connection. My wife used that machine, so did the kids. So, I got a really cheap machine just to do this one thing, and it had to run windows (requirement of the satellite stuff we use for broadband).

Everything was going good for about a year, then symantec tells me I have “Backdoor.Mosuck” (thank you very much). Symantec cannot get rid of it (it is running).  Funny thing is – this is a clean machine, nothing going on, no extra services – firewalled. No idea how it got there (everyone promises “no I did nothing on that machine” :)

So, shutdown, boot in safe mode, move the file that is the root cause, go to clean it out with symantec.  But – wait.  Not allowed to use taskmgr.  Not allowed to use regedit.  Many things cut off by “the administrator” (hey, I thought that was, well, me and I didn’t do that).

So, google around, figure out how to get back into my own machine – which is now pretty wacked out.

So, only a couple of hours lost time. No internet bridge. Much scratching of head.

I just wonder what people that don’t work with computers do sometimes. Really. And I totally don’t get the virus writing mentality.

Why.

Friday, August 12, 2005

At the park

The book is done (volume one anyway). We are at the park for the day. Close to 100f - hot. But no computers . Just the phone :)

Tuesday, August 09, 2005

Order in the court!

As I was proofing my chapters yesterday, I couldn't help but notice that in three chapters I said the same thing, stronger and stronger each time. It was about using (or not using more specifically) ORDER BY. I see it time and time and time again.Someone assumes that because they observe the data coming out sorted from a query or that a query is using an index - they can and should leave off the ORDER BY. That is a career limiting move to borrow a phrase from Connor McDonald (he used it in reference to reorganizing your database but it applies totally here). It is a time bomb ticking away in your code.

Did you know simple optimizer change - no more sort.  Data could appear to be sorted today, statistics gathered and not sorted tomorrow.

Did you know that reading data out via an index - does not imply data is sorted by the index key!

That a group by does not sort (necessarily, and even if it does, it is a binary sort – which is different than a character sort for those of us outside of US7ASCII land).  Unfortunately, part of that thread is missing for some reason (I remember it well, I loved the comment: “I have frequently thought that such 'preaching' is simply an indication that the person answering isn't willing to admit that they don't know how to do it” said right before they tried to show someone how to sort using GROUP BY.  Pot, kettle, and black all popped into my head at that).

Anyway, the bottom line – you want data sorted?  You better use ORDER BY because you cannot have any expectations on the sorted nature of the data otherwise!

In chapter 10 on tables, I wrote:

You should think of a heap organized table as a big unordered collection of rows. These rows will come out in a seemingly random order, and depending on other options being used (parallel query, different optimizer modes and so on), they may come out in a different order with the same query. Do not ever count on the order of rows from a query unless you have an ORDER BY statement on your query!

In chapter 11 on indexes, I wrote:

Note Do not be tempted to ever leave an ORDER BY off a query. Just because your query plan includes an index does not mean the data will be returned in "some order." The only way to retrieve data from the database in some sorted order is to include an ORDER BY on your query. There is no substitute for ORDER BY.

And finally, in the chapter on partitioning, I did it again, a tad stronger.

USING ORDER BY


This example brought to mind an unrelated but very important fact. When looking at hash partitioned indexes, we are faced with another case where the use of a index to retrieve data would not automatically retrieve the data sorted. Many people assume that if the query plan shows an index is used to retrieve the data, the data will be retrieved sorted. This has never been true. The only way we can retrieve data in any sort of sorted order is to use ORDER BY on the query. If your query does not contain an ORDER BY statement, you cannot make any assumptions about the sorted order of the data.

A quick example demonstrates this. We create a small table as a copy of ALL_USERS and create a hash partitioned index with four partitions on the USER_ID column:

SQL> create table t
  2  as
  3  select *
  4    from all_users
  5  /
Table created.
 
SQL> create index t_idx
  2  on t(user_id)
  3  global
  4  partition by hash(user_id)
  5  partitions 4
  6  /
Index created.

Now, we will query that table and use a hint to have Oracle use the index. Notice the ordering (actually, the lack of ordering) of the data:

SQL> set autotrace on explain
SQL> select /*+ index( t t_idx ) */ user_id
  2    from t
  3   where user_id > 0
  4  /
 
   USER_ID
----------
        11
        34

        81
       157
        19
        22

       139
       161
         5
        23

       163
       167
        35
        37

        75
       160
38 rows selected.
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=38 Bytes=494)
   1    0   PARTITION HASH (ALL) (Cost=4 Card=38 Bytes=494)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=4 Card=38 Bytes=494)
 
SQL> set autotrace off

So, even though Oracle used the index in a range scan, the data is obviously not sorted. In fact, you might observe a pattern in this data. There are "four sorted" results here; the  … replaces values that were increasing in value; and between the rows with USER_ID = 34 and 81, the values were increasing in the output. Then the row with USER_ID = 19 appeared. What we are observing is Oracle returning "sorted data" from each of the four hash partitions, one after the other.

This is just a warning that unless your query has an ORDER BY, you have no reason to anticipate the data being returned to you in any kind of sorted order whatsoever. (And no, GROUP BY doesn't have to sort either! There is no substitute for ORDER BY.)

Caveat Emptor

How funny is this

This has to be a joke! But wait, there is some math there and all. And best of all, the ingredients don't lie "Contains 99% water, 1% secret ingredient."

Good for a laugh anyway.  Saw it on gizmodo (neat blog for up and coming “stuff”)

Monday, August 08, 2005

What 5,000 will get you

The first PC I bought was a Tandy 1000EX, single unit (keyboard and computer in one) with a CGA monitor, a big 256k of RAM (later upgraded to 640k) and two floppy disk drives. It cost me in the neighborhood of just over $2,000 as I recall and if you added in the upgrades - a 1200 baud modem, the memory, and a whopping 40 MB hard disk, it was probably $3,500 all told.

My second PC was a Gateway.  They had a brilliant marketing slogan back then.  “Out-standing in their field”.  It was a two page spread with a bunch of pictures of cows and computers.  I just had to buy from them.  It was a 386sx, with a whopping 80 MB hard drive, 1 meg of ram, both kinds of floppy disk drives (modem was extra).  It was touch and go – lots of problems with it (they used the motherboard of the day back then, hardware du jour, and just hoped it would all work).  Mine didn’t.  After many go arounds with tech support, I told them I wanted to send it back.  They said I had to pay for the shipping, but – alas for them, I discovered “no in fact I didn’t”.  I told them it would be on my front porch, come pick it up.  I actually got a call from Ted Waitt that afternoon (at least he said it was he – this was the late 80’s, Gateway was still pretty small).  He had someone come and get the machine and replaced it with a working machine, at no additional cost.  I was hooked.  (Good customer service grabs my attention, and they cost less than the other guys). 

It was about $4,500.  Going up in price.  My 3rd computer, a full blown 486 – about $5,000.  The best money could buy.  I bet it would seem exceedingly slow today, but back then, it was supremely fast.

Then I read this PC Magazine review tonight.  This is what $5,000 gets you today.  I was sure the hard drive capacity was wrong, a typo – but it was not, It has 1024 GB, yes, that is right, a desktop gaming machine with a terabyte of storage.  A terabyte.  Just keep saying that.  A terabyte.  They seriously skimped on the RAM though (just 1 GB?).  But the monitor — a 24 inch LCD.  The difference 15 years makes (I still seem to buy a computer per year though – I cannot remember them all anymore, should have taken pictures).

A terabyte. Wow.  As the review says “… with an obscene 1TB of hard drive space …”.  That’s my kind of obscenity.

Friday, August 05, 2005

CTP Ratio

Last chapter, comment to page (CTP) ratio is 100 to 50, 100 comments, 50 pages.  This’ll be a piece of cake ;)

I can't see a tie

Yesterday someone commented on my pictures on my books.  They said they cannot tell if I was wearing a tie. As there is a funny story in there, well here we go.

Expert One on One -- a tie. On the original WROX one (the red book), you can see it very clearly on the bottom of the page.  Later editions (which proved that I have hair since they stopped cutting my head off at my forehead), it is harder to tell but it is there.

Beginning Oracle -- a tie. But, the guy in the middle of that piciture, his name is Chris Beck, and that is not his body.  They did not like what he was wearing so, a little photoshop and Chris has the shoulders he’s always wanted.  Did they ask?  Nooooo, was he surprised?  Bigtime.  Was it funny – absolutely, we would tease him that his shoulders must have looked really wimpy for them to replace them like that.

Effective Oracle by Design -- a funny story. I was wearing a tie for that photograph, but due to the creative use of photoshop - my head is now on someone elses shoulders and body! (No idea whose body that is). 

That is my Oracle Magazine column picture. The magazine staff told me the photographer was coming in and gave me no "dress code", so I wore what I would for a customer visit (suit and tie). The magazine didn't feel it fell in with the rest of the pictures and therefore (unbeknownst to me) edited it. I laughed a lot upon seeing the picture for the first time — I call it my “californization” picture.  Looks very “California Corporate”, the black tshirt (of which I own – well, zero).

Anyway, Oracle Press really wanted my picture on the book and I really didn’t want it.  I wouldn’t send them one figuring that was that.  Well, they went to Oracle Magazine and got it from them.  If you look closely at the book, you can see the line on my neck where they slid my head on.  It also explains why I look a little “funny” in that picture, the head is not in a natural position.

This is perhaps the only thing I have in common with hollywood.

Thursday, August 04, 2005

Words you just never want to hear

What happens when you rush and change your modus operandi.

Email from editor:

Tom, Here it is. I think is needs a reasonably heavy edit, I’m afraid.

That is editor-speak for “this chapter stinks, fix it”.  And then when the second comment in the chapter by the reviewer is:

I would rank this chapter as the weakest link for the first volume

Ouch.  Well, one of the chapters by definition must be the weakest link.  But they were right.  What I did in this chapter was different from the rest.  It was slightly opinionated, and lacking in examples.  A recurring comment was

Here is where I think the chapter is missing a classic TK test run…

what I'm thinking about here is the classic TK introductory example that shows how to…

It was missing those, they were 100% dead on.  And in re-reading it myself, I could see it didn’t fit (the chapter didn’t). But boy, did it make for a long day.  The chapter increased in volume by 50% but is now very much reorganized, reordered and full of examples that build on each other.  I’ll not say what chapter it is (maybe we can have a contest after the book comes out to see if you can spot it from the others – I would hope not now).

It is amazing how much better it is after putting in the meat.  For whatever reason, I was just saying things (I knew them to be true, I knew what I meant), but it wasn’t coming across, it wasn’t compelling.  My favorite comment

I would so like some proof here.

I could have looked at the number of comments (over 100) versus the number of pages (about 23) and realize that ratio was bad compared to the other chapters (higher than average comment to page ratio) and “punted” – that is cosmetically fix the chapter, address the comments but not really “fix it”.  In fact, after reading the comments part way through, I closed the document and did some other stuff for a while.  Then came back and finished reading them. That was comment #58 and that was the comment that got me to say “ah hah”. 

Funny how one thing like that can set the tone.  I would so like some proof here.  That is all it took.  I said something in the text (it was true), but it sounded not plausible perhaps.  Prove it.  Show me.  Convince me.  That was all it took. 

Well, that and quite a few hours.  The chapter is longer, but it is very different.  Out with the old and in with the new – parts of the chapter were gutted, removed.  Lots of new stuff put in.  Now it is back in the review cycle for another tech edit and copy edit.

Word from the editor this morning on it:

Tom, It looks really good.

But you know, those British people are just so emotional and all – from him that is really high praise :)

7 days and counting.  Then Volume I goes off to be turned into paper – just in time for Oracle Open World in September.

Tuesday, August 02, 2005

Three Things

Thing 1, Why do people think it is OK to update the data dictionary? Thing 2, remember punch cards, I found an article that will put them in perspective. Thing 3, are you ready to get a headache staring at some optical illusions?

First, I’ve written about this before but it keeps happening.  Why why WHY do people feel obliged to update the data dictionary.  That has to be the most reckless thing I can think of anyone doing or suggesting – unless it was support (and they had development ‘bless’ it).  It happened again today – someone asked how to update the ALL_DIRECTORIES view (it wouldn’t update, thankfully).  Why?  They wanted to change the path a directory pointed to.  I was very thankful the view wasn’t updatable – explained that DDL is the only way to modify the dictionary.  Said “don’t go there”.  Unfortunately, they figured out DIR$ is the table underlying the view and updated it. Sigh. I cannot understand why this happens.  It is the reason asktom got started, someone suggesting a dictionary update.  I will never understand the mindset that thinks it would be OK to do this.

On a lighter note, remember punch cards?  In high school, we loaded up the card punch, punched our code into it (Cobol and RPG) and mailed them downtown to be compiled (not making that up, mailed them downtown).  We would get the compiled listings back the next week.  You made darn sure things were going to be as perfect as possible.  It made you program differently, that is for sure – no typing in and seeing what sticks. You wanted it perfect the first time around.  Well, today someone related an MP3 in terms of punch cards.  They calculated how tall the stack of punch cards would need to be to hold a 3 minute MP3.  Turns out that I would be able to hide behind the stack (if it were wide enough, it would be just tall enough).

Lastly, I stumbled upon these three very cool optical illusions.  I agree with the comment on that site that illusion 3 is the most incredible.  Very cool, you cannot believe what your eyes see all of the time.  Seeing is not always believing!

Oh, I’ll add a fourth thing, just downloaded RH 4.0 and will be giving that a go.  Doing a trial install in vmware right now and if I like what I see, I’ll be rebuilding my server with it tonight.  I got the kernel, I got it compiled, but I ran out of time trying to boot with it last night.  Thought it might be easier just to go to version 4.0

Monday, August 01, 2005

Install Fest

I decided to upgrade to vmware 5.0 and reorganize my vm's. In the last couple of days I've installed and patched all of the way up 8ir3, 9ir2, 10gr1 on windows and am getting set to do 9ir1, 10gr1, 10gr2 for linux (running the linux os install right now).

I’ve decided to build a VM per database release this time and have the OS’s be as pristine as possible. Vmware 5.0 has a very nice clone with a softlink – meaning, I installed the OS once and got the settings just the way I like it and then cloned it 3 times to lay down 8i, 9i, and 10g in their own environments. Working very well for now (so far so good).

My plan with linux is to lay down Red Hat Advanced Server 3.0 and then upgrade the kernel from 2.4 to 2.6.  For this, the cloning will be a really nice feature as I’ve not upgraded a kernel myself yet so I’m’ sure it’ll take one or two tries to get it “just right”.  The reason I want to get to 2.6 is for stable USB2.0 and Firewire support on my machine at home. I’ve been itching to have a lot of disk available and the only cost effective solution for playing will be direct attach storage using one of those. I have a terabyte for $1 a gigabyte, now I just have to get it attached.

It is amazing to think that when I joined Oracle one dozen years ago (happy anniversary to me, it is twelve years ago today that I signed up), a 400 MB hard disk was considered “huge” (and huge in two ways, they were physically BIG and the storage was considered almost infinite).  One of my first benchmarks back then was a 150 GB database, that was enormous.  Things have changed for sure.  I now travel with 140 GB of storage (80 GB in the laptop, 60 GB in an external USB 2.0 powered drive that I keep my VM’s on).

I’m pretty sure I’ll be able to fill up a terabyte pretty fast playing with stuff. It’ll be nice to have almost unlimited “room” (but I’m sure it’ll feel very confining in a year or two).

One thing I did realize doing install after install (besides the fact that installing is boring… good thing I can install and do other things at the same time) is that the patch process has gotten a lot better over time.  The 8iR3 patch to 8.1.7.4 – 13 manual steps, updates of the dictionary! Not pretty.  10gR1 to get to 10.1.0.4, @catpatch.sql – done. I still miss the character mode installer though.

Back to install fest – wish me luck on the kernel upgrade, not really sure how that will work out.  Testing on the VM, but my plan is to do it for real on the PowerEdge when I feel like I know what I’m doing.