Thursday, August 31, 2006

Do I ever agree with this...

Stumbled on this posting by Linus Torvalds.  I cannot remember the last time I fired up a debugger to debug a problem.  I have demonstrated the debugger for PL/SQL in Jdeveloper/SQL Developer, but I've never used it.  I believe the last time I used a debugger for real was in 1992.  I gave up, threw it (the debugger) away.  It was a multi-threaded Ada application and when run inside the debugger - it would work fine.  Outside of the debugger - it would fail.  This would happen to single threaded programs too (things just run differently in the artificial environment of the debugger).  I made the developer redo their code heavily instrumented - and we solved the insolvable problem in minutes afterwards.  I've written about this art of instrumentation many times (here is one).  I think this guy is definitely on the right track.

The reason I don't like debuggers in general - I don't think stepping through code a line at a time is a good thing.  You are hoping against hope that the problem will just "come out and hit you smack in the face".  It is like the difference between touch typing and hunt and peck typing.  Using a debugger - you are hunting and pecking.  You don't necessarily have a good idea where to look, what to look for - you are just looking for something to "go wrong".

I've said "debuggers are for wimps".  In my opinion if you are forced into using one (as a crutch), it probably means you don't have a really good understanding of the code (you might find and fix one bug in the debugger, but miss the 500 other bugs sitting there) and haven't been programming defensively.  You become dependent on this debugger thing to find your problems - and when it can't (and they can't many times) you get totally stuck, the problem becomes intractable, insolvable. (this seems to be a major failing with many developers - the inability to perform good problem determination, to create test cases, to whittle the problem back to its bare essentials.  It is the single more important skill a developer can have.)

In the 'real code' that I've written (not the quickie demos I do on asktom), literally every other line of developed code is "debug/instrumentation code".  I like to say that good developers write two bits of code:

  • The code they want to
  • The code they have to

But then go a bit further and say that the code they want to write is the defensive code, the instrumentation/trace code.  The code they have to write is the code the end user is interested in - to the developer that code that does "file/open" in their application - that is the stuff they had to write.  The code before and after the "file/open" bit of code that can be used to diagnose issues, validate return values, validate inputs - double, triple check everything - that is the code they want to write.

The reason they want to write that "debug/instrumentation" code?  So they don't have to spend the rest of their natural lives maintaining and debugging their first bit of production code.  Which, if they do not make it traceable - they will be. 

Once upon a time ago, I wrote a utility called "OWAREPL" - it was the OWA Replacement "cartridge".  I wrote it for OWS 2.0 (Oracle Web Server 2.0 - which came right after OIS 1.0 (Oracle Internet Server) and right before OWS 2.1.1, OWAS (Oracle Web Application Server) 3.0, OAS (Oracle Application Server) 4.0 and then AS (just Application Server) now...  So, this was about 10 years ago (it later became the foundation for the webdb lightweight listener and then mod_plsql).  50% of this code was debug/trace code - easily, if not more.  It was downloaded and used thousands of times.  And not once did I have to arrive on site somewhere to debug this code - if something went wrong, I just asked them to turn on tracing for specific modules (a configuration parameter) and send me the trace file.  We fixed every problem 100% of the time in this fashion.

Sort of like what happens with Oracle, it is fully instrumented.  Have a problem? Get a trace.  V$ tables - instrumentation.  SQL_TRACE=TRUE, debug/trace stuff.

But anyway - read the rant by Linus - well said.

Tuesday, August 29, 2006

APEX 2.2 released...

Was just reading the newswire and found this article. It quotes a friend of mine, Mike Hichwa - the "inventor" of APEX (formerly known as HTML DB). It makes an interesting read - it could go along with this asktom thread.

If you haven't tried APEX - you should, if you use it - you might be interested in the new feature set. Check it out here. I have to say, the developers of APEX man the otn forum on APEX as well as any group could - you'll find them to be very responsive.

Monday, August 28, 2006

Ok, this guy is "not smart"

Incredible, I cannot believe someone would do this. Just what do you think the thought process there was.  Man oh man.  Unbelievable.

Spam Spam Spam, again...

A while ago, I wrote about some strange looking spam I got from Russia. Well, I keep getting it and today I got a really strange looking one:

longarm

Now, I ran the text through babelfish and it seems to be something about "coats". I have no idea what the purpose of the really long arm is though (unless of course, they specialize in making 'odd sizes'). That is what is fun about Russian spam for me - I can make up anything I want, since I cannot really read what is there.

I got this one as well - and if it does what I think it does - I might want one of those. It appears to be a small satellite dish that looks transportable. I could have used one on vacation this last week (little to no mobile phone coverage). It almost looks like (and if the translation could be even a little bit trusted sounded like) it would be a mobile solution. Sure is small enough - unless that is the worlds largest add-in card for a PC:

internet_from_anywhere

Then it was back to the "spam of the strange", here we have a girl that (if the translation is to be trusted) can diagnose that which ails you - before modern medicine can. The picture, coupled with the translation, is a bit strange:

healer

I get surprisingly little spam (pretty good filters at Oracle), but about 90% or more of the spam I do get is from .ru. A waste of bytes since I cannot even read it - but hey, from time to time it makes me raise an eyebrow and go "huh, interesting".

Sunday, August 27, 2006

How far we've come...

In 1922, the state of the art in word processing might have been this Oliver typewriter.

It cost a mere $49.50 back then, quite the deal considering the average price was over $100.00.  But applying an inflation calculator to that $49.50 we see the lowly typewriter would cost a whopping $512.39 in 2005 dollars!  For that much you could get a low end PC with basic word processing, a printer and probably a years worth of ink refills (ok, I'm exaggerating on the refills.. Has anyone else been tempted to buy a NEW printer for less than the refills on an existing one??)

But I see they did have upgrades for the typewriter we just don't seem to have for our PC's today.  I've never seen a mink case mod before - have you?  But then again, changing fonts is much easier today then it was with typewriters.  It was interesting to discover that "ascii art" was around way before I ever thought it was.  And I see that major keyboard modifications had as much chance of becoming mainstream back then as they do today (I'm wondering what it will take to change the mouse/keyboard as the predominant input interface).  Look - this one even "threatens to revo-lutionize the present office typewriter".

 

The Modern Mechanic blog is one of my favorite feeds - good for a grin.

Saturday, August 26, 2006

I always wondered about that...

I always wondered why firefox/thunderbird would do a 3,2,1 countdown before installing an extension. I couldn't come up with a reasonable idea on my own - but now I see it is...

A security feature.

Neat. It has to do with tricksters that might get you to install an extension without realizing you did so. How many times have you been typing away when all of a sudden some application steals focus and you end up hitting enter in a dialog box you didn't realize you were in? My instant messenger - IM - (I use gaim - like it a lot) used to do that all of the time. I'd be happily typing away when all of a sudden some bit of SQL would get sent in response to an IM popping up (didn't realize that gaim took focus, accepted my end bit of SQL and hitting the enter key sent it). I've since always configured gaim to NOT pop up windows when I get a new message.

Now, if I can only figure out how to get gaim's buddy pounce to not steal focus - what usually ends up is I'm typing away, the buddy pounce shows up with an "OK" button, steals focus and eats the characters until I hit enter and then disappears. So I end up typing into the wrong application (having to re-type what I just misdirected) and I miss the buddy pounce dialog since I just dismissed it.

Stealing focus should be against the law :)

And I also learned what those annoying "enter the characters you see in this picture" are called from that page. Captcha - an acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart". I've been calling them something else (not fit for print). I'm glad google finally reviewed my blog and verified it is not a "splog" - something I just read about last week in Wired (will be online september 5th if you don't get the print version - http://www.wired.com/wired/)

Friday, August 25, 2006

Back from a time out...

I just got back from a short week in the woods:

IMG_3427

We were very much bandwidth challenged - I was using my Verizon wireless card but connectivity was spotty to say the least. The best place to receive a signal was not a very convenient one:

IMG_3469

We used the second floor balcony and perched the laptop on the railing as far away from the cottage as possible. There we achieve blazing speeds of up to 80/90 kbps! (yes, that is about 10 to 11 Kilobytes per second). So, once I realized that - I just checked out. I would check my email in the morning and the evening - attend to anything that needed immediate attention and either said "sorry, on vacation, but lots of forums out there to try!" or filed it away for later (now being later - I have a scroll bar on my inbox and that drives me nuts). My goal is to achieve this "nirvana" again.

Anyway, it was a very secluded place

IMG_0757

and for some of the time the only cottage with anyone in it (there are only three to begin with) was ours. That is my idea of down time. The best part of this vacation - I did not have to pull my wallet out of my back pocket all week - the nearest store was far far away and I wasn't leaving the grounds.

Anyway - was just digging through the email and got a question from someone about how to do something (imagine that). It piqued my curiosity (meaning, it was interesting enough that I didn't just reply "sorry, please don't email me" :) In short, they needed a function that would accept two strings and return Y if the second string was an anagram of the first string, N otherwise. They sent their code with the comment "I think this can be done better".

So I gave it a try. Wow, did it feel good to write a piece of code after a week of not doing much at all computerwise. It woke me right up. Here is my solution:

Now, one up it - any easier approaches you can think of?

ops$tkyte%ORA10GR2> create or replace package anagram_pkg
2 as
3 type anagram_array is table of varchar2(1);
4 function compare
5 ( p_src in varchar2, p_tgt in varchar2 )
6 return varchar2
7 deterministic;
8 end;
9 /

Package created.

ops$tkyte%ORA10GR2> create or replace package body anagram_pkg
2 as
3
4 function compare
5 ( p_src in varchar2, p_tgt in varchar2 )
6 return varchar2
7 deterministic
8 is
9 l_data1 anagram_array default anagram_array();
10 l_data2 anagram_array default anagram_array();
11 l_len number default length(p_src);
12 l_return varchar2(1);
13 begin
14 if ( l_len <> length(p_tgt)
15 or p_src is null
16 or p_tgt is null )
17 then
18 l_return := 'N';
19 else
20 l_data1.extend(l_len);
21 l_data2.extend(l_len);
22 for i in 1 .. l_len
23 loop
24 l_data1(i) := upper(substr( p_src,i,1 ));
25 l_data2(i) := upper(substr( p_tgt,i,1 ));
26 end loop;
27
28 if ( l_data1 = l_data2 )
29 then
30 l_return := 'Y';
31 else
32 l_return := 'N';
33 end if;
34 end if;
35 return l_return;
36 end compare;
37
38 end;
39 /

Package body created.
ops$tkyte%ORA10GR2> select a, b, substr(anagram_pkg.compare(a,b),1,1)
2 from t;

A B S
------------ ------------ -
a N
N
b N
Richard Rihcard Y
Richard RihcaDR Y
Richard Rihhard N
Thomas Kyte They AskTom Y
abba baab Y
abba baaa N

9 rows selected.


In Oracle 9i and before, you can use this query:


    select decode( count(*), l_len, 'Y', 'N' )
into l_return
from (select column_value a,
row_number()
over (order by column_value) rn
from TABLE(cast(l_data1 as anagram_array) )
) src,
(select column_value a,
row_number()
over (order by column_value) rn
from TABLE(cast(l_data2 as anagram_array) )
) tgt
where src.rn = tgt.rn
and src.a = tgt.a;


to compare the two collections (lines 28..33). In order to use the query, you will use "create or replace type anagram_array as ..." and create a SQL type. That is, the anagram_type will be a SQL object type - not a PL/SQL type.

Saturday, August 19, 2006

Ouch, that hurts

This one just came across the wire and I groaned out loud. I really wish "WHEN OTHERS" would simply raise "illegal instruction" when executed.

I see selecting sequence.nextval from dual - totally not necessary and an utter waste of CPU cycles.

I see an append with a values clause. insert /*+ append */ ... values works just as well as insert /*+ dont_understand_how_this_works */ ... values does. (insert append works with SELECT only, never never values)

I see a when others then NULL. That is the worst thing a programmer could ever do. Basically you are saying "it does not matter if this code executes or not". Well guess what - if it does not matter DO NOT EXECUTE IT ever!

I see a nologging attempt that is as effective as using the identifier I_truly_dont_get_it. Nologging is an attribute of a segment that can be used for certain bulk operations. It is not a modifier on any DML clause.

I'm suspect of an autonomous transaction with a when others null even more than usual here, since it neither COMMITS nor ROLLSBACK - in this case, not an issue but given the use of the when others, likely a problem else where in this system.

All I can say is ouch, ouch, OUCH.

And - I hate "when others", it should be removed given that it is never used properly in real life.

Friday, August 18, 2006

Google Analytics

The other day I read that Google Analytics was open for business. I was curious as to what it was all about so I signed up and instrumented my blog template to incorporate it. So far - it looks pretty neat. It tracks activity on the site and generates since nice looking graphics:

Google Analytics

That is a view of the "Executive Dashboard". From there you can drill down into more details. I was interested in the "Visits by Source" pie chart there. I can get similar information from "statcounter", but it'll only show me the last 1,100 requests (I pay for the extra 1,000 records). Google analytics is gathering up all of the history for me and will even tell me over time the relative "up/down" contribution of a site. That is, if a new site comes along that starts sending traffic my way, it'll show as a "big up" whereas if another site decreases the amount of traffic they send - it'll show as a "big down".

In any case - it is 'free' for now (as long as you get less than 5,000,000 hits per month - the blog is pretty safe on that account).

It does raise an interesting question though in my mind. After reading through a lot of the postings/news releases on the AOL debacle (read more here and here) is Google just collecting too much information about "me" or "you"? Right now the motto is "do no evil", but who knows what could be done with some of this information in the future. I'm not a huge fan of this easily identifiable information being warehoused outside of my control. I don't necessarily like the "supermarket" clubs where you swipe a card to identify yourself and get a discount (hey, give me that discount just because I use your store, don't make me give data to you for a fee).

Anyway, I'll leave it on for a while and see what I see. Maybe it'll let me drop the monthly fee I pay to statcounter for the extra 1,000 records in the audit trail. The only reason I pay for that is to have a bit of history from which to estimate where traffic comes from. But since the extra 1,000 really only gives me on average an 8 hour window - it isn't that accurate of a picture. To get a view of "where in the world people are coming from", I'd have to check it while I'm asleep and other parts of the world are awake.

Speaking of sleeping, I'm vacationing next week. We take two or so short vacations instead of one big one and next week we'll be spending some days next to a lake:

Cabin

Looks like a nice place to take a nap with a fishing pole doesn't it.

Monday, August 14, 2006

Words of Wisdom

Jonathan Lewis has added a couple of new writings as of August 13th 2006.

The one that spurred me into making a comment however was "Hinted SQL". I really liked this quote:

Hints can be very useful to solve urgent problems – but my general advice is (a) don’t use them as a first resort, (b) check whether the real problem is in the statistics (c) if you really need to hint your SQL, you probably need an average of at least one hint per table to lock in the execution path you expect and (d) assume that you’re going to have to revisit and retest any hinted SQL on the next upgrade.

(emphasis added by me)

Well said, of course if you've seen my short seminar bit "to hint or not to hint" you'd know why I say "well said"...

It is exactly what I say in conclusion during that talk. Well, almost, I state (c) a little differently, but I like Jonathan's way of saying it - so maybe I'll borrow that.

My Favorite Kind of Update

No, not a SQL update - an update on a question. When I ask for more information or a test case - I frequently write:

Give me a short test case. Something small, yet 100% complete. Concise - but self contained. Remove anything not relevant to the problem. Tiny is good, but complete is a requirement.

999 times out of 1,000 - when I build these myself - I find my own mistake or omission.

I said that just recently and the update to the question in asktom was simply:

Update -
Yes you were right! It turned out to be a "missing grant" problem which we
detected why trying to build the test case.

thank you

That just makes it so easy. Peeling away the layers of the problem and making the example have as few lines of code as possible is an "art form". It is perhaps the most vital capability one could have in software development.

A test of MS Livewriter

I've been using the MS Word plugin to create posts here on blogger. It works OK but has some quirks. The biggest quirk recently has been the inability to post directly from Word - rather, I have to save as draft and then log into blogger and 'publish' the entry after adding the hard to read code (like they use for comments). Another long standing issue was the Word plugins annoying feature whereby it would lose the #<tag name> in a URL - meaning it was hard to point "inside a page" like this test link.

Well, I was reading this weekend about Livewriter - a currently "for free" blogging tool. In theory it should ease linking to pictures:

Where we spent the week

Which so far seems to be true (we'll see what happens when I upload this blog entry). Using the word plugin, I had to save as draft, log into blogger, go into HTML edit mode and paste in the flickr information. This tool let me just go into HTML edit mode and paste in the flickr stuff (I'm wondering why the tools don't have easier to use interfaces to the various photo upload sites..)

It has a spell checker which is important. Just wish the spell check ran as you were typing rather than as a "batch process" at the end.

What it doesn't have is the look and feel of a MS user interface - or maybe it has been so long since I've upgraded (I still use office 2000) that I don't know what a current MS interface looks like. The UI looks familiar though.

The real question for me is - can it "do code":

$ plus

SQL*Plus: Release 10.2.0.2.0 - Production on Mon Aug 14 07:49:28 2006

Copyright (c) 1982, 2005, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

ops$tkyte%ORA10GR2> select * from dual;

D
-
X

so far, so good - we'll see how it works out.

Updated right before posting

Ok, Livewriter has already justified its existence. The word plugin would just say "I failed" when it failed to post. Livewriter says "I failed for this reason". In this case, the reason was:

Server Error 0 Occurred

com.google.blogger.exceptions.AutospamException: Your post has been saved as a draft instead of published. You must go to Blogger.com to publish your post. To prevent these errors in the future, request a review at:
http://www.blogger.com/unlock-blog.g?blogID=xxxxxxx
Sorry for the inconvenience

Nice, now I understand why I couldn't post directly from my machine anymore - but rather had to log into blogger and release it. That really confused me since I do a couple of other blogs (nothing of interest, for the boy scouts and a girls soccer team). Those let me post directly without doing the word verification.

Hopefully they'll "review" my blog to verify it isn't a spam blog and let me post directly again. That would be nice.

Friday, August 11, 2006

About one year ago...

About one year ago, I was lamenting the problems we lefties have. Well, I take it all back. I’m a lefty and sticking with it. Apparently, being left handed (and a guy) has the effect of boosting the paycheck.

About time I get paid back for being left handed… I think I'll print this one and keep it in my wallet :)

Thursday, August 10, 2006

Time Sink..

Time Sink. This entry is a total time sink – nothing but a collection of links to things that will waste your time (well, except for the first one). You have been warned.

Silver Bullets
A couple of people forwarded me the link to this ACM paper. It discusses “silver bullets” in software. I loved the XML example. Perfect. This should be mandatory reading for Computer Science (any IT related course) students.

GetHuman Database
I spend a lot of time on the phone some days – trying to get to a human being. This little database has the numbers and techniques for many companies (in the US mostly) to “get a human”.

Time Fountain
I might have to build one of these (unless someone can point me to a place to buy one of course). I think my son Alan would enjoy building it – maybe we’ll do one together. A bit of water, a strobe light and some fluorescent dye and you can play with time (and water).

Free Movies!
They might be old, but they can be really good sometime. Here is a site that has a large set of movies that are out of copyright – and available for download. I just pulled “Plan 9 from Outer Space” – never actually watched it. I’m hoping it is as bad and as cheesy as I think it will be. The price is right. I didn’t think my T1 line would feel slow, but after hitting this site…

Thinking outside of the box
Here are some neat questions. Don’t over analyze them, don’t over think them. The less you analyze them, probably the better you’ll be able to answer them.

It is like the database sometimes. We approach it like everything is going to be complex and are so surprised when it is not. Here is a case in point.

Can you pass 8th grade science?
I did very well (I could pass the 8th grade again) – how about you?

Someday, I want to have this much spare time
The graffiti machine – looks like a fun computer science project. Imagine – you could just set it up and leave while it defaces the building – and come back and collect your stuff afterwards if the police are not around.

Just say no to sending MS-Word
I fully agree with this article. I normally return word documents sent to me by people I don’t know – asking for plain text, or at most – html. Most of the times, the word document is simply hold a bmp file that is a screen image of a character mode sqlplus session!!! Cut and Paste – simple, lo-tech, but small and it works.

Funny Math
Ok, I was a math major in college so this funny math page resonated with me. #3 is my all time favorite – “Here it is” – laughing out loud. #6 is no slouch either.

Some Images
You need 8 feet of space between you and your computer for this one to work. It is pretty freaky though.

You know those motivation posters some companies have hanging in the conference room. Here is a “true” one. You always knew the guy in the red shirt that you never saw before was going to buy it big time.

This needs no explanation.

Learning curves for various editors. Visual studio is the best (curve, not editor).

And lastly, this image seemed most fitting to close up the time sink. You are not paid to think after all.


Now, get back to being productive.

Monday, August 07, 2006

Update on an interesting data set...

The consumerist wrote about this (the AOL release of query strings and results)

... "Combine these ego searches with porn queries and you have a serious embarrassment"...

... They might come across the entry for User 17556639 ...

interesting.

Actually, the consumerist wrote a lot about this today. Scary now.

And like I thought - it is hard to put the cat back into the bag. Mirror sites are up now. That is a genie that is not going back into the bottle.

Sunday, August 06, 2006

An interesting data set...

An interesting data set. Remember not too long ago – a big deal was made out of the US government request of the big search engines for their search details (what the search strings where). Well, now AOL is giving that away for all.

Of course, I just had to download it and put it into Oracle to take a peek at it. It was a 500mb download, a tgz file (tar gzipped). Surprise surprise, the tgz file contains – 10 gzipped files! One wonders why they tgz’ed them – when just tar would have resulted in a smaller file (compressing compressed data makes it bigger in most cases). Anyway, their readme said it had 36,389,567 rows of data – and all but 2 of them loaded up (bad data). I used an external table and a simple create table as select to load it up in a couple of seconds:

CREATE TABLE "AOL_ET"
(
"ANONID" NUMBER,
"QUERY_STR" VARCHAR2(4000),
"QUERY_TIME" DATE,
"ITEM_RANK" NUMBER,
"CLICK_URL" VARCHAR2(4000)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'test_%p.bad'
LOGFILE 'test.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY 0x'09' LDRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
"ANONID" CHAR(255)
TERMINATED BY 0x'09',
"QUERY_STR" CHAR(4000)
TERMINATED BY 0x'09',
"QUERY_TIME" CHAR(255)
TERMINATED BY 0x'09'
DATE_FORMAT DATE MASK "yyyy-mm-dd hh24:mi:ss",
"ITEM_RANK" CHAR(255)
TERMINATED BY 0x'09',
"CLICK_URL" CHAR(4000)
TERMINATED BY 0x'09'
)
)
location
(
'user-ct-test-collection-01.txt',
'user-ct-test-collection-02.txt',
'user-ct-test-collection-03.txt',
'user-ct-test-collection-04.txt',
'user-ct-test-collection-05.txt',
'user-ct-test-collection-06.txt',
'user-ct-test-collection-07.txt',
'user-ct-test-collection-08.txt',
'user-ct-test-collection-09.txt',
'user-ct-test-collection-10.txt'
)
)REJECT LIMIT UNLIMITED
parallel
/

And now I have a new set of test data to play with. Some of the query strings are plain strange, interesting to read sometimes.

Of course, I had to see how many of them lead to asktom – AOL users are not big asktom users apparently – only 2 searches out of the 36 million plus led to my site. The two searches used where:

random number generator errors windows xp
mean mod median

I was curious where most of these searches did lead, so a simple SQL query:

aol%ORA10GR2> l
1 select click_url, count(*)
2 from aol_data
3 group by click_url
4 having count(*) > 500
5* order by 2

showed me:


http://www.ask.com 37752
http://www.hotmail.com 38391
http://www.geocities.com 40547
http://www.bankofamerica.com 48534
http://mail.yahoo.com 53856
http://www.ebay.com 77947
http://www.mapquest.com 96136
http://www.imdb.com 98549
http://www.amazon.com 106119
http://en.wikipedia.org 122539
http://www.yahoo.com 161082
http://www.myspace.com 167070
http://www.google.com 366623
16946938

3932 rows selected.

Almost half of the searches led nowhere (that’s disappointing), the user never clicked on anything. By far – the most destination from AOL search - google! The AOL search leads people to google frequently. I was wondering why this might be – so I peeked at the QUERY_STR used often that resulted in a hit on the google site:

aol%ORA10GR2> select query_str, count(*)
2 from aol_data
3 where click_url = 'http://www.google.com'
4 group by query_str
5 having count(*) >= 100
  1. order by 2;

QUERY_STR COUNT(*)
------------------------------ ----------
google.cm 101
www. google.com 110
google' 116
www google 118
www google com 118
goggle 126
googl search 128
www google search com 132
http www.google.com 134
googles 139
enable cookies 144
www.google .com 147
www.google com 154
www.porn.com 191
www.googl.com 205
googl.com 235
google .com 380
google. com 412
google. 587
www google.com 846
google search 1377
google com 1795
www.google 1961
- 5918
www.google.com 26009
google.com 51489
google 257402

27 rows selected.

Guess that answers that – people most often use AOL search to find out where google is… That search term that was used 191 times above – that is a recurring theme in this data set by the way. Believe it or not.

Does make me wonder if this is a breach of privacy/trust here. Not sure that I’m expecting my ISP/service provider to publish stuff like this. They say it is anonymized, but some of the search strings could contain identifying information of some sort.

updated about 7 hours later...


Yeah, I wasn't sure it was a good move on their part - I don't use AOL myself - my family does. I'm not sure I like the data being released - and some others, well they really don't.

Too late to put the cat back into the bag, when I downloaded it - 195 others preceded me, now it is well over 700 and climbing.

Funny to see that others have noticed that the biggest click away from AOL search is.... google.

16:42 by anonymous: Interesting Results, quite a few (366623) searches resulted in a user going to google. Ha.

Updated a day later...


It seems they pulled the content after about 1,000 downloads. Easy come, easy go.

Friday, August 04, 2006

The third time is a charm...

The third time is a charm. After the “forward” and “insults” typos – I finally got some printed material that looks “typo free” on the cover:

They spelled foreword correctly this time!

Two books just arrived at home via UPS – two books I was a technical editor on (as well as Chris Beck):

XE Books

They are all about Oracle Database 10g Express Edition (one for Windows and one for Linux) and are geared towards someone wanting to get started with this “Oracle thing”. For convenience they ship with the software on CD-ROM (you don’t need to even download the stuff from http://otn.oracle.com/ if you don’t want to). If you were thinking about getting started with Oracle – these books would be a good place to begin. While there are separate books for Windows and Linux –they are pretty much interchangeable – for once you have the product installed, there isn’t that much different between the two platforms. You’ll want to get the one for your platform however – for the correct software.

Here is the foreword I wrote for these books:

In 1987 I was just graduating from college and starting my career as a software developer. I started as a PL/I programmer on IBM mainframes using two databases – SQL/DS on VM/CMS and DB2 on MVS. I became familiar with SQL, but was limited as to what I could do on these production environments.

One day while reading a magazine, Dr. Dobbs Journal, I noticed an advertisement for a relational database that ran on DOS – simple PC’s. It was a product named “Oracle”. I clipped out the coupon – filled it in and ordered this relational database for $99. About 2 weeks later – a dozen or so 5-1/4” floppy disks showed up in my mailbox and I had Oracle version 5.1.5c and all of the development tools I needed to start playing, learning and exploring with. I was hooked.

That was then, this is now – and now, you have the ability to do in 10 minutes, for free what took me weeks and $99 ($166 in 2006 dollars!) in 1987 accomplish. With the introduction of Oracle Database Express Edition – you can download, develop, deploy and distribute your applications for free.

The book you are looking at now is the roadmap to exploiting this free software offering, the guide you need to successfully learn the ins and outs of this thing called Oracle. When I first started in 1987 – all I had was the software and documentation to go on (and much slower computers). Today you have access to not only the software and documentation (as well as discussion forums) – but the roadmap and guide you need to be successful and productive with Oracle Database XE in hours or days. Not the weeks and months it took in the past.

Steve Bobrowski has written the definitive guide on getting started with Oracle Database XE and you have it in your hands right now. I like things that are simple, easy to understand, easy to read – all of that is present in this book.

Steve starts with a quick introduction to databases and Oracle in general. After this very quick overview – we get into the meat of the book with the installation and getting started with Oracle Database XE. Installation is quick and easy – and within minutes, you’ll be on your way to the rest of the book.

The remainder of the book is logically broken into two main components. The first part is geared towards application development (my favorite topic). Steve builds up from using simple SQL, to implementing with PL/SQL and finally building true applications with Oracle Database XE’s Application Express (APEX) tool – the web based development/administration tool that comes with Oracle Database XE. Steve uses both the character mode, command line tool SQL*Plus (an invaluable tool everyone should be familiar with) as well as the graphical environment built with APEX – interchanging between the two environments to provide a comprehensive overview of both. After you are done with this section, you will be able to not only get into Oracle Database XE and play around – but build “real” applications that can be deployed to a large group simply by advertising a URL.

The next section contains the technical background you need to administer, secure and tune your database and applications. Steve covers everything from access control to how to run and interpret statspack reports in a step by step tutorial fashion. Everything is covered here – from not only how to read and understand the statspack reports (a report showing how the “system” is performing, the work it is doing), but how to install it and get your first reports generated. Nothing is left out.

Every chapter is full of examples – concrete examples. The examples are well thought out and demonstrate the key concepts Steve is trying to get across. A friend of mine ends his emails with the latin quote “nullius in verba” – or “nothing in words” – another way of saying “don’t trust just words”. This book exemplifies that trait and take pains to prove out time and time again that what is written here, is true.

If you are just beginning Oracle – whether you are an experienced SQL developer coming from another database as I was in 1987, or whether you are a complete “newbie” to this database thing – you will find this book truly useful in getting started – quickly and easily.

Switching keyboards...

Switching keyboards is hard. I got a new one yesterday based on a recommendation. It is the IBM UltraNav and in general, I’m very pleased with it. The play on the keys is pretty good, feels comfortable. I really like not having to reach and grab the mouse since it has an eraser mouse and a touchpad. It will take some getting used to – I still reach for the mouse when I don’t need to out of habit – but I’ll grow out of that over time.

There are three things I don’t like about the keyboard thought. First and foremost – the escape key is totally in the wrong place. Escape is supposed to be to the left of F1, on the top row of keys. Everyone knows this! If you look at a larger image of the keyboard though:

UltraNav keyboard

You’ll see the escape key is on a row above the F-keys. I spent most of yesterday afternoon dismissing the help window in Linux! Every time I edited something in vi, I ended up hitting the F1 key.

The second thing I didn’t like was – there is no “windows” key. I have gotten used to hitting Windows-E, Windows-L – a lot. That missing key is a big problem.

The third thing I didn’t like was the placement of the function (FN) key used to invoke the “third function” on some keys. It is exactly where everyone knows the CTL key belongs!!! I keep hitting it instead of CTL.

I found I can easily fix the escape key and Windows key problems. I found a small utility – KeyTweak – that let me map the F1 as escape and turned my right CTL key into the Windows key. Unfortunately, the FN key is not a key that can be remapped – it doesn’t send out a scancode when you press it.

So, there is only one thing that makes this keyboard less functional than it could be now – that FN key. I suppose over time I’ll get used to it – but then I’ll be hitting the wrong key on other keyboards…

Thursday, August 03, 2006

No matter which side...

No matter which side of the argument you are on – this just doesn’t seem right. Interesting idea on the part of the public relations firm, but it just doesn’t sit right with me for some reason.

Probably because I don’t like being deceived. Pretending to be an individual without a paid agenda just doesn’t seem right.

Evolution...

Evolution. A frequently asked question is “how can I get a comma delimited list of values grouped by some column”. That is, given the EMP table – I would like a column that is the ‘JOB’ column and another column that is a list of all employee names that have that job assignment, the result would look like this:

JOB ENAMES
--------- ------------------------------
ANALYST FORD,SCOTT
CLERK ADAMS,JAMES,MILLER,SMITH
MANAGER BLAKE,CLARK,JONES
PRESIDENT KING
SALESMAN ALLEN,MARTIN,TURNER,WARD

How can we achieve that. This question is so frequently asked that it is in fact one of the very first asktom questions ever. Back then the release was 8i – and the answer was very different from what we would say in later releases (that thread has the entire history). Back then, a PLSQL function could be used - or if you had a fixed number of rows per group by column (eg: you knew there would never be more than 25 employees per job) you could use DECODE.

Then 9i came out with user defined aggregates and “stragg” was first born (and itself has undergone many iterations). Later someone noticed that with analytics and CONNECT BY (with the then ‘new’ sys_connect_by_path function) we could perform this operation directly in SQL.

So, the answer today – when someone asked today how to develop this comma delimited list by some “group by” set of columns and to have the list sorted would be:

ops$tkyte%ORA10GR2> with data
2 as
3 (
4 select job,
5 ename,
6 row_number() over (partition by job order by ename) rn,
7 count(*) over (partition by job) cnt
8 from emp
9 )
10 select job, ltrim(sys_connect_by_path(ename,','),',') scbp
11 from data
12 where rn = cnt
13 start with rn = 1
14 connect by prior job = job and prior rn = rn-1
15 order by job
16 /

JOB SCBP
--------- ----------------------------------------
ANALYST FORD,SCOTT
CLERK ADAMS,JAMES,MILLER,SMITH
MANAGER BLAKE,CLARK,JONES
PRESIDENT KING
SALESMAN ALLEN,MARTIN,TURNER,WARD

Lets look at each bit in a little more detail. Starting with the “WITH” subquery. I find that to be a nice construct to factor out an implementation detail. In order to do this pivot trick – we need the data broken up virtually by JOB (hence the partition by in the analytics – we partition by our GROUP BY columns – the thing we want to “distinct” on). After the data is broken up by JOB – we assign each row in each JOB set an ascending ROW_NUMBER. Since in this case – we wanted the list of employee names sorted – we order by ENAME. If you didn’t care how the list of ENAMEs was sorted – you could order by anything you wanted. We’ll also gather the COUNT of rows by JOB in this step as well. We’ll need this later to keep just the ‘last’ row from this set of rows by job.

The result of the WITH subquery would look like this:

ops$tkyte%ORA10GR2> select job,
2 ename,
3 row_number() over (partition by job order by ename) rn,
4 count(*) over (partition by job) cnt,
5 decode( row_number() over (partition by job order by ename),
6 count(*) over (partition by job),
7 '<<<=== last one' ) tag
8 from emp
9 /

JOB ENAME RN CNT TAG
--------- ---------- ---------- ---------- ---------------
ANALYST FORD 1 2
ANALYST SCOTT 2 2 <<<=== last one
CLERK ADAMS 1 4
CLERK JAMES 2 4
CLERK MILLER 3 4
CLERK SMITH 4 4 <<<=== last one
MANAGER BLAKE 1 3
MANAGER CLARK 2 3
MANAGER JONES 3 3 <<<=== last one
PRESIDENT KING 1 1 <<<=== last one
SALESMAN ALLEN 1 4
SALESMAN MARTIN 2 4
SALESMAN TURNER 3 4
SALESMAN WARD 4 4 <<<=== last one

14 rows selected.

We have broken the data up by “JOB”, assigned the ROW_NUMBER() to each row and since the COUNT by JOB is assigned to each row – we can find the “last” row. Now all we need to do is assemble the delimited list. Enter: SYS_CONNECT_BY_PATH. This is a “new” function in 9i and above that can assemble the ‘genealogy’ of a row in a hierarchical query. For example:

ops$tkyte%ORA10GR2> select rpad('*',2*level,'*') || ename emp_name,
2 sys_connect_by_path(ename,',') enames
3 from emp
4 start with mgr is null
5 connect by prior empno = mgr
6 /

EMP_NAME ENAMES
--------------- ------------------------------
**KING ,KING
****JONES ,KING,JONES
******SCOTT ,KING,JONES,SCOTT
********ADAMS ,KING,JONES,SCOTT,ADAMS
******FORD ,KING,JONES,FORD
********SMITH ,KING,JONES,FORD,SMITH
****BLAKE ,KING,BLAKE
******ALLEN ,KING,BLAKE,ALLEN
******WARD ,KING,BLAKE,WARD
******MARTIN ,KING,BLAKE,MARTIN
******TURNER ,KING,BLAKE,TURNER
******JAMES ,KING,BLAKE,JAMES
****CLARK ,KING,CLARK
******MILLER ,KING,CLARK,MILLER

So, you can see how for the SMITH record – we can see that SMITH reports to FORD reports to JONES reports to KING – all in the result of the SYS_CONNECT_BY_PATH. But – how is that useful to us here? We don’t want to build a reporting hierarchy – we want a comma delimited list of names by JOB. Well, since we added that ROW_NUMBER() column, after breaking the data up by JOB and sorting by ENAME – we have added a new “reporting relationship” dimension to the data. In a given JOB, row number 1 is the parent of row number 2, is the parent of 3, is the parent of 4 and so on. So, if we were to START WITH all RN=1 rows (the first row of each job) and CONNECT BY that row to the RN=2 row for that JOB and then RN=3 for that JOB and so on, we would achieve:

ops$tkyte%ORA10GR2> with data
2 as
3 (
4 select job,
5 ename,
6 row_number() over (partition by job order by ename) rn,
7 count(*) over (partition by job) cnt
8 from emp
9 )
10 select job, ltrim(sys_connect_by_path(ename,','),',') enames
11 from data
12 start with rn = 1
13 connect by prior job = job and prior rn = rn-1
14 order by job
15 /

JOB ENAMES
--------- ------------------------------
ANALYST FORD
ANALYST FORD,SCOTT
CLERK ADAMS
CLERK ADAMS,JAMES
CLERK ADAMS,JAMES,MILLER
CLERK ADAMS,JAMES,MILLER,SMITH
MANAGER BLAKE
MANAGER BLAKE,CLARK,JONES
MANAGER BLAKE,CLARK
PRESIDENT KING
SALESMAN ALLEN
SALESMAN ALLEN,MARTIN,TURNER
SALESMAN ALLEN,MARTIN
SALESMAN ALLEN,MARTIN,TURNER,WARD

14 rows selected.

Now, we have what we need there – but we have more than we need – we don’t really want all of the rows by JOB, we just want the “last row” by JOB. That is where the count came in – just add a WHERE RN=CNT and:

ops$tkyte%ORA10GR2> with data
2 as
3 (
4 select job,
5 ename,
6 row_number() over (partition by job order by ename) rn,
7 count(*) over (partition by job) cnt
8 from emp
9 )
10 select job, ltrim(sys_connect_by_path(ename,','),',') enames
11 from data
12 where rn = cnt
13 start with rn = 1
14 connect by prior job = job and prior rn = rn-1
15 order by job
16 /

JOB ENAMES
--------- ------------------------------
ANALYST FORD,SCOTT
CLERK ADAMS,JAMES,MILLER,SMITH
MANAGER BLAKE,CLARK,JONES
PRESIDENT KING
SALESMAN ALLEN,MARTIN,TURNER,WARD

There you go – the desired result, straight in SQL, no need for PLSQL (not that PLSQL is a bad thing mind you, but if we don’t have to use it we will not use it – if you can do it in SQL, do it).

It is fun to go back and read things that started some six years ago and see how the evolved over time sometimes. I like the interaction as well – I never would have thought to use analytics and sys_connect_by_path myself, especially not after having developed stragg (once you have a hammer….). I like the sys_connect_by_path approach now – might be a little more “complex” looking at first, but once you get analytics into your brain – it becomes easy.

Wednesday, August 02, 2006

When advertising....

When advertising becomes ironic. (I’ve always wondered if I am using the word ironic correctly – I think so, but even if not – the advertisement coupled with recent events is sort of interesting).

Travel is glamorous...

Travel is glamorous. That is the old saying. I have over 70 airline segments this year and will have my 100K locked in for next year any day now.

It isn’t glamorous. It is tedious sometimes, fun sometimes. I read three stories recently that drive this point home.

The first one was “woman gets kicked off plane because of handbag”. The bag was Gucci you see – far too precious to be stowed away like everyone else’s bag. So, rather than stow it away, she delayed the entire flight for an hour. I don’t know who I would have been more upset with – the woman or the airline, for not removing her more expediently. I’ve seen many a person get bent out of shape because they had to do something (trivial) that they didn’t want to do. It is not much use arguing with the flight attendants – they know who will win that argument.

The next one is sadder. If you are going on a trip to get psychological treatment, perhaps a nice train ride or a bus would be better than a plane. I often sit near the door (economy plus seating on United). I don’t know what I would do if someone went a bit over the edge and started to try and open it. That would be alarming.

Lastly – this one is inexcusable on part of the airline (worse for me – the airline I fly the most on). To be locked up in a plane on the tarmac for over an hour at or above 100 degrees F (about 37 C) – that is crazy. I have experienced the 6 to 7 hour flight from Chicago to Washington (yes, it is more like an hour flight normally). You push back and then wait, and wait, and wait. At least we had power, and cooling. It was an entirely full flight. I had the aisle seat, and a brand new – very thick book to read. The poor guy in the middle seat, not only in the middle seat but the only thing he had to read was – the FAR (Federal Accounting Regulations). That made me feel better – it could have been much worse.

Tuesday, August 01, 2006

Happy anniversary....

Happy anniversary to me. 13 years to the day – I started work at Oracle. I’ve lived through production versions 7, 7.1, 7.2, 7.3, 8.0, 8iR1, 8iR2, 8iR3, 9iR1, 9iR2, 10gR1, 10gR2 – and if the next release comes out before next August – it’ll be 13 versions of Oracle in the 13 years of my tenure here.

Given the amount of change in the last 13 years (remember what it was like in 1993? The world was a different place), I just wonder what the next 13 will look like.