Generally, one of the first mistakes someone will make when they write SQL is to create a cartesian join by leaving off part or all of their where clause. This generally results in queries that take forever to finish (if they ever do), and have far too much information in them. If you do an explain plan and you see “cartesian join”, the natural impulse is that you've done something bone-headedly stupid and that you need to fix it. And this is almost always the case.
But when I was building the NaNo tracker (see the previous post), I found myself needing a cartesian join. It felt weird, and I admit that I had trouble typing it, but it was the right tool for the job.
When a user first logs into the tracker and starts a challenge (there's only one to chose from so far, but I've got plans), they are taken to a page with a calendar on it. All of the days of the challenge are populated with default data. I briefly considered putting a loop into the registration process that would auto-populate these defaults, but I was concerned that that might result in a lot of useless data cluttering up my tables; alternatively, I could add a lot of special-case code to my application to handle the “null” situations, but that was also an unattractive solution. No, I only wanted to save a record when a user actually changed something, and avoiding special-case code is always good. So, instead, I drive the calendar off of a view. Ok, a series of views. And the core data of the view actually doesn't exist in any table.
First, I use a nifty trick with hierarchical SQL to get all of the dates between the start and end dates of the challenge:
select c.challenge_id, c.start_date + v.lvl cal_date from challenge c, ( select challenge_id, level - 1 lvl from challenge c2 connect by level <= (c2.end_date - c2.start_date + 1)) v where c.challenge_id = v.challenge_id
Then, I join this against the user data to get a view that has a record for each user for each date using--yep--a cartesian join:
select d.challenge_id, u.user_id, d.cal_date from dates_v d, users u
No where clause at all (ok, that's not quite true--I filter the users down to those who can use the application; then again, I'm also changing the table names out of paranoia). This allows me to treat my default values as if they are actual rows in the database, without having to actually create them until they're needed, and I don't have to create any special-case logic within my application. All of this magic is encapsulated into two simple views, and because of the nature of cartesian joins, it just works.