[Tfug] RDBMS reprise

Wed Jan 23 17:14:11 MST 2008

Hi, Tim,

--- Tim Ottinger <tottinge at gmail.com> wrote:

> Bexley Hall wrote:
> > I don't see any way to *avoid* making two (or
> more)
> > passes at the server to get all the data that an
> > application may need (e.g., the PDA example I
> > sited is a great model for thinking about the
> > sorts of issues that can arise... "How large of
> > an address book do you expect the user to have?
> > How many fields in each 'contact'?"  etc.)
>   
> Okay.  Have you worked on making more, narrower
> queries?
> for instance, if you want the address book, you
> probably don't really want the address book.

<scratches head>  Huh?  (typos)

> You might want record ID and
> name first, which is less bandwidth and local

I don't worry about bandwidth.  Just storage on
the client.

> storage.  Maybe
> you also only want As and Bs for the first screen. 
> When
> someone selects a person by name, then you might
> want the one person's record.
> 
> Did you already do that?   The initial question
> sounded like you wanted too much at once.

If I take things "in small bites" (as you suggest)
then I run the risk of the dataset changing between
the time I look at the first *portion* of the data
and the *rest* of the data.  (e.g., John Doe's
record can disappear, be renamed, etc. between
the time I pick "John Doe" and the time I issue the
query to retrieve the balance of his record.

While this is unavoidable, it violates the Principle
of Least Surprise:  the user *saw* there was a
"John Doe" in his address book and, now, when he
goes to look at john does's *record*, the device
complains that there is no such person!

(recall *users* don't understand that this is two
separate queries... they just see data that *was*
present is suddenly *not* present!)

> BTW:  Selecting by first letter of name is generally
> much cheaper than setting the 'first' and 'length'
> on a bigger query.  Especially if the db is indexed
> apropriately.
>
> > Currently, I qualify each successive query with
> > those from which it was (obviously) derived so
> > that I can tell if a record has "disappeared",
> > etc.  I can see no way around this possibility
> > short of explicit locking *or* implicit locking
> > (even client-side locking!)
>   
> Probably okay to be optimistic, then. I get 
> (id,name) as (1234, "Ottinger, Tim") then when I
> select my name, I select from people where
> id = 1234.  If there is no
> such record, then I know that something happened
> on the other end.

*You* (I, the programmer) know this is what happened.
But, Joe User doesn't understand this.  Instead, you
get a call to tech support (angry/confused customer)
and you have to explain what *might* have happened.

[note the PDA example falls down here as it is too
simplistic.  Imagine looking looking at a directory's
contents, selecting a file and then being told that
the file doesn't exist!  "Huh?  I just *saw* it?!!!"
Sure, even if you refresh the directory listing and
show the user that it is no longer present, it
results in confusion on his part:  "Where did it go?
Why??"]

> If you can't save data, and can't hold locks, then
> you'll have to have some recovery logic.  Tradeoffs,
> y'know?

Of course!  Hence my point that the cleanest
implementation is to burden the server with this
requirement.  And, if the server ends up unable to
satisfy subsequent requests from *other* clients,
then these look like simple "server busy, please
try again later" errors (which, IMHO, are much
easier for people to relate to than "disappearing
names/files")

> > Datasets vary.  Some may be wide, others narrow.
> > E.g., consider the smarts that you might add to
> > fetch the area code for a particular address
> > based on its ZIP code (I, for one, don't enter
> > area codes in my address book so if I ever
> > forgot Tucson's -- for example -- I would have
> > to make this extra step manually).
>   
> Yeah, but you can always get the very least you
> need,
> and not select any data that you *might* need later.
> So you get the fewest columns of the fewest rows you
> can possibly get away with.  I don't know any better
> way to deal with having a lack of space on the
> client.

That was my point.  Taking bite sized pieces
complicates the recovery logic, forces it to vary
as a function of the data/application *and* confuses
the end user.  In light of that, it seems that
server-side caching is the better approach.  It
also gives a simple upgrade path -- add more memory
to the server and *all* clients benefit.

> Well, there is the idea of pulling in data and
> compressing
> it one way or another.  Then you get a little
> storage back
> for a bit of processing.  But that's hardly a
> panacea either.

Exactly.

> More data on the client == more stale data on
> client. 

But the data makes sense to the user.  If John Doe's
address has changed in the intervening
seconds/minutes/hours/days since he saw the list
of names, at least the data that he had represents
a valid data point that *did* exist (at one time).
Contrast that with having a name that *existed*
but the record behind it that *doesn't* (any more)

> If you're low on resources, you have to be high on
> finesse.
> 
> You probably know the flyweight pattern, too. Maybe
> that's helpful. Possibly not.
> 
> Area-zip isn't one-to-one, either. That makes it
> harder.

I think ZIP to area code is damn near one-to-one
(though area code to zip is not).  Regardless, it
is only germane to this bogus example...

> But a very focused query could get just the area
> codes
> for the one zip code, and then it's a matter of
> whether
> you REALLY need the city name and state name.
> 
> Sorry I'm not more helpful.

<grin>  I suspect its just one of those problems
that *has* no real (clean) solution.  Can't squeeze
both ends of the balloon at the same time!

Thx,
--don

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ