[Tfug] Trusting DBMS results

Mon Nov 3 13:07:20 MST 2008

Hi, John,

--- On Sat, 11/1/08, johngalt1 <johngalt1 at uswest.net> wrote:

> How much is an answer worth?
> 
> more specificity please.
> 
> What is the sensitivity of the data?

"Sensitivity"?  Not sure I understand that in this context.

The data is not sensitive in the sense of "personal/private"
that should be protected from disclosure.

OTOH, some of it can be considered highly sensitive to *change*
(i.e., sensitivity in the electronic sense).  E.g., you'd be
annoyed if your bank account records were mysteriously "off"
by 1024 dollars, etc.

> What is the magnitude of risk and consequence of failure?

Failures are inconveniences/annoyances.  What is the cost
of your VCR/Tivo failing to record a program that you
*know* you told it to record?  (What happens if that
program will never be aired again -- i.e., it is lost
forever).  I'm not worried about safety issues as I
don't trust anything in that regard...

But, folks quickly get annoyed with devices that, in their
mind, "are broken" like this.

> ----- Original Message ----- From: "Bexley Hall"
> 
> > I'm *heavily* integrating a DBMS into an application.
> > I.e., things that I would often "hard code" in the
> > application have been moved into the database.
> > 
> > As a matter of principle, I "never trust inputs".
> > (regardless of whether they come from users or
> > "sensors", etc.)
> 
> If you don't trust your sensors, how can things
> function?

That's why you don't trust them!  E.g., some of the
devices I work with control *tons* (literally thousands
of pounds) of mechanisms in motion.  If you aren't
100.0000% sure that you know what the mechanism is
*doing*, you don't *do* anything!  (i.e., "you make
the system safe").

If, OTOH, you implicitly trust the data you are receiving,
then, at the very least, you will "do something wrong";
or, potentially, "cause harm" (i.e., to a person or an
organization -- perhaps by throwing thousands of dollars
of "product" away).

For similar reasons, you don't trust the actuators
that you have control over.  I.e., just because you are
telling a motor to "spin clockwise", that doesn't mean
that the motor won't end up spinning COUNTERclockwise!

> > But, how pedantic should I treat the data that "I"
> > (i.e., my application) has placed in the database?
> 
> How pedantic can you get?

I can be incredibly aggressive in validating the data.
But, this all comes at a cost.  When creating the data
(i.e., at build-time or even at run-time), I have lots
of associated information to draw on.  But, when looking
at a "retrieved datum", that information isn't readily
available -- perhaps not even readily "retrievable".

E.g., I can encode the data in such a way that I increase
the chances of detecting "invalid data".  But, that implies
that the DBMS and/or its backing store are unreliable
(i.e., if the DBMS can't guarantee that the data I give it
comes back to me unaltered, isn't it a defective "component"?)

> > For example, tables that are "written during manufacture"
> > (and never altered, in theory, at run time)... should I
> > have faith in their sanity?  I.e., should I treat the
> > DBMS as a reliable medium?  Or, one that is "suspect"?
> 
> It sounds as though you may need to do an experiment for
> yourself to have commensurate trust. Or, do research on
> systems designed for very high reliability.

My question is more fundamental than that.  Do you *trust*
the databases that you use -- as well as those that are
used by others which *affect* you -- in daily life?  Or,
do you *assume* the data to be unreliable (note, you must
separate the human operator's unreliability from the DBMS
in making this assessment).

Do you trust the CPU in your PC to perform every computation
correctly?  Do you trust the memory system to return exactly
the values that are stored in it?  Do you trust the disk
system?  Are you sure that software RAID inplementation is
as reliable as an (allegedly) *hardware* RAID implementation?
(If not, how do you employ these things!)

*That* is what I am asking re: DBMS.  Are they "reliable
components" or "flakey" (in the situation I have described).

For example, when I deal with NVRAM technologies, I *don't*
expect every location in the device to accurately reflect
the data that I have stored in it 100% of the time.  This
is a direct consequence of how that data is maintained.
I.e., most such technologies are vulnerable to "short writes".
Depending on the technology, that could compromise a single
datum ("byte") *or* the entire device (or a large portion
of it -- like blowing a row in a DRAM).

Some technologies with limited wear are maintained in ways
that makes them particularly susceptible to this sort of
problem.  For example, early EAROM, MNOS, "Flash" devices
(etc.) had a very limited number of write cycles (10^4).
You simply could not treat them as "regular memory" since
you could easily consume the device's lifespan in a week
of *normal* use (i.e., assuming you aren't deliberately
trying to wear them out!).

As a result, one often kept a shadow copy of the data in question
and used that for all references (read *and* write).  When the
device was shutting down (i.e., if explicitly commanded to do so
*or* implicitly detected -- by means of a power fail sensor -- you
would quickly update the memory device in a non-maskable interrupt,
etc.

Depending on how "precious" the data is and the costs of 
recreating or replacing it, I use different protection 
techniques to safeguard it. But, this comes at an increase
in resources (time and space).

Scaling this up to many {mega,giga}bytes in a DBMS can quickly
get overwhelming (costwise).

> > Note that I implicitly trust values that I "store" in RAM
> > from my computations.  Should the integrity of the DBMS
> > be considered on a par with that?  I.e., assume the DBMS
> > has mechanisms to detect and repair problems just like
> > ECC memory catches alpha particles, etc.
> 
> Paper will catch alpha particles. Are your ECC memory dies
> exposed to air?
> 
> Unless the goal is philosophical conceptualizing about the
> reliability of generalized DBMSes on virtual hardware, try
> to consider what this query sounds like to the reader and
> modify accordingly.

That is the essence of the question!  Are (mature) DBMS's
worthy of being treated as "reliable components" -- as
reliable as the processor, memory subsystem, disk system, etc.

--don