[Tfug] OT? "Hamming distance" in non-binary codes?

Bexley Hall bexley401 at yahoo.com
Wed Feb 5 17:48:12 MST 2014


I need to come up with a (large) set of identifiers.
Ideally, numeric (decimal).  And, as "short" as

But, a naive implementation (1, 2, 3, 4, 5, ...) is
prone to problems in transcription, etc.

For example, depending on typeface, an '8' can be
misread as a '3'.  Or even a '6' or '9'.  Perhaps
a '6' misread as a '5', etc.

*If* you can choose the graphic representation of
each symbol (i.e., typeface and size), you can probably
minimize the chances for these sorts of "misreads".
However, if you are at the mercy of others in how they
record and present the data, then there is nothing that
you can do to prevent the choice of a typeface that
adds to this ambiguity (and, for hand-written identifiers,
all bets are off:  "Is that a 4 or a 9?")

In effect, you want to increase the hamming distance between
"valid" identifiers so these sorts of transcription errors
are minimized and/or easily recognized ("Um, '33333' is not
a valid identifier -- are you sure you don't mean '38838'?").

Additionally, this helps provide some protection against
folks "guessing" valid identifiers.  Credit card issuers
(the phreaking topic reminded me of this) exploit this to
minimize the chance of a CC account number being "mis-read"
(or, mis-heard!) orally, etc.

Any suggestions as to how to approach this deterministically?
Ruling out digits (symbols) that can be misread/miswritten takes
a huge cunk out of the identifier space (i.e., each "digit place"
can only represent ~5 different symbols instead of *10*).  OTOH,
I think smarter coding schemes can provide me with something more
than this.  E.g., disallow certain combinations of "susceptible
digits" but still allow their use (trivial example: 83 and 38 are
valid but 33 and 88 aren't)


More information about the tfug mailing list