[Tfug] CPU Query
Bexley Hall
bexley401 at yahoo.com
Wed Apr 4 11:56:25 MST 2007
--- Jim Secan <jim at nwra.com> wrote:
> > I assume you mean to imply that your tasks
> > are "compute-bound" and not I/O-bound? Do you
> > have enough awareness of what the actual
> algorithms
> > entail (e.g., fixed point vs. floating point,
> etc.)
>
> I wrote all the code, so I know exactly what it's
> doing (OK, so I didn't
Well, at least you *hope* you know! ;-)
> write the SVD package, but that's from one of the
> optimized libraries).
> I/O has all been optimized such that so you read it
> all in (binary
Actually, this is one place where two processors
could have saved you something -- since the I/O
can happen concurrent with the processing. (assuming
there is a LOT of I/O...)
> unformatted), crunch numbers, and then write it all
> out. The "bind" is in
> floating-point operations (mostly matrix
> manipulations - this is a largish
Yes, floating point is almost always a pig.
But, the time required to *accurately* do away
with floating point in favor of fixed point math
rarely makes sense on "run once" applications
(by that, I mean anything that doesn't run
frequently -- for sufficiently large values of
"sufficiently"...)
> inverse problem). My interest is in whether the OS
> can take advantage of
> the 2X CPUs without my having to get a compiler
> (Fortran) that will do
Recompiling with a modern (?) compiler would be
an inexpensive first step. I assume you have
taken care to look at just how you access the
matrices soas to not invalidate the effectiveness
of and D-cache on the machine? E.g.,
for (r = 0; r < ROWMAX; r++)
for (c = 0; c < COLMAX; c++)
matrix(r,c) = F(r,c);
behaves quite differently from:
for (c = 0; c < COLMAX; c++)
for (r = 0; r < ROWMAX; r++)
matrix(r,c) = F(r,c);
Also, make sure you have enough physical memory
to avoid any paging as this would quickly lead to
thrashing when manipulating a large matrix.
> this. Either that, or get into the manual "loop
> unrolling" business,
> which loses me more (in my time) than I would gain.
> I want to know if
> paying a little more for a 2X CPU will gain me in
> throughput without my
> having to do anything other than copy codes over
> from my current FC3
"Measure, then optimize". Why not try running the
code on a small data set and time it? I wonder
if the multicore boxes do anything notably different
than a multiCPU box? I.e., perhaps find a generic
2 CPU box, run the code. Pull one of the CPU's and
run it again? If there was a notable decrease in
performance, I would be encouraged. Unfortunately,
if there was *minimal* difference, I wouldn't
conclude anything from it (since there may be
differences brought about by the fact that dual CPU
designs have to bring everything out through the pad
drivers (significant delay) while a dual *core*
can skip this...
> system and go. As a related side issue, I could
> care less about video
> performance - I work at the command line and could
> live with this on a TTY
> user interface.
>
> I have heard that some OS's (distros) will do a sort
> of load-leveling, but
> I don't know what sort of gain this would provide
> for a single process. I
Most of the "system" time spent on a *workstation*
(as different from a "server") is negligible.
Running the network stack can eat up some resources
but I suspect you aren't *moving* any data across
the wire so that would be negligible. Likewise,
any threads servicing I/O would be minimal in
your description of your implementation.
> have doubts about that, and that's why I'm asking.
> I certainly don't want
> to find that I pay more for a 2X only to find that
> my processing runs
> slower than a comparable speed (and cheaper) 1X
Exactly. This is the Windows model... :>
> because I gain nothing
> from the second processor and lose from extra things
> the OS is doing
> because it knows it has more than one processing
> path through the CPU. I
> have seen this sort of thing happen to people trying
> to parallelize or
> vectorize their codes. Definitely a YMMV situation
> (and possible also a
> TANSTAAFL situation RE gain without pain).
My approach: (assuming the code/system is portable)
Try runnig the code on the sort of box(es) you
are looking at and time them. If you don't see
at least a 25% improvement, your better bet is just
to wait 3 months for a faster 1X CPU to become
available.
*If* you have a multiCPU box handy, perhaps try the
"pull one CPU" trick and see if it gives you any
marked difference. (it would be interesting to
*know*...)
How long can you afford to wait? :> I.e., I tend
to operate on the adage "do nothing unless you
can realize a 2X performance increase -- cuz the
time it takes to implement/debug/validate can better
be spent WAITING for faster hardware..."
HTH,
--don
P.S. *I* would appreciate hearing how this shakes
out for you...
____________________________________________________________________________________
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html
More information about the tfug
mailing list