[Tfug] Question tres: Overbuffering overkill

Thu Sep 4 11:05:25 MST 2014

Hi John,

On 9/4/2014 12:48 AM, John Gruenenfelder wrote:
> And now for my third question.  This has to do with, I believe, an absurdly
> excessive amount of I/O buffering being done by the system as I attempt to
> copy approximately 9 GB of data from my super fast SSD to a microSD card
> (plugged into my laptops SD card slot via an adapter).
>
> I, of course, don't have a problem with the system buffering and caching data
> to improve performance, but in this case it is actually causing the whole
> system to become broken, most directly my wireless network connections.

> The data in question was my music collection.  From the command prompt I did
> something along the lines of a "cp -a" to copy all of the files and subdirs.
> Immediately, pages and pages of "copy src to dest" (I had used the verbose
> switch) flew across the terminal.  Obviously there was no way it could write
> data to the card that rapidly so I assumed that it had simply read all of
> these source files into the FS cache in memory.  After that, the messages
> slowed as the data was steadily, and slowly, written to the card.

> Unfortunately, for reasons I am completely unsure about, this caused immense
> problems with my ongoing network connections.  At the time I was SSHd into two
> other machines, one on my LAN and the other on the Internet.  I also had my
> browser open.  While this super-buffered copy was going on, I began to have
> enormous latencies in my SSH connections.  Occasionally they would respond in
> a semi-timely manner, but most of the time I would get no respose at all.
> Eventually, both connections were dropped due to, I believe, timeouts or maybe
> packet loss.  At the same time I was also unable to browse to any web pages or
> establish any new SSH connections.  My NFS connection to that same computer on
> my LAN also timed out and eventually the automounter decided the remote
> machine was unavailable.

This *sounds* like a crappy driver interface.  One where the driver
spins (nonpreemptible) EXPECTING some operation (i.e., X lines of code)
to happen "in short order" -- and it actually *doesn't*.  E.g., this
often is manifest in error handling code where the algorithm loops
on a condition (waiting for it to become true/false) INDEFINITELY:
     while (operation != COMPLETE) {
         // spin
     }
instead of imposing a specific timeout on that operation to limit the
time spent waiting for it (i.e., the above will "hang" until the
operation *is* COMPLETE).

In userland, this isn't a problem as the kernel can activate another
(user) process while you are spinning.  If the device driver (and
kernel itself) is not preemptible, then a naive code fragment like
the above can bring the system to its knees in short order.

[I.e., I write my drivers as free-standing processes so *each* can
be preempted, giving no "implied preference" to *any*!  If the interface
to the memory card is "indefinitely busy", then the *memory card*
suffers, not any other device.]

Alternatively, if the system has a unified buffer cache (i.e., buffers
for the file system are also shared with the network stack), then it
is possible that the filesystem's rapid consumption of those buffers
(i.e., reading everything off the SSD into kernel memory) is starving
other devices that similarly rely on the availability of those buffers.

[You could possibly test this by reducing the total volume of data
transfered such that you never exhaust *all* of the buffers in
memory -- even if it still takes a long time to complete the write!
I.e., while the write is still in progress (because of slow/flakey
memory card), see if the network remains responsive.  If it does,
then it is a dearth of buffers causing the initial problem.  If it
does NOT, then it is a flakey driver.  For example.]

Do you have any other "slow" peripherals that you could similarly
use as targets?

> To make things a little worse, when the file copying was finally done, the
> network didn't recover.  Using Gnome's network manager, I turned off wifi,
> waited a few seconds, then turned it back on.  When it reconnected to the AP
> it once again behaved properly.
>
> Back in the "old days", I can remember poor/spotty system performance when the
> system would be bogged down by really heavy I/O, but that usually meant
> copying large volumes of data from one HDD to another or from one HDD to
> another area on the same HDD.  The data rates were much higher, the disks were
> much slower, and system performance suffered.  In this case, however, the rest
> of the system never skipped a beat.  The disk in question is an SSD, so the
> max possible data rate is much higher and with SATA uses a lot less system
> resources than those older drives.  In this case, the much slower write speed
> of the SD card was the limiting factor.
>
> Fortunately, this isn't something I do frequently, but it is still puzzling
> and I'd to have some idea of why it happened and if there is anything I can
> do/configure to make it better.  If it helps, I'm not using the normal HDD I/O
> scheduler, "cfq".  Rather, after reading some things on the Net about
> optimizing for SSD systems, I'm using the "deadline" scheduler.  If I recall
> correctly, cfq has a lot of code that worries about optimizing for seek times
> and platter locations, details which have no meaning with an SSD.  Perhaps
> this has had some unforseen consequences.
>
> Anybody have any ideas?

I assume the same algorithm is being used by the memory stick device
as well (?)

Deadline schedulers effectively impose a strict ordering on operations.
As your task set has no *real* differences in deadlines (i.e., all
write operations have an artificial deadline imposed:  now+something.
It's not as if *this* write has a deadline of now+23 and the next has
a deadline of now+906, etc.), the time required for the first task
(buffer write) to complete will be "seen" by all subsequent tasks.
I.e., every buffer's deadline will be N time units closer (assuming the
first buffer required N time units to complete).

So, the scheduler will always pick the buffers in the order they were
"created".  I.e., NO buffers will be emptied until the buffer at the
head of the queue is emptied.

[I don't imagine there are provisions for the deadlines of queued tasks
to be altered (unlike a real-time system where deadlines can be dynamic,
deadline handlers can resubmit tasks, etc.).]

So, if a device is reluctant to accept a buffer write (i.e., flakey
or doing some internal wear leveling), EVERY buffer operation behind
it waits.  I.e., there is no way to "free" one of those buffers by
processing it "out of order".

Deadline schedulers are a silly option in non-real-time systems
where there is no concept of *hard* real-time (i.e., a point at
which you abandon a task because it's deadline has expired).  They
form bottlenecks that the system can't work around (because the
highest priority task *remains* the highest priority task when it
has trouble completing -- like a hardware problem.  The system
never has a chance to make progress on some *other* task because
it is of a lower "priority")

Try changing the scheduling algorithm on the memory card device
(the SSD's algorithm shouldn't be a part of this problem).

Have you tried a different memory card as well?