Opened 2 years ago

Closed 20 months ago

Last modified 20 months ago

#17651 closed bug (fixed)

Super "fast" copy BeFS to BeFS files

Reported by: Windes Owned by: nobody
Priority: normal Milestone: R1/beta4
Component: Applications/Tracker Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

When copy for example 2Gb of files from 1 partition to another it shows that copy done in about 2 seconds! Wow. But after there are about 2 minutes of USB led lightning, when copy is really doing. User becomes shocked and misunderstanding. Please show progressbar window until copy is really done.

Change History (12)

comment:1 by waddlesplash, 2 years ago

Component: File Systems/BFSApplications/Tracker
Owner: changed from axeld to nobody
Platform: x86-64All

comment:2 by waddlesplash, 20 months ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: newclosed

Should be fixed by hrev56429, which now sync()s during the copy loop.

comment:3 by axeld, 20 months ago

Do we behave any different from other operating systems here? AFAICT it's perfectly common to do it like this. Maybe limit the copy buffer for removable media in the kernel directly instead to make the difference smaller? I think it's perfectly okay that differences exist here, and I don't think calling Sync() is such a good idea, either.

comment:4 by waddlesplash, 20 months ago

Why would we limit the file cache buffers in the kernel? That would just cripple operation of Haiku on removable media, where speed depends heavily on the caches to the slower external storage.

comment:5 by pulkomandy, 20 months ago

Why not just call Sync once at the end of the copy, just before closing the copy window? We'd lose some accuracy in the progress bar reporting, but that's probably fine. I don't see why it's needed to call it in between each file.

The problem currently is that the copy appears "done" but in fact nothing was written to disk at all.

Windows actually has an option for how it manages file operations on removable media, it will either use its normal caching (then you have to "eject"/unmount before disconnecting your flashdrive), or use a mode where it ensures all operations are synchronous so that when the software says the operation is done, everything is written to disk.

On BeOS, a lot of caching was used, for example on floppy disks, nothing would be written to disk until unmounting. The case of floppies is of course a bit extreme, but this made a huge difference and made the system a lot more pleasing to use (and saved a lot of writes to floppies if you are moving files in and out).

Why would we limit the file cache buffers in the kernel? That would just cripple operation of Haiku on removable media, where speed depends heavily on the caches to the slower external storage.

Calling sync() between each operation is not really different. It's just doing the same in an even less efficient way.

comment:6 by bipolar, 20 months ago

Just my two cents as an end-user that gets bitten by this regularly on most OSes. A simple use-case...

Copying haiku-r1beta3-x86_64-anyboot.iso (718.81 MiB), from an NTFS partition on HDD, to a BFS partition on a slow USB 2.0 Pendrive:


1- Haiku hrev56417 (pre "Tracker-Sync"):

Using tracker to copy the file the first time shows the progress dialog reporting speeds from 33 MiB/s at the start, to more realistic 3 MiB/s later on... till near, say 75%, then it just rushed to 100%. This took around 3 minutes.

Running sync right after that from a Terminal took a couple of extra minutes accordig to my watch.

Removed the file from the pendrive, copied it again via Tracker. This time the progress dialog progressed far faster (due to the read cache, I assume), and running time sync after that dialog dissapeared... took 6 minutes and 41 seconds.

2- Haiku hrev56441 (post "Tracker-Sync"):

Repeated the same copy operation (removed the iso before updating/rebooting):

Progress dialog more inline with the actual writting speed. time sync returns immediatly.

Repeating the operation was a bit faster (due to the read cache, I assume).


For comparison:

1 - Linux behaves like in the first case (far worse, even), unless you set vm.dirty_bytes and vm.dirty_backround_bytes to smallish fixed values (instead of relying on their *_ratio counterparts (that uses a percentage value of the installed RAM size).

Using 16 and 48 MiB for those, makes this particular use case behave as expected (copy dialogs more closely reflect what is actually written)

Using that, I haven't noticed side-effects, not on my slower hardware (Atom CPU, 2 GB RAM, slow HDD), nor on my "faster" one (Phenom II, 8 GB RAM, slow SSD). Can't comment on how that behaves on faster systems.

2 - Same operation on Windows with default settings (using ntfs on the same slow pendrive)... Progress reports fast speeds, till certain point where it slows to the actual write speed on the pendrive. Not ideal, yet better than default Linux. Being able to successfully eject the USB afterward... a lottery.


From a simple end-user perspective, like me, having the progress bar show the actual write speed, and a decent approximation of what has been actually written into the hardware, is what its expected.

I can't, of course, comment on the merits of the technical solution in hrev56429, but having the progress dialogs "lie" about reaching 100%, or having to wait 5+ minutes for unmount (or a manual call to sync) to complete, is... not ideal.

comment:7 by nephele, 20 months ago

Maybe I am missing something here bit both cases of calling sync seem unsatisfactory to me.

The by file sync is slightly more acurate in reporting but will make the case of waiting for a slow medium worde since a lot of time is now waites on writes to the second medium that could still be read on the first to fill the cache. (also bad if you say copy something from a drive, if i have 32gb if ram i expect the OS to use that and copy a 1gb file off off it immidiently as bandwith permits and the write it to a slower medium again as bandwidth permits.

The second solution is a bit nicer but makes the progress bar completely inacurate.

I wonder if there is no way for the OS to report the actual progress of how much is actually written (and read) and use that for a "2-part" progressbar (upper half read, lower half written) or one based only on written progress.

in reply to:  7 comment:8 by korli, 20 months ago

Replying to nephele:

I wonder if there is no way for the OS to report the actual progress of how much is actually written (and read) and use that for a "2-part" progressbar (upper half read, lower half written) or one based only on written progress.

Maybe using fdatasync() asynchronously after each write. https://pubs.opengroup.org/onlinepubs/9699919799/functions/fdatasync.html

Last edited 20 months ago by korli (previous) (diff)

comment:9 by waddlesplash, 20 months ago

There was some discussion related to this on the mailing list: https://www.freelists.org/post/haiku-commits/haiku-hrev56429-srckitstracker,7

Maybe using fdatasync() asynchronously after each write.

Well, actually we should resize the file at the beginning of the copy so that it is not continually resized by the underlying filesystem leading to fragmentation or slow allocation. That should also provide a performance gain.

Calling sync() between each operation is not really different. It's just doing the same in an even less efficient way.

We only call sync() in this one place (copies in Tracker), we don't call it for general I/O, rsyncs, or any of the other hundreds to thousands of things that use the system cache. Limiting the buffers would have a huge impact on performance that isn't necessarily warranted. Further, this affects writes to standard HDDs, not just external ones, too.

I agree with korli's comments on the mailing list that doing the sync() somewhat asynchronously might be nice, but we do not really have a good facility to do that at present. Read performance is almost always much greater than write performance, so I expect the major difference in speed here will be from waiting for things to be written out to the disk.

If we really wanted to we could experiment with sync'ing just once at the end, however I suspect based on the ticket's original comments that would mean an apparent "hang" of Tracker which is probably much worse.

comment:10 by pulkomandy, 20 months ago

We only call sync() in this one place (copies in Tracker), we don't call it for general I/O, rsyncs, or any of the other hundreds to thousands of things that use the system cache. Limiting the buffers would have a huge impact on performance that isn't necessarily warranted. Further, this affects writes to standard HDDs, not just external ones, too.

Yes, the question is more on how to do it efficiently.

We use BNode::Sync() which calls _kern_fsync. It's important to be clear about what we use here, this syncs only a single file to disk, unlike sync() which syncs everything.

I assume this is synchronous and blocks the caller until the sync is done. While the caller is blocked, we are not starting the read for the next file. As a result, this creates a serialization that is not needed at all, as finishing the write of one file is preventing the read of the next one.

So, this is defeating the use of the file cache which is to allow these operations to run in parallel at the disk side. Yes, it is only for Tracker copy operations and we should probably keep it that way for now, even if having more agressive flushing on removable media overall would be a good idea. And also on non-removable media, since no one really wants to have hundred of megabytes of data they thought they had saved be in fact still pending in cache, and lost in a power cut. Anyway, that's a story for another ticket.

So, if we'd like to keep the parallelism of operations, we don't want the Sync to block the thread that would read the next file. There are many ways to solve this:

  • Don't call Sync at all. As we know, this leads to the confusing behavior mentionned in this ticket
  • Do all the Sync calls at the end. We would first do all the copies, and then go over and sync all the files. This means keeping a lot of file descriptors open until the end of the copy. We can still do a reasonably accurate progress bar: it would progress to 50% after doing all the copies, and then to 100% after doing all the Sync, for example
  • Make fsync non-blocking or do something like io_uring. A lot of new stuff to add to the kernel and a lot of quite complex userspace code to write too. No one is willing to do that work and it would be needlessly complicated for just a file copy operation.

I think a possible simpler solution is to use two threads for the copy: one thread does the copy (without sync), then send the file descriptors to another thread who just calls Sync() on each destination file and then closes it. This way the sync does not block the reading of the next file to copy. The progress report can be based on when the files are Sync'ed and closed by the second thread. In this case I think we still get a copy going as fast as it can be (since the Sync does not prevent starting more reads), the copy progress is being tracked file by file, and the files are actually written to disk once the work is done.

That would not need too much work: no need to introduce new system calls and completely redesign the copy operations, we can keep using high level BNode/BFile code. And it would result in more efficient use of available resources (scheduling a read while waiting for a write to complete, to be sure we maximize disk usage and allow the io scheduler to run these operations in the most efficient order, even if right now our io scheduler is maybe overly simplified).

comment:11 by waddlesplash, 20 months ago

We can still do a reasonably accurate progress bar: it would progress to 50% after doing all the copies, and then to 100% after doing all the Sync, for example

But then on most systems the progress bar will just lock at 50% for a long time and jump to 100% immediately, or it will jump from 50 to 100 without any delay. Neither is a good idea.

The only different way to solve this problem that seems acceptable to me would be doing write+sync in one stage in one thread, with reads in a separate thread. But in my experience, again, reads are so much faster than writes that the architectural modifications to support this are likely not worth making.

comment:12 by pulkomandy, 20 months ago

But then on most systems the progress bar will just lock at 50% for a long time and jump to 100% immediately, or it will jump from 50 to 100 without any delay. Neither is a good idea.

No, that isn't what would happen in that situation.

In the second part of the copy you would do a lot of file.Sync() on all the files. If anything I expect this second part to be slower than the first, not faster. But you can keep track of how many files you have already synced, and if you also keep track of their size, actually you can probably have a more reliable progress reporting in that part than in the first part of the copy, which doesn't yet know the filesizes and computes progress only in term of number of files copied.

The only different way to solve this problem that seems acceptable to me would be doing write+sync in one stage in one thread, with reads in a separate thread.

No, doing it that way would result in complex code because then you need to send the date from one thread to the other. Which means the threads will be waiting for each other in some way and you are essentially re-inventing the file cache in between them.

You want read+write in one thread, and just fsync in the other. This way the only thing you need to communicate between the two threads is "I finished writing this file, please sync and close it".

But in my experience, again, reads are so much faster than writes that the architectural modifications to support this are likely not worth making.

Yes, read on SSD are faster. That's why we want a thread to do read+write: the read operation will be fast and the write operation will also be fast because it's just reading and writing to the file and/or block cache. And then we want another thread doing the sync and actually writing the data to disk.

In the extreme case, this means the read/write thread is CPU bound since it only deals with the cache and the very fast read operations, and the syncing thread is IO bound since it is the one monitoring actual writes to disk.

Also this keeps the reads and writes in the same thread, which means we don't have a lot of "in flight" data to worry about. If you separate the read and write into different threads, what you end up doing is moving the data caching into userspace. Which is not what we want, we just want to make the best possible use of the already available kernel-side caching.

Another way to see this is that we have one thread scheduling read and write operations as fast as it can, and another waiting for the completion of these operations using Sync and tracking the progress. This is not fully asynchronous io, but it is reasonably close and sufficient for this use case, and at a much lower cost in terms of complexity.

Note: See TracTickets for help on using tickets.