Opened 3 weeks ago
Last modified 39 hours ago
#19213 new bug
utimensat assigns arbitrary subsecond timestamps
Reported by: | bhaible | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | File Systems/BFS | Version: | R1/beta5 |
Keywords: | Cc: | axeld | |
Blocked By: | Blocking: | #14311, #19212, #19214, #19215 | |
Platform: | All |
Description
The utimensat() function, specified by POSIX https://pubs.opengroup.org/onlinepubs/9799919799/functions/utimensat.html, ought to set the access time and modification time of a file to the specified values. The values consist of an entire seconds part (tv_sec) and a subsecond part (tv_nsec).
On Haiku (I'm using a July 2024 snapshot) the subsecond part is ignored, and instead an arbitrary value is used. If this arbitrary value was the same for all files, it would be OK. But no, the arbitrary value are different!
This has the effect that files that should have the same modification time suddenly have different modification times. And thus, when I have a Makefile rule that generates one of the files from the other one, this Makefile rule will trigger, although it shouldn't.
How to reproduce: Save fu.c (attached).
$ gcc -Wall fu.c $ ./a.out $ ls -lrt --full-time dummy* -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.019922944 +0100 dummy0 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy6 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy5 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy4 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy3 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy2 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020185088 +0100 dummy1 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020447232 +0100 dummy9 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020447232 +0100 dummy8 -rw-r--r-- 1 user root 0 2024-10-30 16:48:30.020447232 +0100 dummy7
You can see that dummy0 and dummy9, for example, have different time stamps.
Attachments (2)
Change History (14)
by , 3 weeks ago
comment:1 by , 3 weeks ago
Blocking: | 19212 added |
---|
comment:2 by , 3 weeks ago
Blocking: | 19215 added |
---|
comment:3 by , 3 weeks ago
Blocking: | 19214 added |
---|
comment:4 by , 3 weeks ago
Component: | System/POSIX → File Systems/BFS |
---|---|
Owner: | changed from | to
This is due to logic in BFS that assigns different nsec values when no nsecs are specified. If you specify a nonzero nsec value, then you should wind up with that value set exactly.
comment:5 by , 3 weeks ago
Source code analysis:
utimensat is defined in src/system/libroot/posix/sys/utimes.c. The essential code for the modification time is:
if (times[1].tv_nsec != UTIME_OMIT) { mask |= B_STAT_MODIFICATION_TIME; if (times[1].tv_nsec != UTIME_NOW) { if (times[1].tv_nsec < 0 || times[1].tv_nsec > 999999999) RETURN_AND_SET_ERRNO(EINVAL); } stat.st_mtim = times[1]; } ... status = _kern_write_stat(fd, path, (flag & AT_SYMLINK_NOFOLLOW) == 0, &stat, sizeof(struct stat), mask);
For the Be file system, bfs_write_stat is defined in src/add-ons/kernel/file_systems/bfs/kernel_interface.cpp. There the code for the modification time is:
if (!inode->InLastModifiedIndex()) { // directory modification times are not part of the index node.last_modified_time = HOST_ENDIAN_TO_BFS_INT64(bfs_inode::ToInode(stat->st_mtim)); } else if (!inode->IsDeleted()) { // Index::UpdateLastModified() will set the new time in the inode Index index(volume); index.UpdateLastModified(transaction, inode, bfs_inode::ToInode(stat->st_mtim)); }
bfs_inode::ToInode is defined in src/add-ons/kernel/file_systems/bfs/bfs.h:
static int64 ToInode(bigtime_t time) { return ((time / 1000000) << INODE_TIME_SHIFT) + unique_from_nsec((time % 1000000) * 1000); } static int64 ToInode(const timespec& tv) { return ((int64)tv.tv_sec << INODE_TIME_SHIFT) + unique_from_nsec(tv.tv_nsec); }
and unique_from_nsec is in the same file:
/*! Converts the nano seconds given to the internal 16 bit resolution that BFS uses. If \a time is zero, 12 bits will get a monotonically increasing number. For all other values, only the lower 4 bits are changed this way. This is done to decrease the number of duplicate time values, which speeds up the way BFS handles the time indices. */ inline uint32 unique_from_nsec(uint32 time) { static vint32 number; if (time != 0) return (((time + 16383) >> 14) & INODE_TIME_MASK) | (++number & 0xf); return ++number & 0xfff; }
This means that this file system intentionally prevents a program from creating 10 files with the same time stamp.
It intentionally prevents GNU tar (and unzip etc.) from extracting files with correct time stamps.
This is fundamentally broken.
comment:6 by , 3 weeks ago
A correct way to add information for speed-up is to do so in a field that is not visible through 'ls -l --full-time'. It could be a separate field. Or the file system could store a time stamp with picoseconds, in such a way that the nanoseconds are visible through system call and the sub-nanosecond part (picoseconds modulo 1000) are hidden from the POSIX APIs.
comment:7 by , 3 weeks ago
Cc: | added |
---|
There appears to be some logic in here to try and avoid these randomization values showing back up in userland, but I suppose it only works when the time specified is nonzero in the first place.
We can't change the BFS on-disk format at this point, only what we store in it and how we process that information. So storing picoseconds won't be possible.
axeld, it looks like you introduced this code. Any ideas of how we might solve this?
comment:8 by , 3 weeks ago
This started with a file 'testdir2.tar.gz', that contains a file named aclocal.m4 and a file named Makefile.in. In the tarball they have the same time stamp. When extracted on Haiku (Be FS), the file aclocal.m4 is newer than Makefile.in. The build then tries to rebuild Makefile.in, by invoking automake-1.17. This should not happen. This Makefile rule is not meant to be triggered if the tarball was simply extracted and no file was modified.
comment:9 by , 3 weeks ago
Any ideas of how we might solve this?
Change the stat, fstat, lstat etc. system calls so that the lowest 16 bits of the st_crtim, st_mtim, st_ctim fields are cleared. These are the only bits that are affected by the unique_from_nsec part of
static int64 ToInode(const timespec& tv) { return ((int64)tv.tv_sec << INODE_TIME_SHIFT) + unique_from_nsec(tv.tv_nsec); }
comment:10 by , 3 weeks ago
The lowest 16 bits are where the nsec/usec are stored, if we clear them then they'll always be 0. This isn't what we want.
It appears the low 4 bits are the part randomized when time is nonzero. Maybe we can get away with only randomizing the lowest 4 bits all the time?
comment:11 by , 3 weeks ago
Are you willing to change the definition of the unique_from_nsec inline function?
If no, then clearing the lowest 16 bits is the only solution. This reduces the file system time stamp resolution to 1 second. Which is what the systems had in the 1980ies and 1990ies. It's not a high resolution, but it is POSIX compliant.
If yes, then you can decide how many of the bits represent actual time, and how many are randomization "for speeding up the time indices". For example 10 bits for time, and 6 bits of randomization. Clear the lowest 6 bits in the stat[at]() system calls. This would give a file system time stamp resolution to 1 millisecond. What is exactly the performance problem with the time indices, and how many bits are needed to mitigate it?
comment:12 by , 39 hours ago
Blocking: | 14311 added |
---|---|
Milestone: | Unscheduled → R1/beta6 |
We will probably need to change the definition of the function, yes.
test case fu.c