Opened 5 years ago

Last modified 10 months ago

#11117 new bug

FAT: issues with lowercase 8.3 filenames

Reported by: MatejHorvat Owned by: nobody
Priority: normal Milestone: R1
Component: File Systems/FAT Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

  1. Using an NT-based Windows such as Windows XP (this is crucial - see below), create a file with an entirely lowercase but 8.3 filename, such as "example.txt".
  2. Mount the FAT/FAT32 partition in Haiku for reading and writing.
  3. Open the file, modify it, save it, and close it.
  4. Try to open it again. You won't be able to do it. Additionally, if you used Pe to save it, it will complain with an error. This is a sign that something is wrong.
  5. Unmount the partition and mount it again.
  6. You will see that the file's name has been changed to uppercase (as of hrev47617). I have also seen files get corrupted (on alpha 4.1 at least), but I cannot reliably reproduce that.

My guess is that when recreating the file's directory entry, Haiku ignores the two flags that NT-based Windows uses to indicate lowercase 8.3 filenames. See: https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#VFAT_long_file_names ("If a filename contains only lowercase letters...")

Attachments (2)

0001-fat-correctly-read-lowercase-8.3-filenames.patch (3.7 KB) - added by MatejHorvat 5 years ago.
MixedCase.zip (2.4 KB) - added by MatejHorvat 4 years ago.
Disk image that exhibits the problem

Download all attachments as: .zip

Change History (15)

comment:1 Changed 5 years ago by diver

Keywords: FAT FAT32 lowercase removed
Milestone: R1/alpha5R1

comment:2 Changed 5 years ago by MatejHorvat

Has a Patch: set

comment:3 Changed 5 years ago by MatejHorvat

Here's a patch that at least makes it read (and write?) such filenames correctly. However, Pe still seems to trigger an edge case that makes it corrupt files.

comment:4 Changed 5 years ago by korli

toLower is of bool type: why don't you replace "& 0x18" with "& 0x8"?

comment:5 Changed 5 years ago by MatejHorvat

I don't understand what you mean. I changed it to a uint8 because bits 3 and 4 of the byte at offset 0xC of a directory entry are flags that specify whether the filename and/or its extension are lowercase (you can have a filename that is lowercase but has an uppercase extension, or the opposite). The msdos_to_utf8 function must therefore receive both flags to correctly convert the name.

comment:6 Changed 5 years ago by korli

Sorry, I misread the type, combined with the code formatting.

comment:7 Changed 5 years ago by pulkomandy

The variable name "toLower" used for an uint8 is confusing (we would expect a bool with that name). I would either pass two booleans to the function (lowerBase and lowerExtension for example) or find a better name for the variable (caseFlags maybe?)

A comment explaining what's going on and how the flags are used may be helpful, as there are a lot of magic numbers here (0xC, 0x10, 0x8, ...). These should be constants.

comment:8 Changed 4 years ago by MatejHorvat

While "toLower" could be changed, eliminating things like "buffer[0xC]" would require changes through the whole program to use structs (packed, and with endian conversion if required by the architecture) for directory entries. I could do that (and I think it should be done), but it would break my patch for #11120, so I humbly suggest merging the changes first and then cleaning everything up.

comment:9 Changed 4 years ago by pulkomandy

I suggested to replace the magic number by constants. So instead of writing buffer[0xc] you would write:

static const kWhateverOffset = 0xc;

Then later in the code:

buffer[kWhateverOffset] = x;

This makes it clear what the byte is used for, making the code easier to read. No need to convert to structures (which would not necesarily be a good idea as it could have endianness problems).

Changed 4 years ago by MatejHorvat

Attachment: MixedCase.zip added

Disk image that exhibits the problem

comment:10 Changed 4 years ago by MatejHorvat

Would const int (vs. #define) really be appropriate here? This is a mixture of C and C++ code.

Anyway, I have not had much progress with solving the problem, but here's a disk image that can be used to reproduce it (use "diskimage register", then mount it).

If you edit the files, unmount, and mount again, data will almost certainly be lost and the filenames of lower.UPP and UPPER.low will be converted to all capitals (Mixed.cas will remain as it is because it uses LFN entries).

If you use my patch, you will see the correct filenames before editing, but after editing, they will be uppercased regardless.

comment:11 Changed 3 years ago by waddlesplash

Yes, const uint8/uint16 as per style guide.

Have you made any more progress?

comment:12 Changed 10 months ago by pulkomandy

comment:13 Changed 10 months ago by pulkomandy

Has a Patch: unset
Note: See TracTickets for help on using tickets.