Opened 11 years ago
Closed 2 years ago
#10336 closed bug (fixed)
TRIM / fstrim can destroy data on SSD's when executed
Reported by: | kallisti5 | Owned by: | axeld |
---|---|---|---|
Priority: | high | Milestone: | R1/beta4 |
Component: | Drivers/Disk | Version: | R1/Development |
Keywords: | TRIM fstrim | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
fstrim fails to function on OCZ Agility 3. May be due to the ranges being too large for the SSD / AHCI implementation.
Attachments (4)
Change History (75)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → in-progress |
AHCI dumps some info about what the disk supports upon boot. Would be nice to have this info included here. In any case, the range limit is not yet correctly implemented; I'm very slowly working on that, I just haven't found much development time lately, and for this I do need some contiguous time span.
comment:3 by , 11 years ago
Yup. Wasn't poking you too much, just wanted to get the current issues on paper :-)
KERN: ahci: generic AHCI controller found! vendor 0x1022, device 0x7804 KERN: ahci: ahci_register_device KERN: ahci: ahci_init_driver KERN: ahci: ahci_sim_init_bus KERN: ahci: ahci_sim_init_bus: pciDevice 0x82b6b360 KERN: ahci: AHCIController::Init 0:17:0 vendor 1022, device 7804 KERN: ahci: PCI SATA capability found at offset 0x70 KERN: ahci: satacr0 = 0x00100012, satacr1 = 0x0000000f KERN: ahci: pcicmd old 0x0007 KERN: ahci: pcicmd new 0x0006 KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 24 KERN: msi_allocate_vectors: allocated 1 vectors starting from 24 KERN: msi enabled: 0x0089 KERN: ahci: using MSI vector 24 KERN: ahci: registers at 0xf034e000, size 0x800 KERN: ahci: mapping physical address 0xf034e000 with 2048 bytes for AHCI HBA regs KERN: add_memory_type_range(672, 0xf034e000, 0x1000, 0) KERN: ahci: physical = 0xf034e000, virtual = 0x81bfc000, offset = 0, phyadr = 0xf034e000, mapadr = 0x81bfc000, size = 4096, area = 0x000002a0 KERN: ahci: cap: Interface Speed Support: generation 3 KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f) KERN: ahci: cap: Number of Ports: 2 (raw 0x1) KERN: ahci: cap: Supports Port Multiplier: yes KERN: ahci: cap: Supports External SATA: no KERN: ahci: cap: Enclosure Management Supported: no KERN: ahci: cap: Supports Command List Override: yes KERN: ahci: cap: Supports Staggered Spin-up: no KERN: ahci: cap: Supports Mechanical Presence Switch: yes KERN: ahci: cap: Supports 64-bit Addressing: yes KERN: ahci: cap: Supports Native Command Queuing: yes KERN: ahci: cap: Supports SNotification Register: yes KERN: ahci: cap: Supports Command List Override: yes KERN: ahci: cap: Supports AHCI mode only: no KERN: ahci: ghc: AHCI Enable: yes KERN: ahci: Ports Implemented Mask: 0x000003 KERN: ahci: Number of Available Ports: 2 KERN: ahci: AHCI Version 1.0 KERN: ahci: Interrupt 24 KERN: ahci: AHCIPort::Init1 port 0 KERN: ahci: allocating 4096 bytes for AHCI port 0 KERN: ahci: area = 673, size = 4096, virt = 0x81bfd000, phy = 0xa0d4000 KERN: ahci: PRD table is at 0x81bfd580 KERN: ahci: AHCIPort::Init1 port 1 KERN: ahci: allocating 4096 bytes for AHCI port 1 KERN: ahci: area = 674, size = 4096, virt = 0x81bfe000, phy = 0xa0d3000 KERN: ahci: PRD table is at 0x81bfe580 KERN: ahci: AHCIPort::Init2 port 0 KERN: ahci: AHCIPort::ResetPort port 0 KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 1 KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000 KERN: ahci: ssts 0x00000001 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04090000 KERN: ahci: sact 0x00000000 rt port 0, deviceBusy 0, forceDeviceReset 1 KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000 KERN: ahci: ssts 0x00000001 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04090000 KERN: ahci: sact 0x00000000 KERN: ahci: PhyReady Change KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000133 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x040d0000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::PostReset port 0 KERN: ahci: device signature 0x00000101 (ATA) KERN: ahci: ie 0x7dc0007f KERN: ahci: is 0x00000000 KERN: ahci: cmd 0x0000e017 KERN: ahci: ssts 0x000KERN: 00133 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x00000000 KERN: ahci: sact 0x00000000 KERN: ahci: tfd 0x00000150 KERN: ahci: AHCIPort::Init2 port 1 KERN: ahci: AHCIPort::ResetPort port 1 KERN: ahci: AHCIPort::ResetPort port 1, deviceBusy 0, forceDeviceReset 1 KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04090000 KERN: ahci: sact 0x00000000 KERN: ahci: PhyReady Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04090000 KERN: ahci: sact 0x00000000 KERN: ahci: PhyReady Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000000 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000001 KERN: ahci: sctl 0x00000301 KERN: ahci: serr 0x04080000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000001 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x040c0000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000 KERN: ahci: ssts 0x00000001 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x040c0000 KERN: ahci: sact 0x00000000 KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400040, ci 0x00000000 KERN: ahci: ssts 0x00000113 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x040d0000 KERN: ahci: sact 0x00000000 KERN: ahci: PhyReady Change KERN: ahci: Port Connect Change KERN: ahci: AHCIPort::PostReset port 1 KERN: ahci: device signature 0xeb140101 (ATAPI) KERN: ahci: ie 0x7dc0007f KERN: ahci: is 0x00000000 KERN: ahci: cmd 0x0100e017 KERN: ahci: ssts 0x00000113 KERN: ahci: sctl 0x00000300 KERN: ahci: serr 0x00000000 KERN: ahci: sact 0x00000000 KERN: ahci: tfd 0x00000100 KERN: ahci: cookie = 0x8280b900 KERN: ahci: ahci_path_inquiry, cookie 0x8280b900 Last message repeated 1 time KERN: ahci: ahci_scan_bus, cookie 0x8280b900 KERN: ahci: AHCIPort::ScsiTestUnitReady port 0 KERN: ahci: AHCIPort::ScsiInquiry port 0 KERN: ahci: lba 1, lba48 1, fUse48BitCommands 1, sectors 117231408, sectors48 117231408, size 60022480896 KERN: ahci: trim supported, 1 ranges blocks, reads are deterministic, random. KERN: ahci: model number: OCZ-AGILITY3 KERN: ahci: serial number: OCZ-X78XWFG4D28DS609 KERN: ahci: firmware rev.: 2.15 KERN: ahci: trim support: yes KERN: ahci: sg_memcpy phyAddr 0x253442c, size 96 KERN: ahci: ahci_get_restrictions, cookie 0x8280b900 KERN: ahci: AHCIPort::ScsiGetRestrictions port 0: isATAPI 0, noAutoSense 0, maxBlocks 65536 KERN: publish device: node 0x82b59d00, path disk/scsi/0/0/0/raw, module drivers/disk/scsi/scsi_disk/device_v1 KERN: ahci: ahci_get_restrictions, cookie 0x8280b900 KERN: ahci: AHCIPort::ScsiGetRestrictions port 1: isATAPI 1, noAutoSense 1, maxBlocks 256 KERN: publish device: node 0x82b59c60, path disk/scsi/0/1/0/raw, module drivers/disk/scsi/scsi_cd/device_v1 KERN: ata 0: controller doesn't support DMA, disabling KERN: ata 0: _DevicePresent: device selection failed for device 0 KERN: ata 0: _DevicePresent: device 1, presence 0 KERN: ata 0: deviceMask 0 KERN: ata 0: ignoring device 0 KERN: ata 0: ignoring device 1 KERN: ata 0 error: target device not present Last message repeated 1 time KERN: ata 0 error: invalid target device KERN: Last message repeated 12 times. KERN: ata 1: controller doesn't support DMA, disabling KERN: ata 1: _DevicePresent: device selection failed for device 0 KERN: ata 1: _DevicePresent: device 1, presence 0 KERN: ata 1: deviceMask 0 KERN: ata 1: ignoring device 0 KERN: ata 1: ignoring device 1 KERN: ata 1 error: target device not present Last message repeated 1 time KERN: ata 1 error: invalid target device KERN: Last message repeated 12 times. KERN: KDiskDeviceManager::_Scan(/dev/disk/scsi) i: tfd 0x00000100 KERN: ahci: cookie = 0x8280b900 KERN: ahci: ahci_path_inquiry, cookie 0x8280b900
comment:4 by , 11 years ago
it works as of hrev46819!
And by works I mean it erases everything on my SSD. :-\ Screenshot attached
by , 11 years ago
Attachment: | IMG_20140204_232334.jpg added |
---|
as of hrev46819 shortly before everything locks up and bursts into flames.
comment:5 by , 11 years ago
I can't concur here, on my Corsair the results seem to remain as before, which is to say the partition is still usable/readable, but there's no obvious indicator as to whether the trim operation actually succeeded. Steps:
- unmount partition
- dd if=/dev/zero
- mkbfs
- mount
- verify blocks are still generally zeroed apart from basic filesystem structures via DiskProbe
- fstrim
- check blocks again - still zeroed. According to the drive specs, trimmed blocks should theoretically return deterministic random blocks on read, so this should theoretically rule out seeing straight zero blocks, but that's not currently the case.
comment:6 by , 11 years ago
Milestone: | R1 → R1/alpha5 |
---|---|
Priority: | normal → blocker |
we likely should fix this pre-alpha5 or remove fstrim from the alpha5 branch images.
Don't want users running it unknowingly and potentially erasing their data.
comment:7 by , 11 years ago
Just tried it on my OCZ Agility 3 again (hrev46931) (booted haiku from USB stick, did fstrim on BeFS / Haiku SSD filesystem mounted.
Got the attached syslog output before the system froze up.
Interesting...
KERN: [30] 65535 : 19708
A bunch of 65535's... that sounds like some kind of overflow.
comment:8 by , 11 years ago
I was obviously not really awake when I added that debug output. Not only did I write "3%" instead of "%3", I also mixed up block offset, and block length in the AHCI driver. 65535 is just the maximum number of blocks that the ATA spec allows there.
Anyway, those numbers look just right. However, they are far from complete. How did you receive those lines?
comment:9 by , 10 years ago
Milestone: | R1/alpha5 → R1/beta1 |
---|
Pushing to R1B1.
We need to include a blurb in the R1A5 release notes that trim functionalty is experimental and may eat your data.
comment:10 by , 10 years ago
Milestone: | R1/beta1 → R1/alpha5 |
---|
Since this still require action for R1a5, I changed the milestone back, so that it is not forgotten.
It might make more sense to remove the fstrim command unless it gets more testing before the release.
comment:11 by , 10 years ago
FWIW I managed to duplicate kallisti5's issue using VirtualBox. Since version 4.2 it can use trimming to shrink dynamically sized VDIs, one just have to enable it manually using:
$ vboxmanage storageattach Haiku --storagectl "SATA Controller" --port 0 --discard on
Where "Haiku" needs to be replaced by your VM name, and you may also need to specify a different port.
The result after an fstrim command is an unbootable system, hooray! I'll look into it over the next few months ;-)
comment:12 by , 10 years ago
Milestone: | R1/alpha5 → R1/beta1 |
---|
comment:13 by , 9 years ago
Keywords: | TRIM fstrim added |
---|---|
Summary: | TRIM fails on OCZ Agility 3 → TRIM / fstrim can destroy data on SSD's when executed |
comment:14 by , 9 years ago
I've looked into this, but after having compiled my own version of VirtualBox that adds debugging output to the trimming, I can't reproduce it anymore at all -- it works like a charm over here.
Does anyone else feels like trying again on real hardware? ;-)
comment:15 by , 9 years ago
Last time I tried, it didn't seem to destroy any data for me, but it was very slow with the disk spending a lot of time handling each trim command sent, and ultimately replying with failure. SSDs of small capacity should be cheap and easy to come by these days, I would suggest dedicating one to testing purposes?
comment:16 by , 9 years ago
It should be slow ATM, as we use the synchronous version of the trim command (the queueable one didn't exist back then, but we don't support command queuing anyway). But since you currently have to manually issue the command, that shouldn't be much of an issue. Replying with failure is more of a problem, though.
Anyway, I don't think a single SSD will do the trick, anyway. We should test with a number of different ones to be sure it works as it should.
comment:17 by , 9 years ago
I tested this again on my machine (Intel SSDSC2CT180A4). Soon after running fstrim on a test partition which resides on the same drive as my boot one, I hit a KDL in get_next_team_info. After a reboot using the reboot command, my SSD wouldn't be visible in the BIOS, nor in DriveSetup after I booted from an USB disk.
I did a cold reboot, and after that the SSD is back and no data was erased, at least.
A possible interpretation is that the commands we sent to the SSD managed to somehow confuse the firmware enough that it didn't reply to anything after that.
comment:18 by , 8 years ago
fstrim still "seems" to work on on VirtualBox as of hrev50590 / x86_64. (I even got zesty and did a while true; fstrim /boot; sleep 1; done)
I wonder if the AHCI work had any impact on real hardware TRIM?
comment:19 by , 8 years ago
actually.. on reboot after issuing lots of fstrim commands, the OS no longer boots... seems like it definitely still has the potential to destroy data in virtual box or on real hardware.
comment:20 by , 8 years ago
In hrev50664 I added support for trim to our ramdisk device. This makes it possible to test the BFS code with a different disk driver. (and is also useful to release the RAM used by the ramdisk when space is free on the filesystem).
I did that (with a 8MB disk image) and did not manage to corrupt the filesystem yet. So either the bug is in the SCSI implementation of trim, or it needs a disk larger than 8MB to start having problems. Anyway, this makes it possible to test the BFS side of the code without running it on actual data.
comment:21 by , 8 years ago
I'm not sure we should ship fstrim with R1 if there is a good chance of data loss. We may need to disable fstrim in the R1 branch unless this one is fixed.
comment:22 by , 8 years ago
As mentionned above, the command itself and the BFS logic were shown to work fine when using a RAM disk. If we remove anything, it would be the support in the ATA driver. I do plan to get back to this and try to fix the problems, I have a spare SSD to experiment with.
comment:23 by , 8 years ago
Sure, if we don't get it ready in time, we should simply remove the fstrim command; it doesn't serve any purpose then anyway.
Since I could reproduce the issue with VirtualBox, I created a debug version of it (that gives me more insight what ends up in the device), to see what is going wrong. But of course, I didn't manage to reproduce it with that version anymore. Maybe it only happens when doing a bit more before trimming.
follow-up: 64 comment:24 by , 8 years ago
I have set up a machine for testing purposes. Thinkpad X200 with Kingston 60GB SSD, SV300S37A60G. I'm using a 3GB partition near the start of the disk, on which I installed Haiku. I am trimming the boot volume, without other activity happening.
So far I have not managed to corrupt the drive this way. However, the trim command will time out. The HDD led stays on for some time, but then the command is aborted.
port reset: port 0 undergoing COMRESET ExecuteAtaRequest port 0: device timeout sata_request::abort called for command 0x06 trim failed (64 ranges)!
It seems the command is simply taking too long to execute, and eventually the port is reset to "unlock" the situation. On this SSD, it seems to not have any effect (trimming a partition that was cleaned with dd if=/dev/zero first does not change its data). But, it could be that other drives/firmwares are much less happy about being reset while they are TRIMing stuff, and it could lead to loss of data if they don't handle their transactions properly?
I'm going to reduce the number of blocks to trim per command, so that it executes faster and does not time out.
comment:25 by , 8 years ago
I tried various things:
- Always send only 1 range: same timeout
- Reduce all ranges to only 1 sector: same timeout
I noticed that the command expects a number of "range blocks" to trim (a block is 512 byte, or 64 entries of 8 byte each). For the last command we send, there are less than 64 entries, and we send a shorter command block. Does that work? Or should we round the buffer to the next multiple of 512 bytes and fill it with zero? the spec says that unused entries in a block should have their "range" field set to 0. I'm wondering if this could cause the disk to interpret random data at the end of the buffer as trim commands, which would lead to erasing random areas of the disk.
But first, I need to understand why the command timeouts, even with small ranges. I do saw some disk sectors turning into 0xFF, so it is at least partially working.
comment:26 by , 7 years ago
Haiku hrev51346 gcc2h on a Samsung 850 EVO ssd (250 GiB with two 125 GiB partition - Haiku is on the 2nd partition).
When I run fstrim /boot
Everything went fine, no errors/troubles occurred.
Just for the curiosity: It takes 42 seconds to trim 24410886144 bytes, which are about 24 gigabytes: why? I just have 1.71 GiB of files on the Haiku partition.
comment:27 by , 7 years ago
It does indeed work on some SSD's.
I would reboot and ensure all your data is safe before calling things 100% ok :-) Keep in mind trim can damage data on other partitions as well beyond Haiku. I've seen trim fail and corrupt data on OCZ and Sandisk SSD's
Giova84: Could you grab a syslog from that trim you executed and post it here? (/var/log/syslog.old if you've rebooted, otherwise /var/log/syslog)
comment:28 by , 7 years ago
Not sure if you mean the size or the time it requires:
- Trimming is not necessarily a fast operation. That's why it's usually not done when actually deleting files, but later on as some kind of scrubbing service.
- Trimming clears free space, not used space. The more files there are on your partition, the less space is subject to trimming.
comment:29 by , 7 years ago
If the partition is 125GB big, and there is only 1.71GB used, then trim should clear about 123GB. So only 24GB sounds wrong? Where did these extra 100GB go?
comment:30 by , 7 years ago
Oh, you're right Adrien, I overlooked that. In that case, I don't have an explanation without looking deeper. In any case, 42 seconds is very long, too.
comment:31 by , 7 years ago
To do some other test, yesterday, after download some zip files with source code, unzipping them on the disk, compiling them and then delete everything (well, everything could be about 10 MiB in total) I run fstrim again, and tooked about one minute to trim 244082240128 bytes (yes: about 244 gigabytes!): no data corruption occurred on both partition (on the first one there is Win 7 on NTFS, however).
After reading the Kallisti's suggestion, today I run fstrim again, and this time, after one minute, fstrim triggered a KDL about vm_page_fault and read_fault (sorry, ATM i don't have a camera to take pictures). After such KDL i was forced to force the reboot, since was impossible to properly exit from such KDL. At the next boot no corruption were present on the partition and as far as i can tell, no data were lost. However I have saved both logs (one of them talks about fstrim and the KDL), which I attach here.
comment:32 by , 7 years ago
patch: | 0 → 1 |
---|
by , 7 years ago
comment:33 by , 7 years ago
PS: I noticed that when I run checkfs -c /boot
after the fstrim
command, the nodes value decrease. EG: before fstrim
, when I run checkfs -c
, it reports 7756 nodes; after fstrim it reports 7589 nodes.
However, as I said, no data seems lost and checkfs give no errors.
It means that the fstrim command properly works?
comment:34 by , 7 years ago
I'm keep on doing little tests.
After rebooted Haiku I deleted a zip file of about 5 MiB, then I run again fstrim: it immediately (not after some time) triggered the same KDL and again I had to force the reboot (using CTRL ALT CANC). At the next boot I was puzzled, because yesterday the fstrim command run fine without troubles, so I attempted again, but this time before of the fstrim I tried to run the sync command. I don't know if was just casuality, but now fstrim didn't triggered the KDL. Like yesterday it tooked 42 seconds to trim 24412811264 bytes.
Obviously I want avoid to damage the disk or the Haiku partition, as I want to avoid to lost or damage my data (for what is worth, checkfs gave no errors). After some readings on Google, I read that the manual fstrim command (I read some forums of Linux users, since they also have the fstrim command - and some people run fstrim using cron) usually must be run daily or weekly.
I'd like to properly maintain my SSD. So: how can I check if the fstrim command on Haiku really clears free space?
comment:35 by , 7 years ago
What I did to test this (but it is a destructive test): 1) with dd, clear a section of the partition with all 0xE5 (or some other value, or use data from /dev/random) 2) format the partition as bfs (only some sectors are modified) 3) run fstrim on the partition
If fstrim works properly, the fixed value used at step 1 should be gone from the sectors, and the default erased value of the disk should be there instead (usually 00 or FF). You can check this with DiskProbe.
Note that current SSDs do well even without trimming, there is a performance loss but not a lifetime reduction as it used to be.
comment:36 by , 7 years ago
Since a destructive test is the last thing that I would to do (well, if necessary, since I want to be sure that fstrim really works for me, I will do it: please explain me - step to step - how to with dd, clear a section of the partition with all 0xE5 and where looks). So I've tried another test, also if I'm not sure if is really reliable.
On Haiku partition I have a zip file for the mame emulator: such file is called cheat.dat: it contains some lines of text description, like "this is the cheat file. For more info visit the site www.mame.co.uk", plus more entries.
When i probe the Haiku partition (/dev/disk/scsi/0/0/0/1) with diskprobe and I look to the block 0xc8b7e6, in facts I can see the text content of such cheat.dat file.
Then I deleted the cheat.dat file, run sync and then fstrim (which trimmed 24458731520 bytes) and I analyzed again the /dev/disk/scsi/0/0/0/1 partition. At block 0xc8b7e6 there still was the content of the cheat.dat file.
I've done a reliable or an useless test? Please forgive me, but I'm not very expert.
follow-up: 38 comment:37 by , 7 years ago
Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.
comment:38 by , 7 years ago
Replying to pulkomandy:
Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.
Is it actually required/guaranteed for the SSD controller to execute that command synchronously? Or for that matter, to physically erase the page at that point? Depending on the impl, it could conceivably simply mark the page as available for erasure internally, and not actually touch it until needed, but I'm not so familiar with the details of the specs.
comment:39 by , 7 years ago
According to Wikipedia:
There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command:
- Non-deterministic TRIM: Each read command to the Logical block address (LBA) after a TRIM may return different data.
- Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate.
- Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero.
So, it depends on the disk and needs to be checked in the device identification.
comment:40 by , 7 years ago
Also, it's not a good test unless you a) reboot (to make sure no one has the file open still), and b) have run checkfs on the partition, to make sure its space has really been freed.
follow-up: 42 comment:41 by , 7 years ago
After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.
comment:42 by , 7 years ago
Replying to Giova84:
After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.
The only time checkfs frees up space is hypothetically in the case of a power loss or other catastrophic crash that doesn't allow the filesystem to unmount cleanly. Under normal circumstances, space is freed automatically by filesystem operations, so checkfs won't have anything to report.
comment:43 by , 7 years ago
That procedure would just make sure that the space is actually freed. Afterwards, you'd have to trim.
comment:44 by , 7 years ago
I've done the "ultimate" test: first of all I backup all my data on another partition (BeFS) on another disk.
Then I booted Haiku live CD (hrev51346 gcc2h) and there I opened DiskProbe to probe the SSD Haiku's partition, and the block (0xc8b7e6) still showed the content of the file file which I deleted. Then I run again fstrim on the SSD partition (still from the live cd) and I triggered again the same KDL; I rebooted in the live cd and before to run again fstrim, I run the sync command: so fstrim did the job (4749070336 bytes in 15 seconds). I rebooted again in the live cd and checked again the SSD partition with DiskProbe: the block 0xc8b7e6 still had the content of deleted file.
Then I deleted and made from scratch the BeFS partition on the SSD: i rebooted again the live cd, checked again the SSD partition with DiskProbe and block 0xc8b7e6 still contained the content of my previous installation: after a deep check also all the content of my text files were still present on the empty disk despite various fstrim.
comment:45 by , 7 years ago
I'm afraid it's not an ultimate test either: the drive combines several blocks together as an "erase block". AFAIK this is about 1.5 MB on the EVO. This means, that if within this 1.5 MB block, BFS could not erase just a single disk block (4K), the drive cannot erase it just yet. So you might just have hit such a situation.
But anyway, until trim does work reliable, it won't be part of the release.
comment:46 by , 7 years ago
You're right, Axel. However I have some more info about. I've bought this SSD on the 6 August, and I've checked, day to day, the S.M.A.R.T status and was always OK. I can surely say that after the various KDLs which I encountered on Haiku with fstrim, caused the C7 SMART error "CRC Error" and EB "POR Recovery Count" which usually are errors which occurs when the system doesn't shutdown cleanly or when the SATA cable is poor. In facts when the KDLs occurred, I was forced to force the reboot using CTRL ALT CANC, and my SATA cable is good. Currently after all the KDLs which I encountered, the C7 counter, reached the value of 000000000008. (8) and the EB counter a value of 4: in facts I had totally 4 KDLs. When I cleanly shutdown or reboot, these values don't increase.
However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!
POST EDITED
comment:47 by , 7 years ago
However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!
Sorry, I wasn't been clear: I meant to say if there would be an alternative way to trim the SSD BeFS partition. As Pulkomandy said, is a matter of filesystem support, so I guess that also "Samsung Magician" can't run trim on the BeFS partition, because it doesn't know nothing about BeFS. As I've previously said - obviously - the fact of delete and make again the partition didn't solved the fact of free the space on the SSD. Someone know if there is an "universal" utility to execute the trim despite of the filesystem? I also read something about the fact that this could depends on the hardware controller of the drive. I ask all these questions because I still see all my old data (using DiskProbe) despite trimfs and the re-initialization of the partition, also because I have the habit to fill my Haiku partitions with a lot of data which I often delete.
comment:48 by , 7 years ago
patch: | 1 → 0 |
---|
comment:49 by , 7 years ago
patch: | 0 |
---|
Ouch, what happened to that previous_syslog file? It's filled with corrupt characters at the end. Did fstrim erase some memory?
comment:50 by , 7 years ago
I'm not able to tell you if fstrim erased some memory :-)
However I checked if trim properly works under Windows 7, by following these instructions: http://www.win-raid.com/t24f34-Easy-TRIM-test-methods.html (at the "B. Easy and very effective TRIM test by using a Hex Editor" paragraph). And here my SSD has been properly trimmed; so fstrim seems that doesn't work - at least for me - under Haiku.
I tried to do the same test, again, under Haiku: I triggered again the same KDL - followed by a forced and unclean reboot - and the smart value C7 "CRC Error count" has increased of one number (from 8 to 9): I am totally sure that this occurs after the KDL when I am forced to reboot.
comment:51 by , 6 years ago
Milestone: | R1/beta1 → Unscheduled |
---|---|
patch: | → 0 |
Priority: | blocker → high |
SCSI TRIM disabled in hrev52134; removing from beta1.
comment:52 by , 6 years ago
I would have disabled it only in the beta1 branch. Maybe it's time to create the branch?
comment:53 by , 6 years ago
Why? It is known to be broken, and so until someone has time to fix it it does not make sense to leave it enabled, even on nightlies. And no, I'm holding off beta branch creation until we fix the other remaining blockers.
comment:54 by , 4 years ago
Maybe a candidate for Beta 3? It's an important feature that continues to affect many more with each passing day, as SSDs are getting more and more common.
comment:55 by , 4 years ago
It's not really that important on modern SSDs, just a nice to have performance speedup.
Was it tested on NVMe? I think the problem is in the SATA driver so it should work there.
comment:56 by , 4 years ago
I never implemented the ioctl on NVMe due to concerns about whether or not it corrupts disks due to a BFS driver bug or due to a SATA driver bug, and I didn't feel like investigating with all the other NVMe bugs at the time. Maybe it should be revisited.
comment:57 by , 4 years ago
I had implemented it for ramdisks (freeing the memory to other things in the OS) and found no problems there, but I had not done a lot of testing. But given the failure pattern (things like the SSD not even being seen by the BIOS after a failed fstrim until a cold reboot) it seems very unlikely that the main problem is with BFS.
comment:58 by , 4 years ago
Hi,
Today I added support to the SD/MMC driver. I did a test on a mostly empty BFS filesystem on my SD card (which explains the rather large areas being trimmed, and there are few of them).
Here is the log of trimming:
KERN: TRIM FS: KERN: [ 0] 8884224 : 1064857600 KERN: [ 1] 1073745920 : 1073737728 KERN: [ 2] 2147487744 : 1623191552 KERN: mmc_disk: trim_device() KERN: mmc_disk: trim 1064857600 bytes from 8884224 KERN: sdhci_pci: ExecuteCommand(32, 43c8) KERN: sdhci_pci: ExecuteCommand(33, 200000) KERN: sdhci_pci: ExecuteCommand(38, 1) KERN: mmc_disk: trim 1073737728 bytes from 1073745920 KERN: sdhci_pci: ExecuteCommand(32, 200008) KERN: sdhci_pci: ExecuteCommand(33, 400000) KERN: sdhci_pci: ExecuteCommand(38, 1) KERN: mmc_disk: trim 1623191552 bytes from 2147487744 KERN: sdhci_pci: ExecuteCommand(32, 400008) KERN: sdhci_pci: ExecuteCommand(33, 706000) KERN: sdhci_pci: ExecuteCommand(38, 1)
And here is the log of trying to unmount then remount the partition:
KERN: sdhci_pci: Read 1024 bytes at 4194304 KERN: sdhci_pci: ExecuteCommand(18, 2000) KERN: sdhci_pci: Read 4096 bytes at 4196352 KERN: sdhci_pci: ExecuteCommand(18, 2004) KERN: sdhci_pci: Read 4096 bytes at 4200448 KERN: sdhci_pci: ExecuteCommand(18, 200c) KERN: sdhci_pci: Read 2048 bytes at 1077936128 *** KERN: sdhci_pci: ExecuteCommand(18, 202000) bfs: KERN: inode at block 524288 corrupt! sdhci_pci: Read 4096 bytes at 4204544 sdhci_pci: ExecuteCommand(18, 2014) bfs: KERN: could not create root node!
I have annotated a line with ***
. As you can see, BFS tries to read at an offset that was erased by the trimming. So it looks like there indeed is a problem in the BFS code or in the partitionning system manager (which I understand should translate BFS requests into offsets on the raw disk). The trimming seems to have worked: the data is no longer there, and BFS fails to mount the partition.
Then I made a test with 4 smaller partitions on the SD card, and ran fstrim on each of them. For all 4 of them, fstrim ends up erasing data about 4MB into the disk (so it always start erasing data that's in the first partition). It really looks like the partition start offset isn't taken into account?
comment:59 by , 4 years ago
So I dug a bit further into this...
I looked at the bfs code and confirmed that it uses read_pos and write_pos to access the partition expecting that the partition device will do the translation. For example, reading at offset 0 gets the superblock.
I then checked where this translation is done. It appears to be in src/kernel/device_manager/devfs.cpp using the translate_partition_access. However, this translation is currently not done for the B_TRIM_DEVICE ioctl. As a result, the trim is executed using the start of the disk as a reference point, instead of the start of the partition. And, quite possibly, one ends up erasing the partition table, or in general, things that should not have been erased.
This also explains why I had no problems when testing with the ramdisk: I had not created a partition table there, and the offset between the filesystem and the disk did, in fact, match. I suspect when Axel tested on virtualbox, he used a similar setup?
comment:61 by , 4 years ago
I'm looking at the fstrim sourecode and I'm wondering... @kallisti5, do you remember how you used it?
It appears fstrim just opens a path and sends a B_TRIM_DEVICE for the range 0 to uint64_max. Apparently, if you do that on the mount point (fstrim /boot), the ioctl will be handled by bfs, which will then trim only the relevant parts of the underlying disk partition. However, if you run it on the disk device (fstrim /dev/disk/...), it will bypass the filesystem part, and simply erase the whole partition. Combined with the previous bug (see my above comments), it would in fact erase the start of the disk, for a size equal to the partition size.
Is there any chance that's what you tried when you ended up with a completely erased disk?
Probably we should guard against that?
comment:62 by , 4 years ago
TRIM seems to be a nonsensical command to send to raw devices directly instead of through a filesystem, so yes, we should probably prohibit that.
comment:63 by , 4 years ago
Well it can make sense if you want to reset a whole disk, for example DriveSetup could do that when creating a new partition table on a disk. But probably it shouldn't be done from the fstrim command, at least not with an extra flag to enable that behavior.
comment:64 by , 4 years ago
Replying to pulkomandy:
I have set up a machine for testing purposes. Thinkpad X200 with Kingston 60GB SSD, SV300S37A60G. I'm using a 3GB partition near the start of the disk, on which I installed Haiku. I am trimming the boot volume, without other activity happening.
So far I have not managed to corrupt the drive this way. However, the trim command will time out. The HDD led stays on for some time, but then the command is aborted.
port reset: port 0 undergoing COMRESET ExecuteAtaRequest port 0: device timeout sata_request::abort called for command 0x06 trim failed (64 ranges)!
Found out that I still have that particular SSD around (now in a different machine). Same problem still happens as of hrev54949 (after re-enabling trim support in the AHCI driver).
I increased the timeout from 20 to 2000 seconds (just in case the problem was indeed that we didn't wait long enough). It stil timed out, and then I got a KDL, null pointer dereference in BFS BlockAllocator::Trim.
comment:65 by , 3 years ago
For me, TRIM support is especially useful in virtual machines with dynamically allocated storage. I use KVM with Virtio-SCSI storage. Currently, there are some pieces missing in the SCSI code that are necessary to make TRIM work properly. For example, reading VPD pages of the device to get information about the maximum supported size of an unmapped block, the correct SCSI operation to use (unmap/writesame 16/writesame 10), etc.
I will have some time later this week and next week, so I am planning to review the existing code and implement some of the missing features.
comment:66 by , 3 years ago
If you are using TRIM in a VM, you may find it much easier to just implement it in the NVMe driver and use that instead of SCSI. I think the underlying libnvme already supports it, so the relevant ioctl just needs to be wired up inside nvme_disk.
comment:67 by , 3 years ago
Here is an update on my effort to improve SCSI trim support. I think I now have working trim support on SCSI and SATA drives, plus some other minor fixes. I need to do more testing, but so far it works reliably. I am hoping to submit my code for review later this week.
comment:68 by , 3 years ago
nice! Keep in mind @dasebek you're running a little late for this to be included in R1/Beta3.
Excited to see the patch in review.haiku-os.org :-)
comment:69 by , 3 years ago
Here are my improvements to bin/fstrim:
https://review.haiku-os.org/c/haiku/+/4154
and to bfs, devfs, scsi:
https://review.haiku-os.org/c/haiku/+/4155
https://review.haiku-os.org/c/haiku/+/4156
https://review.haiku-os.org/c/haiku/+/4157
I tested it in a KVM virtual machine (both Virtio-SCSI and SATA) and on my old laptop's SSD (Samsung SSD 850 EVO 250GB SATA). I haven't hit any data corruption and the trim operation was always quick. Trimmed regions return zeros on subsequent reads and the size of a virtual machine's disk image shrinks as expected.
comment:71 by , 2 years ago
Milestone: | Unscheduled → R1/beta4 |
---|---|
Resolution: | → fixed |
Status: | in-progress → closed |
I think we can consider this one fixed at this point; I also implemented NVMe TRIM in the meantime.
First attempt:
Second attempt: