Opened 11 years ago

Closed 2 years ago

#10336 closed bug (fixed)

TRIM / fstrim can destroy data on SSD's when executed

Reported by: kallisti5 Owned by: axeld
Priority: high Milestone: R1/beta4
Component: Drivers/Disk Version: R1/Development
Keywords: TRIM fstrim Cc:
Blocked By: Blocking:
Platform: All

Description

fstrim fails to function on OCZ Agility 3. May be due to the ranges being too large for the SSD / AHCI implementation.

Attachments (4)

IMG_20140204_232334.jpg (202.5 KB ) - added by kallisti5 11 years ago.
as of hrev46819 shortly before everything locks up and bursts into flames.
trim-oczagility.txt (16.3 KB ) - added by kallisti5 11 years ago.
hrev46931
previous_syslog (241.3 KB ) - added by Giova84 7 years ago.
syslog with info about KDL and fstrim
syslog (385.9 KB ) - added by Giova84 7 years ago.

Download all attachments as: .zip

Change History (75)

comment:1 by kallisti5, 11 years ago

First attempt:

KERN: bfs: mounted "Data" (root node at 524288, device = /dev/disk/scsi/0/0/0/1)
KERN: [ACPI Debug]  String KERN: [0x1A] "_Q80 : Temperature Up/Down"
KERN: [ACPI Debug]  String KERN: [0x1A] "_Q80 : Temperature Up/Down"
KERN: slab memory manager: created area 0xd3001000 (8514)
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000001, is 0x40000001, ci 0x00000001
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: Task File Error
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: sata_request::finish ATA command 0x06 failed
KERN: ahci: sata_request::finish status 0x51, error 0x04
KERN: ahci: trim failed (179 ranges)!

Second attempt:

KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000001, is 0x40000001, ci 0x00000001
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: Task File Error
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: sata_request::finish ATA command 0x06 failed
KERN: ahci: sata_request::finish status 0x51, error 0x04
KERN: ahci: trim failed (179 ranges)!

comment:2 by axeld, 11 years ago

Owner: changed from nobody to axeld
Status: newin-progress

AHCI dumps some info about what the disk supports upon boot. Would be nice to have this info included here. In any case, the range limit is not yet correctly implemented; I'm very slowly working on that, I just haven't found much development time lately, and for this I do need some contiguous time span.

comment:3 by kallisti5, 11 years ago

Yup. Wasn't poking you too much, just wanted to get the current issues on paper :-)

KERN: ahci: generic AHCI controller found! vendor 0x1022, device 0x7804
KERN: ahci: ahci_register_device
KERN: ahci: ahci_init_driver
KERN: ahci: ahci_sim_init_bus
KERN: ahci: ahci_sim_init_bus: pciDevice 0x82b6b360
KERN: ahci: AHCIController::Init 0:17:0 vendor 1022, device 7804
KERN: ahci: PCI SATA capability found at offset 0x70
KERN: ahci: satacr0 = 0x00100012, satacr1 = 0x0000000f
KERN: ahci: pcicmd old 0x0007
KERN: ahci: pcicmd new 0x0006
KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 24
KERN: msi_allocate_vectors: allocated 1 vectors starting from 24
KERN: msi enabled: 0x0089
KERN: ahci: using MSI vector 24
KERN: ahci: registers at 0xf034e000, size 0x800
KERN: ahci: mapping physical address 0xf034e000 with 2048 bytes for AHCI HBA regs
KERN: add_memory_type_range(672, 0xf034e000, 0x1000, 0)
KERN: ahci: physical = 0xf034e000, virtual = 0x81bfc000, offset = 0, phyadr = 0xf034e000, mapadr = 0x81bfc000, size = 4096, area = 0x000002a0
KERN: ahci: cap: Interface Speed Support: generation 3
KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f)
KERN: ahci: cap: Number of Ports: 2 (raw 0x1)
KERN: ahci: cap: Supports Port Multiplier: yes
KERN: ahci: cap: Supports External SATA: no
KERN: ahci: cap: Enclosure Management Supported: no
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports Staggered Spin-up: no
KERN: ahci: cap: Supports Mechanical Presence Switch: yes
KERN: ahci: cap: Supports 64-bit Addressing: yes
KERN: ahci: cap: Supports Native Command Queuing: yes
KERN: ahci: cap: Supports SNotification Register: yes
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports AHCI mode only: no
KERN: ahci: ghc: AHCI Enable: yes
KERN: ahci: Ports Implemented Mask: 0x000003
KERN: ahci: Number of Available Ports: 2
KERN: ahci: AHCI Version 1.0
KERN: ahci: Interrupt 24
KERN: ahci: AHCIPort::Init1 port 0
KERN: ahci: allocating 4096 bytes for AHCI port 0
KERN: ahci: area = 673, size = 4096, virt = 0x81bfd000, phy = 0xa0d4000
KERN: ahci: PRD table is at 0x81bfd580
KERN: ahci: AHCIPort::Init1 port 1
KERN: ahci: allocating 4096 bytes for AHCI port 1
KERN: ahci: area = 674, size = 4096, virt = 0x81bfe000, phy = 0xa0d3000
KERN: ahci: PRD table is at 0x81bfe580
KERN: ahci: AHCIPort::Init2 port 0
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
rt port 0, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040d0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: ie   0x7dc0007f
KERN: ahci: is   0x00000000
KERN: ahci: cmd  0x0000e017
KERN: ahci: ssts 0x000KERN: 00133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: tfd  0x00000150
KERN: ahci: AHCIPort::Init2 port 1
KERN: ahci: AHCIPort::ResetPort port 1
KERN: ahci: AHCIPort::ResetPort port 1, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040c0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040c0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400040, ci 0x00000000
KERN: ahci: ssts 0x00000113
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040d0000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::PostReset port 1
KERN: ahci: device signature 0xeb140101 (ATAPI)
KERN: ahci: ie   0x7dc0007f
KERN: ahci: is   0x00000000
KERN: ahci: cmd  0x0100e017
KERN: ahci: ssts 0x00000113
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: tfd  0x00000100
KERN: ahci: cookie = 0x8280b900
KERN: ahci: ahci_path_inquiry, cookie 0x8280b900
Last message repeated 1 time
KERN: ahci: ahci_scan_bus, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiTestUnitReady port 0
KERN: ahci: AHCIPort::ScsiInquiry port 0
KERN: ahci: lba 1, lba48 1, fUse48BitCommands 1, sectors 117231408, sectors48 117231408, size 60022480896
KERN: ahci: trim supported, 1 ranges blocks, reads are deterministic, random.
KERN: ahci: model number: OCZ-AGILITY3                            
KERN: ahci: serial number: OCZ-X78XWFG4D28DS609
KERN: ahci: firmware rev.: 2.15    
KERN: ahci: trim support: yes
KERN: ahci: sg_memcpy phyAddr 0x253442c, size 96
KERN: ahci: ahci_get_restrictions, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiGetRestrictions port 0: isATAPI 0, noAutoSense 0, maxBlocks 65536
KERN: publish device: node 0x82b59d00, path disk/scsi/0/0/0/raw, module drivers/disk/scsi/scsi_disk/device_v1
KERN: ahci: ahci_get_restrictions, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiGetRestrictions port 1: isATAPI 1, noAutoSense 1, maxBlocks 256
KERN: publish device: node 0x82b59c60, path disk/scsi/0/1/0/raw, module drivers/disk/scsi/scsi_cd/device_v1
KERN: ata 0: controller doesn't support DMA, disabling
KERN: ata 0: _DevicePresent: device selection failed for device 0
KERN: ata 0: _DevicePresent: device 1, presence 0
KERN: ata 0: deviceMask 0
KERN: ata 0: ignoring device 0
KERN: ata 0: ignoring device 1
KERN: ata 0 error: target device not present
Last message repeated 1 time
KERN: ata 0 error: invalid target device
KERN: Last message repeated 12 times.
KERN: ata 1: controller doesn't support DMA, disabling
KERN: ata 1: _DevicePresent: device selection failed for device 0
KERN: ata 1: _DevicePresent: device 1, presence 0
KERN: ata 1: deviceMask 0
KERN: ata 1: ignoring device 0
KERN: ata 1: ignoring device 1
KERN: ata 1 error: target device not present
Last message repeated 1 time
KERN: ata 1 error: invalid target device
KERN: Last message repeated 12 times.
KERN: KDiskDeviceManager::_Scan(/dev/disk/scsi)
i: tfd  0x00000100
KERN: ahci: cookie = 0x8280b900
KERN: ahci: ahci_path_inquiry, cookie 0x8280b900

comment:4 by kallisti5, 11 years ago

it works as of hrev46819!

And by works I mean it erases everything on my SSD. :-\ Screenshot attached

The partition table is gone, so it definitely trims the first 512 bytes of the disk.

Last edited 11 years ago by kallisti5 (previous) (diff)

by kallisti5, 11 years ago

Attachment: IMG_20140204_232334.jpg added

as of hrev46819 shortly before everything locks up and bursts into flames.

comment:5 by anevilyak, 11 years ago

I can't concur here, on my Corsair the results seem to remain as before, which is to say the partition is still usable/readable, but there's no obvious indicator as to whether the trim operation actually succeeded. Steps:

  • unmount partition
  • dd if=/dev/zero
  • mkbfs
  • mount
  • verify blocks are still generally zeroed apart from basic filesystem structures via DiskProbe
  • fstrim
  • check blocks again - still zeroed. According to the drive specs, trimmed blocks should theoretically return deterministic random blocks on read, so this should theoretically rule out seeing straight zero blocks, but that's not currently the case.

comment:6 by kallisti5, 11 years ago

Milestone: R1R1/alpha5
Priority: normalblocker

we likely should fix this pre-alpha5 or remove fstrim from the alpha5 branch images.

Don't want users running it unknowingly and potentially erasing their data.

comment:7 by kallisti5, 11 years ago

Just tried it on my OCZ Agility 3 again (hrev46931) (booted haiku from USB stick, did fstrim on BeFS / Haiku SSD filesystem mounted.

Got the attached syslog output before the system froze up.

Interesting...

KERN: [30] 65535 : 19708

A bunch of 65535's... that sounds like some kind of overflow.

by kallisti5, 11 years ago

Attachment: trim-oczagility.txt added

comment:8 by axeld, 11 years ago

I was obviously not really awake when I added that debug output. Not only did I write "3%" instead of "%3", I also mixed up block offset, and block length in the AHCI driver. 65535 is just the maximum number of blocks that the ATA spec allows there.

Anyway, those numbers look just right. However, they are far from complete. How did you receive those lines?

comment:9 by kallisti5, 10 years ago

Milestone: R1/alpha5R1/beta1

Pushing to R1B1.

We need to include a blurb in the R1A5 release notes that trim functionalty is experimental and may eat your data.

comment:10 by axeld, 10 years ago

Milestone: R1/beta1R1/alpha5

Since this still require action for R1a5, I changed the milestone back, so that it is not forgotten.

It might make more sense to remove the fstrim command unless it gets more testing before the release.

comment:11 by axeld, 10 years ago

FWIW I managed to duplicate kallisti5's issue using VirtualBox. Since version 4.2 it can use trimming to shrink dynamically sized VDIs, one just have to enable it manually using: $ vboxmanage storageattach Haiku --storagectl "SATA Controller" --port 0 --discard on

Where "Haiku" needs to be replaced by your VM name, and you may also need to specify a different port.

The result after an fstrim command is an unbootable system, hooray! I'll look into it over the next few months ;-)

comment:12 by pulkomandy, 10 years ago

Milestone: R1/alpha5R1/beta1

comment:13 by kallisti5, 9 years ago

Keywords: TRIM fstrim added
Summary: TRIM fails on OCZ Agility 3TRIM / fstrim can destroy data on SSD's when executed

comment:14 by axeld, 9 years ago

I've looked into this, but after having compiled my own version of VirtualBox that adds debugging output to the trimming, I can't reproduce it anymore at all -- it works like a charm over here.

Does anyone else feels like trying again on real hardware? ;-)

comment:15 by pulkomandy, 9 years ago

Last time I tried, it didn't seem to destroy any data for me, but it was very slow with the disk spending a lot of time handling each trim command sent, and ultimately replying with failure. SSDs of small capacity should be cheap and easy to come by these days, I would suggest dedicating one to testing purposes?

comment:16 by axeld, 9 years ago

It should be slow ATM, as we use the synchronous version of the trim command (the queueable one didn't exist back then, but we don't support command queuing anyway). But since you currently have to manually issue the command, that shouldn't be much of an issue. Replying with failure is more of a problem, though.

Anyway, I don't think a single SSD will do the trick, anyway. We should test with a number of different ones to be sure it works as it should.

comment:17 by pulkomandy, 9 years ago

I tested this again on my machine (Intel SSDNow) on a partition with only data and it apparenly erased the whole disk, or at least the boot sector and part of the system partition (then I got a KDL before I could capture any logs).

Version 0, edited 9 years ago by pulkomandy (next)

comment:18 by kallisti5, 8 years ago

fstrim still "seems" to work on on VirtualBox as of hrev50590 / x86_64. (I even got zesty and did a while true; fstrim /boot; sleep 1; done)

I wonder if the AHCI work had any impact on real hardware TRIM?

comment:19 by kallisti5, 8 years ago

actually.. on reboot after issuing lots of fstrim commands, the OS no longer boots... seems like it definitely still has the potential to destroy data in virtual box or on real hardware.

comment:20 by pulkomandy, 8 years ago

In hrev50664 I added support for trim to our ramdisk device. This makes it possible to test the BFS code with a different disk driver. (and is also useful to release the RAM used by the ramdisk when space is free on the filesystem).

I did that (with a 8MB disk image) and did not manage to corrupt the filesystem yet. So either the bug is in the SCSI implementation of trim, or it needs a disk larger than 8MB to start having problems. Anyway, this makes it possible to test the BFS side of the code without running it on actual data.

comment:21 by kallisti5, 8 years ago

I'm not sure we should ship fstrim with R1 if there is a good chance of data loss. We may need to disable fstrim in the R1 branch unless this one is fixed.

comment:22 by pulkomandy, 8 years ago

As mentionned above, the command itself and the BFS logic were shown to work fine when using a RAM disk. If we remove anything, it would be the support in the ATA driver. I do plan to get back to this and try to fix the problems, I have a spare SSD to experiment with.

comment:23 by axeld, 8 years ago

Sure, if we don't get it ready in time, we should simply remove the fstrim command; it doesn't serve any purpose then anyway.

Since I could reproduce the issue with VirtualBox, I created a debug version of it (that gives me more insight what ends up in the device), to see what is going wrong. But of course, I didn't manage to reproduce it with that version anymore. Maybe it only happens when doing a bit more before trimming.

comment:24 by pulkomandy, 8 years ago

I have set up a machine for testing purposes. Thinkpad X200 with Kingston 60GB SSD, SV300S37A60G. I'm using a 3GB partition near the start of the disk, on which I installed Haiku. I am trimming the boot volume, without other activity happening.

So far I have not managed to corrupt the drive this way. However, the trim command will time out. The HDD led stays on for some time, but then the command is aborted.

port reset: port 0 undergoing COMRESET
ExecuteAtaRequest port 0: device timeout
sata_request::abort called for command 0x06
trim failed (64 ranges)!

It seems the command is simply taking too long to execute, and eventually the port is reset to "unlock" the situation. On this SSD, it seems to not have any effect (trimming a partition that was cleaned with dd if=/dev/zero first does not change its data). But, it could be that other drives/firmwares are much less happy about being reset while they are TRIMing stuff, and it could lead to loss of data if they don't handle their transactions properly?

I'm going to reduce the number of blocks to trim per command, so that it executes faster and does not time out.

comment:25 by pulkomandy, 8 years ago

I tried various things:

  • Always send only 1 range: same timeout
  • Reduce all ranges to only 1 sector: same timeout

I noticed that the command expects a number of "range blocks" to trim (a block is 512 byte, or 64 entries of 8 byte each). For the last command we send, there are less than 64 entries, and we send a shorter command block. Does that work? Or should we round the buffer to the next multiple of 512 bytes and fill it with zero? the spec says that unused entries in a block should have their "range" field set to 0. I'm wondering if this could cause the disk to interpret random data at the end of the buffer as trim commands, which would lead to erasing random areas of the disk.

But first, I need to understand why the command timeouts, even with small ranges. I do saw some disk sectors turning into 0xFF, so it is at least partially working.

comment:26 by Giova84, 7 years ago

Haiku hrev51346 gcc2h on a Samsung 850 EVO ssd (250 GiB with two 125 GiB partition - Haiku is on the 2nd partition).

When I run fstrim /boot Everything went fine, no errors/troubles occurred. Just for the curiosity: It takes 42 seconds to trim 24410886144 bytes, which are about 24 gigabytes: why? I just have 1.71 GiB of files on the Haiku partition.

comment:27 by kallisti5, 7 years ago

It does indeed work on some SSD's.

I would reboot and ensure all your data is safe before calling things 100% ok :-) Keep in mind trim can damage data on other partitions as well beyond Haiku. I've seen trim fail and corrupt data on OCZ and Sandisk SSD's

Giova84: Could you grab a syslog from that trim you executed and post it here? (/var/log/syslog.old if you've rebooted, otherwise /var/log/syslog)

comment:28 by axeld, 7 years ago

Not sure if you mean the size or the time it requires:

  1. Trimming is not necessarily a fast operation. That's why it's usually not done when actually deleting files, but later on as some kind of scrubbing service.
  2. Trimming clears free space, not used space. The more files there are on your partition, the less space is subject to trimming.

comment:29 by pulkomandy, 7 years ago

If the partition is 125GB big, and there is only 1.71GB used, then trim should clear about 123GB. So only 24GB sounds wrong? Where did these extra 100GB go?

comment:30 by axeld, 7 years ago

Oh, you're right Adrien, I overlooked that. In that case, I don't have an explanation without looking deeper. In any case, 42 seconds is very long, too.

comment:31 by Giova84, 7 years ago

To do some other test, yesterday, after download some zip files with source code, unzipping them on the disk, compiling them and then delete everything (well, everything could be about 10 MiB in total) I run fstrim again, and tooked about one minute to trim 244082240128 bytes (yes: about 244 gigabytes!): no data corruption occurred on both partition (on the first one there is Win 7 on NTFS, however).

After reading the Kallisti's suggestion, today I run fstrim again, and this time, after one minute, fstrim triggered a KDL about vm_page_fault and read_fault (sorry, ATM i don't have a camera to take pictures). After such KDL i was forced to force the reboot, since was impossible to properly exit from such KDL. At the next boot no corruption were present on the partition and as far as i can tell, no data were lost. However I have saved both logs (one of them talks about fstrim and the KDL), which I attach here.

by Giova84, 7 years ago

Attachment: previous_syslog added

syslog with info about KDL and fstrim

comment:32 by Giova84, 7 years ago

patch: 01

by Giova84, 7 years ago

Attachment: syslog added

comment:33 by Giova84, 7 years ago

PS: I noticed that when I run checkfs -c /boot after the fstrim command, the nodes value decrease. EG: before fstrim, when I run checkfs -c, it reports 7756 nodes; after fstrim it reports 7589 nodes. However, as I said, no data seems lost and checkfs give no errors. It means that the fstrim command properly works?

comment:34 by Giova84, 7 years ago

I'm keep on doing little tests.

After rebooted Haiku I deleted a zip file of about 5 MiB, then I run again fstrim: it immediately (not after some time) triggered the same KDL and again I had to force the reboot (using CTRL ALT CANC). At the next boot I was puzzled, because yesterday the fstrim command run fine without troubles, so I attempted again, but this time before of the fstrim I tried to run the sync command. I don't know if was just casuality, but now fstrim didn't triggered the KDL. Like yesterday it tooked 42 seconds to trim 24412811264 bytes.

Obviously I want avoid to damage the disk or the Haiku partition, as I want to avoid to lost or damage my data (for what is worth, checkfs gave no errors). After some readings on Google, I read that the manual fstrim command (I read some forums of Linux users, since they also have the fstrim command - and some people run fstrim using cron) usually must be run daily or weekly.

I'd like to properly maintain my SSD. So: how can I check if the fstrim command on Haiku really clears free space?

comment:35 by pulkomandy, 7 years ago

What I did to test this (but it is a destructive test): 1) with dd, clear a section of the partition with all 0xE5 (or some other value, or use data from /dev/random) 2) format the partition as bfs (only some sectors are modified) 3) run fstrim on the partition

If fstrim works properly, the fixed value used at step 1 should be gone from the sectors, and the default erased value of the disk should be there instead (usually 00 or FF). You can check this with DiskProbe.

Note that current SSDs do well even without trimming, there is a performance loss but not a lifetime reduction as it used to be.

comment:36 by Giova84, 7 years ago

Since a destructive test is the last thing that I would to do (well, if necessary, since I want to be sure that fstrim really works for me, I will do it: please explain me - step to step - how to with dd, clear a section of the partition with all 0xE5 and where looks). So I've tried another test, also if I'm not sure if is really reliable.

On Haiku partition I have a zip file for the mame emulator: such file is called cheat.dat: it contains some lines of text description, like "this is the cheat file. For more info visit the site www.mame.co.uk", plus more entries.

When i probe the Haiku partition (/dev/disk/scsi/0/0/0/1) with diskprobe and I look to the block 0xc8b7e6, in facts I can see the text content of such cheat.dat file.

Then I deleted the cheat.dat file, run sync and then fstrim (which trimmed 24458731520 bytes) and I analyzed again the /dev/disk/scsi/0/0/0/1 partition. At block 0xc8b7e6 there still was the content of the cheat.dat file.

I've done a reliable or an useless test? Please forgive me, but I'm not very expert.

comment:37 by pulkomandy, 7 years ago

Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.

in reply to:  37 comment:38 by anevilyak, 7 years ago

Replying to pulkomandy:

Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.

Is it actually required/guaranteed for the SSD controller to execute that command synchronously? Or for that matter, to physically erase the page at that point? Depending on the impl, it could conceivably simply mark the page as available for erasure internally, and not actually touch it until needed, but I'm not so familiar with the details of the specs.

comment:39 by pulkomandy, 7 years ago

According to Wikipedia:

There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command:

  • Non-deterministic TRIM: Each read command to the Logical block address (LBA) after a TRIM may return different data.
  • Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate.
  • Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero.

So, it depends on the disk and needs to be checked in the device identification.

comment:40 by axeld, 7 years ago

Also, it's not a good test unless you a) reboot (to make sure no one has the file open still), and b) have run checkfs on the partition, to make sure its space has really been freed.

comment:41 by Giova84, 7 years ago

After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.

in reply to:  41 comment:42 by anevilyak, 7 years ago

Replying to Giova84:

After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.

The only time checkfs frees up space is hypothetically in the case of a power loss or other catastrophic crash that doesn't allow the filesystem to unmount cleanly. Under normal circumstances, space is freed automatically by filesystem operations, so checkfs won't have anything to report.

comment:43 by axeld, 7 years ago

That procedure would just make sure that the space is actually freed. Afterwards, you'd have to trim.

comment:44 by Giova84, 7 years ago

I've done the "ultimate" test: first of all I backup all my data on another partition (BeFS) on another disk.

Then I booted Haiku live CD (hrev51346 gcc2h) and there I opened DiskProbe to probe the SSD Haiku's partition, and the block (0xc8b7e6) still showed the content of the file file which I deleted. Then I run again fstrim on the SSD partition (still from the live cd) and I triggered again the same KDL; I rebooted in the live cd and before to run again fstrim, I run the sync command: so fstrim did the job (4749070336 bytes in 15 seconds). I rebooted again in the live cd and checked again the SSD partition with DiskProbe: the block 0xc8b7e6 still had the content of deleted file.

Then I deleted and made from scratch the BeFS partition on the SSD: i rebooted again the live cd, checked again the SSD partition with DiskProbe and block 0xc8b7e6 still contained the content of my previous installation: after a deep check also all the content of my text files were still present on the empty disk despite various fstrim.

comment:45 by axeld, 7 years ago

I'm afraid it's not an ultimate test either: the drive combines several blocks together as an "erase block". AFAIK this is about 1.5 MB on the EVO. This means, that if within this 1.5 MB block, BFS could not erase just a single disk block (4K), the drive cannot erase it just yet. So you might just have hit such a situation.

But anyway, until trim does work reliable, it won't be part of the release.

comment:46 by Giova84, 7 years ago

You're right, Axel. However I have some more info about. I've bought this SSD on the 6 August, and I've checked, day to day, the S.M.A.R.T status and was always OK. I can surely say that after the various KDLs which I encountered on Haiku with fstrim, caused the C7 SMART error "CRC Error" and EB "POR Recovery Count" which usually are errors which occurs when the system doesn't shutdown cleanly or when the SATA cable is poor. In facts when the KDLs occurred, I was forced to force the reboot using CTRL ALT CANC, and my SATA cable is good. Currently after all the KDLs which I encountered, the C7 counter, reached the value of 000000000008. (8) and the EB counter a value of 4: in facts I had totally 4 KDLs. When I cleanly shutdown or reboot, these values don't increase.

However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!

POST EDITED

Last edited 7 years ago by Giova84 (previous) (diff)

comment:47 by Giova84, 7 years ago

However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!

Sorry, I wasn't been clear: I meant to say if there would be an alternative way to trim the SSD BeFS partition. As Pulkomandy said, is a matter of filesystem support, so I guess that also "Samsung Magician" can't run trim on the BeFS partition, because it doesn't know nothing about BeFS. As I've previously said - obviously - the fact of delete and make again the partition didn't solved the fact of free the space on the SSD. Someone know if there is an "universal" utility to execute the trim despite of the filesystem? I also read something about the fact that this could depends on the hardware controller of the drive. I ask all these questions because I still see all my old data (using DiskProbe) despite trimfs and the re-initialization of the partition, also because I have the habit to fill my Haiku partitions with a lot of data which I often delete.

comment:48 by pulkomandy, 7 years ago

patch: 10

comment:49 by pulkomandy, 7 years ago

patch: 0

Ouch, what happened to that previous_syslog file? It's filled with corrupt characters at the end. Did fstrim erase some memory?

comment:50 by Giova84, 7 years ago

I'm not able to tell you if fstrim erased some memory :-)

However I checked if trim properly works under Windows 7, by following these instructions: http://www.win-raid.com/t24f34-Easy-TRIM-test-methods.html (at the "B. Easy and very effective TRIM test by using a Hex Editor" paragraph). And here my SSD has been properly trimmed; so fstrim seems that doesn't work - at least for me - under Haiku.

I tried to do the same test, again, under Haiku: I triggered again the same KDL - followed by a forced and unclean reboot - and the smart value C7 "CRC Error count" has increased of one number (from 8 to 9): I am totally sure that this occurs after the KDL when I am forced to reboot.

comment:51 by waddlesplash, 6 years ago

Milestone: R1/beta1Unscheduled
patch: 0
Priority: blockerhigh

SCSI TRIM disabled in hrev52134; removing from beta1.

comment:52 by pulkomandy, 6 years ago

I would have disabled it only in the beta1 branch. Maybe it's time to create the branch?

Last edited 6 years ago by pulkomandy (previous) (diff)

comment:53 by waddlesplash, 6 years ago

Why? It is known to be broken, and so until someone has time to fix it it does not make sense to leave it enabled, even on nightlies. And no, I'm holding off beta branch creation until we fix the other remaining blockers.

comment:54 by bitigchi, 4 years ago

Maybe a candidate for Beta 3? It's an important feature that continues to affect many more with each passing day, as SSDs are getting more and more common.

comment:55 by pulkomandy, 4 years ago

It's not really that important on modern SSDs, just a nice to have performance speedup.

Was it tested on NVMe? I think the problem is in the SATA driver so it should work there.

comment:56 by waddlesplash, 4 years ago

I never implemented the ioctl on NVMe due to concerns about whether or not it corrupts disks due to a BFS driver bug or due to a SATA driver bug, and I didn't feel like investigating with all the other NVMe bugs at the time. Maybe it should be revisited.

comment:57 by pulkomandy, 4 years ago

I had implemented it for ramdisks (freeing the memory to other things in the OS) and found no problems there, but I had not done a lot of testing. But given the failure pattern (things like the SSD not even being seen by the BIOS after a failed fstrim until a cold reboot) it seems very unlikely that the main problem is with BFS.

comment:58 by pulkomandy, 4 years ago

Hi,

Today I added support to the SD/MMC driver. I did a test on a mostly empty BFS filesystem on my SD card (which explains the rather large areas being trimmed, and there are few of them).

Here is the log of trimming:

KERN: TRIM FS:
KERN: [  0] 8884224 : 1064857600
KERN: [  1] 1073745920 : 1073737728
KERN: [  2] 2147487744 : 1623191552
KERN: mmc_disk: trim_device()
KERN: mmc_disk: trim 1064857600 bytes from 8884224
KERN: sdhci_pci: ExecuteCommand(32, 43c8)
KERN: sdhci_pci: ExecuteCommand(33, 200000)
KERN: sdhci_pci: ExecuteCommand(38, 1)
KERN: mmc_disk: trim 1073737728 bytes from 1073745920
KERN: sdhci_pci: ExecuteCommand(32, 200008)
KERN: sdhci_pci: ExecuteCommand(33, 400000)
KERN: sdhci_pci: ExecuteCommand(38, 1)
KERN: mmc_disk: trim 1623191552 bytes from 2147487744
KERN: sdhci_pci: ExecuteCommand(32, 400008)
KERN: sdhci_pci: ExecuteCommand(33, 706000)
KERN: sdhci_pci: ExecuteCommand(38, 1)

And here is the log of trying to unmount then remount the partition:

KERN: sdhci_pci: Read 1024 bytes at 4194304
KERN: sdhci_pci: ExecuteCommand(18, 2000)
KERN: sdhci_pci: Read 4096 bytes at 4196352
KERN: sdhci_pci: ExecuteCommand(18, 2004)
KERN: sdhci_pci: Read 4096 bytes at 4200448
KERN: sdhci_pci: ExecuteCommand(18, 200c)
KERN: sdhci_pci: Read 2048 bytes at 1077936128 ***
KERN: sdhci_pci: ExecuteCommand(18, 202000)
bfs: KERN: inode at block 524288 corrupt!
sdhci_pci: Read 4096 bytes at 4204544
sdhci_pci: ExecuteCommand(18, 2014)
bfs: KERN: could not create root node!

I have annotated a line with ***. As you can see, BFS tries to read at an offset that was erased by the trimming. So it looks like there indeed is a problem in the BFS code or in the partitionning system manager (which I understand should translate BFS requests into offsets on the raw disk). The trimming seems to have worked: the data is no longer there, and BFS fails to mount the partition.

Then I made a test with 4 smaller partitions on the SD card, and ran fstrim on each of them. For all 4 of them, fstrim ends up erasing data about 4MB into the disk (so it always start erasing data that's in the first partition). It really looks like the partition start offset isn't taken into account?

Last edited 4 years ago by pulkomandy (previous) (diff)

comment:59 by pulkomandy, 4 years ago

So I dug a bit further into this...

I looked at the bfs code and confirmed that it uses read_pos and write_pos to access the partition expecting that the partition device will do the translation. For example, reading at offset 0 gets the superblock.

I then checked where this translation is done. It appears to be in src/kernel/device_manager/devfs.cpp using the translate_partition_access. However, this translation is currently not done for the B_TRIM_DEVICE ioctl. As a result, the trim is executed using the start of the disk as a reference point, instead of the start of the partition. And, quite possibly, one ends up erasing the partition table, or in general, things that should not have been erased.

This also explains why I had no problems when testing with the ramdisk: I had not created a partition table there, and the offset between the filesystem and the disk did, in fact, match. I suspect when Axel tested on virtualbox, he used a similar setup?

comment:61 by pulkomandy, 4 years ago

I'm looking at the fstrim sourecode and I'm wondering... @kallisti5, do you remember how you used it?

It appears fstrim just opens a path and sends a B_TRIM_DEVICE for the range 0 to uint64_max. Apparently, if you do that on the mount point (fstrim /boot), the ioctl will be handled by bfs, which will then trim only the relevant parts of the underlying disk partition. However, if you run it on the disk device (fstrim /dev/disk/...), it will bypass the filesystem part, and simply erase the whole partition. Combined with the previous bug (see my above comments), it would in fact erase the start of the disk, for a size equal to the partition size.

Is there any chance that's what you tried when you ended up with a completely erased disk?

Probably we should guard against that?

comment:62 by waddlesplash, 4 years ago

TRIM seems to be a nonsensical command to send to raw devices directly instead of through a filesystem, so yes, we should probably prohibit that.

comment:63 by pulkomandy, 4 years ago

Well it can make sense if you want to reset a whole disk, for example DriveSetup could do that when creating a new partition table on a disk. But probably it shouldn't be done from the fstrim command, at least not with an extra flag to enable that behavior.

in reply to:  24 comment:64 by pulkomandy, 4 years ago

Replying to pulkomandy:

I have set up a machine for testing purposes. Thinkpad X200 with Kingston 60GB SSD, SV300S37A60G. I'm using a 3GB partition near the start of the disk, on which I installed Haiku. I am trimming the boot volume, without other activity happening.

So far I have not managed to corrupt the drive this way. However, the trim command will time out. The HDD led stays on for some time, but then the command is aborted.

port reset: port 0 undergoing COMRESET
ExecuteAtaRequest port 0: device timeout
sata_request::abort called for command 0x06
trim failed (64 ranges)!

Found out that I still have that particular SSD around (now in a different machine). Same problem still happens as of hrev54949 (after re-enabling trim support in the AHCI driver).

I increased the timeout from 20 to 2000 seconds (just in case the problem was indeed that we didn't wait long enough). It stil timed out, and then I got a KDL, null pointer dereference in BFS BlockAllocator::Trim.

comment:65 by dasebek, 3 years ago

For me, TRIM support is especially useful in virtual machines with dynamically allocated storage. I use KVM with Virtio-SCSI storage. Currently, there are some pieces missing in the SCSI code that are necessary to make TRIM work properly. For example, reading VPD pages of the device to get information about the maximum supported size of an unmapped block, the correct SCSI operation to use (unmap/writesame 16/writesame 10), etc.

I will have some time later this week and next week, so I am planning to review the existing code and implement some of the missing features.

Last edited 3 years ago by dasebek (previous) (diff)

comment:66 by waddlesplash, 3 years ago

If you are using TRIM in a VM, you may find it much easier to just implement it in the NVMe driver and use that instead of SCSI. I think the underlying libnvme already supports it, so the relevant ioctl just needs to be wired up inside nvme_disk.

comment:67 by dasebek, 3 years ago

Here is an update on my effort to improve SCSI trim support. I think I now have working trim support on SCSI and SATA drives, plus some other minor fixes. I need to do more testing, but so far it works reliably. I am hoping to submit my code for review later this week.

comment:68 by kallisti5, 3 years ago

nice! Keep in mind @dasebek you're running a little late for this to be included in R1/Beta3.

Excited to see the patch in review.haiku-os.org :-)

comment:69 by dasebek, 3 years ago

Here are my improvements to bin/fstrim:

https://review.haiku-os.org/c/haiku/+/4154

and to bfs, devfs, scsi:

https://review.haiku-os.org/c/haiku/+/4155

https://review.haiku-os.org/c/haiku/+/4156

https://review.haiku-os.org/c/haiku/+/4157

I tested it in a KVM virtual machine (both Virtio-SCSI and SATA) and on my old laptop's SSD (Samsung SSD 850 EVO 250GB SATA). I haven't hit any data corruption and the trim operation was always quick. Trimmed regions return zeros on subsequent reads and the size of a virtual machine's disk image shrinks as expected.

comment:70 by pulkomandy, 3 years ago

Patches merged in hrev55239. Please confirm the issue is fixed :)

comment:71 by waddlesplash, 2 years ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: in-progressclosed

I think we can consider this one fixed at this point; I also implemented NVMe TRIM in the meantime.

Note: See TracTickets for help on using tickets.