Opened 5 years ago

Last modified 5 months ago

#10336 in-progress bug

TRIM / fstrim can destroy data on SSD's when executed

Reported by: kallisti5 Owned by: axeld
Priority: high Milestone: Unscheduled
Component: Drivers/Disk Version: R1/Development
Keywords: TRIM fstrim Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

fstrim fails to function on OCZ Agility 3. May be due to the ranges being too large for the SSD / AHCI implementation.

Attachments (4)

IMG_20140204_232334.jpg (202.5 KB) - added by kallisti5 5 years ago.
as of hrev46819 shortly before everything locks up and bursts into flames.
trim-oczagility.txt (16.3 KB) - added by kallisti5 5 years ago.
hrev46931
previous_syslog (241.3 KB) - added by Giova84 16 months ago.
syslog with info about KDL and fstrim
syslog (385.9 KB) - added by Giova84 16 months ago.

Download all attachments as: .zip

Change History (57)

comment:1 Changed 5 years ago by kallisti5

First attempt:

KERN: bfs: mounted "Data" (root node at 524288, device = /dev/disk/scsi/0/0/0/1)
KERN: [ACPI Debug]  String KERN: [0x1A] "_Q80 : Temperature Up/Down"
KERN: [ACPI Debug]  String KERN: [0x1A] "_Q80 : Temperature Up/Down"
KERN: slab memory manager: created area 0xd3001000 (8514)
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000001, is 0x40000001, ci 0x00000001
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: Task File Error
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: sata_request::finish ATA command 0x06 failed
KERN: ahci: sata_request::finish status 0x51, error 0x04
KERN: ahci: trim failed (179 ranges)!

Second attempt:

KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000001, is 0x40000001, ci 0x00000001
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: Task File Error
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 0
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: sata_request::finish ATA command 0x06 failed
KERN: ahci: sata_request::finish status 0x51, error 0x04
KERN: ahci: trim failed (179 ranges)!

comment:2 Changed 5 years ago by axeld

Owner: changed from nobody to axeld
Status: newin-progress

AHCI dumps some info about what the disk supports upon boot. Would be nice to have this info included here. In any case, the range limit is not yet correctly implemented; I'm very slowly working on that, I just haven't found much development time lately, and for this I do need some contiguous time span.

comment:3 Changed 5 years ago by kallisti5

Yup. Wasn't poking you too much, just wanted to get the current issues on paper :-)

KERN: ahci: generic AHCI controller found! vendor 0x1022, device 0x7804
KERN: ahci: ahci_register_device
KERN: ahci: ahci_init_driver
KERN: ahci: ahci_sim_init_bus
KERN: ahci: ahci_sim_init_bus: pciDevice 0x82b6b360
KERN: ahci: AHCIController::Init 0:17:0 vendor 1022, device 7804
KERN: ahci: PCI SATA capability found at offset 0x70
KERN: ahci: satacr0 = 0x00100012, satacr1 = 0x0000000f
KERN: ahci: pcicmd old 0x0007
KERN: ahci: pcicmd new 0x0006
KERN: allocate_io_interrupt_vectors: allocated 1 vectors starting from 24
KERN: msi_allocate_vectors: allocated 1 vectors starting from 24
KERN: msi enabled: 0x0089
KERN: ahci: using MSI vector 24
KERN: ahci: registers at 0xf034e000, size 0x800
KERN: ahci: mapping physical address 0xf034e000 with 2048 bytes for AHCI HBA regs
KERN: add_memory_type_range(672, 0xf034e000, 0x1000, 0)
KERN: ahci: physical = 0xf034e000, virtual = 0x81bfc000, offset = 0, phyadr = 0xf034e000, mapadr = 0x81bfc000, size = 4096, area = 0x000002a0
KERN: ahci: cap: Interface Speed Support: generation 3
KERN: ahci: cap: Number of Command Slots: 32 (raw 0x1f)
KERN: ahci: cap: Number of Ports: 2 (raw 0x1)
KERN: ahci: cap: Supports Port Multiplier: yes
KERN: ahci: cap: Supports External SATA: no
KERN: ahci: cap: Enclosure Management Supported: no
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports Staggered Spin-up: no
KERN: ahci: cap: Supports Mechanical Presence Switch: yes
KERN: ahci: cap: Supports 64-bit Addressing: yes
KERN: ahci: cap: Supports Native Command Queuing: yes
KERN: ahci: cap: Supports SNotification Register: yes
KERN: ahci: cap: Supports Command List Override: yes
KERN: ahci: cap: Supports AHCI mode only: no
KERN: ahci: ghc: AHCI Enable: yes
KERN: ahci: Ports Implemented Mask: 0x000003
KERN: ahci: Number of Available Ports: 2
KERN: ahci: AHCI Version 1.0
KERN: ahci: Interrupt 24
KERN: ahci: AHCIPort::Init1 port 0
KERN: ahci: allocating 4096 bytes for AHCI port 0
KERN: ahci: area = 673, size = 4096, virt = 0x81bfd000, phy = 0xa0d4000
KERN: ahci: PRD table is at 0x81bfd580
KERN: ahci: AHCIPort::Init1 port 1
KERN: ahci: allocating 4096 bytes for AHCI port 1
KERN: ahci: area = 674, size = 4096, virt = 0x81bfe000, phy = 0xa0d3000
KERN: ahci: PRD table is at 0x81bfe580
KERN: ahci: AHCIPort::Init2 port 0
KERN: ahci: AHCIPort::ResetPort port 0
KERN: ahci: AHCIPort::ResetPort port 0, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
rt port 0, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 0, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040d0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::PostReset port 0
KERN: ahci: device signature 0x00000101 (ATA)
KERN: ahci: ie   0x7dc0007f
KERN: ahci: is   0x00000000
KERN: ahci: cmd  0x0000e017
KERN: ahci: ssts 0x000KERN: 00133
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: tfd  0x00000150
KERN: ahci: AHCIPort::Init2 port 1
KERN: ahci: AHCIPort::ResetPort port 1
KERN: ahci: AHCIPort::ResetPort port 1, deviceBusy 0, forceDeviceReset 1
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400000, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04090000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000000
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000301
KERN: ahci: serr 0x04080000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040c0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00000040, ci 0x00000000
KERN: ahci: ssts 0x00000001
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040c0000
KERN: ahci: sact 0x00000000
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::InterruptErrorHandler port 1, fCommandsActive 0x00000000, is 0x00400040, ci 0x00000000
KERN: ahci: ssts 0x00000113
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x040d0000
KERN: ahci: sact 0x00000000
KERN: ahci: PhyReady Change
KERN: ahci: Port Connect Change
KERN: ahci: AHCIPort::PostReset port 1
KERN: ahci: device signature 0xeb140101 (ATAPI)
KERN: ahci: ie   0x7dc0007f
KERN: ahci: is   0x00000000
KERN: ahci: cmd  0x0100e017
KERN: ahci: ssts 0x00000113
KERN: ahci: sctl 0x00000300
KERN: ahci: serr 0x00000000
KERN: ahci: sact 0x00000000
KERN: ahci: tfd  0x00000100
KERN: ahci: cookie = 0x8280b900
KERN: ahci: ahci_path_inquiry, cookie 0x8280b900
Last message repeated 1 time
KERN: ahci: ahci_scan_bus, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiTestUnitReady port 0
KERN: ahci: AHCIPort::ScsiInquiry port 0
KERN: ahci: lba 1, lba48 1, fUse48BitCommands 1, sectors 117231408, sectors48 117231408, size 60022480896
KERN: ahci: trim supported, 1 ranges blocks, reads are deterministic, random.
KERN: ahci: model number: OCZ-AGILITY3                            
KERN: ahci: serial number: OCZ-X78XWFG4D28DS609
KERN: ahci: firmware rev.: 2.15    
KERN: ahci: trim support: yes
KERN: ahci: sg_memcpy phyAddr 0x253442c, size 96
KERN: ahci: ahci_get_restrictions, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiGetRestrictions port 0: isATAPI 0, noAutoSense 0, maxBlocks 65536
KERN: publish device: node 0x82b59d00, path disk/scsi/0/0/0/raw, module drivers/disk/scsi/scsi_disk/device_v1
KERN: ahci: ahci_get_restrictions, cookie 0x8280b900
KERN: ahci: AHCIPort::ScsiGetRestrictions port 1: isATAPI 1, noAutoSense 1, maxBlocks 256
KERN: publish device: node 0x82b59c60, path disk/scsi/0/1/0/raw, module drivers/disk/scsi/scsi_cd/device_v1
KERN: ata 0: controller doesn't support DMA, disabling
KERN: ata 0: _DevicePresent: device selection failed for device 0
KERN: ata 0: _DevicePresent: device 1, presence 0
KERN: ata 0: deviceMask 0
KERN: ata 0: ignoring device 0
KERN: ata 0: ignoring device 1
KERN: ata 0 error: target device not present
Last message repeated 1 time
KERN: ata 0 error: invalid target device
KERN: Last message repeated 12 times.
KERN: ata 1: controller doesn't support DMA, disabling
KERN: ata 1: _DevicePresent: device selection failed for device 0
KERN: ata 1: _DevicePresent: device 1, presence 0
KERN: ata 1: deviceMask 0
KERN: ata 1: ignoring device 0
KERN: ata 1: ignoring device 1
KERN: ata 1 error: target device not present
Last message repeated 1 time
KERN: ata 1 error: invalid target device
KERN: Last message repeated 12 times.
KERN: KDiskDeviceManager::_Scan(/dev/disk/scsi)
i: tfd  0x00000100
KERN: ahci: cookie = 0x8280b900
KERN: ahci: ahci_path_inquiry, cookie 0x8280b900

comment:4 Changed 5 years ago by kallisti5

it works as of hrev46819!

And by works I mean it erases everything on my SSD. :-\ Screenshot attached

The partition table is gone, so it definitely trims the first 512 bytes of the disk.

Last edited 5 years ago by kallisti5 (previous) (diff)

Changed 5 years ago by kallisti5

Attachment: IMG_20140204_232334.jpg added

as of hrev46819 shortly before everything locks up and bursts into flames.

comment:5 Changed 5 years ago by anevilyak

I can't concur here, on my Corsair the results seem to remain as before, which is to say the partition is still usable/readable, but there's no obvious indicator as to whether the trim operation actually succeeded. Steps:

  • unmount partition
  • dd if=/dev/zero
  • mkbfs
  • mount
  • verify blocks are still generally zeroed apart from basic filesystem structures via DiskProbe
  • fstrim
  • check blocks again - still zeroed. According to the drive specs, trimmed blocks should theoretically return deterministic random blocks on read, so this should theoretically rule out seeing straight zero blocks, but that's not currently the case.

comment:6 Changed 5 years ago by kallisti5

Milestone: R1R1/alpha5
Priority: normalblocker

we likely should fix this pre-alpha5 or remove fstrim from the alpha5 branch images.

Don't want users running it unknowingly and potentially erasing their data.

comment:7 Changed 5 years ago by kallisti5

Just tried it on my OCZ Agility 3 again (hrev46931) (booted haiku from USB stick, did fstrim on BeFS / Haiku SSD filesystem mounted.

Got the attached syslog output before the system froze up.

Interesting...

KERN: [30] 65535 : 19708

A bunch of 65535's... that sounds like some kind of overflow.

Changed 5 years ago by kallisti5

Attachment: trim-oczagility.txt added

comment:8 Changed 5 years ago by axeld

I was obviously not really awake when I added that debug output. Not only did I write "3%" instead of "%3", I also mixed up block offset, and block length in the AHCI driver. 65535 is just the maximum number of blocks that the ATA spec allows there.

Anyway, those numbers look just right. However, they are far from complete. How did you receive those lines?

comment:9 Changed 5 years ago by kallisti5

Milestone: R1/alpha5R1/beta1

Pushing to R1B1.

We need to include a blurb in the R1A5 release notes that trim functionalty is experimental and may eat your data.

comment:10 Changed 5 years ago by axeld

Milestone: R1/beta1R1/alpha5

Since this still require action for R1a5, I changed the milestone back, so that it is not forgotten.

It might make more sense to remove the fstrim command unless it gets more testing before the release.

comment:11 Changed 5 years ago by axeld

FWIW I managed to duplicate kallisti5's issue using VirtualBox. Since version 4.2 it can use trimming to shrink dynamically sized VDIs, one just have to enable it manually using: $ vboxmanage storageattach Haiku --storagectl "SATA Controller" --port 0 --discard on

Where "Haiku" needs to be replaced by your VM name, and you may also need to specify a different port.

The result after an fstrim command is an unbootable system, hooray! I'll look into it over the next few months ;-)

comment:12 Changed 4 years ago by pulkomandy

Milestone: R1/alpha5R1/beta1

comment:13 Changed 3 years ago by kallisti5

Keywords: TRIM fstrim added
Summary: TRIM fails on OCZ Agility 3TRIM / fstrim can destroy data on SSD's when executed

comment:14 Changed 3 years ago by axeld

I've looked into this, but after having compiled my own version of VirtualBox that adds debugging output to the trimming, I can't reproduce it anymore at all -- it works like a charm over here.

Does anyone else feels like trying again on real hardware? ;-)

comment:15 Changed 3 years ago by pulkomandy

Last time I tried, it didn't seem to destroy any data for me, but it was very slow with the disk spending a lot of time handling each trim command sent, and ultimately replying with failure. SSDs of small capacity should be cheap and easy to come by these days, I would suggest dedicating one to testing purposes?

comment:16 Changed 3 years ago by axeld

It should be slow ATM, as we use the synchronous version of the trim command (the queueable one didn't exist back then, but we don't support command queuing anyway). But since you currently have to manually issue the command, that shouldn't be much of an issue. Replying with failure is more of a problem, though.

Anyway, I don't think a single SSD will do the trick, anyway. We should test with a number of different ones to be sure it works as it should.

comment:17 Changed 3 years ago by pulkomandy

I tested this again on my machine (Intel SSDSC2CT180A4). Soon after running fstrim on a test partition which resides on the same drive as my boot one, I hit a KDL in get_next_team_info. After a reboot using the reboot command, my SSD wouldn't be visible in the BIOS, nor in DriveSetup after I booted from an USB disk.

I did a cold reboot, and after that the SSD is back and no data was erased, at least.

A possible interpretation is that the commands we sent to the SSD managed to somehow confuse the firmware enough that it didn't reply to anything after that.

Last edited 3 years ago by pulkomandy (previous) (diff)

comment:18 Changed 2 years ago by kallisti5

fstrim still "seems" to work on on VirtualBox as of hrev50590 / x86_64. (I even got zesty and did a while true; fstrim /boot; sleep 1; done)

I wonder if the AHCI work had any impact on real hardware TRIM?

comment:19 Changed 2 years ago by kallisti5

actually.. on reboot after issuing lots of fstrim commands, the OS no longer boots... seems like it definitely still has the potential to destroy data in virtual box or on real hardware.

comment:20 Changed 2 years ago by pulkomandy

In hrev50664 I added support for trim to our ramdisk device. This makes it possible to test the BFS code with a different disk driver. (and is also useful to release the RAM used by the ramdisk when space is free on the filesystem).

I did that (with a 8MB disk image) and did not manage to corrupt the filesystem yet. So either the bug is in the SCSI implementation of trim, or it needs a disk larger than 8MB to start having problems. Anyway, this makes it possible to test the BFS side of the code without running it on actual data.

comment:21 Changed 2 years ago by kallisti5

I'm not sure we should ship fstrim with R1 if there is a good chance of data loss. We may need to disable fstrim in the R1 branch unless this one is fixed.

comment:22 Changed 2 years ago by pulkomandy

As mentionned above, the command itself and the BFS logic were shown to work fine when using a RAM disk. If we remove anything, it would be the support in the ATA driver. I do plan to get back to this and try to fix the problems, I have a spare SSD to experiment with.

comment:23 Changed 2 years ago by axeld

Sure, if we don't get it ready in time, we should simply remove the fstrim command; it doesn't serve any purpose then anyway.

Since I could reproduce the issue with VirtualBox, I created a debug version of it (that gives me more insight what ends up in the device), to see what is going wrong. But of course, I didn't manage to reproduce it with that version anymore. Maybe it only happens when doing a bit more before trimming.

comment:24 Changed 2 years ago by pulkomandy

I have set up a machine for testing purposes. Thinkpad X200 with Kingston 60GB SSD, SV300S37A60G. I'm using a 3GB partition near the start of the disk, on which I installed Haiku. I am trimming the boot volume, without other activity happening.

So far I have not managed to corrupt the drive this way. However, the trim command will time out. The HDD led stays on for some time, but then the command is aborted.

port reset: port 0 undergoing COMRESET
ExecuteAtaRequest port 0: device timeout
sata_request::abort called for command 0x06
trim failed (64 ranges)!

It seems the command is simply taking too long to execute, and eventually the port is reset to "unlock" the situation. On this SSD, it seems to not have any effect (trimming a partition that was cleaned with dd if=/dev/zero first does not change its data). But, it could be that other drives/firmwares are much less happy about being reset while they are TRIMing stuff, and it could lead to loss of data if they don't handle their transactions properly?

I'm going to reduce the number of blocks to trim per command, so that it executes faster and does not time out.

comment:25 Changed 2 years ago by pulkomandy

I tried various things:

  • Always send only 1 range: same timeout
  • Reduce all ranges to only 1 sector: same timeout

I noticed that the command expects a number of "range blocks" to trim (a block is 512 byte, or 64 entries of 8 byte each). For the last command we send, there are less than 64 entries, and we send a shorter command block. Does that work? Or should we round the buffer to the next multiple of 512 bytes and fill it with zero? the spec says that unused entries in a block should have their "range" field set to 0. I'm wondering if this could cause the disk to interpret random data at the end of the buffer as trim commands, which would lead to erasing random areas of the disk.

But first, I need to understand why the command timeouts, even with small ranges. I do saw some disk sectors turning into 0xFF, so it is at least partially working.

comment:26 Changed 16 months ago by Giova84

Haiku hrev51346 gcc2h on a Samsung 850 EVO ssd (250 GiB with two 125 GiB partition - Haiku is on the 2nd partition).

When I run fstrim /boot Everything went fine, no errors/troubles occurred. Just for the curiosity: It takes 42 seconds to trim 24410886144 bytes, which are about 24 gigabytes: why? I just have 1.71 GiB of files on the Haiku partition.

comment:27 Changed 16 months ago by kallisti5

It does indeed work on some SSD's.

I would reboot and ensure all your data is safe before calling things 100% ok :-) Keep in mind trim can damage data on other partitions as well beyond Haiku. I've seen trim fail and corrupt data on OCZ and Sandisk SSD's

Giova84: Could you grab a syslog from that trim you executed and post it here? (/var/log/syslog.old if you've rebooted, otherwise /var/log/syslog)

comment:28 Changed 16 months ago by axeld

Not sure if you mean the size or the time it requires:

  1. Trimming is not necessarily a fast operation. That's why it's usually not done when actually deleting files, but later on as some kind of scrubbing service.
  2. Trimming clears free space, not used space. The more files there are on your partition, the less space is subject to trimming.

comment:29 Changed 16 months ago by pulkomandy

If the partition is 125GB big, and there is only 1.71GB used, then trim should clear about 123GB. So only 24GB sounds wrong? Where did these extra 100GB go?

comment:30 Changed 16 months ago by axeld

Oh, you're right Adrien, I overlooked that. In that case, I don't have an explanation without looking deeper. In any case, 42 seconds is very long, too.

comment:31 Changed 16 months ago by Giova84

To do some other test, yesterday, after download some zip files with source code, unzipping them on the disk, compiling them and then delete everything (well, everything could be about 10 MiB in total) I run fstrim again, and tooked about one minute to trim 244082240128 bytes (yes: about 244 gigabytes!): no data corruption occurred on both partition (on the first one there is Win 7 on NTFS, however).

After reading the Kallisti's suggestion, today I run fstrim again, and this time, after one minute, fstrim triggered a KDL about vm_page_fault and read_fault (sorry, ATM i don't have a camera to take pictures). After such KDL i was forced to force the reboot, since was impossible to properly exit from such KDL. At the next boot no corruption were present on the partition and as far as i can tell, no data were lost. However I have saved both logs (one of them talks about fstrim and the KDL), which I attach here.

Changed 16 months ago by Giova84

Attachment: previous_syslog added

syslog with info about KDL and fstrim

comment:32 Changed 16 months ago by Giova84

Has a Patch: set

Changed 16 months ago by Giova84

Attachment: syslog added

comment:33 Changed 16 months ago by Giova84

PS: I noticed that when I run checkfs -c /boot after the fstrim command, the nodes value decrease. EG: before fstrim, when I run checkfs -c, it reports 7756 nodes; after fstrim it reports 7589 nodes. However, as I said, no data seems lost and checkfs give no errors. It means that the fstrim command properly works?

comment:34 Changed 16 months ago by Giova84

I'm keep on doing little tests.

After rebooted Haiku I deleted a zip file of about 5 MiB, then I run again fstrim: it immediately (not after some time) triggered the same KDL and again I had to force the reboot (using CTRL ALT CANC). At the next boot I was puzzled, because yesterday the fstrim command run fine without troubles, so I attempted again, but this time before of the fstrim I tried to run the sync command. I don't know if was just casuality, but now fstrim didn't triggered the KDL. Like yesterday it tooked 42 seconds to trim 24412811264 bytes.

Obviously I want avoid to damage the disk or the Haiku partition, as I want to avoid to lost or damage my data (for what is worth, checkfs gave no errors). After some readings on Google, I read that the manual fstrim command (I read some forums of Linux users, since they also have the fstrim command - and some people run fstrim using cron) usually must be run daily or weekly.

I'd like to properly maintain my SSD. So: how can I check if the fstrim command on Haiku really clears free space?

comment:35 Changed 16 months ago by pulkomandy

What I did to test this (but it is a destructive test): 1) with dd, clear a section of the partition with all 0xE5 (or some other value, or use data from /dev/random) 2) format the partition as bfs (only some sectors are modified) 3) run fstrim on the partition

If fstrim works properly, the fixed value used at step 1 should be gone from the sectors, and the default erased value of the disk should be there instead (usually 00 or FF). You can check this with DiskProbe.

Note that current SSDs do well even without trimming, there is a performance loss but not a lifetime reduction as it used to be.

comment:36 Changed 16 months ago by Giova84

Since a destructive test is the last thing that I would to do (well, if necessary, since I want to be sure that fstrim really works for me, I will do it: please explain me - step to step - how to with dd, clear a section of the partition with all 0xE5 and where looks). So I've tried another test, also if I'm not sure if is really reliable.

On Haiku partition I have a zip file for the mame emulator: such file is called cheat.dat: it contains some lines of text description, like "this is the cheat file. For more info visit the site www.mame.co.uk", plus more entries.

When i probe the Haiku partition (/dev/disk/scsi/0/0/0/1) with diskprobe and I look to the block 0xc8b7e6, in facts I can see the text content of such cheat.dat file.

Then I deleted the cheat.dat file, run sync and then fstrim (which trimmed 24458731520 bytes) and I analyzed again the /dev/disk/scsi/0/0/0/1 partition. At block 0xc8b7e6 there still was the content of the cheat.dat file.

I've done a reliable or an useless test? Please forgive me, but I'm not very expert.

comment:37 Changed 16 months ago by pulkomandy

Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.

comment:38 in reply to:  37 Changed 16 months ago by anevilyak

Replying to pulkomandy:

Yes, in that case the block should be erased as well. So it looks like in your case, fstrim does nothing at all, or maybe not as much as it could.

Is it actually required/guaranteed for the SSD controller to execute that command synchronously? Or for that matter, to physically erase the page at that point? Depending on the impl, it could conceivably simply mark the page as available for erasure internally, and not actually touch it until needed, but I'm not so familiar with the details of the specs.

comment:39 Changed 16 months ago by pulkomandy

According to Wikipedia:

There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command:

  • Non-deterministic TRIM: Each read command to the Logical block address (LBA) after a TRIM may return different data.
  • Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate.
  • Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero.

So, it depends on the disk and needs to be checked in the device identification.

comment:40 Changed 16 months ago by axeld

Also, it's not a good test unless you a) reboot (to make sure no one has the file open still), and b) have run checkfs on the partition, to make sure its space has really been freed.

comment:41 Changed 16 months ago by Giova84

After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.

comment:42 in reply to:  41 Changed 16 months ago by anevilyak

Replying to Giova84:

After such test I rebooted Haiku and ran checkfs: it told me nothing about freed space.

The only time checkfs frees up space is hypothetically in the case of a power loss or other catastrophic crash that doesn't allow the filesystem to unmount cleanly. Under normal circumstances, space is freed automatically by filesystem operations, so checkfs won't have anything to report.

comment:43 Changed 16 months ago by axeld

That procedure would just make sure that the space is actually freed. Afterwards, you'd have to trim.

comment:44 Changed 16 months ago by Giova84

I've done the "ultimate" test: first of all I backup all my data on another partition (BeFS) on another disk.

Then I booted Haiku live CD (hrev51346 gcc2h) and there I opened DiskProbe to probe the SSD Haiku's partition, and the block (0xc8b7e6) still showed the content of the file file which I deleted. Then I run again fstrim on the SSD partition (still from the live cd) and I triggered again the same KDL; I rebooted in the live cd and before to run again fstrim, I run the sync command: so fstrim did the job (4749070336 bytes in 15 seconds). I rebooted again in the live cd and checked again the SSD partition with DiskProbe: the block 0xc8b7e6 still had the content of deleted file.

Then I deleted and made from scratch the BeFS partition on the SSD: i rebooted again the live cd, checked again the SSD partition with DiskProbe and block 0xc8b7e6 still contained the content of my previous installation: after a deep check also all the content of my text files were still present on the empty disk despite various fstrim.

comment:45 Changed 16 months ago by axeld

I'm afraid it's not an ultimate test either: the drive combines several blocks together as an "erase block". AFAIK this is about 1.5 MB on the EVO. This means, that if within this 1.5 MB block, BFS could not erase just a single disk block (4K), the drive cannot erase it just yet. So you might just have hit such a situation.

But anyway, until trim does work reliable, it won't be part of the release.

comment:46 Changed 16 months ago by Giova84

You're right, Axel. However I have some more info about. I've bought this SSD on the 6 August, and I've checked, day to day, the S.M.A.R.T status and was always OK. I can surely say that after the various KDLs which I encountered on Haiku with fstrim, caused the C7 SMART error "CRC Error" and EB "POR Recovery Count" which usually are errors which occurs when the system doesn't shutdown cleanly or when the SATA cable is poor. In facts when the KDLs occurred, I was forced to force the reboot using CTRL ALT CANC, and my SATA cable is good. Currently after all the KDLs which I encountered, the C7 counter, reached the value of 000000000008. (8) and the EB counter a value of 4: in facts I had totally 4 KDLs. When I cleanly shutdown or reboot, these values don't increase.

However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!

POST EDITED

Last edited 16 months ago by Giova84 (previous) (diff)

comment:47 Changed 16 months ago by Giova84

However, if I would to really and properly free the space on my SSD on the BeFS partition, what I could do? Sorry, but as I've said, I'm not expert about these things. Thank you for your patience, really!

Sorry, I wasn't been clear: I meant to say if there would be an alternative way to trim the SSD BeFS partition. As Pulkomandy said, is a matter of filesystem support, so I guess that also "Samsung Magician" can't run trim on the BeFS partition, because it doesn't know nothing about BeFS. As I've previously said - obviously - the fact of delete and make again the partition didn't solved the fact of free the space on the SSD. Someone know if there is an "universal" utility to execute the trim despite of the filesystem? I also read something about the fact that this could depends on the hardware controller of the drive. I ask all these questions because I still see all my old data (using DiskProbe) despite trimfs and the re-initialization of the partition, also because I have the habit to fill my Haiku partitions with a lot of data which I often delete.

comment:48 Changed 16 months ago by pulkomandy

Has a Patch: unset

comment:49 Changed 16 months ago by pulkomandy

Has a Patch: unset

Ouch, what happened to that previous_syslog file? It's filled with corrupt characters at the end. Did fstrim erase some memory?

comment:50 Changed 16 months ago by Giova84

I'm not able to tell you if fstrim erased some memory :-)

However I checked if trim properly works under Windows 7, by following these instructions: http://www.win-raid.com/t24f34-Easy-TRIM-test-methods.html (at the "B. Easy and very effective TRIM test by using a Hex Editor" paragraph). And here my SSD has been properly trimmed; so fstrim seems that doesn't work - at least for me - under Haiku.

I tried to do the same test, again, under Haiku: I triggered again the same KDL - followed by a forced and unclean reboot - and the smart value C7 "CRC Error count" has increased of one number (from 8 to 9): I am totally sure that this occurs after the KDL when I am forced to reboot.

comment:51 Changed 5 months ago by waddlesplash

Has a Patch: unset
Milestone: R1/beta1Unscheduled
Priority: blockerhigh

SCSI TRIM disabled in hrev52134; removing from beta1.

comment:52 Changed 5 months ago by pulkomandy

I would have disabled it only in the beta1 branch. Maybe it's time to create the branch?

Last edited 5 months ago by pulkomandy (previous) (diff)

comment:53 Changed 5 months ago by waddlesplash

Why? It is known to be broken, and so until someone has time to fix it it does not make sense to leave it enabled, even on nightlies. And no, I'm holding off beta branch creation until we fix the other remaining blockers.

Note: See TracTickets for help on using tickets.