Opened 11 months ago
Last modified 3 months ago
#18778 new bug
KDL when I turn off the laptop: SMAP Violation user-mapped address
Reported by: | atomozero | Owned by: | tqh |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | Drivers/ACPI | Version: | R1/beta4 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
When I restart my laptop everything works perfectly, otherwise when I turn it off the system goes into kernel debug land, presumably because of the ACPI drivers. I am using a nightly build of Haiku (hrev57564) and I cannot say when this problem started.
Attachments (9)
Change History (30)
by , 11 months ago
Attachment: | listdev.txt added |
---|
by , 11 months ago
Attachment: | syslog hrev57564 added |
---|
by , 11 months ago
Attachment: | photo_2024-02-02_23-37-32.jpg added |
---|
by , 11 months ago
Attachment: | photo_2024-02-02_23-37-35.jpg added |
---|
comment:1 by , 11 months ago
comment:2 by , 11 months ago
Summary: | KDL when I turn off the laptop → KDL when I turn off the laptop: SMAP Violation user-mapped address |
---|
comment:3 by , 11 months ago
I've the same problem (warning: text recognition is a bit incorrect)
kdebug area contains 8xffffffff823058d8 AREA: 0xffffffff80b88e78 name: owner: id: base: acpi_physical_nem_area" 8x1 8x9a35 Bxffffffff82305000 size: 0x1000 protection: Bx30 page_protection:0x0000000000000000 wiring: 8x2 memory_type: 0x10808800 cache: Bxffffffff806930f0 cache_type: cache_offset: device 8x8 cache_next: 0x0000000000000000 cache_prev: page nappings: 0 kdebug
comment:4 by , 11 months ago
Strange, that area is mapped with KERNEL permissions only: https://xref.landonf.org/source/xref/haiku/src/add-ons/kernel/bus_managers/acpi/ACPICAHaiku.cpp#472
And 0x30 is only those two kernel protections; plus page_protection is NULL as expected. So what's going on here? How is this an SMAP violation?
comment:5 by , 11 months ago
Keywords: | Power Off removed |
---|---|
Milestone: | Unscheduled → R1/beta5 |
follow-up: 7 comment:6 by , 11 months ago
Uh, unless I am missing something, neither of the areas returned by "area contains" (both from korli and atomozero) actually contains the address in question; both are too small. That seems very strange?
comment:7 by , 11 months ago
Replying to waddlesplash:
Uh, unless I am missing something, neither of the areas returned by "area contains" (both from korli and atomozero) actually contains the address in question; both are too small. That seems very strange?
I don't get it. Size is 0x1000 The areas actually contain the addresses in question.
comment:8 by , 11 months ago
Ah, I was looking at the wrong field, you're correct.
But there still remains the question of how we are getting a SMAP violation on an area that doesn't appear to be user-mapped.
comment:9 by , 11 months ago
I see the area in question wasn't allocated yet before initiating the shutdown process.
comment:10 by , 9 months ago
I'm looking at src/system/kernel/arch/x86/arch_int.cpp and I see we have a long list of if/else to decide what type of fault it is. It could be that the conditions for deciding that it is a SMAP violation somehow end up being verified when they shouldn't, and it's just another type of fault.
For example, Linux checks the U bit in the fault error code, but we check only the P bit
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1277
According to the comments they added this because of the WRUSS instruction which introduces a special case where the code is running from the kernel, but actually explicitly trying to write to userspace:
https://www.felixcloutier.com/x86/wrussd:wrussq
I don't see why we would be using that, but maybe there are some other edge cases here. Did you have a look at the disassembly of the crashing code to see exactly what it is doing?
comment:11 by , 9 months ago
I think this may not really be a SMAP violation except by coincidence. I've encountered "SMAP violations" before that didn't make much sense, but when I "co"'d, I got a standard page fault, if memory serves. Can someone who can reproduce this problem try that and see if you get a second, different, KDL?
comment:12 by , 9 months ago
If you tell me what commands to give, I'll be happy to try them tonight. :)
comment:14 by , 7 months ago
Was this a problem in Beta4? if not i'd like to unschedule this from beta5.
comment:15 by , 7 months ago
if not i'd like to unschedule this from beta5.
Why? It's a crashing bug; we should probably try to fix it.
There is no need to unschedule things until we are closer to the release. It's not set as a blocker, but it is something we should look into if possible. I think at least Waddlesplash should look into the extra captures that he requested? Indeed the SMAP violation turns into a page fault at the same address, but personally, I don't know what to make if it.
comment:16 by , 7 months ago
Why? It's a crashing bug; we should probably try to fix it.
Yes, but should this need to be fixed in beta5?
I'm currently trying to see what is still blocking beta5, and this seems to be about 5 or so regressions. I would rather we focus on those and get beta5 released. :)
comment:17 by , 7 months ago
If you want to see what's blocking beta5, have a look at the tickets with priority blocker or critical in the beta5 milestone:
There is no need to remove the other tickets from the milestone to see that.
comment:18 by , 7 months ago
I don't think that is accurate, since the regressions are not visible there, and the tasks are things to do just before the release.
Otherwise, apart from the exception handling we could release it right now. But afaik we try to fix the regressions before a release.
comment:19 by , 7 months ago
If there are regressions that should be blocking, we can (and probably should) increase their priority.
The other task (the update to OpenSSL) is not to be done just before the release, on the contrary, it should be done as early as possible, so that we can test it in nightlies for a while before the release. Doing it at the last minute would be a great way to introduce new regressions in the next release.
I plan to look into it but I had few time and energy to spend on Haiku lately (due to being busy with paid work things and other projects). Not sure when things will clear up for me, the next few months keep filling up with other things...
comment:20 by , 7 months ago
Indeed the SMAP violation turns into a page fault at the same address, but personally, I don't know what to make if it.
Well, it probably means there isn't really a page there, somehow. How that happens I don't know, though.
If you can type at the KDL prompt, can you please run "area <address>" on the address from the initial message ("SMAP violation at ...")?