Opened 10 months ago

Last modified 2 months ago

#18778 new bug

KDL when I turn off the laptop: SMAP Violation user-mapped address

Reported by: atomozero Owned by: tqh
Priority: normal Milestone: R1/beta6
Component: Drivers/ACPI Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

When I restart my laptop everything works perfectly, otherwise when I turn it off the system goes into kernel debug land, presumably because of the ACPI drivers. I am using a nightly build of Haiku (hrev57564) and I cannot say when this problem started.

Attachments (9)

listdev.txt (3.9 KB ) - added by atomozero 10 months ago.
syslog hrev57564 (512.0 KB ) - added by atomozero 10 months ago.
photo_2024-02-02_23-37-32.jpg (244.7 KB ) - added by atomozero 10 months ago.
photo_2024-02-02_23-37-35.jpg (263.5 KB ) - added by atomozero 10 months ago.
IMG_20240203_132844886.jpg (3.6 MB ) - added by korli 9 months ago.
Area info
photo_2024-02-03_22-07-58.jpg (244.1 KB ) - added by atomozero 9 months ago.
area 0xffffffff825630aa
PXL_20240331_200237410.jpg (3.9 MB ) - added by atomozero 8 months ago.
KDL 1° Screenshot
PXL_20240331_200251575.jpg (4.2 MB ) - added by atomozero 8 months ago.
KDL 2° Screenshot - 1° Continue
PXL_20240331_200302777.jpg (3.9 MB ) - added by atomozero 8 months ago.
KDL 3° Screenshot - 2° Continue

Change History (30)

by atomozero, 10 months ago

Attachment: listdev.txt added

by atomozero, 10 months ago

Attachment: syslog hrev57564 added

by atomozero, 10 months ago

by atomozero, 10 months ago

comment:1 by waddlesplash, 10 months ago

If you can type at the KDL prompt, can you please run "area <address>" on the address from the initial message ("SMAP violation at ...")?

comment:2 by tqh, 10 months ago

Summary: KDL when I turn off the laptopKDL when I turn off the laptop: SMAP Violation user-mapped address

by korli, 9 months ago

Attachment: IMG_20240203_132844886.jpg added

Area info

comment:3 by korli, 9 months ago

I've the same problem (warning: text recognition is a bit incorrect)

kdebug area contains 8xffffffff823058d8

AREA: 0xffffffff80b88e78

name:

owner:

id:

base:

acpi_physical_nem_area"

8x1

8x9a35

Bxffffffff82305000

size:

0x1000

protection:

Bx30

page_protection:0x0000000000000000

wiring:

8x2

memory_type:

0x10808800

cache:

Bxffffffff806930f0

cache_type: cache_offset:

device

8x8

cache_next: 0x0000000000000000

cache_prev:

page nappings: 0

kdebug
Last edited 9 months ago by korli (previous) (diff)

comment:4 by waddlesplash, 9 months ago

Strange, that area is mapped with KERNEL permissions only: https://xref.landonf.org/source/xref/haiku/src/add-ons/kernel/bus_managers/acpi/ACPICAHaiku.cpp#472

And 0x30 is only those two kernel protections; plus page_protection is NULL as expected. So what's going on here? How is this an SMAP violation?

Last edited 9 months ago by waddlesplash (previous) (diff)

comment:5 by waddlesplash, 9 months ago

Keywords: Power Off removed
Milestone: UnscheduledR1/beta5

by atomozero, 9 months ago

area 0xffffffff825630aa

comment:6 by waddlesplash, 9 months ago

Uh, unless I am missing something, neither of the areas returned by "area contains" (both from korli and atomozero) actually contains the address in question; both are too small. That seems very strange?

in reply to:  6 comment:7 by korli, 9 months ago

Replying to waddlesplash:

Uh, unless I am missing something, neither of the areas returned by "area contains" (both from korli and atomozero) actually contains the address in question; both are too small. That seems very strange?

I don't get it. Size is 0x1000 The areas actually contain the addresses in question.

comment:8 by waddlesplash, 9 months ago

Ah, I was looking at the wrong field, you're correct.

But there still remains the question of how we are getting a SMAP violation on an area that doesn't appear to be user-mapped.

comment:9 by korli, 9 months ago

I see the area in question wasn't allocated yet before initiating the shutdown process.

comment:10 by pulkomandy, 8 months ago

I'm looking at src/system/kernel/arch/x86/arch_int.cpp and I see we have a long list of if/else to decide what type of fault it is. It could be that the conditions for deciding that it is a SMAP violation somehow end up being verified when they shouldn't, and it's just another type of fault.

For example, Linux checks the U bit in the fault error code, but we check only the P bit

https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/fault.c#L1277

According to the comments they added this because of the WRUSS instruction which introduces a special case where the code is running from the kernel, but actually explicitly trying to write to userspace:

https://www.felixcloutier.com/x86/wrussd:wrussq

I don't see why we would be using that, but maybe there are some other edge cases here. Did you have a look at the disassembly of the crashing code to see exactly what it is doing?

comment:11 by waddlesplash, 8 months ago

I think this may not really be a SMAP violation except by coincidence. I've encountered "SMAP violations" before that didn't make much sense, but when I "co"'d, I got a standard page fault, if memory serves. Can someone who can reproduce this problem try that and see if you get a second, different, KDL?

comment:12 by atomozero, 8 months ago

If you tell me what commands to give, I'll be happy to try them tonight. :)

comment:13 by pulkomandy, 8 months ago

The command is "continue", or "co" for short

by atomozero, 8 months ago

Attachment: PXL_20240331_200237410.jpg added

KDL 1° Screenshot

by atomozero, 8 months ago

Attachment: PXL_20240331_200251575.jpg added

KDL 2° Screenshot - 1° Continue

by atomozero, 8 months ago

Attachment: PXL_20240331_200302777.jpg added

KDL 3° Screenshot - 2° Continue

comment:14 by nephele, 6 months ago

Was this a problem in Beta4? if not i'd like to unschedule this from beta5.

comment:15 by pulkomandy, 6 months ago

if not i'd like to unschedule this from beta5.

Why? It's a crashing bug; we should probably try to fix it.

There is no need to unschedule things until we are closer to the release. It's not set as a blocker, but it is something we should look into if possible. I think at least Waddlesplash should look into the extra captures that he requested? Indeed the SMAP violation turns into a page fault at the same address, but personally, I don't know what to make if it.

comment:16 by nephele, 6 months ago

Why? It's a crashing bug; we should probably try to fix it.

Yes, but should this need to be fixed in beta5?

I'm currently trying to see what is still blocking beta5, and this seems to be about 5 or so regressions. I would rather we focus on those and get beta5 released. :)

comment:17 by pulkomandy, 6 months ago

If you want to see what's blocking beta5, have a look at the tickets with priority blocker or critical in the beta5 milestone:

https://dev.haiku-os.org/query?milestone=R1%2Fbeta5&priority=blocker&priority=critical&status=assigned&status=in-progress&status=new&status=reopened&group=status

There is no need to remove the other tickets from the milestone to see that.

comment:18 by nephele, 6 months ago

I don't think that is accurate, since the regressions are not visible there, and the tasks are things to do just before the release.

Otherwise, apart from the exception handling we could release it right now. But afaik we try to fix the regressions before a release.

comment:19 by pulkomandy, 6 months ago

If there are regressions that should be blocking, we can (and probably should) increase their priority.

The other task (the update to OpenSSL) is not to be done just before the release, on the contrary, it should be done as early as possible, so that we can test it in nightlies for a while before the release. Doing it at the last minute would be a great way to introduce new regressions in the next release.

I plan to look into it but I had few time and energy to spend on Haiku lately (due to being busy with paid work things and other projects). Not sure when things will clear up for me, the next few months keep filling up with other things...

comment:20 by waddlesplash, 6 months ago

Indeed the SMAP violation turns into a page fault at the same address, but personally, I don't know what to make if it.

Well, it probably means there isn't really a page there, somehow. How that happens I don't know, though.

comment:21 by waddlesplash, 2 months ago

Milestone: R1/beta5R1/beta6

move remaining tickets to beta6

Note: See TracTickets for help on using tickets.