Opened 21 months ago

Closed 2 months ago

#13911 closed bug (fixed)

hrev51714-x86_gcc2_hybrid_anyboot kernel issue due to mild terminal usage and webpositive

Reported by: domcaf Owned by: bonefish
Priority: normal Milestone: Unscheduled
Component: File Systems/packagefs Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description (last modified by diver)

hrev51714-x86_gcc2_hybrid_anyboot kernel issue due to mild terminal usage and webpositive

REPRODUCTION STEPS:

  • Run from USB thumbdrive on physical hardware.
  • Make it to desktop.
  • Enter WiFi/WPA credentials.
  • Open terminal and issue following commands:
  • ping -c2 www.yahoo.com for which successful responses received.
  • set -o vi.
  • ping -c2 www.google.com for which successful responses received.
  • {{{perl -v" which produces expected output.
  • Closed terminal by typing using ctrl-d key sequence.
  • Opened webpositive with no success.
  • Open terminal again with no success.
  • Kernal debugger starts up by itself. See attached photos of white background debugger output and "bt" command output. Multiple photos of each provided in case one photo is easier to view than another.
  • Was able to get same behavior multiple times using above steps. Additional hardware info also attached in case this is processor/hardware specific.
  • Thanks for reviewing!

Attachments (16)

IMG_20171228_153212.jpg (1.0 MB ) - added by domcaf 21 months ago.
Initial debugger output after it started automatically, background is white.
IMG_20171228_153234.jpg (926.9 KB ) - added by domcaf 21 months ago.
Initial debugger output after it started automatically, background is white, alternate photo in case more readable
bt-IMG_20171228_183238.jpg (943.0 KB ) - added by domcaf 21 months ago.
kernel "bt" command output
bt-IMG_20171228_183315.jpg (844.2 KB ) - added by domcaf 21 months ago.
kernel "bt" command output; alternate for readability alternative
TestBed-system-info_20171227.txt (2.2 KB ) - added by domcaf 21 months ago.
Hardware info on which problem occurred using "sysinfo" utility when same system booted into Lubuntu Linux
IMG_20171125_164446.jpg (1.1 MB ) - added by domcaf 21 months ago.
Bios info # 1
IMG_20171125_164309.jpg (1.0 MB ) - added by domcaf 21 months ago.
Bios info # 2
IMG_20171125_164250.jpg (1.1 MB ) - added by domcaf 21 months ago.
Bios info # 3
IMG_20171125_164224.jpg (1.1 MB ) - added by domcaf 21 months ago.
Bios info # 4
IMG_20171125_164131.jpg (1002.0 KB ) - added by domcaf 21 months ago.
Bios info # 5
memtest86plus-IMG_20171229_003013.jpg (999.6 KB ) - added by domcaf 21 months ago.
memtest86+ results screenshot
memtest86plus-IMG_20171229_003025.jpg (1003.6 KB ) - added by domcaf 21 months ago.
memtest86+ results screenshot alternate for readability
bt.jpg (779.4 KB ) - added by domcaf 20 months ago.
bt output
bt_alt.jpg (961.3 KB ) - added by domcaf 20 months ago.
bt_alt for readability problems
syslog.jpg (966.0 KB ) - added by domcaf 20 months ago.
syslog | tail from KDL
syslog_alt.jpg (872.5 KB ) - added by domcaf 20 months ago.
syslog | tail from KDL alt for readability problems

Change History (31)

by domcaf, 21 months ago

Attachment: IMG_20171228_153212.jpg added

Initial debugger output after it started automatically, background is white.

by domcaf, 21 months ago

Attachment: IMG_20171228_153234.jpg added

Initial debugger output after it started automatically, background is white, alternate photo in case more readable

by domcaf, 21 months ago

Attachment: bt-IMG_20171228_183238.jpg added

kernel "bt" command output

by domcaf, 21 months ago

Attachment: bt-IMG_20171228_183315.jpg added

kernel "bt" command output; alternate for readability alternative

by domcaf, 21 months ago

Hardware info on which problem occurred using "sysinfo" utility when same system booted into Lubuntu Linux

by domcaf, 21 months ago

Attachment: IMG_20171125_164446.jpg added

Bios info # 1

by domcaf, 21 months ago

Attachment: IMG_20171125_164309.jpg added

Bios info # 2

by domcaf, 21 months ago

Attachment: IMG_20171125_164250.jpg added

Bios info # 3

by domcaf, 21 months ago

Attachment: IMG_20171125_164224.jpg added

Bios info # 4

by domcaf, 21 months ago

Attachment: IMG_20171125_164131.jpg added

Bios info # 5

comment:1 by waddlesplash, 21 months ago

Keywords: kernel debugger self start removed
Priority: highnormal

Can you please get the file /var/log/syslog from the booted Haiku system (before you reproduce the crash, of course) and upload it here? (Or is that somehow not possible?)

Secondly, have you run a memtest86+ on this system recently?

comment:2 by domcaf, 21 months ago

memtest86+ ran thru 3 full cycles with no errors before I stopped testing. Please see attached screenshots. Will try to get you syslog soon.

by domcaf, 21 months ago

memtest86+ results screenshot

comment:3 by domcaf, 21 months ago

Has a Patch: set

by domcaf, 21 months ago

memtest86+ results screenshot alternate for readability

comment:4 by diver, 21 months ago

Component: - GeneralFile Systems/packagefs
Description: modified (diff)
Owner: changed from nobody to bonefish
Platform: x86All

comment:5 by diver, 21 months ago

Looks like Haiku KDL's due to packagefs failing to read package content. Might be a problem with the thumb drive.

comment:6 by domcaf, 21 months ago

I'll try a different thumbdrive and post back what happens.

comment:7 by domcaf, 21 months ago

I was NOT able to reproduce this bug when I used a different thumb drive. Sorry for the "red herring". I will try multiple thumb drives BEFORE reporting other issues that I might experience. Thank you all for your patience. I consider this issue resolved.

comment:8 by waddlesplash, 21 months ago

Resolution: invalid
Status: newclosed

comment:9 by mmlr, 21 months ago

Resolution: invalid
Status: closedreopened

It should still not crash in this way. The page fault indicates that some error condition isn't checked/handled. Probably a buffer that failed to allocate/transfer but was then still used. The error should bubble up and eventually be shown to the user somehow, not KDL.

Since it seems that you are able to readily reproduce the issue, can you keep that stick around for a while so that any investigation may get more info if needed? Could you try to reproduce the issue and print the last part of the syslog in KDL with 'syslog | tail' and post the output here? Maybe the issue triggering this leaves some traces which may make it easier to find the problem.

comment:10 by domcaf, 21 months ago

I've set aside the thumb drive for additional research on this issue. Do you want additional reproduction attempts with same build as initially reported or a more recent one?

comment:11 by mmlr, 21 months ago

It is well possible that the problem only occurs when hitting certain blocks of the stick or only with a certain access pattern. Since modifying the FS layout by updating packages may make the nicely reproducible test case go away, I'd stick to the current setup for now. Once the problem is better understood updating may make sense.

comment:12 by pulkomandy, 20 months ago

Has a Patch: unset

comment:13 by domcaf, 20 months ago

Got to point where I couldn't get system to boot off of thumb drive so I re-imaged it with same build as reported and tried again. With re-imaged thumb drive, I'd get to the blue background of desktop then after about 30 seconds or so ended up in kernel debugger, KDL. Attached is "bt" and "syslog | tail" output photos.

by domcaf, 20 months ago

Attachment: bt.jpg added

bt output

by domcaf, 20 months ago

Attachment: bt_alt.jpg added

bt_alt for readability problems

by domcaf, 20 months ago

Attachment: syslog.jpg added

syslog | tail from KDL

by domcaf, 20 months ago

Attachment: syslog_alt.jpg added

syslog | tail from KDL alt for readability problems

comment:14 by mmlr, 20 months ago

Thanks for the update.

The syslog shows that it's a timeout on the USB device as the reason why the read fails. Since the block number seems to have changed, it doesn't look like a specific region of the device being faulty. So there probably is a compatibility issue that leads to the device timeouts. This is an issue that should be investigated separately though.

What should be solved here is that the read error apparently isn't handled and therefore leads to the KDL instead of bubbling up.

comment:15 by waddlesplash, 2 months ago

Resolution: fixed
Status: reopenedclosed

The actual read timeouts were likely solved by the XHCI changes, and the KDL is yet another one that was solved by hrev52646. (I think we are somewhere in excess of 15-20 tickets closed by that now.)

Note: See TracTickets for help on using tickets.