Opened 6 years ago

Closed 3 years ago

Last modified 3 years ago

#10784 closed bug (no change required)

My laptop turn off because GPU overheating

Reported by: Premislaus Owned by: kallisti5
Priority: normal Milestone: R1
Component: Drivers/Graphics/radeon_hd Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

My laptop turn off because GPU overheating. I have this problem on Linux with kernels before 3.13.

I ran in VESA mode because - #9894

I have two graphics cards in my laptop - Radeon HD 7520g and 7670m. My laptop is a Samsung NP355V5C-S05PL with A6-4400M APU.

Attachments (3)

listdev (3.5 KB ) - added by Premislaus 6 years ago.
syslog.old (512.1 KB ) - added by Premislaus 6 years ago.
syslog (160.9 KB ) - added by Premislaus 6 years ago.

Download all attachments as: .zip

Change History (19)

by Premislaus, 6 years ago

Attachment: listdev added

by Premislaus, 6 years ago

Attachment: syslog.old added

by Premislaus, 6 years ago

Attachment: syslog added

comment:1 by Premislaus, 6 years ago

Max 10 minutes and it turned off.

No problems on Windows 8.1.

Last edited 6 years ago by Premislaus (previous) (diff)

comment:2 by Premislaus, 6 years ago

Last edited 6 years ago by Premislaus (previous) (diff)

comment:3 by Premislaus, 6 years ago

No problems on Ubuntu 14.04 and Windows 8.1. On Ubuntu is even cooler than on Windows.

On Haiku cooling operates at full power, but the laptop gets hot. And turn off after few minutes.

comment:4 by Premislaus, 6 years ago

Probably works "fine" in Vesa mode. I must blacklist radeon_hd accelerant and ude fail-safe video mode.

After some time I have KDL:

KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x71162ff0, ip 0x115b196, write 1, user 1, thread 0x153d
KERN: vm_page_fault: thread "w:846:offscreen" (5437) in team "app_server" (520) tried to write address 0x71162ff0, ip 0x115b196 ("app_server_seg0ro" +0x82196)
KERN: debug_server: Thread 5437 entered the debugger: Segment violation
KERN: stack trace, current PC 0x115b196  HasClipping__C9DrawState + 0x6:
KERN:   (0x71163008)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163038)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163068)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163098)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711630c8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711630f8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163128)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163158)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163188)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711631b8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711631e8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163218)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163248)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163278)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711632a8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711632d8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163308)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163338)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163368)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163398)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711633c8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711633f8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163428)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163458)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163488)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711634b8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711634e8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163518)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163548)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163578)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711635a8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711635d8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163608)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163638)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163668)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163698)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711636c8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711636f8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163728)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163758)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163788)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711637b8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711637e8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163818)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163848)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163878)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711638a8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x711638d8)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163908)  0x115b1d3  HasClipping__C9DrawState + 0x43
KERN:   (0x71163938)  0x115b1d3  HasClipping__C9DrawState + 0x43

comment:5 by Premislaus, 6 years ago

Unfortunately, laptop shut down again :/.

comment:6 by kallisti5, 6 years ago

Hm. We don't touch GPU Power Management for this very reason. (Letting the AtomBIOS manage the power on it's own) I'll reach out to my AMD contacts to see if it is a known issue.

It could be that the laptop vendor kept the voltages a bit too high to boost fps and didn't adjust the ASIC settings to match this adjustment. A lot of vendors only test their machines running "Windows and the stock drivers" thus missing these kinds of bugs. Linux likely doesn't see the issue as it takes over power management and likely has a wider safety threshold than the windows drivers.

The only solition in this case would be to take over the GPU power management, but a lot of code needs to be written to do that and a *LOT* of testing as we would be more likely to overheat a wider range of systems.

Version 0, edited 6 years ago by kallisti5 (next)

comment:7 by kallisti5, 5 years ago

something feels off about your laptop given the error you saw booting with vesa. could you run memtest on your laptop just to rule memory corruption out? You can download most Linux ISO's and they will have a memtest boot option. (Ubuntu for example has one)

comment:8 by Premislaus, 5 years ago

I ran Memtest86+ v4.20. No errors.

comment:9 by umccullough, 5 years ago

I'm not sure I've seen an overheating GPU cause a machine to shut down - usually you just get visual artifacts, and occasionally a hardware hang.

Thermal shutdown is usually a CPU feature, however, and depending on the CPU model, the thermal shutdown temp may vary - I've seen some set at 75C, while many of intels shutdown at 90C (I've had this happen, btw... when I disabled the thermal protection in the BIOS on a machine that didn't have the heatsink/fan properly seated on the CPU).

I would guess it could also happen if the northbridge chipset overheats - or maybe your laptop has some additional thermal protection built in that cuts power when it hits some specific case temp.

I don't suppose you have any way of tracking the various CPU/motherboard, etc. temps when you're approaching a shutdown event? Can you duplicate the behavior on say Linux with a heavy load applied to the machine?

comment:10 by kallisti5, 4 years ago

Resolution: invalid
Status: newclosed

This one seemed strange when reported. Since we don't touch the radeon hd power management, i'm going to attribute the issue to a quirk in the implementation. Linux had the same issue, but since I didn't see any linux quirks documented and GPU thermal management is left to the GPU, i'm going to close this one as not an issue with our driver but the hardware.

in reply to:  9 comment:11 by Premislaus, 3 years ago

Replying to kallisti5:

This one seemed strange when reported. Since we don't touch the radeon hd power management, i'm going to attribute the issue to a quirk in the implementation. Linux had the same issue, but since I didn't see any linux quirks documented and GPU thermal management is left to the GPU, i'm going to close this one as not an issue with our driver but the hardware.

This ticket is still valid. My laptop turns off from time to time.

On Linux you have DPM for Radeon and powersaving for CPU.

Replying to umccullough:

I don't suppose you have any way of tracking the various CPU/motherboard, etc. temps when you're approaching a shutdown event? Can you duplicate the behavior on say Linux with a heavy load applied to the machine?

On Linux is slightly hotter than on Windows, but I don't had this problems since they introduced DPM.

https://wiki.archlinux.org/index.php/ATI#Dynamic_power_management

comment:12 by pulkomandy, 3 years ago

Resolution: invalid
Status: closedreopened
Last edited 3 years ago by pulkomandy (previous) (diff)

comment:13 by Premislaus, 3 years ago

I think this ticket should be finally closed. I cleanup my laptop from dust and checked memory with propertiary memtest. For several days Haiku was good.

Haiku needs proper powermanagment. Under Haiku my laptop is a lot hotter than on Linux or Windows. During the idle, air from the fan is hot. This is why my laptop shutdowns from time to time - insane temps. But this is another ticket for Haiku.

comment:14 by diver, 3 years ago

Resolution: no change required
Status: reopenedclosed

comment:15 by tqh, 3 years ago

Have you checked if there are any firmware updates for your computer? It sounds like your firmware should not do that if they follow specs.

in reply to:  15 comment:16 by Premislaus, 3 years ago

Replying to tqh:

Have you checked if there are any firmware updates for your computer? It sounds like your firmware should not do that if they follow specs.

I have latest EFI, and there is no more updates from Samsung for this particular laptop.

I think my CPU is constnat on 2,7 GHz - not reclocking.

Years ago this commit helped me on my old desktop - http://cgit.haiku-os.org/haiku/commit/?id=cc586f1655b94c248be58ba1752b42bc39fbaf03

Note: See TracTickets for help on using tickets.