Opened 7 years ago

Closed 7 years ago

#8508 closed bug (fixed)

Various classes no longer appear to be resolving members with recent gcc upgrades

Reported by: anevilyak Owned by: anevilyak
Priority: normal Milestone: R1
Component: Applications/Debugger Version: R1/Development
Keywords: Cc: bonefish
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Tracker's Model class and several others no longer show data members in the Debugger when compiled with gcc 4.6. This is most likely due to the latter making use of some newer DWARF features that we don't yet grok. Needs to be investigated.

Attachments (1)

debuginfo.zip (4.8 MB ) - added by anevilyak 7 years ago.
Debug information

Change History (14)

by anevilyak, 7 years ago

Attachment: debuginfo.zip added

Debug information

comment:1 by anevilyak, 7 years ago

Cc: bonefish added

I'm somewhat puzzled as to what's going on here:

The definition for the Model class is a DIE with a tag type of structure (0x189f1d20). The latter contains children which define the values for several enum and union types contained within it, but not the data members. It also defines a second compound type entry which does contain the data members (0x189f8148). That one has the Model's type entry tagged as a specification. This seems backwards to me, since normally if we can't find what we need on the DIE itself, we look at the specification or abstract origin to try and resolve what we need, but in this case that's not possible because they're chained the opposite way around. I do notice that dwarf type factory gets called for both types, but I'm wondering if, due to the order in which things happen, the one that lacks the data members makes it into the type cache first and prevents the other from being added as a consequence. Any ideas, Ingo? This is readily reproducible with Tracker by putting a breakpoint at e.g. src/kits/tracker/Model.h:391.

Corresponding DIE parsing and local variable creation trace is attached.

comment:2 by bonefish, 7 years ago

IIRC I've never read anything about how abstract origins or specifications should be handled, at least not in sufficient detail and for all cases. I believe I implemented it according to what I thought made sense from the debug data I encountered. It may well be that actually a different handling is required. Like that all DIEs that represent the same type need to be merged. It might be worth to reread the latest DWARF specs under that aspect and consult other debugger implementations (gdb).

I also wouldn't rule out that gcc 4.6 produces incorrect debugging info.

in reply to:  2 comment:3 by anevilyak, 7 years ago

Replying to bonefish:

It might be worth to reread the latest DWARF specs under that aspect and consult other debugger implementations (gdb).

At least what I've looked at in DWARF 4's specifications isn't entirely unambiguous in that respect, though it sounds reasonably similar to your interpretation and doesn't really suggest the scenario I'm currently seeing. Finding out what gdb does might take some time since I'm not familiar with its codebase at all, and at least testing with our old in-tree build is inconclusive since it doesn't seem to think the corresponding lines exist at all in the debug info, so I'm unable to draw a comparison there yet.

I also wouldn't rule out that gcc 4.6 produces incorrect debugging info.

Also entirely possible :/.

comment:4 by anevilyak, 7 years ago

Status: newin-progress

comment:5 by anevilyak, 7 years ago

Problem found: apparently they've switched back to using .eh_frame for all the relevant information, and .debug_frame is either not present, or when it is, useless. Fixed in hrev44209.

comment:6 by anevilyak, 7 years ago

Resolution: fixed
Status: in-progressclosed

One note: in the cases where .debug_frame is generated at all, I do notice that it contains at least a CIE and maybe 3-4 CFI sections, though I wasn't yet able to ascertain exactly what functions they resolve to (it's observable with a debug build of libbe here at least). Do you think it'd be better to simply evaluate both sections for a potential match if they're present?

Also, I notice that currently we always do a brute force scan through all the data to try and find a match when reconstructing a call frame. Would it be worth filing an enhancement to perform a startup scan of the call frame information in order to build a search tree by address range so we can quickly locate the exact section we want? For larger libraries it often takes a noticeable amount of time to find the right CFI via the current methodology.

in reply to:  6 ; comment:7 by bonefish, 7 years ago

Replying to anevilyak:

Do you think it'd be better to simply evaluate both sections for a potential match if they're present?

Unless there's a good reason to do that -- like both sections containing different parts of the information -- I wouldn't do that. I wonder anyway why the .debug_frame is no longer correctly written. After all it is documented to contain the information. Have you read anything about this (e.g. in the changelog, some announcement, or discussion)? Or might this just be a bug?

Also, I notice that currently we always do a brute force scan through all the data to try and find a match when reconstructing a call frame. Would it be worth filing an enhancement to perform a startup scan of the call frame information in order to build a search tree by address range so we can quickly locate the exact section we want? For larger libraries it often takes a noticeable amount of time to find the right CFI via the current methodology.

Or do that lazily/cache information. The startup already takes quite some time and doing more preprocessing will make that even worse. Although that's probably quite a bit of work, it might be necessary to preload even less information (load the CIEs on demand with caching). I may recall that incorrectly, but didn't debugging WebPositive (with full debug info for WebKit) even hit the address space limit?

in reply to:  7 comment:8 by anevilyak, 7 years ago

Replying to bonefish:

Unless there's a good reason to do that -- like both sections containing different parts of the information -- I wouldn't do that. I wonder anyway why the .debug_frame is no longer correctly written. After all it is documented to contain the information. Have you read anything about this (e.g. in the changelog, some announcement, or discussion)? Or might this just be a bug?

I'd guess bug, but I haven't had a chance to go through the mailing lists and announcements to find anything yet. For now I simply changed it as I did so we'd adapt to whatever the current situation happens to be.

Or do that lazily/cache information. The startup already takes quite some time and doing more preprocessing will make that even worse. Although that's probably quite a bit of work, it might be necessary to preload even less information (load the CIEs on demand with caching). I may recall that incorrectly, but didn't debugging WebPositive (with full debug info for WebKit) even hit the address space limit?

We do indeed have that problem, though gdb does as well iirc. If memory serves, we actually hit the issue up front while parsing the DIEs though, would need to try again to see. In any case, will file an enhancement ticket for lazy evaluation/loading.

in reply to:  7 comment:9 by anevilyak, 7 years ago

Replying to bonefish:

Unless there's a good reason to do that -- like both sections containing different parts of the information -- I wouldn't do that. I wonder anyway why the .debug_frame is no longer correctly written. After all it is documented to contain the information. Have you read anything about this (e.g. in the changelog, some announcement, or discussion)? Or might this just be a bug?

The only related discussion I've managed to find is this: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40521

comment:10 by anevilyak, 7 years ago

Resolution: fixed
Status: closedreopened

The heuristic I added is apparently not quite correct either. I sometimes wind up with libraries where .debug_frame is several times larger than .eh_frame but the appropriate FDE/CIE are contained in the eh_frame section instead. I've checked out the git clone of gdb's latest source to see how they handle it, and their methodology is simply to try both sections in order, preferring .debug_frame first, and trying .eh_frame if it's not found in the former. Should I simply adapt us to do likewise, since it seems this is the only sane way to cope with the changing whims of the gcc developers?

comment:11 by bonefish, 7 years ago

Sure, I don't see any other solution. I wonder if there's a logic behind all of this.

in reply to:  11 comment:12 by anevilyak, 7 years ago

Replying to bonefish:

Sure, I don't see any other solution. I wonder if there's a logic behind all of this.

I wish I knew as well. Will work on that in the next few days then.

comment:13 by anevilyak, 7 years ago

Resolution: fixed
Status: reopenedclosed

Implemented in hrev44316.

Note: See TracTickets for help on using tickets.