Opened 17 months ago

Closed 7 months ago

Last modified 6 months ago

#9140 closed bug (fixed)

KDL when configure bison-2.6.4

Reported by: diger Owned by: axeld
Priority: critical Milestone: R1
Component: System/Kernel Version: R1/alpha4
Keywords: Cc:
Blocked By: Blocking: #10111
Has a Patch: no Platform: All

Description

hrev44732 gcc4, gcc4.6.3

When trying to configure the bison I get KDL

Attachments (4)

diger.png (97.1 KB) - added by diger 17 months ago.
bison-kdl.txt (26.7 KB) - added by siarzhuk 14 months ago.
cutout of crash log
kdl-groff.png (660.6 KB) - added by tidux 12 months ago.
bison-kdl-on-nightly-hrev46038.png (62.4 KB) - added by siarzhuk 7 months ago.
The KDL reproduced on newest available hrev46038 nightly.

Download all attachments as: .zip

Change History (28)

Changed 17 months ago by diger

comment:1 Changed 17 months ago by diger

KDL in vfs code. See screenshot for more details

comment:2 Changed 17 months ago by diver

  • Component changed from - General to System/Kernel
  • Owner changed from nobody to axeld

comment:3 Changed 17 months ago by diver

Might be a dupe of #1988.

comment:4 follow-up: Changed 14 months ago by siarzhuk

This test reproduces the behaviour:

#include <fcntl.h>

int
main ()
{
	int result = 0;
	static char const sym[] = "conftest.sym";
	if (symlink ("/dev/null", sym) != 0)
		result |= 2;
	else
	{
		int fd = 0;
		fd = open (sym, O_WRONLY | O_NOFOLLOW | O_CREAT, 0);
		if (fd >= 0)
		{
			close (fd);
			result |= 4;
		}
	}
	return result;
}


comment:5 Changed 14 months ago by diger

This bug is reproduced when configuring gettext-runtime 0.18.2 & gettext-tools 0.18.2

comment:6 in reply to: ↑ 4 ; follow-up: Changed 14 months ago by anevilyak

Replying to siarzhuk:

This test reproduces the behaviour:

Hi Siarzhuk,

I don't suppose there's anything special about your and/or diger's system configuration? Thus far neither the above test nor any of the configure scripts mentioned are reproducing the panic over here. As a first hunch I tried switching to a cyrillic locale but that made no difference. hrev45350, gcc4, 8GB of RAM and 8 CPU cores over here for reference.

comment:7 in reply to: ↑ 6 Changed 14 months ago by siarzhuk

Replying to anevilyak:

I don't suppose there's anything special about your and/or diger's system configuration? Thus far neither the above test nor any of the configure scripts mentioned are reproducing the panic over here. As a first hunch I tried switching to a cyrillic locale but that made no difference. hrev45350, gcc4, 8GB of RAM and 8 CPU cores over here for reference.

Strange, it is reproducible both in Virtual Box and with real HW on my home system. May be our /Sources partitions that were created years ago affect on this. I'll check more widely than.

comment:8 Changed 14 months ago by siarzhuk

I have checked bison.c test on following systems:

hrev42604 GCC4/Hybrid (in VirtualBox)
hrevalpha4-44594
hrev45141 x86_64
hrev43037 GCC4/Hybrid
hrev45223 GCC4/Hybrid
hrev44869 GCC4/Hybrid

test on all systems fails with the same error. :-|

comment:9 follow-up: Changed 14 months ago by anevilyak

Always with the same set of partitions? Or does e.g. a completely clean virtualbox image with no other partitions mounted exhibit the same issue?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 14 months ago by siarzhuk

Replying to anevilyak:

Always with the same set of partitions? Or does e.g. a completely clean virtualbox image with no other partitions mounted exhibit the same issue?

By the way the VirtualBox case above (#1) is such "completely clean". 2,3,4 - different partitions of the same PC. 5,6 - different partitions of the other PC. BTW diger.png acquired on Virtual Box at my PC on the work. I have tested all cases by copying bison.c to home directory issuing "gcc bison.c" and running resulting a.out file.

comment:11 in reply to: ↑ 10 ; follow-ups: Changed 14 months ago by anevilyak

Replying to siarzhuk:

By the way the VirtualBox case above (#1) is such "completely clean". 2,3,4 - different partitions of the same PC. 5,6 - different partitions of the other PC. BTW diger.png acquired on Virtual Box at my PC on the work. I have tested all cases by copying bison.c to home directory issuing "gcc bison.c" and running resulting a.out file.

Tried exactly those steps, still no luck. Could you by any chance try enabling VFS tracing (http://cgit.haiku-os.org/haiku/tree/src/system/kernel/fs/vfs.cpp#n66 ), and then paste the resulting serial output from vbox here?

Changed 14 months ago by siarzhuk

cutout of crash log

comment:12 in reply to: ↑ 11 Changed 14 months ago by siarzhuk

Replying to anevilyak:

Could you by any chance try enabling VFS tracing (http://cgit.haiku-os.org/haiku/tree/src/system/kernel/fs/vfs.cpp#n66 ), and then paste the resulting serial output from vbox here?

It was a bit tricky: First I have to disable syslog because it never ends tracing into system log about it's writing into system log, I suspect. Than I have to unsuccessfully wait about 3 hours until it finish loading app_server and other whistles. Than I just hardcoded "launch /bin/consoled" into boot script and get the possibility to run a.out and get KDL. :-) I hope it helps.

comment:13 Changed 12 months ago by tidux

I was able to reproduce this on both hrev45480 and hrev45526 when attempting to configure groff 1.22.2, on a physical machine and a virtual machine. Here's a screenshot of the VM crashing.

Changed 12 months ago by tidux

comment:14 Changed 11 months ago by diger

hrev45703 gcc4.7.3

reproduced when configuring gettext-runtime 0.18.2 & gettext-tools 0.18.2 & bison & groff

comment:15 in reply to: ↑ 11 ; follow-up: Changed 10 months ago by siarzhuk

Replying to anevilyak:

Tried exactly those steps, still no luck. Could you by any chance try enabling VFS tracing (http://cgit.haiku-os.org/haiku/tree/src/system/kernel/fs/vfs.cpp#n66 ), and then paste the resulting serial output from vbox here?

Any news here? Diger reports me that more and more software packages are affected by this problem. He is maintainer of the Haiku port of PKGSRC system and can observe the growing of this problem in the real-time. ;-)

That looks like newest (>=2.69) autoconf versions issue and may become serious problem as soon as we try to recompile optional packages preparing to the next Haiku release, IMO.

comment:16 in reply to: ↑ 15 Changed 10 months ago by anevilyak

Replying to siarzhuk:

Any news here? Diger reports me that more and more software packages are affected by this problem. He is maintainer of the Haiku port of PKGSRC system and can observe the growing of this problem in the real-time. ;-)

Speaking for myself only, it's still completely impossible to reproduce on my own hardware, and my knowledge of the VFS is otherwise too limited to go off the log output alone. I'd hoped one of the other kernel developers who had more exposure/experience with that code would comment. There's a possibility it could in some way be related to some of the other races involving get_vnode() i.e. #5262 or #9839 though.

comment:17 Changed 7 months ago by diger

  • Priority changed from normal to critical

comment:18 Changed 7 months ago by diger

hrev46032 gcc4.7.3

reproduced when configuring gtexinfo & libidn

BTW, from my 10 months' experience - about 2-3 such KDLs is enough to damage the FS unrecoverably.

Changed 7 months ago by siarzhuk

The KDL reproduced on newest available hrev46038 nightly.

comment:19 Changed 7 months ago by siarzhuk

The KDL reproduced on newest available hrev46038 nightly.

Reproduced in VirtualBox 4.2.18.hrev88780 using bison.c on the latest available GCC2 nightly hrev46038

comment:20 follow-up: Changed 7 months ago by siarzhuk

Hm... Just quick look: The create_vnode's parameter openMode is 524801 that correspond to O_CREAT | O_WRONLY | O_NOFOLLOW (0x80201). So the traverse variable in the code below should be set to false.

The only call of VNodePutter::Put is "protected" by if (... && traverse) so it should not be issued in case traverse is false. But it did.

Was the O_NOFOLLOW defined into the value different that 0x00080000 during compiling vfs.cpp? Or I have missed something? ;-)

static int
create_vnode(struct vnode* directory, const char* name, int openMode,
	int perms, bool kernel)
{
	bool traverse = ((openMode & (O_NOTRAVERSE | O_NOFOLLOW)) == 0);

[...]
			// If the node is a symlink, we have to follow it, unless
			// O_NOTRAVERSE is set.
			if (S_ISLNK(vnode->Type()) && traverse) {
				putter.Put();

comment:21 in reply to: ↑ 20 ; follow-up: Changed 7 months ago by bonefish

Replying to siarzhuk:

The only call of VNodePutter::Put is "protected" by if (... && traverse) so it should not be issued in case traverse is false. But it did.

VNodePutter is a RAII style class. Put() is also called in the destructor.

There's an erroneous put_vnode() (probably overlooked when changing the code to use VNodePutter) in an error case. So I suppose there already exists a symlink where the file shall be created.

comment:22 in reply to: ↑ 21 Changed 7 months ago by siarzhuk

Replying to bonefish:

VNodePutter is a RAII style class. Put() is also called in the destructor.

Ah... That is I have missed. :)

There's an erroneous put_vnode() (probably overlooked when changing the code to use VNodePutter) in an error case. So I suppose there already exists a symlink where the file shall be created.

Yes, this is the case of this configure test: an attempt to create the file inplace of existing symlink. Thank you for the pointing out!

comment:23 Changed 7 months ago by siarzhuk

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in hrev46039. Thanks again!

comment:24 Changed 6 months ago by anevilyak

  • Blocking 10111 added

(In #10111) Duplicate indeed.

Note: See TracTickets for help on using tickets.