Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#9798 closed bug (fixed)

mount_nfs hangs, blocked in nfs add on

Reported by: jua Owned by: pdziepak
Priority: normal Milestone: R1
Component: Network & Internet/UDP Version: R1/Development
Keywords: nfs nfs_mount Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev45633

While trying to mount NFS shares (old NFS, not v4) using mount_nfs, something goes wrong and the command line tool simply hangs and is then unkillable. It can neither be killed with Ctrl+C in the terminal, nor via ProcessController. The only way to get rid of it is a reboot.

I've tracked the problem down to the NFS file system add-on, the following happens in there:

(1) fs_mount() in nfs_add_on.c calls nfs_mount(), which fails for some reason (why that fails is possibly material for another bug report, have not investigated that yet) (2) fs_mount() thus goes to its error handling and calls shutdown_postoffice() (3) shutdown_postoffice() sets the quit-flag for the postoffice-thread to true, closes the socket and then waits for the postoffice-thread to exit using wait_for_thread(). This wait_for_thread() never returns and causes the hanging. (4) ... meanwhile in the postoffice-thread: The postoffice-thread is currently in recvfrom() inside its main loop in postoffice_func(). Since nothing is received anymore, it waits there forever and doesn't see that its quit flag was set, so it never terminates. (5) => deadlock!

I'm not quite sure what the correct way to handle it would be -- maybe a simple workaround would be to set a read timeout on the socket so the postoffice thread would terminate at least at some point. Any ideas?

Change History (6)

comment:1 Changed 6 years ago by diver

Component: Network & InternetFile Systems/NFS
Owner: changed from nobody to mmu_man

comment:2 Changed 6 years ago by axeld

Component: File Systems/NFSNetwork & Internet/UDP
Owner: changed from mmu_man to zooey

While the behavior of the socket is undefined in this case, the policy Haiku follows is that functions waiting on a file descriptor will return when the file descriptor is closed.

If UDP (I assume?) does not follow that policy, it should be fixed. Thanks for the investigation!

comment:3 Changed 6 years ago by pdziepak

Owner: changed from zooey to pdziepak
Status: newassigned

I came across this issue last Summer when working on NFS4, unfortunately I completely forgot to commit the patch.

hrev45719 should fix this bug for UDP. I have no idea, though, whether the problem exists in the implementations of other transport layer protocols (i.e. TCP since we do not support SCTP). Anyway, our NFS2 client is UDP only so I believe this ticket can be closed.

comment:4 Changed 6 years ago by pdziepak

Resolution: fixed
Status: assignedclosed

comment:5 Changed 6 years ago by axeld

I've definitely developed a software that relied on that specific feature in TCP, so I'm pretty sure it at least did work at one point, and probably will still do so :-)

comment:6 Changed 6 years ago by jua

Great, thanks!

Note: See TracTickets for help on using tickets.