Opened 7 years ago

Last modified 11 months ago

#10169 assigned bug

Check kernel functions' stack usage

Reported by: korli Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: x86-64


I reproduced on real hardware and QEmu 1.4.0

Here is the QEmu command line (090c:1000 is the vendor_id:device_id for the usb key)

qemu-system-x86_64 -m 512 -usb -serial stdio  --enable-kvm -usbdevice host:090c:1000 -boot menu=on

then press F12, then select the USB device.

As you may understand this bug is a bit blocking to test x86_64 on real hardware :)

Attachments (1)

log_usb_boot_hrev46281_x86_64.txt (37.7 KB ) - added by korli 7 years ago.
panic serial log

Download all attachments as: .zip

Change History (19)

by korli, 7 years ago

panic serial log

comment:1 by korli, 7 years ago

It seems the kernel stack is the same size on 32-bit and 64-bit, which might explain the behavior difference between architectures. The following patch fixes the problem by increasing the kernel stack size by another page.

Is it OK as is or would it be better to differentiate with B_HAIKU_64_BIT?

diff --git a/headers/private/kernel/kernel.h b/headers/private/kernel/kernel.h
index a5c01e3..d068a48 100644
--- a/headers/private/kernel/kernel.h
+++ b/headers/private/kernel/kernel.h
@@ -47,7 +47,7 @@
        // At least, you then know that the stack overflows in this case :)
 /** Size of the kernel stack */
-#define KERNEL_STACK_SIZE              (B_PAGE_SIZE * 3)       // 12 kB
+#define KERNEL_STACK_SIZE              (B_PAGE_SIZE * 4)       // 16 kB

comment:2 by axeld, 7 years ago

IIRC it's only 3 pages because of the guard pages -- at least IIRC it used to be 8 kB. Anyway, it should be a 64-bit only change. The kernel stacks can take a lot of (unpageable) memory when you have many threads around.

comment:3 by korli, 7 years ago

Fine. But isn't the 2832 bytes stack usage in vm_page_write_modified_page_range() a bit high anyway?

ffffffff811a8010 (+2832) ffffffff8010f3b4   <kernel_x86_64> vm_page_write_modified_page_range() + 0x4cb

in reply to:  3 comment:4 by bonefish, 7 years ago

Replying to korli:

Fine. But isn't the 2832 bytes stack usage in vm_page_write_modified_page_range() a bit high anyway?

Yes, that's way too much. The main issue is

PageWriteWrapper* wrappers[maxPages];

With maxPages being up to 256, this means 2048 bytes on a 64 bit architecture. It should be allocated in the same way wrapperPool is allocated (i.e. on the heap with small stack array fallback).

Another issue is the PageWriteTransfer object allocated on the stack. The class contains the following attribute:

generic_io_vec      fVecs[32]; // TODO: make dynamic/configurable

That's another 512 bytes.

One can see several heavy stack users in the stack trace. The file cache's read_into_cache() being the most problematic, since there's no limit for nested mounting (e.g. an image file in an image file in an image file ...), with certain calls going through as many file cache layers.

I suppose many functions could use a review wrt stack usage. Even 200 bytes is quite a bit. And all code that isn't performance critical and is allowed to fail could allocate variables on the heap.

comment:5 by korli, 7 years ago

Applied a working patch in hrev46314.

comment:6 by korli, 7 years ago

Here is a patch for the wrappers stack usage issue. I have no idea better than stackWrappers2 for the variable name ATM.

Testing with a 128MB VM seemed OK. I encountered "PageWriteWrapper: Failed to write page" messages in the syslog on low memory condition with or without this patch.

diff --git a/src/system/kernel/vm/vm_page.cpp b/src/system/kernel/vm/vm_page.cpp
index ee20257..34d7370 100644
--- a/src/system/kernel/vm/vm_page.cpp
+++ b/src/system/kernel/vm/vm_page.cpp
@@ -3081,7 +3081,14 @@ vm_page_write_modified_page_range(struct VMCache* cache, uint32 firstPage,
        int32 nextWrapper = 0;
-       PageWriteWrapper* wrappers[maxPages];
+       PageWriteWrapper* stackWrappers2[1];
+       PageWriteWrapper** wrappers
+               = new(malloc_flags(allocationFlags)) PageWriteWrapper*[maxPages];
+       if (wrappers == NULL) {
+               // don't fail, just limit our capabilities
+               wrappers = stackWrappers2;
+               maxPages = 1;
+       }
        int32 usedWrappers = 0;
        PageWriteTransfer transfer;
@@ -3153,6 +3160,8 @@ vm_page_write_modified_page_range(struct VMCache* cache, uint32 firstPage,
        if (wrapperPool != stackWrappers)
                delete[] wrapperPool;
+       if (wrappers != stackWrappers2)
+               delete[] wrappers;
        return B_OK;

comment:7 by bonefish, 7 years ago

Regarding the variable naming, I would rename the old stackWrappers to stackWrapperPool (to match the naming of the actual variable) and go with stackWrappers for the new variable.

I'd also merge the two allocation checks into a single if, like this:

PageWriteWrapper* wrapperPool = ... allocate ...
PageWriteWrapper** wrappers = ... allocate ...
if (wrapperPool == NULL || wrappers == NULL) {
  delete both
  use stack

comment:8 by korli, 7 years ago

Thanks for the feedback. Applied with your ideas in hrev46354.

read_into_cache() has already a TODO about the stack usage. It would be interesting if there is any static analysis tool to provide stack usage information, but I doubt any exists.

in reply to:  8 ; comment:9 by umccullough, 7 years ago

Replying to korli:

It would be interesting if there is any static analysis tool to provide stack usage information, but I doubt any exists.

Not sure what you're looking for specifically. Coverity does have a "stack depth" checker but I have little clue how to configure it. I found this snippet in the documentation:

A.  __coverity_stack_depth__(<max_memory>)

Indicates to the STACK_USE checker that the function and its callees should not use more memory (in bytes) than specified by the constant integer <max_memory>. This feature is useful for situations where threads are created with different stack sizes. The primitive should be used in the thread entry-point function.

It seems perhaps you can create a "model" to declare this primitive and it will check the stack below any given function that is modeled.

in reply to:  9 ; comment:10 by korli, 7 years ago

Replying to umccullough:

It seems perhaps you can create a "model" to declare this primitive and it will check the stack below any given function that is modeled.

I suppose Coverity should fit our use case then. You can try with 8KB or less for kernel threads. We probably have to insert coverity_stack_depth in our code. Kernel entry points would be kernel threads and syscall entry points.

in reply to:  10 comment:11 by umccullough, 7 years ago

Replying to korli:

We probably have to insert coverity_stack_depth in our code. Kernel entry points would be kernel threads and syscall entry points.

I think they also provide a "model file" submission for analysis purposes, to direct Coverity how to treat certain functions so you don't have to do anything with the code.

I admit, I have only a vague idea of how it is supposed to work, but I have made you an admin if you'd like to investigate and mess with it. You can login and then visit this page:

Further documentation is available here: and here:

comment:12 by luroh, 6 years ago

Blocking: 7665 added

comment:13 by luroh, 6 years ago

Milestone: R1Unscheduled

Moving non-x86 related tickets out of R1 milestone.

comment:14 by axeld, 3 years ago

Owner: changed from axeld to nobody
Status: newassigned

comment:15 by waddlesplash, 2 years ago

Keywords: boot-failure added

comment:16 by waddlesplash, 2 years ago

Blocking: 7665 removed

comment:17 by waddlesplash, 13 months ago

Keywords: boot-failure removed
Summary: Double Fault Exception on usb boot x86_64Check kernel functions' stack usage

Coverity's support for this seems too limited. Instead we should probably use the GCC flag -fstack-usage, which dumps the maximum stack usage of all functions in a file, and look over the kernel as a whole for this.

comment:18 by waddlesplash, 11 months ago

Below are all the functions which use more than 256 bytes of stack in the kernel (the utility I used to generate it is here.)

Likely the node_monitor should be fixed. Otherwise most of these functions using 1-2KB of stack is probably OK; they are mostly immediate syscalls that do not call much else.

user_debugger.cpp2336status_t debug_nub_thread(void*)
elf.cpp1856status_t UserSymbolLookup::LookupSymbolAddress(addr_t, addr_t*, const char**, const char**, bool*)
elf.cpp1648status_t elf_load_user_image(const char*, BKernel::Team*, uint32, addr_t*)
user_debugger.cpp1600status_t thread_hit_debug_event_internal(debug_debugger_message, const void*, int32, bool, bool&)
ddm_userland_interface.cpp1408status_t _user_find_disk_system(const char*, user_disk_system_info*)
safemode_settings.cpp1344status_t _user_get_safemode_option(const char*, char*, size_t*)
node_monitor.cpp1328status_t NodeMonitorService::NotifyEntryMoved(dev_t, ino_t, const char*, ino_t, const char*, ino_t)
VMAnonymousCache.cpp1312void swap_init_post_modules()
node_monitor.cpp1296status_t NodeMonitorService::NotifyAttributeChanged(dev_t, ino_t, ino_t, const char*, int32)
node_monitor.cpp1296status_t NodeMonitorService::NotifyEntryCreatedOrRemoved(int32, dev_t, ino_t, const char*, ino_t)
team.cpp1280status_t team_create_thread_start_internal(void*)
node_monitor.cpp1280status_t NodeMonitorService::NotifyStatChanged(dev_t, ino_t, ino_t, uint32)
elf.cpp1200void register_elf_image(elf_image_info*)
team.cpp1184status_t _user_exec(const char*, const char* const*, size_t, int32, int32, mode_t)
node_monitor.cpp1184status_t notify_query_entry_event(int32, port_id, int32, dev_t, ino_t, const char*, ino_t)
user_debugger.cpp1168void user_debug_image_deleted(const image_info*)
user_debugger.cpp1168void user_debug_image_created(const image_info*)
image.cpp1168status_t _user_get_next_image_info(team_id, int32*, image_info*, size_t)
image.cpp1152status_t _user_get_image_info(image_id, image_info*, size_t)
image.cpp1152image_id _user_register_image(extended_image_info*, size_t)
image.cpp1136void _user_image_relocated(image_id)
signal.cpp1120void handle_signals(BKernel::Thread*)
vfs_net_boot.cpp1088virtual status_t NetBootMethod::Init()
vfs_boot.cpp1072void vfs_mount_boot_file_system(kernel_args*)
hostname.cpp1072int sethostname(const char*, size_t)
hostname.cpp1072int gethostname(char*, size_t)
find_directory.cpp1008status_t __find_directory(directory_which, dev_t, bool, char*, int32)
file_cache.cpp976status_t write_to_cache(file_cache_ref*, void*, off_t, int32, addr_t, size_t, bool, vm_page_reservation*, size_t)
file_cache.cpp944status_t read_into_cache(file_cache_ref*, void*, off_t, int32, addr_t, size_t, bool, vm_page_reservation*, size_t)
signal.cpp880int64 _user_restore_signal_frame(signal_frame_data*)
vfs_net_boot.cpp768void NetStackInitializer::_ScanDevices(BPrivate::DiskDevice::KPath&)
vm_page.cpp768status_t vm_page_write_modified_page_range(VMCache*, uint32, uint32)
user_debugger.cpp752void user_debug_watchpoint_hit()
user_debugger.cpp752void user_debug_single_stepped()
user_debugger.cpp752void user_debug_breakpoint_hit(bool)
block_cache.cpp704status_t block_notifier_and_writer(void*)
block_cache.cpp688void block_cache_discard(void*, off_t, size_t)
block_cache.cpp672status_t cache_sync_transaction(void*, int32)
block_cache.cpp656status_t block_cache_sync(void*)
irq_routing_table.cpp656status_t choose_link_device_configurations(acpi_module_info*, IRQRoutingTable&, interrupt_available_check_function)
block_cache.cpp656status_t block_cache_sync_etc(void*, off_t, size_t)
KPartition.cpp640void BPrivate::DiskDevice::KPartition::SetVolumeID(dev_t)
KPartition.cpp640void BPrivate::DiskDevice::KPartition::SetDiskSystem(BPrivate::DiskDevice::KDiskSystem*, float)
KDiskDeviceManager.cpp640void BPrivate::DiskDevice::KDiskDeviceManager::_NotifyDeviceEvent(BPrivate::DiskDevice::KDiskDevice*, int32, uint32)
block_cache.cpp608static status_t {anonymous}::BlockWriter::WriteBlock({anonymous}::block_cache*, {anonymous}::cached_block*)
block_cache.cpp608status_t write_blocks_in_previous_transaction({anonymous}::block_cache*, {anonymous}::cache_transaction*)
device_manager.cpp592status_t control_device_manager(const char*, uint32, void*, size_t)
KPartition.cpp576virtual status_t BPrivate::DiskDevice::KPartition::PublishDevice()
vfs.cpp576status_t common_rename(int, char*, int, char*, bool)
AVLTreeBase.cpp576int AVLTreeBase::_Insert(AVLTreeNode*)
thread.cpp560void thread_exit()
IOCache.cpp560status_t IOCache::_TransferPages(size_t, size_t, bool, bool)
thread.cpp560status_t _user_unblock_threads(thread_id*, uint32, status_t)
debug.cpp544status_t _user_kernel_debugger(const char*)
arch_cpu.cpp544void dump_feature_string(int, cpu_ent*)
vfs_boot.cpp544uint32 compute_check_sum(BPrivate::DiskDevice::KDiskDevice*, off_t)
debug.cpp544void _user_debug_output(const char*)
vfs.cpp528status_t _user_read_fs_info(dev_t, fs_info*)
vfs.cpp528status_t _user_write_fs_info(dev_t, const fs_info*, int)
system_info.cpp496status_t _user_get_system_info(system_info*)
vfs.cpp496dev_t fs_mount(char*, const char*, const char*, uint32, const char*, bool)
thread.cpp496thread_id thread_create_thread(const BKernel::ThreadCreationAttributes&, bool)
vm.cpp480area_id vm_create_anonymous_area(team_id, const char*, addr_t, uint32, uint32, uint32, addr_t, const virtual_address_restrictions*, const physical_address_restrictions*, bool, void**)
arch_cpu.cpp480status_t arch_cpu_init_percpu(kernel_args*, int)
team.cpp448thread_id load_image_internal(char**&, size_t, int32, int32, int32, team_id, uint32, port_id, uint32)
vfs.cpp448status_t _user_stat_attr(int, const char*, attr_info*)
arch_debug.cpp448ssize_t arch_debug_gdb_get_registers(char*, size_t)
vm.cpp448area_id _vm_map_file(team_id, const char*, void**, uint32, size_t, uint32, uint32, bool, int, off_t, bool)
irq_routing_table.cpp432status_t read_irq_routing_table_recursive(acpi_module_info*, pci_module_info*, acpi_handle, uint8, IRQRoutingTable&, IRQRoutingTable&, bool, interrupt_available_check_function)
vfs.cpp416status_t _user_read_index_stat(dev_t, const char*, stat*)
irq_routing_table.cpp416status_t read_irq_routing_table(acpi_module_info*, IRQRoutingTable&, interrupt_available_check_function)
vfs.cpp416status_t vfs_create_special_node(const char*, fs_vnode*, mode_t, uint32, bool, fs_vnode*, vnode**)
ddm_userland_interface.cpp400status_t _user_get_next_disk_system_info(int32*, user_disk_system_info*)
vfs.cpp400int create_vnode(vnode*, const char*, int, int, bool)
vfs.cpp400status_t dir_vnode_to_path(vnode*, char*, size_t, bool)
vfs.cpp400dev_t _user_mount(const char*, const char*, const char*, uint32, const char*, size_t)
socket.cpp384status_t _user_get_next_socket_stat(int, uint32*, net_stat*)
arch_debug.cpp384int stack_trace(int, char**)
ddm_userland_interface.cpp384status_t _user_get_disk_system_info(disk_system_id, user_disk_system_info*)
vm.cpp384status_t vm_soft_fault(VMAddressSpace*, addr_t, bool, bool, bool, vm_page**)
team.cpp384thread_id fork_team()
node_monitor.cpp384status_t NodeMonitorService::NotifyMount(dev_t, dev_t, ino_t)
vfs.cpp384status_t vfs_write_pages(vnode*, void*, off_t, const generic_io_vec*, size_t, uint32, generic_size_t*)
vfs.cpp384status_t _user_create_fifo(int, const char*, mode_t)
vfs.cpp384status_t vfs_read_pages(vnode*, void*, off_t, const generic_io_vec*, size_t, uint32, generic_size_t*)
system_info.cpp368virtual void SystemNotificationService::EventOccurred(NotificationService&, const BPrivate::KMessage*)
vm.cpp368area_id vm_create_null_area(team_id, const char*, void**, uint32, addr_t, uint32)
driver_settings.cpp368void* load_driver_settings(const char*)
legacy_drivers.cpp368status_t legacy_driver_probe(const char*)
vfs.cpp368int _user_open_attr(int, const char*, const char*, uint32, int)
vfs.cpp368int _user_open_parent_dir(int, char*, size_t)
vfs.cpp368status_t normalize_path(char*, size_t, bool, bool)
vfs.cpp368status_t _user_entry_ref_to_path(dev_t, ino_t, const char*, char*, size_t)
syscalls.cpp352status_t _user_generic_syscall(const char*, uint32, void*, size_t)
vfs.cpp352status_t entry_ref_to_vnode(dev_t, ino_t, const char*, bool, bool, vnode**)
vfs_request_io.cpp352status_t do_synchronous_iterative_vnode_io(vnode*, void*, io_request*, iterative_io_get_vecs, iterative_io_finished, void*)
vfs.cpp352status_t vfs_init(kernel_args*)
vfs.cpp352status_t dir_remove(int, char*, bool)
device_manager.cpp352void publish_directories(const char*)
VMUtils.cpp352status_t get_mount_point(BPrivate::DiskDevice::KPartition*, BPrivate::DiskDevice::KPath*)
vfs.cpp336status_t get_vnode_name(vnode*, vnode*, char*, size_t, bool)
KPartition.cpp336virtual void BPrivate::DiskDevice::KPartition::Dump(bool, int32)
vfs.cpp336status_t common_create_link(int, char*, int, char*, bool, bool)
devfs.cpp336status_t devfs_open(fs_volume*, fs_vnode*, int, void**)
thread.cpp336void ThreadNotificationService::Notify(uint32, team_id, thread_id, BKernel::Thread*)
node_monitor.cpp336status_t NodeMonitorService::NotifyUnmount(dev_t)
devfs.cpp320status_t devfs_ioctl(fs_volume*, fs_vnode*, void*, uint32, void*, size_t)
vfs.cpp320status_t common_create_symlink(int, char*, const char*, int, bool)
vfs.cpp320ssize_t _user_write_attr(int, const char*, uint32, off_t, const void*, size_t)
vfs.cpp320int file_create(int, char*, int, int, bool)
debug.cpp320void debug_init_post_modules(kernel_args*)
device_manager.cpp320status_t device_node::_GetNextDriver(void*, driver_module_info*&)
vm.cpp320status_t vm_resize_area(area_id, size_t, bool)
device_manager.cpp320void init_node_tree()
cpu.cpp304void load_cpufreq_module()
vm.cpp304status_t _user_get_next_area_info(team_id, ssize_t*, area_info*)
file_cache.cpp304status_t file_cache_control(const char*, uint32, void*, size_t)
vfs.cpp304status_t _user_create_index(dev_t, const char*, uint32, uint32)
thread.cpp304status_t arch_thread_enter_userspace(BKernel::Thread*, addr_t, void*, void*)
IOSchedulerSimple.cpp304status_t IOSchedulerSimple::_Scheduler()
KPartition.cpp304virtual status_t BPrivate::DiskDevice::KPartition::RepublishDevice()
vm.cpp304status_t _user_set_memory_protection(void*, size_t, uint32)
signal.cpp304status_t _user_sigwait(const sigset_t*, siginfo_t*, uint32, bigtime_t)
vm.cpp304area_id _user_create_area(const char*, void**, uint32, size_t, uint32, uint32)
cpu.cpp304void load_cpuidle_module()
vfs.cpp304ssize_t _user_read_attr(int, const char*, off_t, void*, size_t)
user_debugger.cpp304void nub_thread_cleanup(BKernel::Thread*)
UserTimer.cpp304int32 _user_create_timer(clockid_t, thread_id, uint32, const sigevent*, const thread_creation_attributes*)
vfs.cpp304status_t dir_create(int, char*, int, bool)
vfs.cpp304int _user_open_entry_ref(dev_t, ino_t, const char*, int, int)
vfs.cpp304status_t common_unlink(int, char*, bool)
vfs.cpp304status_t _user_create_dir_entry_ref(dev_t, ino_t, const char*, int)
arch_cpu.cpp304status_t arch_cpu_init_post_modules(kernel_args*)
user_debugger.cpp288port_id install_team_debugger(team_id, port_id, thread_id, bool, bool)
vfs.cpp288status_t _user_remove_attr(int, const char*)
vm.cpp288status_t vm_set_area_protection(team_id, area_id, uint32, bool)
AVLTreeBase.cpp288int AVLTreeBase::_RemoveRightMostChild(AVLTreeNode**, AVLTreeNode**)
KPartition.cpp288virtual status_t BPrivate::DiskDevice::KPartition::GetPath(BPrivate::DiskDevice::KPath*)
vfs.cpp288status_t _user_remove_index(dev_t, const char*)
vfs.cpp288status_t common_file_io_vec_pages(vnode*, void*, const file_io_vec*, size_t, const iovec*, size_t, uint32*, size_t*, size_t*, bool)
image.cpp288image_id register_image(BKernel::Team*, extended_image_info*, size_t, bool)
socket.cpp288ssize_t _user_recvmsg(int, msghdr*, int)
team.cpp288pid_t _user_wait_for_child(thread_id, uint32, siginfo_t*, team_usage_info*)
KDiskDevice.cpp288void BPrivate::DiskDevice::KDiskDevice::_InitPartitionData()
vfs.cpp288int _user_open_dir_entry_ref(dev_t, ino_t, const char*)
module.cpp272status_t read_next_module_name(void*, char*, size_t*)
image.cpp272status_t unregister_image(BKernel::Team*, image_id)
module.cpp272status_t module_init_post_boot_device(bool)
vm.cpp272area_id vm_clone_area(team_id, const char*, void**, uint32, uint32, uint32, area_id, bool)
vm.cpp272area_id vm_copy_area(team_id, const char*, void**, uint32, uint32, area_id)
Note: See TracTickets for help on using tickets.