Opened 10 years ago

Closed 10 years ago

#4115 closed bug (fixed)

Failed to relocate error when attempting to boot PPC kernel.

Reported by: kallisti5 Owned by: mmu_man
Priority: normal Milestone: R1
Component: System/Kernel Version:
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: PowerPC

Description

When booting the Haiku PPC Kernel a:

boot_arch_elf_relocate_rela(): Failed to relocate entry index

error is presented for a split second before returning to the PPC bootloader.

The Failed to relocate error comes from kernel/arch/ppc/arch_elf.cpp

I am working on getting the *exact* message now by inserting a temporary while(1) after the error condition.

Attachments (4)

ppcarch_elf_forcerelocate.patch (528 bytes) - added by kallisti5 10 years ago.
temporary solution to show the larger problem.
synccleanup.patch (2.6 KB) - added by kallisti5 10 years ago.
isyncmissingcpuinit.diff (3.1 KB) - added by kallisti5 10 years ago.
ppc-isync-relocation-final.diff (3.7 KB) - added by kallisti5 10 years ago.
Fix for this issue.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 10 years ago by kallisti5

boot_arch_elf_relocate_rela(): Failed to relocate entry index 1339, rel type 1, offset 0x8017a568, sym 0xe30, addend 0x0

comment:2 Changed 10 years ago by kallisti5

err... think I found where the error is coming from...

src/tests/system/boot/loader/platform_misc.cpp:

extern "C" status_t boot_arch_elf_relocate_rel(struct preloaded_image *image,

struct Elf32_Rel *rel, int rel_len)

{

return B_ERROR;

}

extern "C" status_t boot_arch_elf_relocate_rela(struct preloaded_image *image,

struct Elf32_Rela *rel, int rel_len)

{

return B_ERROR;

}

This looks incomplete... right?

comment:3 in reply to:  2 Changed 10 years ago by kallisti5

Please ignore the above comment. Nothing wrong there.

Turning on CHATTY in src/system/kernel/arch/ppc/arch_elf.cpp and adding some additional CHATTY text.

comment:4 Changed 10 years ago by kallisti5

After some trial and error I figured out the following:

On my PPC PowerBook Lombard: kernel/arch/ppc/arch_elf.cpp is not getting used. kernel/arch/m68k/arch_elf.cpp is getting used. This is where the error is coming from.

I am working on enabling chatty in the m68k code.. is this right? Should kernel/arch/m68k be used? If the m68k directory is correct.. what is the PPC directory in the source tree for?

--Alex

comment:5 Changed 10 years ago by anevilyak

Owner: changed from axeld to mmu_man

That is most definitely incorrect, and implies an error in the Jamfiles somewhere. M68K is as its name implies, code specific to the Motorola 680x0 CPUs (specifically Atari Falcon), and as such is fundamentally incompatible with the PPC port.

comment:6 Changed 10 years ago by kallisti5

It seems there is a bug in the Jamfile and changing a source file and rebuilding without cleaning will not include the changes in the new haiku-image. We are in fact running kernel/arch/ppc/ vs m68k. please ignore my previous comment. (man I wish there was an edit option in trac for users)

I am attaching a patch which lets the boot continue further and provides some context around the issue. (the patch ppcarch_elf_forcerelocate.patch is NOT a fix, but simply shows the issue better)

I will post the results here shortly.

Changed 10 years ago by kallisti5

temporary solution to show the larger problem.

comment:7 Changed 10 years ago by kallisti5

ok, here is what happens. When the failure to relocate entry happens, vlErr is set to -2147478780 each time.

Here are the last of the "Failed to relocate entry index X type X" messages...
Entry  Type
1346   21
1347    1
1348   21
1349    1
1350   21
7002    1
7003   21
7004    1
7005   21
7006    1
7007   21
7008    1
7009   21
7010    1
7011   21
7012    1
7013   21
9087   21

Kernel entry at: 0x8007b7c8
Kernel stack top: 0x80004000
<kernel startup locks here>

comment:8 Changed 10 years ago by mmu_man

Status: newassigned

The error is: 0x80001304: Symbol not found

comment:9 Changed 10 years ago by kallisti5

The boot process is getting hung up right after the "Kernel stack top" message. Looking through the sources this is where we jump to the kernel which has now been copied into memory.

mmu_man, Where are you getting the Symbol not found message from?

comment:10 Changed 10 years ago by kallisti5

Quick note... Tried the same image on a G4 quicksilver PPC and got the exact same error and result.

Changed 10 years ago by kallisti5

Attachment: synccleanup.patch added

comment:11 Changed 10 years ago by kallisti5

ok, used readelf was able to extract a listing of the missing relocations as per the offsets in the errors. Magic fingers!

8017b3b0  000e3201 R_PPC_ADDR32      00000000   _Z18_user_atomic_set64 + 0
8018d208  000e3215 R_PPC_JMP_SLOT    00000000   _Z18_user_atomic_set64 + 0
8017b3b8  0009ef01 R_PPC_ADDR32      00000000   _Z27_user_atomic_test_ + 0
8018c340  0009ef15 R_PPC_JMP_SLOT    00000000   _Z27_user_atomic_test_ + 0
8017b3c0  000ae201 R_PPC_ADDR32      00000000   _Z18_user_atomic_add64 + 0
8018c6f0  000ae215 R_PPC_JMP_SLOT    00000000   _Z18_user_atomic_add64 + 0
8017b3c8  0000d301 R_PPC_ADDR32      00000000   _Z18_user_atomic_and64 + 0
8018a428  0000d315 R_PPC_JMP_SLOT    00000000   _Z18_user_atomic_and64 + 0
8017b3d0  0008cc01 R_PPC_ADDR32      00000000   _Z17_user_atomic_or64P + 0
8018bf08  0008cc15 R_PPC_JMP_SLOT    00000000   _Z17_user_atomic_or64P + 0
8017b3d8  00063501 R_PPC_ADDR32      00000000   _Z18_user_atomic_get64 + 0
8018b640  00063515 R_PPC_JMP_SLOT    00000000   _Z18_user_atomic_get64 + 0
8018d1c0  000e0d15 R_PPC_JMP_SLOT    00000000   arch_cpu_init_percpu + 0

comment:12 Changed 10 years ago by kallisti5

arch_cpu_init_percpu is defined in the following files

src/system/kernel/arch/m68k/arch_cpu.cpp:arch_cpu_init_percpu(kernel_args *args, int curr_cpu)
src/system/kernel/arch/x86/arch_cpu.cpp:arch_cpu_init_percpu(kernel_args *args, int cpu)

oh look! function not defined for ppc! For now a function which returns 0 should be enough until multi cpu support is added for PPC

comment:13 Changed 10 years ago by kallisti5

adding patch to add arch_cpu_init_percpu to ppc code.

This patch also cleans up the my previous isync patch that fixes the PPC bootloader not starting.

attachment: isyncmissingcpuinit.diff

Changed 10 years ago by kallisti5

Attachment: isyncmissingcpuinit.diff added

comment:14 Changed 10 years ago by kallisti5

just a quick note, after the patch above we now get one less failed to relocate entry error (the arch_cpu_init_percpu one) and the kernel drops back to the openfirmware prompt.

I think the atomic_set,test,etc errors are because the Kernel is having trouble accessing the functions in libroot/os/arch/ppc/atomic.S

comment:15 Changed 10 years ago by kallisti5

alex@linux:~/develop/haiku/generated/objects/haiku/ppc/release/system/kernel$ readelf -s kernel_ppc  | grep "UND"
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
   211: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_and64PVx
  1589: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_get64PVx
  2252: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z17_user_atomic_or64PVxx
  2543: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z27_user_atomic_test_and
  2786: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_add64PVx
  3634: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_set64PVx
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
  2042: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_and64PVx
  3420: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_get64PVx
  4083: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z17_user_atomic_or64PVxx
  4374: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z27_user_atomic_test_and
  4617: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_add64PVx
  5465: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _Z18_user_atomic_set64PVx

I think these are coming from src/system/kernel/arch/ppc/arch_atomic.c arch_atomic defines int64's on 32-bit platforms. Looking into commenting them out for now.

comment:16 Changed 10 years ago by kallisti5

Ingo pointed out that this was due to a missing extern "C" statement in headers/private/kernel/user_atomic.h

This fixes the undefined user_atomic errors!

The *final* attached contains lots of good PPC fixes and gets rid of all of the relocation errors. We still get choked when jumping to the kernel but at least we get the kernel in memory properly now.

After checking in the attached ppc-isync-relocation-final.diff patch this TRAC should be resolved.

Changed 10 years ago by kallisti5

Fix for this issue.

comment:17 Changed 10 years ago by mmu_man

Do we still need this #ifdef on _BOOT_MODE now ?

comment:18 Changed 10 years ago by kallisti5

I vote to leave it in.

What happens is on the first missing relocation we return an error instantly throwing us back to the bootloader and causing the end user to not see the error message.

With the _BOOT_MODE def in there we will try and push on if it is the kernel booting showing all the relocation errors and then freeze up when trying to jump into the kernel.

Missing relocation errors still may occur in the future and this is a handy way to fish them out.

comment:19 Changed 10 years ago by kallisti5

also a quick note on those sync commands, they always have to be called prior to and sometimes after those context changes. Since you always have to do it we might as well avoid bugs and do them in the assembly vs the c. If we call isync/sync multiple times on accident somewhere no harm will come of it.

comment:20 Changed 10 years ago by mmu_man

Well it should panic when getting an error from relocate_foo(). Skipping errors and hoping for the kernel to crash (which might not be obviously right at boot, depending on which sym is missing), is not really clean.

comment:21 Changed 10 years ago by kallisti5

Good point. A panic would be ideal.. or maybe set some kind of $panic = true thing while relocating then after everything is relocated trigger the panic if something wasn't right?

For now feel free to remove the _BOOT_MODE bit as its not required and not the most important part of this fix.

Thanks!

-- Alex

comment:22 Changed 10 years ago by mmu_man

committed as hrev32067, minus the _BOOT_MODE part.

comment:23 Changed 10 years ago by kallisti5

this is working great... the system locks up when jumping to the kernel, but all the relocations are now proper.

safe to close.

comment:24 Changed 10 years ago by pulkomandy

Resolution: fixed
Status: in-progressclosed
Note: See TracTickets for help on using tickets.