#3281 closed enhancement (fixed)
ffs() arch optimization
Reported by: | tqh | Owned by: | tqh |
---|---|---|---|
Priority: | low | Milestone: | R1/beta2 |
Component: | System/libroot.so | Version: | |
Keywords: | Cc: | fredrik.holmqvist@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
Our ffs can be optimized at least for x86. Providing a test-program based on FreeBsd's ffs. No real implementation as I don't know how you want to do arch optimizations and fallbacks.
On newer x86, it can even be done in two ops with cmov, but that would require runtime configuration which might be overkill. Other platforms tries to inline function as well.
Attachments (3)
Change History (18)
comment:1 by , 16 years ago
Priority: | normal → low |
---|---|
Type: | bug → enhancement |
by , 16 years ago
Attachment: | ffs_arch_x86_opt.patch added |
---|
comment:2 by , 16 years ago
Learning by doing :) Might have missed something in the patch, but it works fine here.
comment:3 by , 16 years ago
Cc: | added |
---|
comment:4 by , 16 years ago
This patch may be overkill though and the function just defined in the arch_string.S, I just wanted to learn to be able to go all the way to runtime cpu-optimized functions.
comment:5 by , 15 years ago
Owner: | changed from | to
---|---|
Version: | R1/pre-alpha1 |
I guess that's better on your plate now :-)
comment:6 by , 15 years ago
Also check out this one: http://osdir.com/ml/solaris.opensolaris.performance/2006-01/msg00000.html
Hi, Not sure if this can be of any interest, but I noticed the ffs function is implemented using a loop. A faster implementation exists using the "pcount" assembler instruction (where available and fast - this does not seem to be the case on all processors), or through another algorithm using only a few instructions. Check out: http://supertech.csail.mit.edu/papers/debruijn.pdf For a 32-bit word, the optimized function could be written this way: int ffs(int field) { static const int index[] = { 1, 2, 29, 3, 30, 15, 25, 4, 31, 23, 21, 16, 26, 18, 5, 9, 32, 28, 14, 24, 22, 20, 17, 8, 27, 13, 19, 7, 12, 6, 11, 10 }; unsigned int w = field; if (w == 0) return (0); w &= -w; w *= 125613361U; w >>= 27; return index[w]; } Regards, Olivier. This message posted from opensolaris.org
comment:9 by , 15 years ago
Sounds like a plan. Other OS'es seem to try to inline functions like these, it would be nice to try to reduce function overhead on all these small helper functions. Any ideas in this regard or experiments to perform?
by , 15 years ago
Attachment: | test_ffs.cpp added |
---|
Updated test with new example, and also with builtin version.
comment:10 by , 15 years ago
For gcc 4 builtin_ffs is always at least twice as fast. It should probably be ok on gcc2.95 as well. Should we just change the header to #define ffs(x) builtin_ffs(x) ?
Since we use gcc on other archs it should work there as well.
comment:11 by , 9 years ago
Milestone: | R1 → Unscheduled |
---|
comment:14 by , 5 years ago
Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone
comment:15 by , 5 years ago
Milestone: | R1 → R1/beta2 |
---|
x86 ffs optimization