#8481 closed bug (fixed)
strncpy doesn't pad the destination string with NULs
Reported by: | hamish | Owned by: | tqh |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/libroot.so | Version: | R1/Development |
Keywords: | strncpy | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
See the relevant standard: http://pubs.opengroup.org/onlinepubs/9699919799/functions/strncpy.html
Attachments (2)
Change History (19)
by , 13 years ago
Attachment: | 0001-Pad-the-destination-string-with-zeroes-in-strncpy.patch added |
---|
comment:1 by , 13 years ago
patch: | 0 → 1 |
---|
follow-up: 3 comment:2 by , 13 years ago
comment:3 by , 13 years ago
Replying to hamish:
On x86, gcc2+4 O0/O2/O3, zeroing everything beforehand is faster than zeroing the remainder afterwards in most cases
Hmm, surely correct when the source string is empty, but incorrect when the source string has the specified length or more.
I think you should better memset afterward with dest and count, which should already have the correct values (unless I'm mistaken).
comment:4 by , 13 years ago
dest
and count
won't have the right values afterwards. The loop semantics need to be changed a little so that count doesn't wrap around. Something like:
while (count && (*dest = *src++) != '\0') { dest++; count--; }
So memsetting before with the original loop turns out a little faster in most cases (other than where the buffer gets almost entirely filled). The difference is small though, so I'm OK to memset after.
follow-up: 7 comment:5 by , 13 years ago
What about this?
while (count--) { if ((*dest++ = *src++) == '\0') { memset(dest, '\0', count); break; } }
comment:6 by , 13 years ago
You could take a peek at strlen or strnlen which checks four bytes at a time for null, so you don't have to do it bytewise.
comment:7 by , 13 years ago
Replying to korli:
What about this?
Performs about the same as the loop in comment 4.
Checking four bytes at a time beats the others by a long shot at -O0
, but only seems a little faster, if at all, at -O3
.
follow-up: 9 comment:8 by , 13 years ago
Interesting probably hard to do better, maybe align writes on four bytes to ease up on the write combining.
Btw I've been looking at -D__NO_STRING_INLINES
as a build parameter, which from what I can tell, tells gcc to not trust glib with inlining but do it in gcc. This might be the wrong assumption though as it is very poorly documented.
comment:9 by , 13 years ago
Replying to tqh:
Interesting probably hard to do better, maybe align writes on four bytes to ease up on the write combining.
Yep, I'm already aligning the destination buffer.
Btw I've been looking at
-D__NO_STRING_INLINES
as a build parameter, which from what I can tell, tells gcc to not trust glib with inlining but do it in gcc. This might be the wrong assumption though as it is very poorly documented.
As far as I can tell that only affects whether glibc's string.h defines a bunch of inline assembly versions of the string functions. Since we don't use glibc's string.h I don't think it has any effect.
comment:10 by , 13 years ago
The code can be simplified by changing
34 if (((addr_t)src & 3) != 0) {
to
34 if (((addr_t)src & 3) == 0) {
removing line 35 to 45 and moving
// Deal with the remainder. 60 while (count-- != 0) { 61 if ((*dest++ = *src++) == '\0') { 62 memset(dest, '\0', count); 63 return tmp; 64 } 65 }
after the if. Also I'm not sure whether we want to do source alignment (on x86 at least).
comment:11 by , 13 years ago
To my understanding, unaligned access is not necessarily OK on all architectures. Given that this is a generic strncpy it's probably a good idea to avoid it?
follow-up: 15 comment:12 by , 13 years ago
Yes, mine too. I was fishing for comments on if we should have special case for x86. Probably not worth it.
by , 13 years ago
Attachment: | 0001-strncpy-pad-the-destination-with-NULs.patch added |
---|
comment:14 by , 13 years ago
Btw we need source aligned so that the source word never steps over a page boundry if it reads a byte or three past the \0.
comment:15 by , 13 years ago
Replying to tqh:
Yes, mine too. I was fishing for comments on if we should have special case for x86. Probably not worth it.
Functions like strncpy
, strncmp
, strcpy
, memcpy
, memcmp
, etc are usually implemented in assembly (thus there are different versions of them for each architecture). Apparently it is still worth to do that. I wrote a simple test and compiled it on gcc 4.6.3. memcpy1
takes about 300ms to complete, while memcpy2
needs only 100ms. Obviously, memcpy
is the easiest of these functions to make run much faster when implemented in assembly, but then again it may be worth to consider such option.
comment:16 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | in-progress → closed |
Yes, I've been playing a bit with that as well in my spare time. Patch applied in hrev44069, nice work.
comment:17 by , 13 years ago
Unfortunatly it's not presented that well with the rename I did from .c to .cpp. Should have pushed in two steps. Sorry.
On x86, gcc2+4 O0/O2/O3, zeroing everything beforehand is faster than zeroing the remainder afterwards in most cases