Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#8481 closed bug (fixed)

strncpy doesn't pad the destination string with NULs

Reported by: hamish Owned by: tqh
Priority: normal Milestone: R1
Component: System/libroot.so Version: R1/Development
Keywords: strncpy Cc:
Blocked By: Blocking:
Platform: All

Description

Attachments (2)

0001-Pad-the-destination-string-with-zeroes-in-strncpy.patch (763 bytes ) - added by hamish 13 years ago.
0001-strncpy-pad-the-destination-with-NULs.patch (2.0 KB ) - added by hamish 13 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 by hamish, 13 years ago

patch: 01

comment:2 by hamish, 13 years ago

On x86, gcc2+4 O0/O2/O3, zeroing everything beforehand is faster than zeroing the remainder afterwards in most cases

in reply to:  2 comment:3 by korli, 13 years ago

Replying to hamish:

On x86, gcc2+4 O0/O2/O3, zeroing everything beforehand is faster than zeroing the remainder afterwards in most cases

Hmm, surely correct when the source string is empty, but incorrect when the source string has the specified length or more.

I think you should better memset afterward with dest and count, which should already have the correct values (unless I'm mistaken).

comment:4 by hamish, 13 years ago

dest and count won't have the right values afterwards. The loop semantics need to be changed a little so that count doesn't wrap around. Something like:

while (count && (*dest = *src++) != '\0') {
  dest++; count--;
}

So memsetting before with the original loop turns out a little faster in most cases (other than where the buffer gets almost entirely filled). The difference is small though, so I'm OK to memset after.

comment:5 by korli, 13 years ago

What about this?

while (count--) {
	if ((*dest++ = *src++) == '\0') {
		memset(dest, '\0', count);
		break;
	}
}

comment:6 by tqh, 13 years ago

You could take a peek at strlen or strnlen which checks four bytes at a time for null, so you don't have to do it bytewise.

in reply to:  5 comment:7 by hamish, 13 years ago

Replying to korli:

What about this?

Performs about the same as the loop in comment 4.

Checking four bytes at a time beats the others by a long shot at -O0, but only seems a little faster, if at all, at -O3.

comment:8 by tqh, 13 years ago

Interesting probably hard to do better, maybe align writes on four bytes to ease up on the write combining.

Btw I've been looking at -D__NO_STRING_INLINES as a build parameter, which from what I can tell, tells gcc to not trust glib with inlining but do it in gcc. This might be the wrong assumption though as it is very poorly documented.

in reply to:  8 comment:9 by hamish, 13 years ago

Replying to tqh:

Interesting probably hard to do better, maybe align writes on four bytes to ease up on the write combining.

Yep, I'm already aligning the destination buffer.

Btw I've been looking at -D__NO_STRING_INLINES as a build parameter, which from what I can tell, tells gcc to not trust glib with inlining but do it in gcc. This might be the wrong assumption though as it is very poorly documented.

As far as I can tell that only affects whether glibc's string.h defines a bunch of inline assembly versions of the string functions. Since we don't use glibc's string.h I don't think it has any effect.

comment:10 by tqh, 13 years ago

The code can be simplified by changing

34	    if (((addr_t)src & 3) != 0) { 

to

34	    if (((addr_t)src & 3) == 0) { 

removing line 35 to 45 and moving

        // Deal with the remainder. 
 	60	        while (count-- != 0) { 
 	61	            if ((*dest++ = *src++) == '\0') { 
 	62	                memset(dest, '\0', count); 
 	63	                return tmp; 
 	64	            } 
 	65	        } 

Also I'm not sure whether we want to do source alignment (on x86 at least).

Version 0, edited 13 years ago by tqh (next)

comment:11 by hamish, 13 years ago

To my understanding, unaligned access is not necessarily OK on all architectures. Given that this is a generic strncpy it's probably a good idea to avoid it?

comment:12 by tqh, 13 years ago

Yes, mine too. I was fishing for comments on if we should have special case for x86. Probably not worth it.

comment:13 by tqh, 13 years ago

Owner: changed from axeld to tqh
Status: newin-progress

Looks good to me.

comment:14 by tqh, 13 years ago

Btw we need source aligned so that the source word never steps over a page boundry if it reads a byte or three past the \0.

in reply to:  12 comment:15 by pdziepak, 13 years ago

Replying to tqh:

Yes, mine too. I was fishing for comments on if we should have special case for x86. Probably not worth it.

Functions like strncpy, strncmp, strcpy, memcpy, memcmp, etc are usually implemented in assembly (thus there are different versions of them for each architecture). Apparently it is still worth to do that. I wrote a simple test and compiled it on gcc 4.6.3. memcpy1 takes about 300ms to complete, while memcpy2 needs only 100ms. Obviously, memcpy is the easiest of these functions to make run much faster when implemented in assembly, but then again it may be worth to consider such option.

comment:16 by tqh, 13 years ago

Resolution: fixed
Status: in-progressclosed

Yes, I've been playing a bit with that as well in my spare time. Patch applied in hrev44069, nice work.

comment:17 by tqh, 13 years ago

Unfortunatly it's not presented that well with the rename I did from .c to .cpp. Should have pushed in two steps. Sorry.

Note: See TracTickets for help on using tickets.