Ruby Buzz Forum - The strcpy function is dead! Long live memcpy!

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
The strcpy function is dead! Long live memcpy!

4 replies on 1 page. Most recent reply: Feb 5, 2006 9:41 AM by Daniel Berger

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 4 replies on 1 page

Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Daniel Berger is a Ruby Programmer who also dabbles in C and Perl

The strcpy function is dead! Long live memcpy!

Posted: Feb 2, 2006 6:22 PM

This post originated from an RSS feed registered with Ruby Buzz by Daniel Berger.
Original Post: The strcpy function is dead! Long live memcpy! Feed Title: Testing 1,2,3... Feed URL: http://djberg96.livejournal.com/data/rss Feed Description: A blog on Ruby and other stuff.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Daniel Berger Latest Posts From Testing 1,2,3...

I got a very interesting bug report from Brian Marick (yes, that one) for the win32-clipboard package. He reported that Unicode characters with null bytes in them (e.g. the Tibetan character 0x0F00) were causing the string to be terminated prematurely.

It turns out the problem was with strcpy() and strlen(). Here are the original two lines that caused the problems:

hMem = GlobalAlloc(GHND, strlen(data) + sizeof(TCHAR*));
strcpy((TCHAR *)GlobalLock(hMem), data);

Those two lines were changed to:

hMem = GlobalAlloc(GHND, RSTRING(rbData)->len + sizeof(TCHAR*));
memcpy((TCHAR *)GlobalLock(hMem), data, RSTRING(rbData)->len);

At first I thought that Microsoft's _tcslen() and _tcscpy() functions would Do The Right Thing™. But, no, they didn't work.

Given that Unicode characters can contain null bytes (a fact, btw, which I was unaware of until now), why would I ever use strcpy() again in lieu of its inability to handle Unicode properly?

Read: The strcpy function is dead! Long live memcpy!

Don McCaughey

Posts: 7
Nickname: donmcc
Registered: Feb, 2006

Re: The strcpy function is dead! Long live memcpy!

Posted: Feb 2, 2006 8:09 PM

That's not a bug in Windows, it's a bug in the code. strcpy() is only for single byte character sets. For multibyte character sets, use _mbscpy. strlen() works correctly for both single byte and multibyte character strings, but for Unicode, you need to use wcscpy() and wcslen() (wcs = wide character string). Wide characters are 16 bits on Windows (32 bits on OS X and other Unixes), so yeah, some characters may have null bytes in them.

_tcslen() and _tcscpy() are not functions, but macros aliases for the versions of these functions that match your program's default character set, as determined by whether or not _UNICODE and/or _MBCS are #defined. Likewise, TCHAR is an alias for either CHAR or WCHAR.

This should do what was intended:


WCHAR* buffer = (WCHAR*) GlobalAlloc(GPTR, (wcslen(data) + 1) * sizeof(WCHAR));
wcscpy(buffer, data);

Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Re: The strcpy function is dead! Long live memcpy!

Posted: Feb 2, 2006 10:16 PM

First....how the heck did my post end up in a forum?! Anyhoo...

> That's not a bug in Windows, it's a bug in the code.
> strcpy() is only for single byte character sets.

I understand that. See below.

> For multibyte character sets, use _mbscpy. strlen() works
> correctly for both single byte and multibyte character
> strings, but for Unicode, you need to use wcscpy() and
> wcslen() (wcs = wide character string). Wide characters
> are 16 bits on Windows (32 bits on OS X and other Unixes),
> so yeah, some characters may have null bytes in
> them.
>
> _tcslen() and _tcscpy() are not functions, but macros
> aliases for the versions of these functions that match
> your program's default character set, as determined
> by whether or not _UNICODE and/or _MBCS are #defined.
> Likewise, TCHAR is an alias for either CHAR or WCHAR.

I understand that they're macros. The UNICODE constant is defined (if you look at the source) and MBCS is not, which means wcslen is being used behind the scenes according to the MSDN docs. Maybe I should just default to defining MCBS for all my C extensions. Is there a downside to that?

> This should do what was intended:
>


> WCHAR* buffer = (WCHAR*) GlobalAlloc(GPTR, (wcslen(data) +
> 1) * sizeof(WCHAR));
> wcscpy(buffer, data);
>

Unless there's a drawback to the code I'm using now, I'll leave it alone. :)

Dan

Matt Gerrans

Posts: 1153
Nickname: matt
Registered: Feb, 2002

Re: The strcpy function is dead! Long live memcpy!

Posted: Feb 5, 2006 12:02 AM

Just FYI, "in lieu of" means "instead of" not "in view of."

Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Re: The strcpy function is dead! Long live memcpy!

Posted: Feb 5, 2006 9:41 AM

Thanks Matt, fixed. I keep thinking "lieu" stems from the Latin "light" instead of "place" (locus).

Flat View: This topic has 4 replies on 1 page

Previous Topic

Next Topic


	Web Artima.com