The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
The strcpy function is dead! Long live memcpy!

4 replies on 1 page. Most recent reply: Feb 5, 2006 9:41 AM by Daniel Berger

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 4 replies on 1 page
Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Daniel Berger is a Ruby Programmer who also dabbles in C and Perl
The strcpy function is dead! Long live memcpy! Posted: Feb 2, 2006 6:22 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Daniel Berger.
Original Post: The strcpy function is dead! Long live memcpy!
Feed Title: Testing 1,2,3...
Feed URL: http://djberg96.livejournal.com/data/rss
Feed Description: A blog on Ruby and other stuff.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Daniel Berger
Latest Posts From Testing 1,2,3...

Advertisement
I got a very interesting bug report from Brian Marick (yes, that one) for the win32-clipboard package. He reported that Unicode characters with null bytes in them (e.g. the Tibetan character 0x0F00) were causing the string to be terminated prematurely.

It turns out the problem was with strcpy() and strlen(). Here are the original two lines that caused the problems:
hMem = GlobalAlloc(GHND, strlen(data) + sizeof(TCHAR*));
strcpy((TCHAR *)GlobalLock(hMem), data);

Those two lines were changed to:
hMem = GlobalAlloc(GHND, RSTRING(rbData)->len + sizeof(TCHAR*));
memcpy((TCHAR *)GlobalLock(hMem), data, RSTRING(rbData)->len);

At first I thought that Microsoft's _tcslen() and _tcscpy() functions would Do The Right Thing™. But, no, they didn't work.

Given that Unicode characters can contain null bytes (a fact, btw, which I was unaware of until now), why would I ever use strcpy() again in lieu of its inability to handle Unicode properly?

Read: The strcpy function is dead! Long live memcpy!


Don McCaughey

Posts: 7
Nickname: donmcc
Registered: Feb, 2006

Re: The strcpy function is dead! Long live memcpy! Posted: Feb 2, 2006 8:09 PM
Reply to this message Reply
That's not a bug in Windows, it's a bug in the code. strcpy() is only for single byte character sets. For multibyte character sets, use _mbscpy. strlen() works correctly for both single byte and multibyte character strings, but for Unicode, you need to use wcscpy() and wcslen() (wcs = wide character string). Wide characters are 16 bits on Windows (32 bits on OS X and other Unixes), so yeah, some characters may have null bytes in them.

_tcslen() and _tcscpy() are not functions, but macros aliases for the versions of these functions that match your program's default character set, as determined by whether or not _UNICODE and/or _MBCS are #defined. Likewise, TCHAR is an alias for either CHAR or WCHAR.

This should do what was intended:

WCHAR* buffer = (WCHAR*) GlobalAlloc(GPTR, (wcslen(data) + 1) * sizeof(WCHAR));
wcscpy(buffer, data);

Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Re: The strcpy function is dead! Long live memcpy! Posted: Feb 2, 2006 10:16 PM
Reply to this message Reply
First....how the heck did my post end up in a forum?! Anyhoo...

> That's not a bug in Windows, it's a bug in the code.
> strcpy() is only for single byte character sets.

I understand that. See below.

> For multibyte character sets, use _mbscpy. strlen() works
> correctly for both single byte and multibyte character
> strings, but for Unicode, you need to use wcscpy() and
> wcslen() (wcs = wide character string). Wide characters
> are 16 bits on Windows (32 bits on OS X and other Unixes),
> so yeah, some characters may have null bytes in
> them.
>
> _tcslen() and _tcscpy() are not functions, but macros
> aliases for the versions of these functions that match
> your program's default character set, as determined
> by whether or not _UNICODE and/or _MBCS are #defined.
> Likewise, TCHAR is an alias for either CHAR or WCHAR.

I understand that they're macros. The UNICODE constant is defined (if you look at the source) and MBCS is not, which means wcslen is being used behind the scenes according to the MSDN docs. Maybe I should just default to defining MCBS for all my C extensions. Is there a downside to that?

> This should do what was intended:
>
> WCHAR* buffer = (WCHAR*) GlobalAlloc(GPTR, (wcslen(data) +
> 1) * sizeof(WCHAR));
> wcscpy(buffer, data);
>


Unless there's a drawback to the code I'm using now, I'll leave it alone. :)

Dan

Matt Gerrans

Posts: 1153
Nickname: matt
Registered: Feb, 2002

Re: The strcpy function is dead! Long live memcpy! Posted: Feb 5, 2006 12:02 AM
Reply to this message Reply
Just FYI, "in lieu of" means "instead of" not "in view of."

Daniel Berger

Posts: 1383
Nickname: djberg96
Registered: Sep, 2004

Re: The strcpy function is dead! Long live memcpy! Posted: Feb 5, 2006 9:41 AM
Reply to this message Reply
Thanks Matt, fixed. I keep thinking "lieu" stems from the Latin "light" instead of "place" (locus).

Flat View: This topic has 4 replies on 1 page
Topic: Geocoding with Ruby on Rails Previous Topic   Next Topic Topic: Elements Helper - szybkie tworzenie interaktywnych aplikacji w Ruby on Rails

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use