From 3747751369148941413
X-Google-Thread: f78e5,d2c294613fb6ef04,start
X-Google-Attributes: gidf78e5,public
X-Google-Language: ENGLISH,ASCII
Path: g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local01.nntp.dca.giganews.com!nntp.sun.com!news.sun.com.POSTED!not-for-mail
NNTP-Posting-Date: Mon, 08 May 2006 11:55:02 -0500
From: =?ISO-8859-1?Q?Martin_Vejn=E1r?= <avakar@volny.cz>
Subject: Defect report: handling of extended source characters in string literals
Sender: fjh@cs.mu.OZ.AU
Message-id: <e3krdi$fhd$1@ns.felk.cvut.cz>
MIME-version: 1.0
X-MIME-Autoconverted: from 8bit to quoted-printable by ns.felk.cvut.cz id
 k47D6xAB015920
X-MIME-Autoconverted: from quoted-printable to 8bit by mulga.cs.mu.OZ.AU id
 k47D7iqx008899
Content-type: text/plain; format=flowed; charset=ISO-8859-1
Content-transfer-encoding: 8BIT
X-Original-NNTP-posting-date: Sun, 7 May 2006 13:06:58 +0000 (UTC)
Delivered-to: std-c++@mailman.ucar.edu
Delivered-to: std-c++@ucar.edu
X-PMX-Version: 5.1.2.240295
X-Original-To: std-c++@mailman.ucar.edu
X-Virus-Scanned: amavisd-new at ucar.edu
X-Virus-Scanned: amavisd-new at cs.mu.OZ.AU
X-Virus-Scanned: amavisd-new at cs.mu.OZ.AU
Newsgroups: comp.std.c++
X-NNTP-posting-host: rb3h201.chello.upc.cz
User-Agent: Thunderbird 1.5 (Windows/20051201)
Original-recipient: rfc822;stephen.clamage@sun.com
Organization: Czech Technical University
Approved: stephen.clamage@sun.com (comp.std.c++)
Originator: clamage@cafe1
Cache-Post-Path: news1nwk!unknown@cafe1.sfbay.sun.com
X-Cache: nntpcache 3.0.1 (see http://www.nntpcache.org/)
Date: Mon, 08 May 2006 11:55:02 -0500
Lines: 54
NNTP-Posting-Host: 192.18.42.249
X-Trace: sv3-YS99geaA9T67u9Na1nchzNCk1LJbptNmlXgKOTXu4YjQb22nQV6CIgeCRYxFi04EqeMDP9UoJlTDHiz!wh/+ae+cX/Xy1FBKq61SFsCoG+pwwaRWF57ROLCteDZX9kg3IS15b5FUqoQ3gQ==
X-Complaints-To: abuse@sun.com
X-DMCA-Complaints-To: abuse@sun.com
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.32
Xref: g2news2.google.com comp.std.c++:1820


[ Note: Forwarded to C++ Committee.  -sdc ]

Consider the following code:

     #include <iostream>
     int main()
     {
         std::cout << "\\u00e1" << std::endl;

         // Following line contains Unicode character
         // "latin small letter a with acute" (U+00E1)
         std::cout << "\�" << std::endl;
     }

The first statement in main outputs characters "u00e1" preceded by a 
backslash.

The Standard says:
[2.1 - Phases of translation, paragraph 1.1]
     Physical source file characters are mapped, in an 
implementation-defined manner, to the basic source character set 
(introducing new-line characters for end-of-line indicators) if 
necessary. Trigraph sequences (2.3) are replaced by corresponding 
single-character internal representations. Any source file character not 
in the basic source character set (2.2) is replaced by the 
universal-character-name that designates that character. (An 
implementation may use any internal encoding, so long as an actual 
extended character encountered in the source file, and the same extended 
character expressed in the source file as a universal-character-name 
(i.e. using the \uXXXX notation), are handled equivalently.)

During this translation phase, the foreign character in the second 
statement is replaced by a universal-character-name. Such statement 
resembles the first and outputs one of the following:

     \u00e1
     \u00E1
     \U000000e1
     \U000000E1

C99 (at least in the draft I have available) avoids this problem by not 
introducing any universal character names and not restricting the 
(basic) source character set to 96 characters as C++ does.

-- 
Martin Vejn�r


[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]



