220 37535 <f5400abd-c89d-4a89-a7a9-f41a5e906c69@isocpp.org> article
Path: news.gmane.org!.POSTED!not-for-mail
From: martinho.fernandes@native-instruments.de
Newsgroups: gmane.comp.lang.c++.isocpp.proposals
Subject: Re: Re: Unicode support by extending std::locale. Can
 we make it by 2020?
Date: Wed, 28 Mar 2018 08:28:25 -0700 (PDT)
Lines: 84
Approved: news@gmane.org
Message-ID: <f5400abd-c89d-4a89-a7a9-f41a5e906c69@isocpp.org>
References: <45303792-68f2-4545-8ce4-4a3e1ec35b1b@isocpp.org> <8174836d-21fd-4030-aee9-bcb43d83d0fb@isocpp.org>
 <CAORbL+Mw=VCaPuOvo-4NcekDSPgsvHXA6v2QD0wmFUfC8OgOsw@mail.gmail.com>
Reply-To: std-proposals@isocpp.org
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_1233_2140794702.1522250905350"
X-Trace: blaine.gmane.org 1522250785 22906 195.159.176.226 (28 Mar 2018 15:26:25 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 28 Mar 2018 15:26:25 +0000 (UTC)
To: ISO C++ Standard - Future Proposals <std-proposals@isocpp.org>
Original-X-From: std-proposals+bncBCAY74GSVQNBBGXJ53KQKGQEYSUGMVI@isocpp.org Wed Mar 28 17:26:21 2018
Return-path: <std-proposals+bncBCAY74GSVQNBBGXJ53KQKGQEYSUGMVI@isocpp.org>
Envelope-to: gclcip-std-proposals@m.gmane.org
Original-Received: from mail-vk0-f71.google.com ([209.85.213.71])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <std-proposals+bncBCAY74GSVQNBBGXJ53KQKGQEYSUGMVI@isocpp.org>)
	id 1f1Cxw-0005oh-0X
	for gclcip-std-proposals@m.gmane.org; Wed, 28 Mar 2018 17:26:20 +0200
Original-Received: by mail-vk0-f71.google.com with SMTP id b144sf1851922vke.10
        for <gclcip-std-proposals@m.gmane.org>; Wed, 28 Mar 2018 08:28:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=isocpp-org.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:message-id:in-reply-to:references:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :list-post:list-help:list-archive:list-subscribe:list-unsubscribe;
        bh=VfgeEGGe9W40PNvJhOQaH4qFbyFB4vL8zLOGU0G4SBI=;
        b=R8fn6ANZrnc/xfRfylldVc72rpdI3Ajgpn1tkBaU4w15JSiW6FobpNdtF6Mkapy6oM
         VAWL4HC8aOAZRgA30Yf2Iv0AFbIlvc1hMJ6dURqKi/nMbTy7VSyeBCikzdN3F4WjsGNd
         V84sAkykjHNO/LMbaz8vj0/SgR3gktdaVQXuZWl3ANC6+COAruUWMVaC+DY8RxYwQ7u1
         qSmnk62geBcRi0lsfUjh69cOpCkvm9qVNbVPUokrYc/akhSK9Z0yNCy7XXNjjCLmwNQF
         dP0qtLJnHRptmkvdXI/tW/tbnA+xwjEJsaaotTuESSy5RT+yeLqhUoLfEptRQWtv2IQ+
         r7CA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:message-id:in-reply-to:references
         :subject:mime-version:x-original-sender:reply-to:precedence
         :mailing-list:list-id:x-spam-checked-in-group:list-post:list-help
         :list-archive:list-subscribe:list-unsubscribe;
        bh=VfgeEGGe9W40PNvJhOQaH4qFbyFB4vL8zLOGU0G4SBI=;
        b=loKYTFA/FqTBom+nskHqk5mjQ6XEJoL2B69Ev1tAzaJwYtR7/YTWlH6Gwq7gOIstHg
         pPNvYF5tFfm0+coY9ZJBqJVrZMEj//90+PbnEqXnoUVdltWZbxLPXthJOahIWMdnUMPx
         Q08+TqZYckdBbxUhKMpvNcKL7CcGeyz77A6zYeiJl0Y0QBRuCihyYPVOSCwHLNEoAwFC
         dfC90grQQ6F346BAQ5nm5dpWv0EFLGoMolWbPJ1E3y075LBGTG+LCqk6KlNQDnramm4f
         dvLfV8jMjVpMaqLeC8mai5L0ULoS2rfnXUgM5hS1pwyvLNaTrS540URfczDPimeAeAAB
         0CdQ==
X-Gm-Message-State: AElRT7E2qT2CPuGFvkenIo/X1F9nypp1ZdMEH0S4a65IOztWZ5MmSWub
	VcpP0REBnlaeSdcPe6xjGBaqrg==
X-Google-Smtp-Source: AIpwx4+3QrxEUqIRYk0hkJQ+K3aHijQZb28g3R8PR2EVyM0BCBbscDJzEuXUxnGbyeL7WgZk1/zyAA==
X-Received: by 10.31.180.2 with SMTP id d2mr2383898vkf.43.1522250907358;
        Wed, 28 Mar 2018 08:28:27 -0700 (PDT)
X-BeenThere: std-proposals@isocpp.org
Original-Received: by 10.31.213.194 with SMTP id m185ls2521555vkg.15.gmail; Wed, 28 Mar
 2018 08:28:26 -0700 (PDT)
X-Received: by 10.31.149.199 with SMTP id x190mr6007402vkd.11.1522250905771;
        Wed, 28 Mar 2018 08:28:25 -0700 (PDT)
In-Reply-To: <CAORbL+Mw=VCaPuOvo-4NcekDSPgsvHXA6v2QD0wmFUfC8OgOsw@mail.gmail.com>
X-Original-Sender: martinho.fernandes@native-instruments.de
Precedence: list
Mailing-list: list std-proposals@isocpp.org; contact std-proposals+owners@isocpp.org
List-ID: <std-proposals.isocpp.org>
X-Spam-Checked-In-Group: std-proposals@isocpp.org
X-Google-Group-Id: 399137483710
List-Post: <https://groups.google.com/a/isocpp.org/group/std-proposals/post>, <mailto:std-proposals@isocpp.org>
List-Help: <https://support.google.com/a/isocpp.org/bin/topic.py?topic=25838>, <mailto:std-proposals+help@isocpp.org>
List-Archive: <https://groups.google.com/a/isocpp.org/group/std-proposals/>
List-Subscribe: <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>,
 <mailto:std-proposals+subscribe@isocpp.org>
List-Unsubscribe: <mailto:googlegroups-manage+399137483710+unsubscribe@googlegroups.com>,
 <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>
Xref: news.gmane.org gmane.comp.lang.c++.isocpp.proposals:37535
Archived-At: <http://permalink.gmane.org/gmane.comp.lang.c++.isocpp.proposals/37535>

------=_Part_1233_2140794702.1522250905350
Content-Type: multipart/alternative; 
	boundary="----=_Part_1234_363284222.1522250905350"

------=_Part_1234_363284222.1522250905350
Content-Type: text/plain; charset="UTF-8"

Ooops, I forgot to reply to the list.


On Wednesday, March 28, 2018 at 5:11:44 PM UTC+2, Dimitrij Mijoski wrote:
>
> AFAIK Unicode defines both simple 1-to-1 case transformations at character 
> level and more complex language sensitive and context sensitive case 
> transformations at string level. You can not just care about the second and 
> throw away the first. The ctype<char32_t> would handle the first.
>  
>

But 1:1 mapping isn't language-sensitive, just like you said. std::locale 
interfaces like `charT toupper(charT, locale)` are supposed to be 
language-sensitive, and have fundamentally broken assumptions.

Arguably the simple case mappings are also not important for "Unicode 
support": the Unicode standard defines toUppercase and toLowercase 
operations in section 3.13, R1 and R2. They both use the full case mappings 
and not the simple ones. The simple ones are just a poor fallback for when 
one doesn't have full casing support (it's a poor fallback because they 
have inconsistent casing of characters with diacritics: LATIN SMALL LETTER 
A WITH GRAVE uppercases to LATIN CAPITAL LETTER A WITH GRAVE, but the 
aforementioned LATIN SMALL LETTER J WITH CARON, doesn't uppercase to LATIN 
CAPITAL LETTER J WITH CARON)

-- 
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69%40isocpp.org.

------=_Part_1234_363284222.1522250905350
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Ooops, I forgot to reply to the list.<br><br><br>On Wednes=
day, March 28, 2018 at 5:11:44 PM UTC+2, Dimitrij Mijoski wrote:<blockquote=
 class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1=
px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">AFAIK Unicode defines bo=
th simple 1-to-1 case transformations at character level and more complex l=
anguage sensitive and context sensitive case transformations at string leve=
l. You can not just care about the second and throw away the first. The cty=
pe&lt;char32_t&gt; would handle the first.<br></div>=C2=A0<br></blockquote>=
<div><br>But 1:1 mapping isn&#39;t language-sensitive, just like you said. =
std::locale interfaces like `charT toupper(charT, locale)` are supposed to =
be language-sensitive, and have fundamentally broken assumptions.<br><br>Ar=
guably the simple case mappings are also not important for &quot;Unicode su=
pport&quot;: the Unicode standard defines toUppercase and toLowercase opera=
tions in section 3.13, R1 and R2. They both use the full case mappings and =
not the simple ones. The simple ones are just a poor fallback for when one =
doesn&#39;t have full casing support (it&#39;s a poor fallback because they=
 have inconsistent casing of characters with diacritics: LATIN SMALL LETTER=
 A WITH GRAVE uppercases to LATIN CAPITAL LETTER A WITH GRAVE, but the afor=
ementioned LATIN SMALL LETTER J WITH CARON, doesn&#39;t uppercase to LATIN =
CAPITAL LETTER J WITH CARON)<br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f5400abd-c89d-4a89-a7a9-f41a5e906c69=
%40isocpp.org</a>.<br />

------=_Part_1234_363284222.1522250905350--

------=_Part_1233_2140794702.1522250905350--

.
