220 29084 <9e242562-aba1-4f8d-8e0a-b2de34f6a10b@isocpp.org> article
Path: news.gmane.org!.POSTED!not-for-mail
From: "T. C." <rs2740@gmail.com>
Newsgroups: gmane.comp.lang.c++.isocpp.proposals
Subject: Re: Removing trivial undefined behaviour
Date: Wed, 26 Oct 2016 00:11:26 -0700 (PDT)
Lines: 133
Approved: news@gmane.org
Message-ID: <9e242562-aba1-4f8d-8e0a-b2de34f6a10b@isocpp.org>
References: <35daf87a-bf1d-46de-b16c-c2965fbadc8f@isocpp.org>
Reply-To: std-proposals@isocpp.org
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_1398_56848167.1477465886441"
X-Trace: blaine.gmane.org 1477465904 6857 195.159.176.226 (26 Oct 2016 07:11:44 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 26 Oct 2016 07:11:44 +0000 (UTC)
To: ISO C++ Standard - Future Proposals <std-proposals@isocpp.org>
Original-X-From: std-proposals+bncBCQ43G7NQIIRBIFOYHAAKGQEC3MSOQQ@isocpp.org Wed Oct 26 09:11:40 2016
Return-path: <std-proposals+bncBCQ43G7NQIIRBIFOYHAAKGQEC3MSOQQ@isocpp.org>
Envelope-to: gclcip-std-proposals@m.gmane.org
Original-Received: from mail-it0-f69.google.com ([209.85.214.69])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <std-proposals+bncBCQ43G7NQIIRBIFOYHAAKGQEC3MSOQQ@isocpp.org>)
	id 1bzIMw-0008Dq-8L
	for gclcip-std-proposals@m.gmane.org; Wed, 26 Oct 2016 09:11:26 +0200
Original-Received: by mail-it0-f69.google.com with SMTP id q75sf9411392itc.4
        for <gclcip-std-proposals@m.gmane.org>; Wed, 26 Oct 2016 00:11:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=isocpp-org.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:message-id:in-reply-to:references:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :x-spam-checked-in-group:list-post:list-help:list-archive
         :list-subscribe:list-unsubscribe;
        bh=MtOWue82oQ05pQJyBOSPhMjhpyS2M1WqwLiry9N4fHM=;
        b=ISTctmaNZcjKEnXSkET+DYo5VZivog0sr4Yl+otIRNWNwE7lF5PBtV1sSL1CjEHK+7
         6ucd7d9ba2Ip3Z9B8WbdRyE7OIo0r4RSXBKwP2NooRg6kl+lmCHNF7AbZvwdbK2UoEpR
         SdVGcMjH9SCW0XLKqK7ZOmfC84xG5+oWQk1CUXMaVq7F0ddMgg4DFsTIFrscJlSmcXQo
         T9nwFSRmDR9oF+5Mq7gCaIxYj75LfrsQkoXkGT/VJp9ez9gkM1Ep6kwwPJnpEU9Z08yY
         7ZrnpCea+XXaolt00aEejlP+etEMX2FFJ5gOdBRX/Iowdy00kudeMJk+OKeCyWqbw5mN
         VjJw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=date:from:to:message-id:in-reply-to:references:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :x-spam-checked-in-group:list-post:list-help:list-archive
         :list-subscribe:list-unsubscribe;
        bh=MtOWue82oQ05pQJyBOSPhMjhpyS2M1WqwLiry9N4fHM=;
        b=ozOmUh1WGAzzxCRnGrWBswu+Ypi/aP5Je4qxRGNXXcMKplhjZts28Z+fKNj90W6QFy
         qThGMYYdD48Nr/yL683zc/dGrlwazzUO6/o/nBupxeF0y8BKlwNwxQb92wy34xhnZLBf
         wVLKJMMGeE/DbxUEuFS2OiWkFX8kONJkOmOSJF5f05eKPf8BRZyS/aPNTzBO9RQb2hYE
         Qy7C7nx3KWfadfEvWIbpcR+tulkQoOKT+iKXHiLVEp1ox8/uVLY12GQriocENI/9iOTB
         HRid9xlYk1K8yF8U6jvvEhaQwDSw2rezdLaCuZVotDuf3rIX9V0R3gX6eeH+YiMl7JM0
         82DA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:date:from:to:message-id:in-reply-to:references
         :subject:mime-version:x-original-sender:reply-to:precedence
         :mailing-list:list-id:x-spam-checked-in-group:list-post:list-help
         :list-archive:list-subscribe:list-unsubscribe;
        bh=MtOWue82oQ05pQJyBOSPhMjhpyS2M1WqwLiry9N4fHM=;
        b=jk5t8sheqekFoDyVZcJAMFdBQ7I1cleGMrYJucUot3sJ+dP9OYrZBfEh2PmaWOXUNN
         rkgdvOozxZ5h3Q2bOf36NSVOdcNKCEGy9Lt1m1pr8W0ZaJA01+vAX1Kc2f99KVkdYMHm
         qIeS8iBXJvPv9tjYlBFFohf3Vlzw3rrWmZVPkdnXZ5f5YGT72B5t9/BnUjsXBFhXyyze
         N3IkS3uTDH62AKJgE8hOstPI/EBVd0hmfiXL7EDLelQUGVEgqek28yAVfLdxZzD7IxPK
         P/+pg9bi1AJckv+Am0PqXlT/kW6WhiZ/QJ6HrmsmMIlpJ+XPRPFlz8ck4FhLqpoLybS3
         O8Yg==
X-Gm-Message-State: ABUngvdiqR2UKcSk8ZkY/+GS/0LtpbtrQCjo8gt1QlGN3t8SZ+uIbo/bjmegIwvjk6BUAA==
X-Received: by 10.36.146.195 with SMTP id l186mr2363053itd.11.1477465888627;
        Wed, 26 Oct 2016 00:11:28 -0700 (PDT)
X-BeenThere: std-proposals@isocpp.org
Original-Received: by 10.36.10.196 with SMTP id 187ls1743194itw.7.canary-gmail; Wed, 26
 Oct 2016 00:11:27 -0700 (PDT)
X-Received: by 10.36.26.214 with SMTP id 205mr37461iti.1.1477465887912;
        Wed, 26 Oct 2016 00:11:27 -0700 (PDT)
In-Reply-To: <35daf87a-bf1d-46de-b16c-c2965fbadc8f@isocpp.org>
X-Original-Sender: rs2740@gmail.com
Precedence: list
Mailing-list: list std-proposals@isocpp.org; contact std-proposals+owners@isocpp.org
List-ID: <std-proposals.isocpp.org>
X-Google-Group-Id: 399137483710
List-Post: <https://groups.google.com/a/isocpp.org/group/std-proposals/post>, <mailto:std-proposals@isocpp.org>
List-Help: <https://support.google.com/a/isocpp.org/bin/topic.py?topic=25838>, <mailto:std-proposals+help@isocpp.org>
List-Archive: <https://groups.google.com/a/isocpp.org/group/std-proposals/>
List-Subscribe: <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>,
 <mailto:std-proposals+subscribe@isocpp.org>
List-Unsubscribe: <mailto:googlegroups-manage+399137483710+unsubscribe@googlegroups.com>,
 <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>
Xref: news.gmane.org gmane.comp.lang.c++.isocpp.proposals:29084
Archived-At: <http://permalink.gmane.org/gmane.comp.lang.c++.isocpp.proposals/29084>

------=_Part_1398_56848167.1477465886441
Content-Type: multipart/alternative; 
	boundary="----=_Part_1399_901643724.1477465886441"

------=_Part_1399_901643724.1477465886441
Content-Type: text/plain; charset=UTF-8



On Wednesday, October 26, 2016 at 3:05:45 AM UTC-4, Hans Guijt wrote:
>
> I'd like to make a case for fixing the undefined behaviour in the 
> character classification functions (is_digit, is_xdigit, is_alpha, 
> is_alnum, etc.). These function display undefined behaviour when confronted 
> with negative values, and it is entirely too easy to accidentally call them 
> with such:
>
> void main (int argc, const char *argv[])
> {   if (argc)
>         isdigit (argv [0][0]);
> }
>
> ...done. At this point the user can call the program with some character 
> string containing any value above 0x7f, and assuming char acts as a signed 
> type (not uncommon), this program will exhibit undefined behaviour. 
>
> Since this undefined behaviour could pop up for any character string that 
> is not baked into the program itself (where, at least, the programmer can 
> know in advance that no characters with ASCII values above 0x7f are 
> present), we are left with the rather ridiculous situation that we must 
> 'pre-classify' characters before we are allowed to classify them:
>
> return is_safe_char (c) && is_digit (c);
>
> ...where is_safe_char would be something like:
>
> bool is_safe_char (int c)
> {   return c >= 0 && c <= 0x7f;
> }
>
> But why stop there? Writing your own character classification functions is 
> easy, after all. In my experience projects invariably fall into one of two 
> cathegories: those that write their own versions of the character 
> classification functions, and those that are blissfully unaware of the 
> problem and wonder why their software sometimes fails.
>    
> It has been made clear to me that some implementations use lookup tables 
> for the implementation of this group of functions. Such behaviour can be 
> maintained; all that is needed is a single additional test to see if the 
> input value lies somewhere within the table range or not. This test is 
> effectively already mandatory anyway (since not having it is pretty much a 
> guarantee of undefined behaviour somewhere down the line), so why not stick 
> it in the library where it belongs, instead of in every single piece of C++ 
> code out there? It removes a risk not everyone is aware of and that is 
> completely unnecessary.
>
> There may be other cases of undefined behaviour around that could be 
> removed with a trivial change, but these have been bugging me for years... 
>
>
>
>
>
Because they are meant to be used as isdigit((unsigned char) c).

-- 
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/9e242562-aba1-4f8d-8e0a-b2de34f6a10b%40isocpp.org.

------=_Part_1399_901643724.1477465886441
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Wednesday, October 26, 2016 at 3:05:45 AM UTC-4=
, Hans Guijt wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;mar=
gin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D=
"ltr">I&#39;d like to make a case for fixing the undefined behaviour in the=
 character classification functions (is_digit, is_xdigit, is_alpha, is_alnu=
m, etc.). These function display undefined behaviour when confronted with n=
egative values, and it is entirely too easy to accidentally call them with =
such:<br><br>void main (int argc, const char *argv[])<br>{=C2=A0=C2=A0 if (=
argc)<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 isdigit (argv [0][0]);<=
br>}<br><br>...done. At this point the user can call the program with some =
character string containing any value above 0x7f, and assuming char acts as=
 a signed type (not uncommon), this program will exhibit undefined behaviou=
r. <br><br>Since this undefined behaviour could pop up for any character st=
ring that is not baked into the program itself (where, at least, the progra=
mmer can know in advance that no characters with ASCII values above 0x7f ar=
e present), we are left with the rather ridiculous situation that we must &=
#39;pre-classify&#39; characters before we are allowed to classify them:<br=
><br>return is_safe_char (c) &amp;&amp; is_digit (c);<br><br>...where is_sa=
fe_char would be something like:<br><br>bool is_safe_char (int c)<br>{=C2=
=A0=C2=A0 return c &gt;=3D 0 &amp;&amp; c &lt;=3D 0x7f;<br>}<br><br>But why=
 stop there? Writing your own character classification functions is easy, a=
fter all. In my experience projects invariably fall into one of two cathego=
ries: those that write their own versions of the character classification f=
unctions, and those that are blissfully unaware of the problem and wonder w=
hy their software sometimes fails.<br>=C2=A0=C2=A0 <br>It has been made cle=
ar to me that some implementations use lookup tables for the implementation=
 of this group of functions. Such behaviour can be maintained; all that is =
needed is a single additional test to see if the input value lies somewhere=
 within the table range or not. This test is effectively already mandatory =
anyway (since not having it is pretty much a guarantee of undefined behavio=
ur somewhere down the line), so why not stick it in the library where it be=
longs, instead of in every single piece of C++ code out there? It removes a=
 risk not everyone is aware of and that is completely unnecessary.<br><br>T=
here may be other cases of undefined behaviour around that could be removed=
 with a trivial change, but these have been bugging me for years... <br><br=
><br><br><br></div></blockquote><div><br></div><div>Because they are meant =
to be used as isdigit((unsigned char) c).</div><div><br></div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/9e242562-aba1-4f8d-8e0a-b2de34f6a10b%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/9e242562-aba1-4f8d-8e0a-b2de34f6a10b=
%40isocpp.org</a>.<br />

------=_Part_1399_901643724.1477465886441--

------=_Part_1398_56848167.1477465886441--

.
