220 29083 <35daf87a-bf1d-46de-b16c-c2965fbadc8f@isocpp.org> article
Path: news.gmane.org!.POSTED!not-for-mail
From: Hans Guijt <hguijtra@xs4all.nl>
Newsgroups: gmane.comp.lang.c++.isocpp.proposals
Subject: Removing trivial undefined behaviour
Date: Wed, 26 Oct 2016 00:05:44 -0700 (PDT)
Lines: 121
Approved: news@gmane.org
Message-ID: <35daf87a-bf1d-46de-b16c-c2965fbadc8f@isocpp.org>
Reply-To: std-proposals@isocpp.org
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_1736_601913448.1477465544923"
X-Trace: blaine.gmane.org 1477465575 23858 195.159.176.226 (26 Oct 2016 07:06:15 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 26 Oct 2016 07:06:15 +0000 (UTC)
To: ISO C++ Standard - Future Proposals <std-proposals@isocpp.org>
Original-X-From: std-proposals+bncBCAJXIV3ZQEBBSVLYHAAKGQEAXKH77I@isocpp.org Wed Oct 26 09:06:10 2016
Return-path: <std-proposals+bncBCAJXIV3ZQEBBSVLYHAAKGQEAXKH77I@isocpp.org>
Envelope-to: gclcip-std-proposals@m.gmane.org
Original-Received: from mail-oi0-f72.google.com ([209.85.218.72])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <std-proposals+bncBCAJXIV3ZQEBBSVLYHAAKGQEAXKH77I@isocpp.org>)
	id 1bzIHR-00035c-AC
	for gclcip-std-proposals@m.gmane.org; Wed, 26 Oct 2016 09:05:45 +0200
Original-Received: by mail-oi0-f72.google.com with SMTP id t73sf71283015oie.5
        for <gclcip-std-proposals@m.gmane.org>; Wed, 26 Oct 2016 00:05:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=isocpp-org.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:message-id:subject:mime-version:x-original-sender
         :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group
         :list-post:list-help:list-archive:list-subscribe:list-unsubscribe;
        bh=EzWZdWgGELu6/DtXqLUk6HtVNpF+cQKHIXjnIafXTRI=;
        b=s98zNPmmb1fpnMItBEzi07nxKpWyk6b+qK1Q9QISB0I0GrbLRD7E1+nNgVb3CMLikf
         gU6jp8PIDCK25eLCSgK18L/MfPxaVqspuiTkZ1HRE1Spc8P7wtnb1eTvFMnw/kco0/pr
         dq8MGirY/XqvAgN8Q0sRrShfJHjqVR5BZTpD6TKCiHMkGOtPTY+Y5w2bVGo4apwWP3IZ
         +0egUJbdLDdLhlAVHHjtXESTY5Qeugsq2Z9AiMZU65cDmExKYCe24EkrXpYPxwjWl0AD
         cJyEARtiYqKUCE8O5CgeyhHTxwPpy8iRdviCHUhO7qgoeiWMFH/77u4R8GezPVfdoiPM
         5rpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:date:from:to:message-id:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :x-spam-checked-in-group:list-post:list-help:list-archive
         :list-subscribe:list-unsubscribe;
        bh=EzWZdWgGELu6/DtXqLUk6HtVNpF+cQKHIXjnIafXTRI=;
        b=HGL1QCQyUVRFA/bUguCQKQrousdgtCcq7zfn9m9L2NeRQO+pRDpcbxKK/dDm5fowNo
         +YPEVKd5OXeybpw7tzTSVsx0X6ncEDpqjZbeag55g4BaECcdl/fWjJvQE4wb50rSfqNi
         VXQnzbuawj1jeJdPYdkOmpKHFGKxeeU/z1x3Ad0JNBdmr3NB0d03b6CXcoArjfskuKS2
         3Bbg4Is8OonDcOcwNmrdE42/lqvZY6KNAxl+FOcCkBkNpGVwOuGTOGkx7Xsl+5JcxxXc
         HfXjZWzKfpj6nyfNaYHRyMTCBUZxkACEFZ480NG8VjcTaCuGuO/x42gC9KYKaO1M7Uly
         45zA==
X-Gm-Message-State: ABUngvc91GOEg91hGc7WkxkJrM9ZanTLdRs28vnT2ZGa/RKflA5Du/PM9O/DYFH/owbxkQ==
X-Received: by 10.157.6.74 with SMTP id 68mr210195otn.48.1477465547388;
        Wed, 26 Oct 2016 00:05:47 -0700 (PDT)
X-BeenThere: std-proposals@isocpp.org
Original-Received: by 10.36.60.20 with SMTP id m20ls1637658ita.18.canary-gmail; Wed, 26
 Oct 2016 00:05:46 -0700 (PDT)
X-Received: by 10.36.120.20 with SMTP id p20mr371333itc.0.1477465546265;
        Wed, 26 Oct 2016 00:05:46 -0700 (PDT)
X-Original-Sender: hguijtra@xs4all.nl
Precedence: list
Mailing-list: list std-proposals@isocpp.org; contact std-proposals+owners@isocpp.org
List-ID: <std-proposals.isocpp.org>
X-Google-Group-Id: 399137483710
List-Post: <https://groups.google.com/a/isocpp.org/group/std-proposals/post>, <mailto:std-proposals@isocpp.org>
List-Help: <https://support.google.com/a/isocpp.org/bin/topic.py?topic=25838>, <mailto:std-proposals+help@isocpp.org>
List-Archive: <https://groups.google.com/a/isocpp.org/group/std-proposals/>
List-Subscribe: <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>,
 <mailto:std-proposals+subscribe@isocpp.org>
List-Unsubscribe: <mailto:googlegroups-manage+399137483710+unsubscribe@googlegroups.com>,
 <https://groups.google.com/a/isocpp.org/group/std-proposals/subscribe>
Xref: news.gmane.org gmane.comp.lang.c++.isocpp.proposals:29083
Archived-At: <http://permalink.gmane.org/gmane.comp.lang.c++.isocpp.proposals/29083>

------=_Part_1736_601913448.1477465544923
Content-Type: multipart/alternative; 
	boundary="----=_Part_1737_673616425.1477465544923"

------=_Part_1737_673616425.1477465544923
Content-Type: text/plain; charset=UTF-8

I'd like to make a case for fixing the undefined behaviour in the character 
classification functions (is_digit, is_xdigit, is_alpha, is_alnum, etc.). 
These function display undefined behaviour when confronted with negative 
values, and it is entirely too easy to accidentally call them with such:

void main (int argc, const char *argv[])
{   if (argc)
        isdigit (argv [0][0]);
}

....done. At this point the user can call the program with some character 
string containing any value above 0x7f, and assuming char acts as a signed 
type (not uncommon), this program will exhibit undefined behaviour. 

Since this undefined behaviour could pop up for any character string that 
is not baked into the program itself (where, at least, the programmer can 
know in advance that no characters with ASCII values above 0x7f are 
present), we are left with the rather ridiculous situation that we must 
'pre-classify' characters before we are allowed to classify them:

return is_safe_char (c) && is_digit (c);

....where is_safe_char would be something like:

bool is_safe_char (int c)
{   return c >= 0 && c <= 0x7f;
}

But why stop there? Writing your own character classification functions is 
easy, after all. In my experience projects invariably fall into one of two 
cathegories: those that write their own versions of the character 
classification functions, and those that are blissfully unaware of the 
problem and wonder why their software sometimes fails.
   
It has been made clear to me that some implementations use lookup tables 
for the implementation of this group of functions. Such behaviour can be 
maintained; all that is needed is a single additional test to see if the 
input value lies somewhere within the table range or not. This test is 
effectively already mandatory anyway (since not having it is pretty much a 
guarantee of undefined behaviour somewhere down the line), so why not stick 
it in the library where it belongs, instead of in every single piece of C++ 
code out there? It removes a risk not everyone is aware of and that is 
completely unnecessary.

There may be other cases of undefined behaviour around that could be 
removed with a trivial change, but these have been bugging me for years... 




-- 
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/35daf87a-bf1d-46de-b16c-c2965fbadc8f%40isocpp.org.

------=_Part_1737_673616425.1477465544923
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;d like to make a case for fixing the undefined behav=
iour in the character classification functions (is_digit, is_xdigit, is_alp=
ha, is_alnum, etc.). These function display undefined behaviour when confro=
nted with negative values, and it is entirely too easy to accidentally call=
 them with such:<br><br>void main (int argc, const char *argv[])<br>{=C2=A0=
=C2=A0 if (argc)<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 isdigit (arg=
v [0][0]);<br>}<br><br>...done. At this point the user can call the program=
 with some character string containing any value above 0x7f, and assuming c=
har acts as a signed type (not uncommon), this program will exhibit undefin=
ed behaviour. <br><br>Since this undefined behaviour could pop up for any c=
haracter string that is not baked into the program itself (where, at least,=
 the programmer can know in advance that no characters with ASCII values ab=
ove 0x7f are present), we are left with the rather ridiculous situation tha=
t we must &#39;pre-classify&#39; characters before we are allowed to classi=
fy them:<br><br>return is_safe_char (c) &amp;&amp; is_digit (c);<br><br>...=
where is_safe_char would be something like:<br><br>bool is_safe_char (int c=
)<br>{=C2=A0=C2=A0 return c &gt;=3D 0 &amp;&amp; c &lt;=3D 0x7f;<br>}<br><b=
r>But why stop there? Writing your own character classification functions i=
s easy, after all. In my experience projects invariably fall into one of tw=
o cathegories: those that write their own versions of the character classif=
ication functions, and those that are blissfully unaware of the problem and=
 wonder why their software sometimes fails.<br>=C2=A0=C2=A0 <br>It has been=
 made clear to me that some implementations use lookup tables for the imple=
mentation of this group of functions. Such behaviour can be maintained; all=
 that is needed is a single additional test to see if the input value lies =
somewhere within the table range or not. This test is effectively already m=
andatory anyway (since not having it is pretty much a guarantee of undefine=
d behaviour somewhere down the line), so why not stick it in the library wh=
ere it belongs, instead of in every single piece of C++ code out there? It =
removes a risk not everyone is aware of and that is completely unnecessary.=
<br><br>There may be other cases of undefined behaviour around that could b=
e removed with a trivial change, but these have been bugging me for years..=
.. <br><br><br><br><br></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/35daf87a-bf1d-46de-b16c-c2965fbadc8f%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/35daf87a-bf1d-46de-b16c-c2965fbadc8f=
%40isocpp.org</a>.<br />

------=_Part_1737_673616425.1477465544923--

------=_Part_1736_601913448.1477465544923--

.
