{"id":87,"date":"2014-09-21T10:00:00","date_gmt":"2014-09-21T01:00:00","guid":{"rendered":"https:\/\/www.rapapaing.com\/blog\/?p=87"},"modified":"2020-02-02T17:25:50","modified_gmt":"2020-02-02T08:25:50","slug":"geographically-isolated-bugs","status":"publish","type":"post","link":"https:\/\/rapapaing.com\/blog\/2014\/09\/geographically-isolated-bugs\/","title":{"rendered":"Geographically isolated bugs"},"content":{"rendered":"\n<p>I find it interesting how a certain class of programming problems may be so common in one geographical location, that are considered to be common knowledge, yet completely unknown in the rest of the world.<br>Consider this piece of code in C\/C++ <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"c\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\/\/ \u30ed\u30dc\u30c3\u30c8\u6a5f\u80fd\nif (robotRequired)\n{\n    CreateRobot();\n}<\/pre>\n\n\n\n<p>See anything wrong with it? No?<\/p>\n\n\n\n<p>Well, if you happen to save that little piece of code in Shift-JIS, which is an encoding very commonly used for Japanese text, on some compilers the \u201cif\u201d statement will get completely ignored, and the contents will be executed every time.<\/p>\n\n\n\n<p>What\u2019s going on? If your compiler is not designed or configured to recognize Shift-JIS sequences as multi-byte characters, most of the times your program will work as intended, since everything after the \u201c\/\/\u201d will get ignored as a comment. However, in this case, this is what the compiler will see:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"c\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\/\/ \u0192\u008d\u0192{\u0192b\u0192g\u2039@\u201d\\\nif (robotRequired)\n{\n    CreateRobot();\n}<\/pre>\n\n\n\n<p> Notice the last character in the comment? That backslash will escape the line break, and make the comment continue\u00a0on\u00a0to the next line. After preprocessing, this is what the code looks like: <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"c\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">{\n    CreateRobot();\n}<\/pre>\n\n\n\n<p>This happens because multi-byte sequences in Shift-JIS do not guarantee that any bytes after the first one will be above 0x7f. In this case, the character \u80fd is encoded as 0x94&nbsp;0x5c. That second byte happens to be the same as the backslash, which causes this problem.<\/p>\n\n\n\n<p>This is certainly not isolated to the \u80fd symbol. Any symbol whose last byte is 0x5c\u00a0will exhibit the same problem. These are some symbols that end in 0x5c, (list courtesy of\u00a0<a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/sites.google.com\/site\/fudist\/Home\/grep\/sjis-damemoji-jp\" target=\"_blank\">this site<\/a>):<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u30fc\u30bd\u042b\u2168\u5642\u6d6c\u6b3a\u572d\u69cb\u8695\u5341\u7533\u66fe\u7baa\u8cbc\u80fd\u8868\u66b4\u4e88\u7984\u5154\u5580\u5abe\u5f4c\u62ff\u6764\u6b43\u6fec\u755a\u79c9\u7db5\u81c0\u85f9\u89f8\u8ec6\u9414\u9945\u9ded\u5046\u7821\u7e8a\u72be<\/pre>\n\n\n\n<p>Most of these are not obscure symbols. A lot of them are actually very likely to appear in comments, string literals or anywhere else where a stray backslash is likely to cause a lot of trouble.<\/p>\n\n\n\n<p>It turns out that in Japan, this is very common knowledge for programmers. and you&nbsp;should not call yourself a programmer in Japan in you don\u2019t know about this. The problem is that, in the existence of such a problem, it can be very difficult to find what\u2019s causing a problem, since at first sight there is nothing wrong with the code, and in a 100000+ line program, the comments are the last place you will start looking for problems.<\/p>\n\n\n\n<p>Of course, I had never heard of this issue, and faced it a few days ago. Thankfully, I was not the one who inserted the problem, just the guy who found out about it after noticing how no assembly code was being generated by the compiler for that line, even with all optimizations disabled.<\/p>\n\n\n\n<p>What can you do to avoid this problem? Use UTF-8. Always.<\/p>\n\n\n\n<p>UTF-8&nbsp;multi-byte sequences contain only bytes set to 0x80 or above, so this problem will never exist if you use UTF-8 characters. And your compiler does not really have to be UTF-8 aware to compile UTF-8 source files,&nbsp;as long as you don\u2019t start naming your variables and functions with multi-byte characters.<\/p>\n\n\n\n<p>Cool bug though. Exploiting it would make an awesome entry in the\u00a0<a href=\"http:\/\/www.underhanded-c.org\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">underhanded C contest<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I find it interesting how a certain class of programming problems may be so common in one geographical location, that are considered to be common knowledge, yet completely unknown in the rest of the world.Consider this piece of code in C\/C++ See anything wrong with it? No? Well, if you happen to save that little &hellip; <a href=\"https:\/\/rapapaing.com\/blog\/2014\/09\/geographically-isolated-bugs\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Geographically isolated bugs&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[26,18],"class_list":["post-87","post","type-post","status-publish","format-standard","hentry","category-programming","tag-bugs","tag-c"],"_links":{"self":[{"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/posts\/87","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/comments?post=87"}],"version-history":[{"count":1,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/posts\/87\/revisions"}],"predecessor-version":[{"id":88,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/posts\/87\/revisions\/88"}],"wp:attachment":[{"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/media?parent=87"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/categories?post=87"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rapapaing.com\/blog\/wp-json\/wp\/v2\/tags?post=87"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}