ときどきの雑記帖 RE* (新南口)

Fighting Vipers

September 14, 2022

RubyKaigi 2022

行きたかったでござる。

円安

そろそろ来年の分を入手しておかないとなあとMoleskineのPokcketやLargeのDialy を見ているんだけど、以前からそれなりに高かったとは言えさらに高くなっているような。円安の影響ですかねえ。やれやれ。

技術書典 13

Wolfram言語の本があった。

無料で始めるWolfram言語入門：マトラ研究所

後番組?

放送時間からすると「笑わない数学」の後番組っぽいですね。

NHKの“ゲーム教養番組”がレギュラー化！「ゲームゲノム」が10月5日23時より放送 - GAME Watch

スズメバチ

ここ数日間で二回スズメバチ(たぶん)に家の近くで遭遇した。なんでこんなところに(小学生の頃以来だよ)。

退屈なことはPythonにやらせよう第2版

発売予定が出たのはずいぶん早かったと思うのだけどいつの間にか23年1月発売に。

—

onetrueawkに興味深いissue Ancient awk regexp compatibility bug · Issue #161 · onetrueawk/awk があった。

内容はというと

Using the code from master as of today, I found the following bug. Given:
BEGIN {
  print match("abc-def", /[qrs---tuv]/)
}
The One True Awk prints a result of 0, whereas gawk and mawk print 4. Ancient awks (and I think it’s even documented in the awk book) allowed a “range” of minus through minus to mean a real actual minus sign. The current code doesn’t support this anymore.

[qrs---tuv]という正規表現を与えたときに、 gawkやmawkでは-が正規表現のマッチ対象になるけれども One True Awkではならない。というもの。

へー、と思ってgawkとmawkのコードを見たら確かに---を特別扱いしていた。まずはgawk。というかgnulibのregexから

gnulibc regcomp.c

/* Peek a token from INPUT, and return the length of the token.
   We must not use this function out of bracket expressions.  */

static int
peek_token_bracket (re_token_t *token, re_string_t *input, reg_syntax_t syntax)
{

ざっくり省略

  switch (c)
    {
    case ']':
      token->type = OP_CLOSE_BRACKET;
      break;
    case '^':
      token->type = OP_NON_MATCH_LIST;
      break;
    case '-':
      /* In V7 Unix grep and Unix awk and mawk, [...---...]
         (3 adjacent minus signs) stands for a single minus sign.
         Support that without breaking anything else.  */
      if (! (re_string_cur_idx (input) + 2 < re_string_length (input)
             && re_string_peek_byte (input, 1) == '-'
             && re_string_peek_byte (input, 2) == '-'))
        {
          token->type = OP_CHARSET_RANGE;
          break;
        }
      re_string_skip_bytes (input, 2);
      FALLTHROUGH;
    default:
      token->type = CHARACTER;
    }
  return 1;
}

mawkはこう。しばらくどういう動作をしているのか理解が追い付かなかった😓

mawk-20121129/rexp0.c at master · ThomasDickey/mawk-20121129

case '-':

    if (prevc == -1 || p + 1 == q) {
        prevc = '-';
        char_on(*bvp, '-');
        p++;
    } else {
        int c;
        char *mark = ++p;

        if (*p != '\\')
            c = (UChar) * p++;
        else {
            ++p;
            c = escape(&p);
        }

        if (prevc <= c) {
            block_on(*bvp, prevc, c);
            prevc = -1;
        } else {        /* back up */
            p = mark;
            prevc = '-';
            char_on(*bvp, '-');
        }
    }
    break;

u– の範囲を処理する → /* back up*/のところを通る →

prevc = '-';
char_on(*bvp, '-');

を実行する → prevc==’-‘の状態で再度 case ‘-’ に来ると… と。

んで onetrueawkにはなし。 awk/b.c at master · onetrueawk/awk

gnulib でIn V7 Unix grep とコメントにはあったけどそれっぽいコードが見当たらない?

v7unix/grep.c at master · v7unix/v7unix

str[nls]?c(py|at)

strscpy でも触れた話題をふたたび。

strlcpy

strlcpyの実装

string.c - lib/string.c - Linux source code (v5.19.3) - Bootlin

#ifndef __HAVE_ARCH_STRLCPY
/**
 * strlcpy - Copy a C-string into a sized buffer
 * @dest: Where to copy the string to
 * @src: Where to copy the string from
 * @size: size of destination buffer
 *
 * Compatible with ``*BSD``: the result is always a valid
 * NUL-terminated string that fits in the buffer (unless,
 * of course, the buffer size is zero). It does not pad
 * out the result like strncpy() does.
 */
size_t strlcpy(char *dest, const char *src, size_t size)
{
        size_t ret = strlen(src);

        if (size) {
                size_t len = (ret >= size) ? size - 1 : ret;
                memcpy(dest, src, len);
                dest[len] = '\0';
        }
        return ret;
}
EXPORT_SYMBOL(strlcpy);
#endif

無条件でsrcに対してstrlenを呼び出しているので、 srcがきちんとNUL-terminateされていない場合に「オーバーラン」するのは指摘にある通り。そして戻り値はstrlen(src)の値なので切り詰めが起きたかどうかは戻り値だけからはわからない。起こるか(or 起きたか)どうかを呼び出し側で判定する必要がある。と

ここで 1999 USENIX Annual Technical Conference, June 6-11, 1999, Monterey, California, USA を見ると使用例も記載されていたので紹介する。

Example 1c is a trivial conversion to the strlcpy()/strlcat() API. It has the advantage of being as simple as Example 1a, but it does not take advantage of the new API’s return value.
strlcpy(path, homedir, sizeof(path));
strlcat(path, "/", sizeof(path));
strlcat(path, ".foorc", sizeof(path));
len = strlen(path);

この例では戻り値を使っていない。ただ単にstrncpyを置き換えたもの。

次の例では戻り値を使っている。

Since Example 1c is so easy to read and comprehend, it is simple to add additional checks to it. In Example 1d, we check the return value to make sure there was enough space for the source string. If there was not, we return an error. This is slightly more complicated but in addition to being more robust, it also avoids the final strlen() call.
len = strlcpy(path, homedir, sizeof(path);
if (len >= sizeof(path))
 return (ENAMETOOLONG);
len = strlcat(path, "/", sizeof(path));
if (len >= sizeof(path))
 return (ENAMETOOLONG);
len = strlcat(path, ".foorc", sizeof(path));
if (len >= sizeof(path))
 return (ENAMETOOLONG);

マニュアルページ strlcat(3) の戻り値の記述をみると

RETURN VALUES
Besides quibbles over the return type (size_t versus int) and signal han- dler safety (snprintf(3) is not entirely safe on some systems), the fol- lowing two are equivalent:

          n = strlcpy(dst, src, len);
          n = snprintf(dst, len, "%s",        src);

Like snprintf(3), the strlcpy() and strlcat() functions return the total
length of the string they tried to create.  For strlcpy() that means the
length of src.  For strlcat() that means the initial length of dst plus
the length of src.

If the return value is >= dstsize, the output string has been truncated.
It is the caller's responsibility to handle this.

caller's responsibilityって明記されてますね。

マニュアルページのexampleから

EXAMPLES
The following code fragment illustrates the simple case:
char        *s, *p,        buf[BUFSIZ];

...

(void)strlcpy(buf, s, sizeof(buf));
(void)strlcat(buf, p, sizeof(buf));
To detect truncation, perhaps while building a pathname, something like the following might be used:
char        *dir, *file, pname[MAXPATHLEN];

...

if (strlcpy(pname, dir, sizeof(pname)) >= sizeof(pname))
       goto        toolong;
if (strlcat(pname, file, sizeof(pname)) >= sizeof(pname))
       goto        toolong;
Since it is known how many characters were copied the first time, things can be sped up a bit by using a copy instead of an append:
char        *dir, *file, pname[MAXPATHLEN];
size_t n;

...

n = strlcpy(pname, dir, sizeof(pname));
if (n >= sizeof(pname))
       goto        toolong;
if (strlcpy(pname + n, file, sizeof(pname) - n) >= sizeof(pname) - n)
       goto        toolong;
However, one may question the validity of such optimizations, as they de- feat the whole purpose of strlcpy() and strlcat(). As a matter of fact, the first version of this manual page got it wrong.

まあ呼び出すだけでOKみたいなうまい話はないと。

別の実装

(別件も含め)色々検索していると、異なるやり方の実装もあることを知った。

たとえば usskim / strlcpy によるとOpenBSDでの実装は次のようになっているらしい。

/*
 * Copy src to string dst of size siz.  At most siz-1 characters
 * will be copied.  Always NUL terminates (unless siz == 0).
 * Returns strlen(src); if retval >= siz, truncation occurred.
 */
size_t
strlcpy(char *dst, const char *src, size_t siz)
{
          char *d = dst;
    const char *s = src;
    size_t n = siz;

    /* Copy as many bytes as will fit */
    if (n != 0 && --n != 0)
    {
        do
        {
            if ((*d++ = *s++) == '\0')
                break;
        }
        while (--n != 0);
    }

    /* Not enough room in dst, add NUL and traverse rest of src */
    if (n == 0)
    {
        if (siz != 0)
            *d = '\0';     /* NUL-terminate dst */
        while (*s++)
            continue;
    }

    return s - src - 1;    /* count does not include NUL */
}

無条件でコピー元に対してstrlenを呼び出してはいない。そして同じブログで紹介されていたもう一つ別の実装。

/**
 * Copy a string to a sized buffer. The result is always nul-terminated
 * (contrary to strncpy()).
 *
 * @param dest destination buffer
 * @param src string to be copied
 * @param len maximum number of characters to be copied plus one for the
 * terminating nul.
 *
 * @return strlen(src)
 */
#ifndef HAVE_STRLCPY
extern size_t vlc_strlcpy (char *dst, const char *src, size_t siz)
{
    size_t len;

    for (len = 1; (len < siz) && *src; len++)
        *dst++ = *src++;

    if (siz)
        *dst = '\0';

    while (*src++)
        len++;

    return len - 1;
}
#endif

こちらも同様。

strscpy

strscpyでは切り詰めが起きた場合に-E2BIG が返るので、戻り値を見るだけで判断できる。

strscpyの実装

効率upのため色々複雑になっているけど strscpyの実装はこう

#ifndef __HAVE_ARCH_STRSCPY
/**
 * strscpy - Copy a C-string into a sized buffer
 * @dest: Where to copy the string to
 * @src: Where to copy the string from
 * @count: Size of destination buffer
 *
 * Copy the string, or as much of it as fits, into the dest buffer.  The
 * behavior is undefined if the string buffers overlap.  The destination
 * buffer is always NUL terminated, unless it's zero-sized.
 *
 * Preferred to strlcpy() since the API doesn't require reading memory
 * from the src string beyond the specified "count" bytes, and since
 * the return value is easier to error-check than strlcpy()'s.
 * In addition, the implementation is robust to the string changing out
 * from underneath it, unlike the current strlcpy() implementation.
 *
 * Preferred to strncpy() since it always returns a valid string, and
 * doesn't unnecessarily force the tail of the destination buffer to be
 * zeroed.  If zeroing is desired please use strscpy_pad().
 *
 * Returns:
 * * The number of characters copied (not including the trailing %NUL)
 * * -E2BIG if count is 0 or @src was truncated.
 */
ssize_t strscpy(char *dest, const char *src, size_t count)
{
        const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
        size_t max = count;
        long res = 0;

        if (count == 0 || WARN_ON_ONCE(count > INT_MAX))
                return -E2BIG;


#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
        /*
         * If src is unaligned, don't cross a page boundary,
         * since we don't know if the next page is mapped.
         */
        if ((long)src & (sizeof(long) - 1)) {
                size_t limit = PAGE_SIZE - ((long)src & (PAGE_SIZE - 1));
                if (limit < max)
                        max = limit;
        }
#else
        /* If src or dest is unaligned, don't do word-at-a-time. */
        if (((long) dest | (long) src) & (sizeof(long) - 1))
                max = 0;
#endif

        while (max >= sizeof(unsigned long)) {
                unsigned long c, data;

                c = read_word_at_a_time(src+res);
                if (has_zero(c, &data, &constants)) {
                        data = prep_zero_mask(c, data, &constants);
                        data = create_zero_mask(data);
                        *(unsigned long *)(dest+res) = c & zero_bytemask(data);
                        return res + find_zero(data);
                }
                *(unsigned long *)(dest+res) = c;
                res += sizeof(unsigned long);
                count -= sizeof(unsigned long);
                max -= sizeof(unsigned long);
        }

        while (count) {
                char c;

                c = src[res];
                dest[res] = c;
                if (!c)
                        return res;
                res++;
                count--;
        }

        /* Hit buffer length without finding a NUL; force NUL-termination. */
        if (res)
                dest[res-1] = '\0';

        return -E2BIG;
}
EXPORT_SYMBOL(strscpy);
#endif

string: Introduce strtomem() and strtomem_pad() [LWN.net]

strlcatの仕様で(?)脆弱性が混入したはなし

strcatの代替は難しいという話をしましたが、PHPではセキュリティの保険的な対策のため strcat→strncat→strlcat と書き換えた結果、想定に反して致命的な脆弱性が混入したことがあります。https://t.co/T3I5jvNKBi https://t.co/FSqkP4IohM
— 徳丸浩 (@ockeghem) August 27, 2022

PHP5.3.7のcrypt関数のバグはこうして生まれた | 徳丸浩の日記

ここで不幸にもバグが入りました。strncatとstrlcatでは、第3パラメータの意味が異なります。

FORTRAN Compiler on IBM 704

データ型

704では Fixed point 36bit と Floating point 36bit という二つの数値型 (「倍精度」もあった?)が扱えるのだけど、その704上のFORTRAN II (たぶんFORTRAN Iも)で扱う数値は

種類	実装
実数	36bit Floating point
整数	15bit 固定小数点数

となっている。

http://archive.computerhistory.org/resources/text/Fortran/102649787.05.01.acc.pdf

p.9

1 to 5 decimal digits. A preceding + or - sign is optional. The magnitude of the constant must be less than 32768.

Any number of decimal digits, with a decimal point at the beginning, at the end, or between two digits. A preceding + or - sign is optional. A decimal exponent preceded by an E may follow.

The magnitude of the number thus expressed must be zero, or must lie between the approximate limits of 10^-38 to 10^38 . The number will appear in the object program as a normalised single-precision floating point number.

A fixed point variable can assume any integral value whose magnitude is less than 32768. However, see the section on Fixed Point Arithmetic in Chapter 7.

A floating point variable can assume any value expressible as a normalised 704 floating point number; i.e. zero, or with magnitude between approximately 10^-38 and 10^38.

CHAPTER 7. MISCELLANEOUS DETAILS ABOUT FORTRAN

Fixed Point Arithmetic

The use of fixed point arithmetic is governed by the following considerations.

Fixed point constants specified in the source program must have magnitudes less than 2^15.

Fixed point data read in by the object ppmgram ~~are treated mod 2^15~~ must be smaller than 2^15

Fixed point arithmetic in the object program is arithmetic mod 2^15.

Indexing in the object program is mod (size of the object machine).

なぜ? と思ったがこれは、おそらく整数の使い道が配列の添え字くらいしか想定されていなかったからじゃないだろうか? であれば、15ビットあれば十分であるし (インデックスレジスタの大きさがそうだから) それを越える範囲は面倒のもとになる。

仕方がない

今日知ったこと: マルウェアが検出を避けるためコードをモーフィングする技法のひとつに "Shikata-Ga-Naiエンコーダ" というものがあるらしい。これはFPU命令を使ってデコーダ自体も検出しにくくできるという優れた? 特性がある。https://t.co/5AjR9FW8c1
— 新山祐介 (Yusuke Shinyama) (@mootastic) September 11, 2022

Shikata-Ga-Nai encoder

「仕方が無い」という画像が貼られていて。もしかしてと思ったがやっぱりその日本語から名前を付けたのか。でもなんで?

Teaching C

ほとんどの学校で、C言語はまずい教え方をされている。C言語は今後もシステム保守のために長く必要とされるだろうが、多くの教材は古く、悪い習慣を教えており、今日の低水準プログラミングで必要なセキュリティ上の考慮や未定義動作について触れていない。https://t.co/R7dYmDBO3u
— 新山祐介 (Yusuke Shinyama) (@mootastic) September 11, 2022

Teaching C – Embedded in Academia

日本に限った話でもないんだなあと

元記事はMay 10, 2016のものとなっているので結構前のものですね。その後の状況はどうなんだろうか。あと、Embedded in Academiaって見たような覚えがあるんだけどページの体裁が記憶にあるそれとはだいぶ違うような。

Embedded in Academia – John Regehr, Professor of Computer Science, University of Utah, USA

One might argue that we shouldn’t be teaching C any longer, and I would certainly agree that C is probably a poor first or second language. On the other hand, even if we were in a position where no new projects should be written in C (that day is coming, but slowly — probably at least a decade off), we’re still going to be stuck maintaining C for many decades. A random CS graduate has pretty good odds of running into C during her career. But beyond that, even after we replace C, the systems programming niche will remain. A lot of what we learn when we think we’re learning C is low-level programming and that stuff is important.

ごもっとも。

Hugoメモ

なんか落ちた。

panic: runtime error: index out of range [0] with length 0

goroutine 359 [running]:
github.com/gohugoio/hugo/hugolib.(*pageState).shiftToOutputFormat(0xc004c52120, 0x1, 0x0)
        /root/project/hugo/hugolib/page.go:861 +0x3c5
github.com/gohugoio/hugo/hugolib.(*pageState).initOutputFormat(...)
        /root/project/hugo/hugolib/page.go:457
github.com/gohugoio/hugo/hugolib.(*Site).preparePagesForRender.func1(0xc0017f54c0?)
        /root/project/hugo/hugolib/hugo_sites.go:853 +0x2a
github.com/gohugoio/hugo/hugolib.(*pageMap).withEveryBundlePage.func1({0x1?, 0x2?}, 0xc002683470?)
        /root/project/hugo/hugolib/content_map_page.go:691 +0x2a
github.com/gohugoio/hugo/hugolib.contentTrees.Walk.func1({0xc000ed8dc0?, 0xc0017f54c0?}, {0x2a50e80?, 0xc002ceebd0?})
        /root/project/hugo/hugolib/content_map.go:876 +0x3b
github.com/armon/go-radix.recursiveWalk(0xc002ceec30, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:519 +0x45
github.com/armon/go-radix.recursiveWalk(0xc002308180?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.recursiveWalk(0x1e1864a?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.recursiveWalk(0xc001ad0b28?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.recursiveWalk(0xc002405d70?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.recursiveWalk(0x2030001?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.recursiveWalk(0x2cd7393?, 0xc0017f54c0)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:525 +0xb3
github.com/armon/go-radix.(*Tree).Walk(...)
        /go/pkg/mod/github.com/armon/go-radix@v1.0.0/radix.go:447
github.com/gohugoio/hugo/hugolib.contentTrees.Walk({0xc0024f7da0?, 0x4?, 0x1e23419?}, 0xc0017f5500?)
        /root/project/hugo/hugolib/content_map.go:874 +0x5b
github.com/gohugoio/hugo/hugolib.(*pageMap).withEveryBundlePage(0xc000756ea0?, 0x0?)
        /root/project/hugo/hugolib/content_map_page.go:689 +0x47
github.com/gohugoio/hugo/hugolib.(*Site).preparePagesForRender(0x2c69280?, 0xc0?, 0x1e22b4a?)
        /root/project/hugo/hugolib/hugo_sites.go:852 +0x5e
github.com/gohugoio/hugo/hugolib.(*HugoSites).render(0xc001f46a80, 0xc002fa5590)
        /root/project/hugo/hugolib/hugo_sites_build.go:308 +0x597
github.com/gohugoio/hugo/hugolib.(*HugoSites).Build.func4()
        /root/project/hugo/hugolib/hugo_sites_build.go:147 +0x2a
runtime/trace.WithRegion({0x33e3cc0?, 0xc003389680?}, {0x2cdcbeb, 0x6}, 0xc0017f58a0)
        /usr/local/go/src/runtime/trace/annotation.go:141 +0xe3
github.com/gohugoio/hugo/hugolib.(*HugoSites).Build(0xc001f46a80, {0x0, {0x0, 0x0}, 0x0, 0x0, 0x0, 0x0, 0xc003389650, 0x0, ...}, ...)
        /root/project/hugo/hugolib/hugo_sites_build.go:149 +0x589
github.com/gohugoio/hugo/commands.(*commandeer).rebuildSites(0xc0002d5a40, {0xc00336f740?, 0x4?, 0x4?})
        /root/project/hugo/commands/hugo.go:765 +0x210
github.com/gohugoio/hugo/commands.(*commandeer).handleEvents.func4(0xc0002d5a40, 0xc0033e1098)
        /root/project/hugo/commands/hugo.go:1157 +0x96
github.com/gohugoio/hugo/commands.(*commandeer).handleEvents(0xc0002d5a40, 0xc0024b8300, 0xc0015332d8, {0xc00427e060?, 0x4, 0x4}, 0xc0033359e0?)
        /root/project/hugo/commands/hugo.go:1160 +0xbc5
github.com/gohugoio/hugo/commands.(*commandeer).newWatcher.func1()
        /root/project/hugo/commands/hugo.go:895 +0x26c
created by github.com/gohugoio/hugo/commands.(*commandeer).newWatcher
        /root/project/hugo/commands/hugo.go:886 +0x3ca

≪ prev 帝都物語

next ≫ The Two Faces of Tomorrow