ときどきの雑記帖 RE* (新南口)

ソイレントグリーン

October 22, 2021

2022

(原作は違うようだけど)来年の話だったのか。

2022年、留まるところを知らない人口増加により、世界は食住を失った人間が路上に溢れ、一部の特権階級と多くの貧民という格差の激しい社会となっていた。肉や野菜といった本物の食料品は宝石以上に稀少で高価なものとなり、特権階級を除くほとんどの人間は、ソイレント社が海のプランクトンから作るという合成食品の配給を受けて、細々と生き延びていた。

この画像ではちょっと読みづらいけど、パッケージにも2022の文字が見えますな。

FWIW

one true awkのissueを眺めていたら

Fix #121 by awkfan77 · Pull Request #132 · onetrueawk/awk

I committed the same fix to OpenBSD some time ago FWIW.

なんか見慣れない頭字語が。

で、調べた。

FWIW（それが有益かどうかわからないけれども）とは意味/解説 - シマウマ用語集

FWIWとは、「for what it’s worth.」の略で、「（それがあなたにとって）有益かどうかわからないけれども」「一応念のため言っておくけれども」「私の意見ですけれども」といった意味の英語の略語、インターネットスラングである。伝えようとしている情報が相手にとって価値あるものか判断できないけれども、という弁明や補足として添えられるフレーズの一つ。

そのまま; 役に立つかどうかは分からないが; For What It’s Worth; 真偽はともかく

なるほど。

operator precedence

テストでバグ発見！（4）懐かしい「解の公式」のプログラムからバグを探し出せ：山浦恒央の“くみこみ”な話（146）（3/4 ページ） - MONOist に、

プログラミングでは、割り算と掛け算の優先順位は同じです※2）。その場合、左から順番に計算するため、(2 * a)をする前に分子を2で割ってしまいます。
※2）プログラミング言語によって、異なる場合があります。

とあったのだけど、乗算と除算の優先順位の違うプログラミング言語ってあったっけ? (いやまあ慎重な物言いで良いと思いますが)

Order of operations - Wikipedia

化

セジウィック:アルゴリズムC 第5部-グラフアルゴリズム- | R.セジウィック

グラフアルゴリズムの世界的名著がついに翻訳化!

「翻訳化」ってどういう意味だろう?

tr

前回の続き。

とあったのでいくつかmanを見てみたところ件の記事にあったような記述は見当たらなかったんだけど、どこのだろう?

これですが、「SET1 と SET2 の両方で指定した場合には CHAR1-CHAR2 と同じ」で検索すると、いくつか見つかりました。

CHAR1-CHAR2 CHAR1 から CHAR2 までを昇順に展開した文字列 [CHAR1-CHAR2] SET1 と SET2 の両方で指定した場合には CHAR1-CHAR2 と同じ

ふむ。例の記事にあった通りですね(疑ってすまんかった)。

そして

coreutils 5.2.1

man のほかにも検索に引っかかったものがあって、状況を把握するのに特に役立ったのがこれ。 Onlive-Source-Backup/ja.po at master ・ samdmarshall/Onlive-Source-Backup ・ GitHub

5.2.1とずいぶん前のバージョンなんですが、 coreutilsの翻訳リソース(のコピー、バックアップ)らしいものでその中身を見ると

#: src/tr.c:351
msgid ""
"  \\v              vertical tab\n"
"  CHAR1-CHAR2     all characters from CHAR1 to CHAR2 in ascending order\n"
"  [CHAR*]         in SET2, copies of CHAR until length of SET1\n"
"  [CHAR*REPEAT]   REPEAT copies of CHAR, REPEAT octal if starting with 0\n"
"  [:alnum:]       all letters and digits\n"
"  [:alpha:]       all letters\n"
"  [:blank:]       all horizontal whitespace\n"
"  [:cntrl:]       all control characters\n"
"  [:digit:]       all digits\n"
msgstr ""
"  \\v              垂直タブ\n"
"  CHAR1-CHAR2     CHAR1 から CHAR2 までを昇順に展開した文字列\n"
"  [CHAR1-CHAR2]   SET1 と SET2 の両方で指定した場合には CHAR1-CHAR2 と同じ\n"
"  [CHAR*]         SET2 として, CHAR を SET1 の長さ分展開した文字列\n"
"  [CHAR*REPEAT]   CHAR を REPEAT 個展開した文字列, REPEAT の値を 0 から\n"
"                      始めた場合には, 8 進数として解釈\n"
"  [:alnum:]       全てのアルファベットと数字\n"
"  [:alpha:]       全てのアルファベット\n"
"  [:blank:]       全ての水平方向空白文字\n"
"  [:cntrl:]       全ての制御文字\n"
"  [:digit:]       全ての数字\n"

#: src/tr.c:362

英語の文字列と日本語の文字列で食い違いがあります。日本語文字列には問題の[CHAR1-CHAR2] があるのに対して英語文字列にはありません。そして、これは最新(9.0)のcoreutilsでも同じでした。ただしgitリポジトリにはディレクトリpoの下にファイルはありません。 tar.xz形式の配布アーカイブで確認できます。

findutils

これはいったいどうしたことかと過去に遡って確認してみると、 coreutilsに統合される前、 findutilsの時代に「それ」は潜り込んでいました。

Textutils - GNU Project - Free Software Foundation

Fileutils, Shellutils, and Textutils have been combined into the GNU Coreutils package. All further development and discussion is now taking place as Coreutils. The last separate versions were fileutils-4.1.11, textutils-2.1, and sh-utils-2.0.15. The first major release of coreutils-5.0 was announced on Fri, 4 April 2003.
Please refer to the new coreutils home page at http://www.gnu.org/software/coreutils/ for more information.

通常のGNUのFTPサイトにはすでにfindutilsのアーカイブはなくなっている (従ってミラーサイトにもありません)のですが、 Index of /old-gnu/textutils にありました。このディレクトリに置かれているfindutilsのアーカイブは以下の通りで

textutils-1.19.tar.gz	1996-07-12 03:00
textutils-1.20.tar.gz	1996-12-12 03:00
textutils-1.21.tar.gz	1997-01-09 03:00
textutils-1.22.tar.gz	1997-01-27 03:00
textutils-2.0.tar.gz	1999-08-07 15:05
textutils-2.1.tar.bz2	2002-07-31 02:53
textutils-2.1.tar.gz	2002-07-31 02:53

これらのすべての内容を確認してはいないのですが、

2.0
- ja.po なし
- ソースコード中に例の記述あり
2.1
- ja.po あり
- ソースコード中に例の記述なし。そしてこの時点で英語版と日本語版とで食い違いがあり
  - 英語msgに例の記述なし
  - 日本語msgに例の記述あり

という状態でした。

2.0のtr.cと2.1のtr.cのdiffを取ってみるとこんな感じです(今回の話題に関係ないところは削除しています)

--- textutils-2.0/src/tr.c	1999-04-04 04:46:19.000000000 +0900
+++ textutils-2.1/src/tr.c	2002-07-02 14:15:06.000000000 +0900
@@ -1,5 +1,5 @@
 /* tr -- a filter to translate characters
-   Copyright (C) 91, 1995-1998, 1999 Free Software Foundation, Inc.
+   Copyright (C) 91, 1995-2002 Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -350,9 +354,10 @@ Interpreted sequences are:\n\
   \\n              new line\n\
   \\r              return\n\
   \\t              horizontal tab\n\
+"), stdout);
+     fputs (_("\
   \\v              vertical tab\n\
   CHAR1-CHAR2     all characters from CHAR1 to CHAR2 in ascending order\n\
-  [CHAR1-CHAR2]   same as CHAR1-CHAR2, if both SET1 and SET2 use this\n\
   [CHAR*]         in SET2, copies of CHAR until length of SET1\n\
   [CHAR*REPEAT]   REPEAT copies of CHAR, REPEAT octal if starting with 0\n\
   [:alnum:]       all letters and digits\n\
@@ -360,6 +365,8 @@ Interpreted sequences are:\n\

見事に例の部分が2.1からはなくなっています。ということでChangeLogを見ると

changelog

2000-08-10  Paul Eggert  <eggert@twinsun.com>

* doc/textutils.texi: Recommend against the System V syntax
for tr ranges, and don't use it in examples.  Use POSIX
classes rather than ranges, for portability.
* src/tr.c (usage): Don't describe System V syntax, as it
doesn't always work.

* src/sort.c (usage): Describe -d and -i in a locale-independent way.

* doc/Makefile.am (constants.texi): Use the C locale so that
[A-Z] works as expected.

該当する変更はこの辺ですかね。前世紀からの遺物でしたか。にしてもなんでこんなことに。

ついでにそれぞれのInfo(正確にはその元になったtexinfoファイル)で trの部分を見てみると

2.0

@item Ranges
@cindex ranges

The notation @samp{@var{m}-@var{n}} expands to all of the characters
from @var{m} through @var{n}, in ascending order.  @var{m} should
collate before @var{n}; if it doesn't, an error results.  As an example,
@samp{0-9} is the same as @samp{0123456789}.  Although GNU @code{tr}
does not support the System V syntax that uses square brackets to
enclose ranges, translations specified in that format will still work as
long as the brackets in @var{string1} correspond to identical brackets
in @var{string2}.

2.1

@item Ranges
@cindex ranges

The notation @samp{@var{m}-@var{n}} expands to all of the characters
from @var{m} through @var{n}, in ascending order.  @var{m} should
collate before @var{n}; if it doesn't, an error results.  As an example,
@samp{0-9} is the same as @samp{0123456789}.

@sc{gnu} @command{tr} does not support the System V syntax that uses square
brackets to enclose ranges.  Translations specified in that format
sometimes work as expected, since the brackets are often transliterated
to themselves.  However, they should be avoided because they sometimes
behave unexpectedly.  For example, @samp{tr -d '[0-9]'} deletes brackets
as well as digits.

Many historically common and even accepted uses of ranges are not
portable.  For example, on @acronym{EBCDIC} hosts using the @samp{A-Z}
range will not do what most would expect because @samp{A} through @samp{Z}
are not contiguous as they are in @acronym{ASCII}.
If you can rely on a @acronym{POSIX} compliant version of @command{tr}, then
the best way to work around this is to use character classes (see below).
Otherwise, it is most portable (and most ugly) to enumerate the members
of the ranges.

記述量は増えてますが確かに Don't describe System V syntax, as it doesn't always work. ですね。

manページの記述に関してですが、調べた限りではcoreutils(とfindutils)のmanページはコマンドに --helpオプションをつけて実行したときの出力を使って作られているようです。このため、英語のmanページを見ても例の記述は(はるか昔に削除されてしまっているので) 見つからず、(なぜか残ってしまっている)日本語のmanページには今も存在し続けている。ということのようです。

ここで(WSL上の)Ubuntu 20.04で試した結果を。

まずは英語(C.UTF8)から

kbk@toybox4:/mnt/c/Users/kbk$ tr --help
Usage: tr [OPTION]... SET1 [SET2]
Translate, squeeze, and/or delete characters from standard input,
writing to standard output.

  -c, -C, --complement    use the complement of SET1
  -d, --delete            delete characters in SET1, do not translate
  -s, --squeeze-repeats   replace each sequence of a repeated character
                            that is listed in the last specified SET,
                            with a single occurrence of that character
  -t, --truncate-set1     first truncate SET1 to length of SET2
      --help     display this help and exit
      --version  output version information and exit

SETs are specified as strings of characters.  Most represent themselves.
Interpreted sequences are:

  \NNN            character with octal value NNN (1 to 3 octal digits)
  \\              backslash
  \a              audible BEL
  \b              backspace
  \f              form feed
  \n              new line
  \r              return
  \t              horizontal tab
  \v              vertical tab
  CHAR1-CHAR2     all characters from CHAR1 to CHAR2 in ascending order
  [CHAR*]         in SET2, copies of CHAR until length of SET1
  [CHAR*REPEAT]   REPEAT copies of CHAR, REPEAT octal if starting with 0
  [:alnum:]       all letters and digits
  [:alpha:]       all letters
  [:blank:]       all horizontal whitespace
  [:cntrl:]       all control characters
  [:digit:]       all digits
  [:graph:]       all printable characters, not including space
  [:lower:]       all lower case letters
  [:print:]       all printable characters, including space
  [:punct:]       all punctuation characters
  [:space:]       all horizontal or vertical whitespace
  [:upper:]       all upper case letters
  [:xdigit:]      all hexadecimal digits
  [=CHAR=]        all characters which are equivalent to CHAR

Translation occurs if -d is not given and both SET1 and SET2 appear.
-t may be used only when translating.  SET2 is extended to length of
SET1 by repeating its last character as necessary.  Excess characters
of SET2 are ignored.  Only [:lower:] and [:upper:] are guaranteed to
expand in ascending order; used in SET2 while translating, they may
only be used in pairs to specify case conversion.  -s uses the last
specified SET, and occurs after translation or deletion.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report tr translation bugs to <http://translationproject.org/team/>
Full documentation at: <http://www.gnu.org/software/coreutils/tr>
or available locally via: info '(coreutils) tr invocation'

続いて日本語(ja_JP.UTF8)。

kbk@toybox4:/mnt/c/Users/kbk$ LC_ALL=ja_JP.UTF8 tr --help
使用法: tr [OPTION]... SET1 [SET2]
Translate, squeeze, and/or delete characters from standard input,
writing to standard output.

  -c, -C, --complement    use the complement of SET1
  -d, --delete            delete characters in SET1, do not translate
  -s, --squeeze-repeats   replace each sequence of a repeated character
                            that is listed in the last specified SET,
                            with a single occurrence of that character
  -t, --truncate-set1     first truncate SET1 to length of SET2
      --help     この使い方を表示して終了する
      --version  バージョン情報を表示して終了する

SET は文字列によって指定します。多くの場合その文字自身を表現します。
解釈のされ方は以下の通りです:

  \NNN            文字の八進数表現(1 から 3 個の 八進数)
  \\              バックスラッシュ
  \a              ベル
  \b              バックスペース
  \f              フォームフィード
  \n              改行
  \r              復帰
  \t              水平タブ
  \v              垂直タブ
  CHAR1-CHAR2     CHAR1 から CHAR2 までを昇順に展開した文字列
  [CHAR1-CHAR2]   SET1 と SET2 の両方で指定した場合には CHAR1-CHAR2 と同じ
  [CHAR*]         SET2 として, CHAR を SET1 の長さ分展開した文字列
  [CHAR*REPEAT]   CHAR を REPEAT 個展開した文字列, REPEAT の値を 0 から
                      始めた場合には八進数として解釈する
  [:alnum:]       全てのアルファベットと数字
  [:alpha:]       全てのアルファベット
  [:blank:]       全ての水平方向空白類文字
  [:cntrl:]       全ての制御文字
  [:digit:]       全ての数字
  [:graph:]       全ての表示可能文字。空白は含まない
  [:lower:]       全ての小文字アルファベット
  [:print:]       全ての表示可能文字。空白も含む
  [:punct:]       全ての句読点
  [:space:]       全ての水平及び垂直タブ文字
  [:upper:]       全ての大文字アルファベット
  [:xdigit:]      全ての十六進数数値
  [=CHAR=]        全ての CHAR と等価な文字

Translation occurs if -d is not given and both SET1 and SET2 appear.
-t may be used only when translating.  SET2 is extended to length of
SET1 by repeating its last character as necessary.  Excess characters
of SET2 are ignored.  Only [:lower:] and [:upper:] are guaranteed to
expand in ascending order; used in SET2 while translating, they may
only be used in pairs to specify case conversion.  -s uses the last
specified SET, and occurs after translation or deletion.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
tr の翻訳に関するバグは <http://translationproject.org/team/ja.html> に連絡してください。
Full documentation at: <http://www.gnu.org/software/coreutils/tr>
or available locally via: info '(coreutils) tr invocation'

ということで

修正した方が良いと思うのですが、どこに報告すればいいんだろう?

この辺?

bootstrap

ところでgitのリポジトリの coreutils/po at master · coreutils/coreutils に.poファイルがない問題ですが、 coreutils/bootstrap at master · coreutils/coreutils を見ると、やはりcoreutilsとは別に管理されていて必要に応じて開発者のローカル環境にコピーしているようです (GNU Libと同様)。

coreutils/bootstrap at master · coreutils/coreutils

# Get translations.

download_po_files() {
  subdir=$1
  domain=$2
  echo "$me: getting translations into $subdir for $domain..."
  cmd=$(printf "$po_download_command_format" "$subdir" "$domain")
  eval "$cmd"
}

# Mirror .po files to $po_dir/.reference and copy only the new
# or modified ones into $po_dir.  Also update $po_dir/LINGUAS.
# Note po files that exist locally only are left in $po_dir but will
# not be included in LINGUAS and hence will not be distributed.
update_po_files() {
  # Directory containing primary .po files.
  # Overwrite them only when we're sure a .po file is new.
  po_dir=$1
  domain=$2

  # Mirror *.po files into this dir.
  # Usually contains *.s1 checksum files.
  ref_po_dir="$po_dir/.reference"

  test -d $ref_po_dir || mkdir $ref_po_dir || return
  download_po_files $ref_po_dir $domain \
    && ls "$ref_po_dir"/*.po 2>/dev/null |
      sed 's|.*/||; s|\.po$||' > "$po_dir/LINGUAS" || return

  langs=$(cd $ref_po_dir && echo *.po | sed 's/\.po//g')
  test "$langs" = '*' && langs=x
  for po in $langs; do
    case $po in x) continue;; esac
    new_po="$ref_po_dir/$po.po"
    cksum_file="$ref_po_dir/$po.s1"
    if ! test -f "$cksum_file" ||
        ! test -f "$po_dir/$po.po" ||
        ! $SHA1SUM -c "$cksum_file" < "$new_po" > /dev/null 2>&1; then
      echo "$me: updated $po_dir/$po.po..."
      cp "$new_po" "$po_dir/$po.po" \
          && $SHA1SUM < "$new_po" > "$cksum_file" || return
    fi
  done
}

実際にダウンロードを行うコマンド $po_download_command_formatの中身はこう。

coreutils/bootstrap at master · coreutils/coreutils

# The command to download all .po files for a specified domain into a
# specified directory.  Fill in the first %s with the destination
# directory and the second with the domain name.
po_download_command_format=\
"wget --mirror --level=1 -nd -nv -A.po -P '%s' \
 https://translationproject.org/latest/%s/"

ダウンロードするのに参照しているのはここですか。

Welcome to the Translation Project

さっきのところよりこちらに問題を報告する方がよさげ?

m

ところで coreutils.git - GNU coreutils のファイル一覧を見ると

m---------    gnulib @ dd0af10        0       log

というのがあるんですが、この先頭のmって何を表しているんでしょう?

foreach

Previous:

foreach my $key (keys %hash) {
    my $val = $hash{$key};
    
    print "$key => $val\n";
}

New:

foreach my ($key, $val) (%hash) {
    print "$key => $val\n";
}

これって前からできなかったっけ? と思いつつ確かめると

Perlの組み込み関数 each の翻訳 - perldoc.jp

    while (my ($key, $value) = each %hash) {
        print $key, "\n";
        delete $hash{$key};   # This is safe
    }

ああ、そうか。 foreachで列挙できる形ではなかったということか。

≪ prev 神無月の巫女

next ≫ シニカルヒステリーアワー