ときどきの雑記帖 RE* (新南口)

きみのせい

September 20, 2021

しずえさん

今週(と来週)はとある事情でしずえさんを見られない(たぶん)。

p-code

昔々のMS-C(Visual Cになる前)には主にコードを小さくするために p-codeを使うというオプションがあったのだけど、このp-codeの情報ってどこかにないですかね。

ってこんなのが pcode/PCODE.TXT at master · jrmuizel/pcode

PCODE.TXT is the decoded version of the PCODE.hlp from Microsoft C/C++ 7.0

Programming explained with Music

I use Scala so I'm cool right?......right? https://t.co/fcbTigxEq4 pic.twitter.com/Z5GnWPVfmM
— Programmer Humor (@PR0GRAMMERHUM0R) September 17, 2021

ツイート貼り付けだと最初の四つしか見えないけど、八つのプログラミング言語が取り上げられていてその内訳はこう。

PHP
JavaScript
C#
css
Haskell
Swift
Java
Lisp

それぞれの「たとえ」が適当なのかどうかはわたしにはわからん(特にLisp)😄

ツイートで言及されている I use Scala so I’m cool right?……right? : ProgrammerHumor には上記のもの以外の言語もあり(イラストにはなってないけどね)。

gawk -v

さらにさらにこの話題(-vオプションの話) を引っ張るのじゃよ。

gawk.git - gawk

diff --git a/node.c b/node.c
index 54ea6627..4ad41ef1 100644
--- a/node.c
+++ b/node.c
@@ -451,6 +451,8 @@ make_str_node(const char *s, size_t len, int flags)
	if (c < 0) {
		if (do_lint)
			lintwarn(_("backslash at end of string"));
+		if ((flags & ELIDE_BACK_NL) != 0)
+			continue;
 		c = '\\';
 	}
 	*ptm++ = c;

よく見ると、このコミットではlintwarnで出力されるメッセージが5.1.0ものとは違っていた。 4.1.4 ではこうで

lintwarn(_("backslash at end of string"));

5.1.0ではこう

lintwarn(_("backslash string continuation is not portable"));

じゃあこの変更はいつ入ったのかというと 2018-08-02 18:20:53 +0300 のこれ。

Add lint warning for escaped newlines.

gawk.git - gawk

diff --git a/node.c b/node.c
index 4ad41ef1..2f9e6d4c 100644
--- a/node.c
+++ b/node.c
@@ -450,7 +450,7 @@ make_str_node(const char *s, size_t len, int flags)
	c = parse_escape(&pf);
	if (c < 0) {
		if (do_lint)
-			lintwarn(_("backslash at end of string"));
+			lintwarn(_("backslash string continuation is not portable"));
		if ((flags & ELIDE_BACK_NL) != 0)
			continue;
		c = '\\';

コミット時のコメントや修正内容を考えると、文字列末尾の\に対する現状の動作は意図していなかったもののように思えるんだけど、本当のところは(メンテナーである)Arnoldに訊かないとわからんなあこれ。

kbk@toybox4:/mnt/c/Users/kbk$ gawk --lint -v bs='a\' 'BEGIN{print bs "" }'
gawk: warning: backslash at end of string
a\
kbk@toybox4:/mnt/c/Users/kbk$ gawk --lint  'BEGIN{bs="a\\"; print bs "" }'
a\
kbk@toybox4:/mnt/c/Users/kbk$ gawk --lint  'BEGIN{bs="a\"; print bs "" }'
gawk: cmd. line:1: BEGIN{bs="a\"; print bs "" }
gawk: cmd. line:1:                          ^ unterminated string
gawk: cmd. line:1: BEGIN{bs="a\"; print bs "" }
gawk: cmd. line:1:                          ^ syntax error

ふむ?

gensubはgawk固有の拡張機能です

【一行野郎】コロン「:」区切りによる最短一致と最長一致の取り出し - Qiita

文字列「1A:2B:3C:4D」をコロン「:」区切りにする.
このときの最短一致である「1A」を取り出す場合.

$ echo '1A:2B:3C:4DF' | \
    awk '{ $0 = gensub(/[(.+):](.+)+/, "\\\\2", "G", $0); print $0 }'

文字列「1A:2B:3C:4D」をコロン「:」区切りにする. このときの最長一致である「1A:2B:3C」を取り出す場合.

$ echo '1A:2B:3C:4DF' | \
    awk '{ $0 = gensub(/(.+):(.+)+/, "\\\\1", "G", $0); print $0 }'

(echo の引数のラストが4DFなのは原文のまま) gensub (9.1.3 String-Manipulation Functions - The GNU Awk User’s Guide) をgawk で呼び出して使うならともかく、(実体は同じだとしても) awkで使うのはアウトだろー

まあそれはともかくとしても、わざわざ$0に代入しておいてから$0を出力するのにもツッコミを入れたくなるな😄

gensubを使わずにやるとしたらこんな感じか?

kbk@toybox4:/mnt/c/Users/kbk$ echo 'aa:bb:cc:dd' | awk -F':' '{print $1}'
aa
kbk@toybox4:/mnt/c/Users/kbk$ echo 'aa:bb:cc:dd' | awk  -F':' -vOFS=':' '{print $1,$2,$3}'
aa:bb:cc

二番目はちょっと美しくないけど、元記事のcut(1)バージョンでもフィールド数を指定してるしね。

あるいはこうとか。

kbk@toybox4:/mnt/c/Users/kbk$ echo 'aa:bb:cc:dd' | awk '{gsub(/:.+$/, "");print}'
aa
kbk@toybox4:/mnt/c/Users/kbk$ echo 'aa:bb:cc:dd' | awk '{gsub(/:[^:]+$/, "");print}'
aa:bb:cc

シェルスクリプト

↑の向井さんのを読んでいて気がついた(知った)のだけど転職した – Jun Mukai’s blog なのだとか。

論文紹介podcastの再開はないのかなあ…

pascal

“ソ能表"問題 - Qiita

実は古のPascalコンパイラーにも同じような問題があって、コメントの開始・終了は (* と *)の組み合わせ、 { と } の組み合わせのどちらでも使えたのだけど、後者はShifJISの2バイト目に含まれるのでコメントで日本語を書きたい場合は前者を使わないと… という。

翻訳

1. Japanese only version

2021-09-02(rev10): Shiro Kawaiさんにご指摘いただいた誤訳訂正/改善を反映

というのを見て、どんな指摘だったのか気になったので差分を見てみた。

以下差分は Revisions - 1. Japanese only version · GitHub から、原文は http://call-with-current-continuation.org/articles/forth.txt から。

@@ -20,8 +24,9 @@ Forthは、一人の人間や少人数のプログラマのグループに合わ
  Forthの一つの見方に過ぎません。
 
  私の認識では、Forthプログラマには2つの考え方の流派が存在します。一方は、極端
- なシンプルさ、ミニマリズム、短い定義、可能な限り金属的であることを好む「古典派」
- です。もう一方は、他の言語に見られる機能で古典的なForthモデルを拡張し、一般的に
+ なシンプルさ、ミニマリズム、短い定義、そして可能な限りハードウェアに近いところ
+ まで降りていくことを好む「古典派」です。
+ もう一方は、他の言語に見られる機能で古典的なForthモデルを拡張し、一般的に
  機能やツールを追加することで、よりコンピューティングの主流にアクセスしやすい言語
  にしようと努力する「現代派」です。後者は、概してスタック、逆ポーランド記法、重い
  リファクタリングといったアイデアを、一般的に現代のプログラミング言語に不可欠で

As I perceive it, there exist two schools of thought among Forth
programmers: the "classical" school that prefers extreme simplicity,
minimalism, short definitions and being down to the metal as much
as possible, and on the other hand the "modern" school that extends
the classic Forth model with features found in other languages and
which generally strives for making the language more accessible to the
computing mainstream by adding features, tools and generally attempts to
mix the ideas of stacks, reverse polish notation and heavy refactoring
with that what is commonly thought as both essential and desirable for
contemporary programming languages, but what just ends up being some
sort of Lisp with a funny notation.

@@ -52,7 +57,7 @@ Forthはボキャブラリーを使った非常に強力な名前空間システ
 
  Forthは通常、対話的に動作します。基本的な read-eval-print-loop は非常にシンプル
  なので、すべての機能を対話的な方法でテストできます。Forthを実装する際に最初に行う
- ことの一つは、トップレベルインタプリタを育てることです。デバイスドライバを対話的に
+ ことの一つは、トップレベルインタプリタを動くようにすることです。デバイスドライバを対話的に
  デバッグできることを想像してみてください(それも、高レベル言語でデバッグできるように
  なる前に膨大な量のコードがすでに存在して動作しなければならないLispマシンの前に
  座らずに)。組込みシステムのプログラミング、テスト、調査がどれだけ簡単にできるかを

Forth is usually interactive. Since the basic read-eval-print-loop is so
simple, you are able to test all functionality in an interactive manner,
as one of the first things you do is when implementing a forth is bringing
up the toplevel interpreter.  Imagine being able to debug device drivers
interactively (and not sitting in front of a Lisp machine where already
huge amounts of code must exist and work before you are even able to do
so in a high-level language) and you get a glimpse of how much easier
embedded systems programming, testing and epxloration can be.

@@ -70,8 +75,9 @@ Forthにはメモリしかありません。メモリ内のストレージのワ
  聞こえるかもしれませんが、メモリレイアウトのあらゆる側面を完全に制御することが
  できます。あなたとマシンのアーキテクチャの間には人工的な障壁はありませんし、
 フォン・ノイマン・マシンに乗っていないと思わせようとする抽象化もありません。
- なぜなら、あなたが何であるかを認めたものが、最終的に真の、公開されたマシン
- リソースへのアクセスになるからです。これは良いことで、すべてのC言語プログラマは
+ 実際、フォン・ノイマン・マシンの上に乗っているのですから。
+ そして、そう認めることで、最終的に真の公開されたマシンリソースへのアクセスが
+ 手に入るのです。これは良いことで、すべてのC言語プログラマは
 このレベルのアクセスが可能(そしてそれを必要としている)ですが、Forthプログラマは、
 コンパイラの透過性の無い操作や、不明確な構造レイアウトや定義されていない動作の
 ような、邪魔になる脆い抽象化には驚かされないでしょう。言語標準に反してプログラム

In Forth there is only memory - word and byte-sized cells of storage
in memory, and stacks. You step down on the level of assembly language
which may sound daunting, yet gives you full control over every aspect
of memory layout. There are no artificial barriers between you and the
machine's architecture, no abstractions that try to make you believe
you are not on a von Neumann machine, because that's what you are and
acknowleding this gives you in the end true, unveiled access to the
machine's resources. This is good, every C programmer has this level of
access (and needs it), but the Forth programmer will not be surprised
by intransparent machinations of the compiler and brittle abstraction
that just get in the way, like unclear structure layout or undefined
behaviour.

@@ -92,11 +98,10 @@ Forthには「yak-shaving」はありません: 自分とゴールとの間に
 約束する人は誰でも信じますが、それはすべて失敗しています。マシンから離れすぎて、
 言語、コンパイラ、または「パラダイム」を喜ばせるためだけに何かをしてしまうからです。
 
- 今日ではパフォーマンスの側面で一部の人はさらに先を行っており、最近の複雑な言語は
- 基本的に解釈実行される言語ではサポートできないような重度の最適化を提供しています
- (とはいえ、Forthインタプリタは非常に低レベルでなので、コンパイルされたコードと
- 解釈されたコードの境界線は非常に薄くなっており、多くのCPUにおいてインタプリタは
- 2つまたは3つのマシン命令で構成されています)。これは技術的には正しいのですが、
+ ここで性能の話を持ち出す人がいることでしょう。モダンで複雑な言語は単純なインタプリタ型
+ 言語にはできないヘビーな最適化を提供しているのだと(とはいえ、Forthインタプリタは非常に
+ 低レベルで、コンパイルされたコードと解釈されるコードの境界は非常に薄く、多くのCPUで
+ インタプリタは2つか3つのマシン命令で構成されていますが)。これは技術的には正しいのですが、
  このような重い最適化作業が必要かどうかは議論の余地があります。あなたのコードの
  ほとんどは、最近作り出されているブロートウェア(bloatware)の中で実行されることは
  めったに(あるいは決して)ないでしょうが、実行時パフォーマンスの調査は、やや宗教的な

Now some may bring forward the performance side of things, that more
modern and more complex languages provide heavy duty optimizations that
a language that is basically interpreted can not support (albeit the
Forth interpreter is so low-level that the boundary between compiled code
and interpreted code becomes very thin, and on many CPUs the interpreter
consists of two or three machine instructions). This is technically true,
but whether you need all this heavy optimization work is an open question:
most of your code runs seldom (or never), particularly in the bloatware
that is produced nowadays, and the search for runtime performance has
taken on a somewhat religious meaning.

原文すべてに目を通してはいないけど、一文が長くて訳しづらそう。

長い原文をどのように短い複数の文に分けるのとか、この単語をこの日本語に当てはめるのかなどなどとても参考になった(が、それを活用できるかは別問題)。

あと、bloatwareはカタカナ書きはしかたないにしても、脚注で説明を加えるくらいはしてよかったかも(という外野の感想)。

glob zsh 13

前回

ということで、次はこの続きではなく scannerの中身を見ていくことにしよう。

と書いたものの、この関数結構長かった (しかも再帰呼び出しあり)😄

zsh/glob.c at master · zsh-users/zsh

/* Do the globbing:  scanner is called recursively *
 * with successive bits of the path until we've    *
 * tried all of it.                                */

/**/
static void
scanner(Complist q, int shortcircuit)
{
    Patprog p;
    int closure;
    int pbcwdsav = pathbufcwd;
    int errssofar = errsfound;
    struct dirsav ds;

    if (!q || errflag)
        return;
    init_dirsav(&ds);

    if ((closure = q->closure)) {
        /* (foo/)# - match zero or more dirs */
        if (q->closure == 2)        /* (foo/)## - match one or more dirs */
            q->closure = 1;
        else {
            scanner(q->next, shortcircuit);
            if (shortcircuit && shortcircuit == matchct)
                return;
        }
    }
    p = q->pat;

closure 云々はスルーしてその先へ

    /* Now the actual matching for the current path section. */
    if (p->flags & PAT_PURES) {
        /*
         * It's a straight string to the end of the path section.
         */
        char *str = (char *)p + p->startoff;

このif(と対応するelse)が結構長い。

    /* Now the actual matching for the current path section. */
    if (p->flags & PAT_PURES) {
  省略
    } else {
        /* Do pattern matching on current path section. */
        char *fn = pathbuf[pathbufcwd] ? unmeta(pathbuf + pathbufcwd) : ".";
        int dirs = !!q->next;
        DIR *lock = opendir(fn);
        char *subdirs = NULL;
        int subdirlen = 0;

        if (lock == NULL)
            return;
        while ((fn = zreaddir(lock, 1)) && !errflag) {
            /* prefix and suffix are zle trickery */
  省略
    }
}

どうもelse部の方が「本命」ぽいのでこちらを先に見る。

ところでscannerの呼び出し元で参照しているmatchctだけど、 scannerでは参照しているだけで、値を書き換えているのは scannerから呼び出しているinsertという関数の中で。

scanner中でmatchctを参照している箇所

insertを呼び出している箇所

zsh/glob.c at 00d20ed15e18f5af682f0daec140d6b8383c479a · zsh-users/zsh

/* This may be set by qualifier functions to an array of strings to insert
 * into the list instead of the original string. */

static char **inserts;

/* add a match to the list */

/**/
static void
insert(char *s, int checked)
{

実はinsertもそれなりの長さがあるので、さてどう進めましょうかね。

C 言語を書くときに map がないとか reduce がないとか気にしないでしょ？
そのノリのままで Go も書けばいいんですよ (適当)
— Shinya Kato (@0x19f) September 17, 2021

≪ prev Fragile

next ≫ デストロイ・オール・ヒューマンズ!