ときどきの雑記帖 RE* (新南口)

タタール人の砂漠

June 14, 2022

216 = 65536

屈辱の数学史 A Comedy of math errors

という数学的読み物の本を読んでいたら

216 = 65536

という式が出てきた(p.183)。

なんじゃこりゃと一瞬思ったが

2¹⁶ = 65536

の誤植のようだ。縦書きの本なので(この式も縦書きの文に含まれていた)気がつきにくいのかとも思うけど…ねえ?

しかし例によって直訳は難しいとはいえなんで邦題がこうなったんたんだろう?

Windows termial

wsl に ubuntu 20.04を入れたのだけど以前から入れていた18.04も残ってしまい、どうしたもんかと思っていたのだけど「アプリと機能」から探してアンインストールすれば良いだけだった。

ただそれだけだと wslの起動キーボードショートカットが生きたままなので、 wslのjsonファイルから

    "copyFormatting": "none",
    "copyOnSelect": false,
    "defaultProfile": "{61c54bbd-c2c6-5271-96e7-009a87ff44bf}",
    "profiles": 
    {
        "defaults": {},
        "list": 
        [
            {

という辺りに

                "name": "Ubuntu-20.04",
                "source": "Windows.Terminal.Wsl"

のような記述があるので(これは20.04のものだけど) 18.04に対する部分を削除しておしまい。ディスクが2GiBちょい空いた。

IBM 704 もしくはWikipedia

うぃきぺでIBM 704の項目を読んでいたら以下のような文章に遭遇した。

日本語版 IBM 704 - Wikipedia

デクリメントレジスタとはインデックスレジスタであり、ベースアドレスからその値を引いて実効アドレスとするため、このように呼ばれた。1命令で3つのデクリメントレジスタ全てを関与させることもできる。命令にある3ビットのタグフィールドは各ビットがそれぞれデクリメントレジスタに対応しているので、複数をONにすれば複数のデクリメントレジスタをアドレス計算に使用できる。しかし、複数のインデックス・レジスタが選択された場合、デクリメントが行われる前に、それらの内容は加算されるのではなく、一緒に加算される。

それらの内容は加算されるのではなく、一緒に加算される。 という部分にどういうこっちゃ? と理解に苦しんだので英語版にあたってみると…

英語版

The contents of the index registers are subtracted from the base address, so the index registers are also called “decrement registers”. All three index registers can participate in an instruction: the 3-bit tag field in the instruction is a bit map specifying which of the registers participate in the operation. However, when more than one index register is selected, then their contents are ORed – not added – together before the decrement takes place.

あー。「OR」が「加算」に化けちゃったのか。

コードレビュー

Code Review: How to make enemies | RepoHealth

最初のやつで笑った

Step 1: Code Style comments

petzolt

プログラミングWindowsの新しいのはもう出ないどうこうという話を以前書いたことがあるけど本を書くこと自体を止めたわけではなかったっぽい。＞Charles Petzold

(3.1の、すごい値段がついてるな)

前のCODEも読んだ覚えがあるのだけどもう20年前(日本語版が2003年刊行)なのか。ひー。

ksh

例の話題をまだ引っ張ってみる😄

今回はkshの算術式の解析の辺り。 A_POWといういかにもな名前があるのでそれでソースコードを検索してみると

ast/streval.c at cc1bca276ca82e6d9687af77531420a3a779a5a8 · att/ast

if(peekchr(vp)== '=' && !(strval_precedence[op]&NOASSIGN))
{
	if((!lvalue.value || precedence > 3))
		ERROR(vp,e_notlvalue);
	if(precedence==3)
		precedence = 2;
	assignop = lvalue;
	getchr(vp);
	c = 3;
}
else
{
	c = (strval_precedence[op]&PRECMASK);
	if(c==MAXPREC || op==A_POW)
		c++;
	c *= 2;
}

や

ast/streval.c at cc1bca276ca82e6d9687af77531420a3a779a5a8 · att/ast

case A_POW:
    num = pow(sp[-1],num);
    break;

といったものが見つかる。前者はともかく後者が実際にべき乗を計算しているところだろう。

単項マイナスについてもA_UMINUS というものがあるので

#define A_POP   	41
#define A_SWAP  	42
#define A_UMINUS	43
#define A_JMPZ  	44
#define A_JMPNZ		45

同様に検索すると

ast/streval.c at cc1bca276ca82e6d9687af77531420a3a779a5a8 · att/ast

case A_MINUS:
    op =  A_UMINUS;
    goto common;

というのが見つかる。この部分だけではよくわからないけど (なにか「すり替えている」らしいとはわかる)、これを含む関数を見ると

ast/streval.c at cc1bca276ca82e6d9687af77531420a3a779a5a8 · att/ast

static int expr(register struct vars *vp,register int precedence)
{
        register int    c, op;
        int             invalid,wasop=0;
        struct lval     lvalue,assignop;
        const char      *pos;
        Sfdouble_t      d;

        lvalue.value = 0;
        lvalue.nargs = 0;
        lvalue.fun = 0;
        lvalue.shp =  vp->shp;
again:
        op = gettok(vp);
        c = 2*MAXPREC+1;
        switch(op)
        {
            case A_PLUS:
                goto again;
            case A_EOF:
                if(precedence>2)
                        ERROR(vp,e_moretokens);
                return(1);
            case A_MINUS:
                op =  A_UMINUS;
                goto common;
            case A_NOT:
                goto common;
            case A_MINUSMINUS:
                c = A_LVALUE;
                op = A_DECR|T_NOFLOAT;
                goto common;
            case A_PLUSPLUS:
                c = A_LVALUE;
                op = A_INCR|T_NOFLOAT;
                /* FALL THRU */
            case A_TILDE:
                op |= T_NOFLOAT;
            common:
                if(!expr(vp,c))
                        return(0);
                stakputc(op);
                break;
            default:
                vp->nextchr = vp->errchr;
                wasop = 1;
        }
        invalid = wasop;
        while(1)
        {
                assignop.value = 0;
                op = gettok(vp);
                if(op==A_DIG || op==A_REG || op==A_LIT)
                {
                        if(!wasop)
                                ERROR(vp,e_synbad);
                        goto number;
                }
                if(wasop++ && op!=A_LPAR)
                        ERROR(vp,e_synbad);
                /* check for assignment operation */
                if(peekchr(vp)== '=' && !(strval_precedence[op]&NOASSIGN))
                {
                        if((!lvalue.value || precedence > 3))
                                ERROR(vp,e_notlvalue);
                        if(precedence==3)
                                precedence = 2;
                        assignop = lvalue;
                        getchr(vp);
                        c = 3;
                }
                else
                {
                        c = (strval_precedence[op]&PRECMASK);
                        if(c==MAXPREC || op==A_POW)
                                c++;
                        c *= 2;
                }
                /* from here on c is the new precedence level */

「式」の先頭に-が現れるとそれをA_UMINUSとして

case A_MINUS:
    op =  A_UMINUS;
    goto common;

スタックに放り込むと思ったら先に expr(自分自身)を呼び出していた。

common:
    if(!expr(vp,c))
            return(0);
    stakputc(op);
    break;

-に後続する「式」を(再帰呼び出しした先で)スタックに積んだ後で先ほどのA_UMINUSをスタックに積んでいるわけだけどここのexprの呼び出しで渡している第二引数は関数の先頭の方でc = 2*MAXPREC+1;として設定していたもの。

MAXPRECはstreval.hで15として定義されていて、一方A_POWの優先順位はstrdata.cで

/* POW */   14|NOASSIGN|RASSOC,

14とされている。ということで

c = (strval_precedence[op]&PRECMASK);
if(c==MAXPREC || op==A_POW)
        c++;
c *= 2;

という操作はあるのだけど

if(precedence >= c)
        goto done;

このifの条件が真になる(31 > 30)ので、単項マイナス数値リテラルべき乗演算子 … とあったときには単項マイナスを優先して解釈される。と。

$0=$2

onetrueawk のリポジトリ追いかけてたらこんなissueがあった。

$0=$2 “glitch” · Issue #147 · onetrueawk/awk

$ git branch 
* master
$ git log -n1 --oneline 
2402014 (HEAD -> master, origin/staging, origin/master, origin/HEAD) Eliminate file management memory leak
$ echo 1: 2 | ./a.out 'NF==2 && $0=$2'
$ echo 1: 2 | ./a.out 'NF==2 && $0=$2 "" '
2

ちょっとわかりづらいのだけどコードを追いかけると

diff --git a/run.c b/run.c
index df616fc..277e5c7 100644
--- a/run.c
+++ b/run.c
@@ -1133,8 +1133,9 @@ Cell *assign(Node **a, int n)	/* a[0] = a[1], a[0] += a[1], etc. */
 		if (x == y && !(x->tval & (FLD|REC)) && x != nfloc)
 			;	/* self-assignment: leave alone unless it's a field or NF */
 		else if ((y->tval & (STR|NUM)) == (STR|NUM)) {
+			yf = getfval(y);
 			setsval(x, getsval(y));
-			x->fval = getfval(y);
+			x->fval = yf;
 			x->tval |= NUM;
 		}
 		else if (isstr(y))

getfvalがyで指定されたフィールドの(数値としての)値を取得する関数なんだけど、 x->fval = getfval(y);で取得する値がその直前で呼び出しているsetsval(x, getsval(y));でレコード($0)の再構築をされてしまうので意図したものと違う値になってしまう(場合がある)と。

副作用こわい。というやつですか(多分違う)。

swift

Swiftの正規表現、PCRE2と鬼車とICUの正規表現と.NETの正規表現のスーパーセットを受け付けるとか恐しいことが書いてある。https://t.co/5zks3zPeL8
— ると (@cocoa_ruto) June 12, 2022

え、どういうこと? と思いリンク先に飛ぶ。色々書かれているけどこの辺かしら。

swift-evolution/0355-regex-syntax-run-time-construction.md at main · apple/swift-evolution

Syntax

We propose accepting a syntactic “superset” of the following existing regular expression engines:

PCRE 2, an “industry standard” and a rough superset of Perl, Python, etc.

Oniguruma, a modern engine with additional features.

ICU, used by NSRegularExpression, a Unicode-focused engine.

.NET, which adds delimiter-balancing and some interesting minor details around conditional patterns.

To our knowledge, all other popular regex engines support a subset of the above syntaxes.
We also support UTS#18’s full set of character class operators (to our knowledge no other engine does). Beyond that, UTS#18 deals with semantics rather than syntax, and what syntax it uses is covered by the above list. We also parse Java’s properties (e.g. \p{javaLowerCase}), meaning we support a superset of Java 8 as well.
Note that there are minor syntactic incompatibilities and ambiguities involved in this approach. Each is addressed in the relevant sections below.
Regex syntax will be part of Swift’s source-compatibility story as well as its binary-compatibility story. Thus, we present a detailed and comprehensive design.

ふむ。

操車場アルゴリズム(infix → RPN)

convert infix notation expression to reverse polish のつづき。

演算子順位法のページが日本語版のうぃきぺにはなくて不思議に思ったのだけどこれはあった。と言っても他の検索をしていて偶然見つけたのだけど。このアルゴリズムの解説は竹内先生の本でも見かけた覚えがあるんだけどどの本だっけか。

操車場アルゴリズム - Wikipedia

操車場アルゴリズムを後に一般化したのが演算子順位構文解析（英語版）である。

ということで単純な中置記法の算術式からreverse polish のそれへ変換するのには十分なので、これを使って変換をしてみる。

>python porandmake.py
式を入力してください
(a+b)(c-d)
[['a', 'b', '+'], ['c', 'd', '-'], '*']
>python porandmake.py
式を入力してください
a+bcd
['a', ['b', ['c', 'd', '*'], '*'], '+']
>python porandmake.py
式を入力してください
a-b-c
['a', ['b', 'c', '-'], '-']

「右結合」しちゃだめでしょ😓

なおコメントにあったスクリプトだと

>python cnv2rpn.py
数式を入力してください: (a+b)/(c-d)
['a', [['b', [')/(', 'c', '*'], '*'], 'd', '-'], '+']`
>python cnv2rpn.py
数式を入力してください: a+b/c/d
['a', ['b', ['c', 'd', '/'], '/'], '+']

ということでやっつけスクリプト。

class Lexer
  def initialize(input)
    @s = input.chars
  end
  def next
    if @s.length > 0
      @s.shift
    else
      nil
    end
  end
end

class Stack
  def initialize
    @values = []
  end
  def push(v)
    @values.push(v)
  end
  def pop
    @values.pop if @values.count>0
  end
  def top
    @values[-1] if @values.count>0
  end
end

def getprec(op)
  if op=="^"
    prec = 4
  elsif %w[* /].include?(op)
    prec = 3
  elsif %w[+ -].include?(op)
    prec = 2
  else
    prec = 1
  end
end

def op?(token)
  %w[^ * / + -].include?(token)
end

def lassoc?(op)
  %w[* / + -].include?(op)
end

def expr2rpn(inputtext)
  lexer = Lexer.new(inputtext)
  stack = Stack.new

  result = []
  while token=lexer.next
    if /[a-z]/.match(token)
      result << token
    elsif token=="("
      stack.push(token)
    elsif token==")"
      while stack.top!="("
        result << stack.pop
      end
      stack.pop
    elsif op?(token)
      while getprec(token)<getprec(stack.top) || lassoc?(token) && getprec(token)==getprec(stack.top)
        result << stack.pop
      end
      stack.push(token)
    else
      puts "error"
      break
    end
  end
  while stack.top
    result << stack.pop
  end
  result
end

s=ARGV.shift
puts "input: #{s}"
puts "output: #{expr2rpn(s).join(' ')}"

$ ruby cnv2rpn.rb '(a+b)/(c-d)'
input: (a+b)/(c-d)
output: a b + c d - /

$ ruby cnv2rpn.rb 'a-b-c'
input: a-b-c
output: a b - c -

$ ruby cnv2rpn.rb 'a+b/c/d'
input: a+b/c/d
output: a b c / d / +

二三個目のwhileの条件がちょっと長い(重い)けど、演算子の優先順位を保持する配列を二個 (先読みトークン用とスタックトップ用)持たせてそれぞれの優先順位を調整してやると条件式を簡単にできる。

links

ところで操車場アルゴリズム - Wikipedia には

操車場アルゴリズムは前置記法（ポーランド記法）の生成にも使える。その場合は、入力トークン列を最後尾から先頭に向かって処理し、出力キューを反転させ（つまり、出力キューは出力スタックとなる）、右括弧と左括弧の扱いを反転させればよい（つまり、左括弧については右括弧を見つけるまでスタックをポップし続ける）。

とあるんだけど、英語版 Shunting yard algorithm - Wikipedia を見ると微妙に違う。

The shunting yard algorithm can also be applied to produce prefix notation (also known as Polish notation). To do this one would simply start from the end of a string of tokens to be parsed and work backwards, reverse the output queue (therefore making the output queue an output stack), and flip the left and right parenthesis behavior (remembering that the now-left parenthesis behavior should pop until it finds a now-right parenthesis). And changing the associativity condition to right.

最後のAnd changing the associativity condition to right.は日本語版ではどこ行っちゃったんだろう?

いずれにしてもこのアルゴリズム(の変形)で中置記法の算術式を前置記法に変換できるとは知らなかった。

新山さん

新山祐介 (Yusuke Shinyama)（@mootastic）さん / Twitter のtwiterのbioをみてたら

日本語の要約は間違っていることもあるので引用は慎重に。

という一文があったのに今更のように気がついた。さすがだ。

tuc

日本語発音だと’tac’と区別がつかなそうという気もするけどなかなか面白い仕様のcutだ。

≪ prev 悲しみよこんにちは

next ≫ 許されざるいのち