ときどきの雑記帖 RE* (新南口)

The Mule

November 22, 2021

AWK to Go compiler

AWKGo: basic AWK to Go compiler by benhoyt · Pull Request #75 · benhoyt/goawk

お、これは面白そう。

テキシコー

10月から再放送してたのか!

テキシコー | NHK for School

今日まで気がつかなかったorz

BCD

今では顧みられない36bit幅アーキテクチャは10進数10桁のBCD表現が35bitである事に由来するとか。36bit機は実物を見たことがなくハネウェル6000くらいしか知らない。 https://t.co/vlAvKd8ZQ3
— アビ(精神年齢低め)(v) (@strnh) November 20, 2021

Honeywell 6000 series - Wikipedia

BCD (二進化十進表現 - Wikipedia) って packed 形式でも十進一桁4ビットなので、十進10桁を表現するには 4bit * 10 = 40bit 必要になりそうな気がする(符号分含まず)んだけどどうなんだろう?

プログラマー(エンジニア)なら暗記しているであろう

log₁₀2 ≓ 0.301

を使って計算すると

10 / 0.301 = 33.22259

なので、「値としては」34ビットあれば収まるのはわかる。たとえばIEEE 754-2008で規格に入った十進3桁を10ビットで表す手法を使うと十進9桁が30ビットで表現できてさらに残り5ビットあるから、全体として35ビットで十進10桁を表現するのはいけそうだけど

Densely packed decimal - Wikipedia

1971年、陳天機(Tien Chi Chen)とIrving T. Hoが現在Chen-Ho符号化として知られる手法を考案した。これは3桁の10進数を10ビットに無劣化でパックする接頭符号表現であり、わずか2、3ゲートのハードウェア遅延でBCDとの間の変換が可能であった。 DPDはこれを洗練させたものであり、マイク・カウリッショウ（英語版）(Mike Cowlishaw)により考案された。 DPDは浮動小数点表現の標準のIEEE 754-2008の、十進浮動小数点に取り入れられた。

実際のところはどうなんだろう?

この世代のCPUで3bitという単位が頻繁に使われるのは、単に丁度いいサイズってだけでなくてワードマシンの影響がある気がするのだが、すると「じゃあ何で多くのワードマシンのワード幅が3の倍数なのか」という新たな疑問が発生して、これはもはや調べるのが難しい気はする https://t.co/9PTYOn2tkA
— AoiMoe a.k.aしお兄P (@AoiMoe) November 20, 2021

どうして昔の人は八進数でしゃべるのか?

当時の文字コードが 5bit や 6bit だったというところと関連してそうな気もする。
— AoiMoe a.k.aしお兄P (@AoiMoe) November 20, 2021

UNIVAC 1101 が 24bit 、IBM 701 が 36bit で、この辺で固まってるんだと思うのだが
— AoiMoe a.k.aしお兄P (@AoiMoe) November 20, 2021

時代は下るけど PDP-8 - Wikipediaも12ビットマシンだったし、最初期のマイクロプロセッサーにも12ビットのものがあったような。

数値計算には BCD が扱える 4bit 単位が便利だが、これだと英文字が表現できず、最低 6bit は必要となる。さらに、当時の記憶装置の信頼性の低さからワードの頭にパリティ付けて奇数ビット数になってたり、大昔はいろんなマシンが見られる
— AoiMoe a.k.aしお兄P (@AoiMoe) November 20, 2021

60 年代に IBM が無駄を承知で BDC を 8bit に拡張した EBCDIC を採用して IBM/360 に実装したのは先見の明があったんだろな。
— AoiMoe a.k.aしお兄P (@AoiMoe) November 20, 2021

ところで68000 MC68000 - Wikipedia のオペコードも眺めていくとそこかしこに3bitの塊が見え隠れしていたり。ついでに書くとam29000 AMD Am29000 - Wikipedia はレジスターが256個(と書くのはちょっと不正確なんだけどそこはそれ) あり、さらに命令は基本3オペランドなので、 1命令32bitのうち24bitがレジスターの指定という。

線形と非線形のあいだ

“36DA5B7FEC924801” の続き (まだ調べてるのよ😄)

このシリーズ(笑)の最初の方で

パラメトロン計算機: ビットごとの秘法と技法から

最も右のビットの位置を知る法
TAOCPにある第二の方法はde Bruijnサイクルを利用するものだ.
4ビットのde Bruijnサイクルは例えば次のようなものである.

というのがあったけど、前回触れたのは M系列(Maximum length sequence) というもので、これら二つは似ているけれども0の並びを含むかどうかが違っている。

この二つの違いは結局のところなんなのかを色々調べて回ったけど、どうも、Maximum length sequenceが「線形」で de Bruijn sequenceが「非線形」であるということらしい?

じゃあその「線形」と「非線形」の違いとは…となり。

perl5

とある理由でPerlの構文定義を眺めていたら、あることに気がついた。

perl5/perly.y at blead · Perl/perl5

/* An expression which may have a side-effect */
sideff	:	error
			{ $$ = NULL; }
	|	expr[body]
			{ $$ = $body; }
	|	expr[body] IF condition
			{ $$ = newLOGOP(OP_AND, 0, $condition, $body); }
	|	expr[body] UNLESS condition
			{ $$ = newLOGOP(OP_OR, 0, $condition, $body); }
	|	expr[body] WHILE condition
			{ $$ = newLOOPOP(OPf_PARENS, 1, scalar($condition), $body); }
	|	expr[body] UNTIL iexpr
			{ $$ = newLOOPOP(OPf_PARENS, 1, $iexpr, $body); }
	|	expr[body] FOR condition
			{ $$ = newFOROP(0, NULL, $condition, $body, NULL);
			  parser->copline = (line_t)$FOR; }
	|	expr[body] WHEN condition
			{ $$ = newWHENOP($condition, op_scope($body)); }
	;

なんだこのexpr[body]というのは?

yaccにはこんなのなかったよなあと思いつつ perl5/Makefile.SH at blead · Perl/perl5 を見るとこんな記述が見つかった。

# I now supply perly.c with the kits, so the following section is
# used only if you force bison to run by saying
# 	make regen_perly
# You normally shouldn't remake perly.[ch].
.PHONY: regen_perly
run_byacc run-byacc:
	@echo "run_byacc is obsolete; try 'make regen_perly' instead"
# this outputs perly.h, perly.act and perly.tab
regen_perly regen-perly:
	perl regen_perly.pl
# We don't want to regenerate perly.c and perly.h, but they might
# appear out-of-date after a patch is applied or a new distribution is
# made.
perly.c: perly.y
	-@sh -c true
perly.h: perly.y
	-@sh -c true
SYM  = globvar.sym perlio.sym
SYMH = perlvars.h intrpvar.h

ん－、いつの頃からかbisonの使用が強制されるようになってた?

ということでregen_perly.pl(ソースツリーの一番上のレベルにあった)を見ると

perl5/regen_perly.pl at blead · Perl/perl5

#!/usr/bin/perl
#
# regen_perly.pl, DAPM 12-Feb-04
#
# Copyright (c) 2004, 2005, 2006, 2009, 2010, 2011 Larry Wall
#
# Given an input file perly.y, run bison on it and produce
# the following output files:
#
# perly.h	standard bison header file with minor doctoring of
#		#line directives plus adding a #ifdef PERL_CORE
#
# perly.tab	the parser table C definitions extracted from the bison output
#		plus an extra table generated by this script.
#
# perly.act	the action case statements extracted from the bison output
#
# Note that perly.c is *not* regenerated - this is now a static file which
# is not dependent on perly.y any more.
#
# If a filename of the form foo.y is given on the command line, then
# this is used instead as the basename for all the files mentioned
# above.
#
# Note that temporary files of the form perlytmp.h and perlytmp.c are
# created and then deleted during this process
#
# Note also that this script is intended to be run on a UNIX system;
# it may work elsewhere but no specific attempt has been made to make it
# portable.

use 5.006;
sub usage { die "usage: $0 [ -b bison_executable ] [ file.y ]\n" }

use warnings;
use strict;

our $Verbose;
BEGIN { require './regen/regen_lib.pl'; }

my $bison = 'bison';

if (@ARGV >= 2 and $ARGV[0] eq '-b') {
    shift;
    $bison = shift;
}

my $y_file = shift || 'perly.y';

略

# creates $tmpc_file and $tmph_file
my_system("$bison -d -o $tmpc_file $y_file");
以下略

このファイルが増えたのは2004年のことで、バージョンでは5.8 → 5.10でのことらしい。一方、bisonの先の表記は現状の最新のマニュアル Bison 3.8.1 によればきちんと記載があってどんなものかはよくわかるのだけどさていつ頃入ったんだこれ。

3.4.6 Actions

省略

The C code in an action can refer to the semantic values of the components matched by the rule with the construct $n, which stands for the value of the nth component. The semantic value for the grouping being constructed is $$. In addition, the semantic values of symbols can be accessed with the named references construct $name or $[name]. Bison translates both of these constructs into expressions of the appropriate type when it copies the actions into the parser implementation file. $$ (or $name, when it stands for the current grouping) is translated to a modifiable lvalue, so it can be assigned to.

Here is a typical example:
…
| exp '+' exp     { $$ = $1 + $3; }
Or, in terms of named references:
exp[result]:
…
| exp[left] '+' exp[right]  { $result = $left + $right; }

3.6 Named References

3.6 Named References

As described in the preceding sections, the traditional way to refer to any semantic value or location is a positional reference, which takes the form $n, $$, @n, and @$. However, such a reference is not very descriptive. Moreover, if you later decide to insert or remove symbols in the right-hand side of a grammar rule, the need to renumber such references can be tedious and error-prone.

To avoid these issues, you can also refer to a semantic value or location using a named reference. First of all, original symbol names may be used as named references. For example:
 { $invocation = new_invocation ($op, $args, @invocation); }
Positional and named references can be mixed arbitrarily. For example:
 { $$ = new_invocation ($op, $args, @$); }
However, sometimes regular symbol names are not sufficient due to ambiguities:
exp: exp '/' exp
 { $exp = $exp / $exp; } // $exp is ambiguous.
 
exp: exp '/' exp
 { $$ = $1 / $exp; } // One usage is ambiguous.

exp: exp '/' exp
 { $$ = $1 / $3; } // No error.
When ambiguity occurs, explicitly declared names may be used for values and locations. Explicit names are declared as a bracketed name after a symbol appearance in rule definitions. For example:
exp[result]: exp[left] '/' exp[right]
 { $result = $left / $right; }
In order to access a semantic value generated by a midrule action, an explicit name may also be declared by putting a bracketed name after the closing brace of the midrule action code:
exp[res]: exp[x] '+' {$left = $x;}[left] exp[right]
 { $res = $left + $right; }
In references, in order to specify names containing dots and dashes, an explicit bracketed syntax $[name] and @[name] must be used:
if-stmt: "if" '(' expr ')' "then" then.stmt ';'
 { $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); }
It often happens that named references are followed by a dot, dash or other C punctuation marks and operators. By default, Bison will read ‘$name.suffix’ as a reference to symbol value $name followed by ‘.suffix’, i.e., an access to the suffix field of the semantic value. In order to force Bison to recognize ‘name.suffix’ in its entirety as the name of a semantic value, the bracketed syntax ‘$[name.suffix]’ must be used.

もうひとつ、 3.3.2 Empty Rules というのも増えていた。

3.3.2 Empty Rules

A rule is said to be empty if its right-hand side (components) is empty. It means that result in the previous example can match the empty string. As another example, here is how to define an optional semicolon:
semicolon.opt: | ";";
It is easy not to see an empty rule, especially when | is used. The %empty directive allows to make explicit that a rule is empty on purpose:
semicolon.opt:
 %empty
| ";"
;

Flagging a non-empty rule with %empty is an error. If run with -Wempty-rule, bison will report empty rules without %empty. Using %empty enables this warning, unless -Wno-empty-rule was specified.

The %empty directive is a Bison extension, it does not work with Yacc. To remain compatible with POSIX Yacc, it is customary to write a comment ‘/* empty */’ in each rule with no components:
semicolon.opt:
 /* empty */
| ";"
;

追記

Bison 3.8.2 のChangeLog-2012に以下のような記述あり。

2009-10-03  Alex Rozenman  <rozenman@gmail.com>

	Document named references.
	* doc/bison.texinfo (Actions): Add new example and xref to
	Using Named References node.
	(Using Named References): New node.

2009-09-19  Alex Rozenman  <rozenman@gmail.com>

	Keep sub-messages aligned. Fix strings for translation.
	* src/location.h (location_print): Add return value.
	* src/location.c (location_print): Return number of printed
	characters.
	* src/complain.h (complain_at_indent, warn_at_indent): Prototype
	new functions.
	* src/complain.cpp (indent_ptr): New static variable.
	(error_message, complain_at_indent, warn_at_indent): Implement
	the alignment mechanism.
	* src/scan-code.l (parse_ref, show_sub_messages): Fix strings
	for translations. Use new alignment mechanism.
	* tests/named-ref.at: Adjust test-cases.
	* NEWS (2.5): Add an announcement about named references.

2009-06-27  Alex Rozenman  <rozenman@gmail.com>

	Implement support for named symbol references.
	* src/parse-gram.y: Add new syntax (named_ref.opt).
	* src/reader.c: Store named refs in symbol lists.
	* src/reader.h: New argument for symbol_append and
	action_append functions.
	* src/scan-code.h: Add new field (named_ref) into
	code_props data structure. Keeps named ref of midrule
	actions.
	* src/scan-code.l: Support for named refs in semantic
	action code. New function 'parse_named_ref'.
	* src/scan-gram.l: Support bracketed id.
	* src/symlist.c: Store named refs in symbol lists.
	* src/symlist.h: New field in symbol list: named_ref.
	* src/named-ref.h: New file, a struct for named_ref.
	* src/named-ref.cp: New file, named_ref_new function.
	* src/local.mk: Add two new files.
	* tests/testsuite.at: Include new test group:
	* tests/named-refs.at: this new file.

2009年頃にはnamed referenceのための作業が行われていた模様。一方NEWSでは

* Noteworthy changes in release 2.5 (2011-05-14)

** Grammar symbol names can now contain non-initial dashes:

  Consistently with directives (such as %error-verbose) and with
  %define variables (e.g. push-pull), grammar symbol names may contain
  dashes in any position except the beginning.  This is a GNU
  extension over POSIX Yacc.  Thus, use of this extension is reported
  by -Wyacc and rejected in Yacc mode (--yacc).

** Named references:

  Historically, Yacc and Bison have supported positional references
  ($n, $$) to allow access to symbol values from inside of semantic
  actions code.

  Starting from this version, Bison can also accept named references.
  When no ambiguity is possible, original symbol names may be used
  as named references:

    if_stmt : "if" cond_expr "then" then_stmt ';'
    { $if_stmt = mk_if_stmt($cond_expr, $then_stmt); }

  In the more common case, explicit names may be declared:

    stmt[res] : "if" expr[cond] "then" stmt[then] "else" stmt[else] ';'
    { $res = mk_if_stmt($cond, $then, $else); }

  Location information is also accessible using @name syntax.  When
  accessing symbol names containing dots or dashes, explicit bracketing
  ($[sym.1]) must be used.

  These features are experimental in this version.  More user feedback
  will help to stabilize them.
  Contributed by Alex Rozenman.

とあるので、公式には2011年5月リリースの2.5から。ということでいいのかな。

ndbm, dbm, hsearch

Berkeley DB作者の Margo Seltzer & Mike Olsonインタビュー。ことの始まりは、データベースの授業を受けていたMargoが「誰かndbmの代わりを実装する人いない?」と聞かれ「やるやる!」と答えてしまった、いわゆる「大学院生のよくある罠」にひっかかったため。https://t.co/DNK77qfYpS
— 新山祐介 (Yusuke Shinyama) (@mootastic) November 21, 2021

twitter だと140字の制限があるし、ある意味仕方のないことなのだろうけど

A Conversation with Margo Seltzer and Mike Olson - ACM Queue

Margo Seltzer This started with a standard stupid grad-student trick. I had taken Mike Stonebraker’s graduate database course at Berkeley, and I thought Litwin’s extensible linear hashing was really cool. Keith [Bostic], recognizing “stupid grad-student syndrome,” said, “Hey, how would you like to implement it? I need a replacement for ndbm, dbm, and hsearch.” I innocently said, “Sure!”

He introduced me to Ozan Yigit, who had written gdbm, which was specifically an ndbm replacement. We brainstormed together and figured out a way to have a single hash package that would support both the persistent (n)dbm, as well as the in-memory hsearch replacement. That became what was known as, cleverly enough, hash.

原文でこうあるのをndbmだけにしちゃうのはどうなんだろう。

あと、extensible linear hashing についてはこの辺を。

≪ prev The Sky Crawlers

next ≫ The Six Million Dollar Man