ときどきの雑記帖 RE* (新南口)

依頼人から一言

September 7, 2022

スプラトゥーンナワバリクイズ

JE東の車内放送で流れていた 60秒クイズがスプラトゥーンナワバリクイズに変わっていた。

『スプラトゥーン3』発売日（9/9）に合わせて、JR東日本電車内の雑学クイズがリニューアル。3人のインクリングが正解をかけてナワバリバトル！ | ゲーム・エンタメ最新情報のファミ通.com

ということらしい。

Learn Languages 2022

8末や9頭の週末は(ぴー)な予定が入るようになってここ数年はいけてないのだけどまだ続いているのだねえ。

Learn Languages 2022

このイベントは2003年にLL Saturdayとして開始しました。

来年は20周年?(ひー)

ところで

潰しの効かない言語Elmを学んで潰しが効くスキルを身につけよう

これが非常に気になった(笑)

ズームバックオチアイ

エピソード - ズームバック×オチアイ - NHK

5月に放送があったのか! 気がつかなかった orz

ERE

シェルスクリプトの世界から基本正規表現(BRE)をなくそう！ - Qiita

主張としては理解できなくもないんだけど、前提として Issue 8ではBREとEREで能力差はなくなっているんだろうか?

The Open Group Base Specifications Issue 7, 2018 edition の Regular Expressions で

9.3.7 BRE Precedence

と

9.4.8 ERE Precedence

を比較すると後者には

Subexpressions/back-references () \n

が見当たらない(他にも違いがあるけどとりあえずこれを挙げる)。 glibcのregexではどちらも同じ能力を持つようにされているので BREを廃止したとしても問題はないのだけど、そのほかのものではどうなんだろう?

gawk

5.2に向けたベータ版が。という話がありましたが 5.2.0が無事(?)リリースされたようで。

Gawk 5.2.0 released [LWN.net]

さて、gawkで使っている hash関数の話の続き。 FNV hashでも GNU smalltalk からのものでもないものの変遷。

1.0.1 awk1.c

#define HASHSTEP(old, c) ((old << 1) + c)
#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */

/*
 * return hash function on name.  must be compatible with the one
 * computed a step at a time, elsewhere  (JF: Where?  I can't find it!)
 */
int
hashf(name, len, hashsize)
     register char *name;
     register int len;
     int hashsize;
{
  register int r = 0;
  
  while (len--)
    r = HASHSTEP(r, *name++);
  
  return MAKE_POS(r) % hashsize;
}

ずいぶんとたんじゅんなものに見えるけどまあ最初(?)はこんなものなのかも。

2.11 array.c

/*
 * calculate the hash function of the string subs, also returning in *typtr
 * the type (string or number) 
 */
static int
hash_calc(subs)
NODE *subs;
{
	register int hash1 = 0, i;

	subs = force_string(subs);
	for (i = 0; i < subs->stlen; i++)
		hash1 = HASHSTEP(hash1, subs->stptr[i]);

	hash1 = MAKE_POS(STIR_BITS((int) hash1)) % ASSOC_HASHSIZE;
	return (hash1);
}

ここで使っているマクロの定義は以下の通り。

#define ASSOC_HASHSIZE 127
#define STIR_BITS(n) ((n) << 5 | (((n) >> 27) & 0x1f))
#define HASHSTEP(old, c) ((old << 1) + c)
#define MAKE_POS(v) (v & ~0x80000000)	/* make number positive */

STIR_BITSが増えたのが大きなポイントかな。それと最後にとる剰余で1.01では変数を使っていたのが固定値(127)に変わっている。

実は1.01でも上記の関数には(変数に格納された)固定値(101)を渡していたので、この辺はコンパイル時の最適化を期待してのものもあるのかもしれない。

awkで

#define HASHSIZE 101

という定義していて、ハッシュ関数(hashf)の呼び出し部分はこう。

  i = sizeof (HASHNODE) + len + 1;
  hp = (HASHNODE *)obstack_alloc(&other_stack,i);
  bucket = hashf(name, len, HASHSIZE);
  hp->next = table[bucket];

  len = bp - name;
  bucket = table[hashf(name, len, HASHSIZE)];
  while (bucket) {
    if (bucket->length == len && strncmp(bucket->name, name, len) == 0)
      return bucket->value;
    bucket = bucket->next;
  }

で、次。

3.1 array.c

/* awk_hash --- calculate the hash function of the string in subs */

static unsigned long
awk_hash(register const char *s, register size_t len, unsigned long hsize, size_t *code)
{
	register unsigned long h = 0;

	/*
	 * This is INCREDIBLY ugly, but fast.  We break the string up into
	 * 8 byte units.  On the first time through the loop we get the
	 * "leftover bytes" (strlen % 8).  On every other iteration, we
	 * perform 8 HASHC's so we handle all 8 bytes.  Essentially, this
	 * saves us 7 cmp & branch instructions.  If this routine is
	 * heavily used enough, it's worth the ugly coding.
	 *
	 * OZ's original sdbm hash, copied from Margo Seltzers db package.
	 */

	/*
	 * Even more speed:
	 * #define HASHC   h = *s++ + 65599 * h
	 * Because 65599 = pow(2, 6) + pow(2, 16) - 1 we multiply by shifts
	 */
#define HASHC   htmp = (h << 6);  \
		h = *s++ + htmp + (htmp << 10) - h

	unsigned long htmp;

	h = 0;

#if defined(VAXC)
	/*	
	 * This was an implementation of "Duff's Device", but it has been
	 * redone, separating the switch for extra iterations from the
	 * loop. This is necessary because the DEC VAX-C compiler is
	 * STOOPID.
	 */
	switch (len & (8 - 1)) {
	case 7:		HASHC;
	case 6:		HASHC;
	case 5:		HASHC;
	case 4:		HASHC;
	case 3:		HASHC;
	case 2:		HASHC;
	case 1:		HASHC;
	default:	break;
	}

	if (len > (8 - 1)) {
		register size_t loop = len >> 3;
		do {
			HASHC;
			HASHC;
			HASHC;
			HASHC;
			HASHC;
			HASHC;
			HASHC;
			HASHC;
		} while (--loop);
	}
#else /* ! VAXC */
	/* "Duff's Device" for those who can handle it */
	if (len > 0) {
		register size_t loop = (len + 8 - 1) >> 3;

		switch (len & (8 - 1)) {
		case 0:
			do {	/* All fall throughs */
				HASHC;
		case 7:		HASHC;
		case 6:		HASHC;
		case 5:		HASHC;
		case 4:		HASHC;
		case 3:		HASHC;
		case 2:		HASHC;
		case 1:		HASHC;
			} while (--loop);
		}
	}
#endif /* ! VAXC */
	if (code != NULL)
		*code = h;

	if (h >= hsize)
		h %= hsize;
	return h;
}

ifdef/elseがあったりでだいぶ行数が増えた。ハッシュ関数そのものの違いはさておき Duff's Deviceを使っているのが目をひきますね。

/* awk_hash --- calculate the hash function of the string in subs */

static unsigned long
awk_hash(const char *s, size_t len, unsigned long hsize, size_t *code)
{
	unsigned long h = 0;
	unsigned long htmp;

	/*
	 * Ozan Yigit's original sdbm hash, copied from Margo Seltzers
	 * db package.
	 *
	 * This is INCREDIBLY ugly, but fast.  We break the string up into
	 * 8 byte units.  On the first time through the loop we get the
	 * "leftover bytes" (strlen % 8).  On every other iteration, we
	 * perform 8 HASHC's so we handle all 8 bytes.  Essentially, this
	 * saves us 7 cmp & branch instructions.  If this routine is
	 * heavily used enough, it's worth the ugly coding.
	 */

	/*
	 * Even more speed:
	 * #define HASHC   h = *s++ + 65599 * h
	 * Because 65599 = pow(2, 6) + pow(2, 16) - 1 we multiply by shifts
	 *
	 * 4/2011: Force the results to 32 bits, to get the same
	 * result on both 32- and 64-bit systems. This may be a
	 * bad idea.
	 */
#define HASHC   htmp = (h << 6);  \
		h = *s++ + htmp + (htmp << 10) - h ; \
		htmp &= 0xFFFFFFFF; \
		h &= 0xFFFFFFFF

	h = 0;

	/* "Duff's Device" */
	if (len > 0) {
		size_t loop = (len + 8 - 1) >> 3;

		switch (len & (8 - 1)) {
		case 0:
			do {	/* All fall throughs */
				HASHC;
		case 7:		HASHC;
		case 6:		HASHC;
		case 5:		HASHC;
		case 4:		HASHC;
		case 3:		HASHC;
		case 2:		HASHC;
		case 1:		HASHC;
			} while (--loop);
		}
	}

	if (code != NULL)
		*code = h;

	if (h >= hsize)
		h %= hsize;
	return h;
}

ifdefがなくなってスッキリと。

2.15あたりでもハッシュ関数が変わっていたような気もするんだけどめんｄ(ry

ところで連想配列(辞書、ハッシュ)で使っているハッシュ関数といえば他の言語でも色々あったわけで、調べてみるとFNVを使っていたものもあったようだ。

2009/09/23 — Pythonのハッシュ関数は、今でもFNV っぽいアルゴリズムを用いています … なので、 RubyからわかったことはMurmurを使っている、そのことだけでした …

よく使われるハッシュ関数
FNVハッシュ
murmurハッシュ

FNVハッシュの詳しい説明はこちら

Fowler–Noll–Vo hash function - Wikipedia

Fowler–Noll–Vo (or FNV) is a non-cryptographic hash function created by Glenn Fowler, Landon Curt Noll, and Kiem-Phong Vo.

The basis of the FNV hash algorithm was taken from an idea sent as reviewer comments to the IEEE POSIX P1003.2 committee by Glenn Fowler and Phong Vo in 1991. In a subsequent ballot round, Landon Curt Noll improved on their algorithm. In an email message to Landon, they named it the Fowler/Noll/Vo or FNV hash.[1]

swiss table

ハッシュ関数に関係して。

Go の map の実装が hashmap から SwissTable になる可能性が出てきた。今の実装から仕様を変える事なしに SwissTable に置き換える事が可能で、それでありながら read/write 共に 20～50% 改善、iterate が 10% 改善。https://t.co/e0nF4z4pCZ
— mattn (@mattn_jp) September 1, 2022

swiss tableってどんなんだっけ? と思ったが過去にも調べていた(ここには書かなかったけど)

哈希算法工具 - 头条搜索

谷歌开源的cpp 类库abseil中含了一个非常高性能的hash表实现swisstable：我们非常 … Rust 类库：hashbrown 这个crate 是Google 的高性能SwissTable哈希映射的Rust …

GNU grep 3.8

GNU Grep 3.8がリリース。egrepとfgrepの利用でdeprecatedの警告表示開始。grep の -E と -Fのオプションを使うよう説明が入る。egrepとfgrepは2007年から既にdeprecated状態であった。1970年代の小さな計算機ではコマンドを3つに分けることに意味は有った。POSIXではegrepとfgrepは不採用 https://t.co/jJwIhfl0UQ
— 高梨陣平 (@jingbay) September 4, 2022

使用するPCREも変更されているようだ。

It’s Past Time To Stop Using egrep & fgrep Commands, Per GNU grep 3.8 - Phoronix

In addition to the egrep/fgrep warnings, GNU Grep 3.8 has its -P option now based on PCRE2 rather than the older PCRE, regular expressions with stray backslashes now cause warnings, and there are various bug fixes.

そういえば ~~少し前(だっけ?)~~ 結構な以前から GNU grepのegrepとfgrepはシェルスクリプトに差し替えられていたよなあと思いだしたのでリポジトリをみると

egrep.sh - grep.git - grep

#!@SHELL@
cmd=${0##*/}
echo "$cmd: warning: $cmd is obsolescent; using @grep@ @option@" >&2
exec @grep@ @option@ "$@"

なるほど。

リポジトリにはegrep.shだけでfgrepが見当たらなかったが、これはmakeで作るようだ。確かにegrep.shもいかにもテンプレートですって作り (@で囲まれた文字列(configureなどでもよく使われている)がある) だものなあ。

Makefile.am\src - grep.git - grep

## Process this file with automake to create Makefile.in
# Copyright 1997-1998, 2005-2022 Free Software Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.

LN = ln

AM_CFLAGS = $(WARN_CFLAGS) $(WERROR_CFLAGS) $(PCRE_CFLAGS)

# Tell the linker to omit references to unused shared libraries.
AM_LDFLAGS = $(IGNORE_UNUSED_LIBRARIES_CFLAGS)

bin_PROGRAMS = grep
bin_SCRIPTS = egrep fgrep
grep_SOURCES =					\
  dfasearch.c					\
  die.h						\
  grep.c					\
  kwsearch.c					\
  kwset.c					\
  searchutils.c
if USE_PCRE
grep_SOURCES += pcresearch.c
endif

noinst_HEADERS = grep.h kwset.h search.h system.h

# Sometimes, the expansion of $(LIBINTL) includes -lc which may
# include modules defining variables like 'optind', so libgreputils.a
# must precede $(LIBINTL) in order to ensure we use GNU getopt.
# But libgreputils.a must also follow $(LIBINTL), since libintl uses
# replacement functions defined in libgreputils.a.
LDADD = \
  ../lib/libgreputils.a $(LIBINTL) ../lib/libgreputils.a $(LIBICONV) \
  $(LIBTHREAD)

grep_LDADD = $(LDADD) $(PCRE_LIBS) $(LIBCSTACK)
localedir = $(datadir)/locale
AM_CPPFLAGS = -I$(top_builddir)/lib -I$(top_srcdir)/lib

EXTRA_DIST = egrep.sh

egrep fgrep: egrep.sh Makefile
	$(AM_V_GEN)grep=`echo grep | sed -e '$(transform)'` &&		\
	case $@ in egrep) option=-E;; fgrep) option=-F;; esac &&	\
	shell_does_substrings='set x/y && d=$${1##*/} && test "$$d" = y' && \
	if $(SHELL) -c "$$shell_does_substrings" 2>/dev/null; then	\
	  edit_substring='s,X,X,';					\
	else								\
	  edit_substring='s,\$${0##\*/},`expr "X$$0" : '\''X\\(.*\\)/'\''`,g'; \
	fi &&								\
	sed -e 's|[@]SHELL@|$(SHELL)|g'					\
	    -e "$$edit_substring"					\
	    -e "s|[@]grep@|$$grep|g"					\
	    -e "s|[@]option@|$$option|g" <$(srcdir)/egrep.sh >$@-t
	$(AM_V_at)chmod +x $@-t
	$(AM_V_at)mv $@-t $@

CLEANFILES = egrep fgrep *-t

なぞのすくりぷと

require 'pp'

Inf = Float::INFINITY

tbl = [
 [  0,  10,  20,  12,  15, Inf, Inf, Inf, Inf],
 [ 10,   0, Inf, Inf,  10, Inf, Inf, Inf, Inf],
 [ 20, Inf,   0,  10, Inf,  25,  20,  30, Inf],
 [ 12, Inf,  10,   0,  15, Inf, Inf,  20, Inf],
 [ 15,  10, Inf,  15,   0, Inf, Inf,  15,  18],
 [Inf, Inf,  25, Inf, Inf,   0,   5, Inf, Inf],
 [Inf, Inf,  20, Inf, Inf,   5,   0,  35, Inf],
 [Inf, Inf,  30,  20,  15, Inf,  35,   0,  12],
 [Inf, Inf, Inf, Inf,  18, Inf, Inf,  12,   0]]

result = [*1..8].permutation.filter do |root|
  root.unshift(0)
  ans = root.each_cons(2).inject(0) do |acc, t|
    acc += tbl[t[0]][t[1]]
    #break Inf if acc==Inf 無限大になったらbreakしたいんだけど…
  end
  ans==Inf ? false : root.unshift(ans)
end

pp result.sort_by{ _1[0] }

実行結果

[[105, 0, 1, 4, 8, 7, 3, 2, 6, 5],
 [110, 0, 1, 4, 8, 7, 3, 2, 5, 6],
 [122, 0, 1, 4, 3, 2, 5, 6, 7, 8],
 [125, 0, 1, 4, 8, 7, 6, 5, 2, 3],
 [127, 0, 3, 2, 5, 6, 7, 8, 4, 1]]

FORTRAN Compiler on IBM 704

FORTRANと言えばプログラミング言語では少数派の column-major orderな言語なわけですが、なぜそうなったのかがわかったような気がする。が、今回はまだ書かない

Attendez. mais j'espère.

😄

ひょっとしたら暗黙の型宣言のアレみたいにどこかでインタビューに答えているかもしれないけど (求む情報)。

Row- and column-major order - Wikipedia

FORTRAN以外のcolumn-major orderな言語は FORTRANの影響というか関係の深さでそうした(そうなった)。という流れな気がする(要確認)。

Richard Stallman Announces GNU C Language Reference Manual

以前にちょっと触れた、rmsが書いているという Cのリファレンス本のドラフトが公開されたとか。

This manual explains the C language for use with the GNU Compiler Collection (GCC) on the GNU/Linux system and other systems. We refer to this dialect as GNU C. If you already know C, you can use this as a reference manual.

If you understand basic concepts of programming but know nothing about C, you can read this manual sequentially from the beginning to learn the C language.

If you are a beginner to programming, we recommend you first learn a language with automatic garbage collection and no explicit pointers, rather than starting with C@. Good choices include Lisp, Scheme, Python and Java. C’s explicit pointers mean that programmers must be careful to avoid certain kinds of errors.

Good choicesに LispとSchemeがあるのは当然として、 PythonとJavaも挙げているのが意外っちゃ意外な気がする。

にしても「なんでTexinfoなんて使ってんだ」的な意見が目立っていて苦笑せざるを得ない (個人的には結構好きなんだけどなあTexinfo)。

Richard Stallman Announces GNU C Language Reference Manual | Hacker News

From Common Lisp to Julia

Why I No Longer Use Common Lisp
Common Lisp is a fantastic language, but it just isn’t for me. Most of the problems I have with the language are social issues more than technical issues. The combination of all of the following issues, are what led me to become increasingly frustrated with the language over time, in no particular order.

social issueというのが気になったのだけど The combination of all of the following issues で挙げられたものは以下のようなもの(個々の詳細は略)

Editor support
Language Evolution
Software Versioning and Deployment
Documentation
Software Quality
Programming Paradigm
Community
Performance

追記

中学生が｢BUMP OF CHICKENの天体観測は僕たちの生まれる前の名曲です｣といっていたという怖い話がバズっていたのをみて、妻と「これを言われたらざわつく曲」を考えていたら「おじいちゃんが好きだったYMOのRYDEEN」というフレーズが出てきて心臓が止まりそうになった。自爆。
— くらげ＠怪奇全裸斧男 (@kurage313book) September 5, 2022

≪ prev とんぼ

next ≫ 帝都物語