ときどきの雑記帖 RE* (新南口)

帝都物語

September 10, 2022

池袋ジュンク堂 25周年

【おかげさまで25周年】
ジュンク堂書店池袋本店はおかげさまで25周年を迎えました！これまでご愛顧いただき、本当にありがとうございます。
25周年を記念して、イベントやフェアなどを各フロアで行なっております。
ぜひこの機会にお越しくださいませ。#25周年 #ジュンク池袋 pic.twitter.com/B9SIHWpNd4
— ジュンク堂書店池袋本店 (@junkudo_ike) August 29, 2022

もうそんなになるのか。

新宿店もなくなってからだいぶ経つしなあ。そして渋谷店は…

八重洲ブックセンター

ここもか!

【重要なお知らせ】
八重洲ブックセンター本店は、街区の再開発計画に伴い、2023年3月にて営業を終了いたします。長年のご愛顧に心より御礼申し上げます。弊社本店は、当地に建設予定の大規模複合ビルへの将来的な出店を計画しております。詳細はこちらをご覧ください↓https://t.co/8isA5pZev1 pic.twitter.com/e0oIZHGUa0
— 八重洲ブックセンター本店 (@yaesu_honten) September 9, 2022

チューリング賞

【8階理工学】注目の新刊
『因果推論の科学「なぜ？」の問いにどう答えるか』（文藝春秋）
人工知能分野の巨人、ジューディア・パール氏の切り開くデータ分析・統計学を超える新しい学問。重厚かつエキサイティングな一冊‼️「凄い内容」との松尾豊さん推薦帯にも引き込まれます。8F新刊台C04にて。HRD pic.twitter.com/gnlWMrPxLL
— 紀伊國屋書店新宿本店 (@KinoShinjuku) September 10, 2022

というのを見かけ、面白そうではないかと

を見に行くと

著者は人工知能界のノーベル賞にあたるチューリング賞受賞!

…え? ＞人工知能界のノーベル賞にあたるチューリング賞

66

N3051 Operator overloading in C を読んでいたら Algol 66という記述を見かけた。 68の間違いだろうと思ったが、どうも本当にAlgol 66というものがあったらしい?

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3051.pdf

1.5 Computer science context
Operator overloading is a common feature in modern programming languages. It is supported in:
Ada, Algol66 (yes, since 1966!) C#, C++, D, Eiffel, Fortran, F#, Haskell, Java, Lua, MATLAB, Object Pascal, Free Pascal, Delphi (since 2005), PHP3, Perl, Python, R, Ruby, Visual Basic.NET and that is not an exhaustive list.

Operator overloading - Wikipedia には見当たらないなあ＞Algol 66

PL/M

RMS、GNU C 言語リファレンスマニュアルをアナウンス | スラドデベロッパーの Re:初心者が本気でやりたいなら (#4321704) | RMS、GNU C 言語リファレンスマニュアルをアナウンス | スラドに

PL/Mをご存知だろうか。
PL/1のサブセットとしCP/Mの開発者でもあるゲイリー・キルドールが作ったシンプルな言語。 CよりもPascalよりもシンプルでコンパイラが出力するコードが想像しやすい。
まさに高級マクロアセンブラと言って良いと思う。

という発言があった。

実は大学時代の恩師がC↓ PL/M↑だったのだけど (わたしは使う機会に恵まれなかった)、言うほど「シンプル」だったかなあ。などと思ったりしつつ CP/Mのソースコードに含まれていた submit.plmから手続きを一つ。例に出してみると

move: procedure(s,d,n);
    declare (s,d) address, n byte;
    declare a based s byte, b based d byte;
        do while (n := n - 1) <> 255;
        b = a; s = s + 1; d = d + 1;
        end;
    end move;

なんか見慣れない`:=‘なんて記号が。

Pascalじゃないので代入ではないし等値比較も=だったよなあと調べてみると、代入演算子の一種ではあるらしい。

http://bitsavers.trailing-edge.com/pdf/intel/PLM/9800268B_PLM-80_Programming_Manual_Jan80.pdf のp.32 4.6.3 EMBEDDED ASSIGNMENTSにはこうある

A special form of the assignment in used within PL/M expression, The form of this ’embedded assignment’ is
 variable := expression
and may appear anywhere an expression is allowed. The expression to the right of the := assignment symbol is evaluated and then stored into the variable on the left. The value of the embedded assignment is the same as that of its right half. For example, the expression
 ALT + (CORR := TCORR+PCORR) - (ELEV := MT/SCALE)
result in exactly the same value as
 ALT + (TCORR+PCORR)-(MT/SCALE)
The only difference is the side-effect of storing the intermediate results TCORR+PCORR and HT/SCALE into CORR and ELEV, respectively. These intermediate resuts can then be used at a later point in the program without calcuating them again.

=が代入演算子でもあり等値比較演算子でもあるので、たとえば前述のwhileの条件部を(n = n - 1) <> 255 としてしまうと変なことに。一方でこのnのdecrementを別に書くのも…ということでこれが導入されたんだろうか。

ところでその前のページにも面白い(興味深い)仕様が記載されていて、それは

p.31 4.6.2 MULTIPLE ASSIGNMENTS

LEFT, CENTER, RIGHT = INIT + CORR;

こういうもの。Cでいうところの

LEFT = CENTER = RIGHT = INIT + CORR;

か。

最後に、値を返す「関数」はこう。

SUMSQUARE: PROCEDURE (A, B) ADDRESS;
   DECLARE (A, B) ADDRESS;
   RETURN A+A+B*B;
END SUMSQUARE;

引数リストの後に関数の(返す値の)型を書くのは Pascalっぽい。値を返すのはreturnを使うんですな。

脱線

PL/Mに関連してCP/M - Wikipedia を見ていたら

2016年に、ザイドマン・コンサルティングのボッブ・ザイドマンは、デジタルリサーチが開発したCP/Mとティム・パターソンが開発し、長年前者のコードを基にしたと疑われたDOSのソースコードを比較し、初版のDOSのソースコードがCP/Mのソースコード基にしたかを調べた。 (略) ザイドマンの結論は、DOSはCP/Mのコードを一切基にしていないとのことであるものの、システムコールの多くの部分が真似られた[53]。

てな記述があり、注釈を見ると

Bob Zeidman (2016-10-18). “Source Code Comparison of DOS and CP/M” (英語). Journal of Computer and Communications (Scientific Research Publishing) 4 (No.12). doi:10.4236/jcc.2016.412001 2021年10月3日閲覧。. https://www.scirp.org/journal/paperinformation.aspx?paperid=71259

ということで後で読む(たぶん)

sort

RAM に乗らないならマージソート(外部マージソート)じゃないすかね。 https://t.co/ZMg1NNHApi
— mattn (@mattn_jp) August 31, 2022

ちょっと前に盛り上がっていたらしいこの話題巨大テキストファイルをsortコマンドでソートしてみるにあるようにsortコマンドで片づけるのが一番手間がかからない (最速とは限らない)んじゃないかなあ。

そもそも

なんでそんなファイルできたの
意地悪なCSV(クォートに囲まれた文字列に,がある)だったりしない?
一回やればよい? or 定期/不定期に繰り返される?
そもそもなんのためのソート?

等々疑問があるのだけどそれはまあ「問題のための問題」なので触れちゃあいかんポイントなのだろうなあとスルーしておいて

🦐100GBソート実験結果発表🦐

＜速度＞
簡易バッチ：19分
sortコマンド：59分
DBサーバ：27分

＜準備＞
簡易バッチ：10分
sortコマンド：0秒
DBサーバ：20分

実行環境、改善の目途等はリプへ続く
— えび＠プログラマー (@ebiebi_pg) September 1, 2022

その「簡易バッチ」で使うであろう自作プログラムの特にPerl版に疑問が。

プログラムとしてはこんな感じ(コメントは削除)

file = Fi;eRead("hoge.csv", "read");
sortedKeys = new SortedColect(int, int);
linecount = 0;
while (linestr = file.Read()){
  colms[] = linestr.split(",");
  sortedKeys.add(parseInt(colms[0], linecount);
}
file.close()

inputfile = FileRead("hoge.csv", "read");
outputfile = FileRead("hoge_out.csv", "add");
sortKeys.foreach(unko){
  inputfile.seek(unko.value, 0);
  inputline = inputfile.Read():
}
outputfile.close();
inputfile.close();

linestr.split(",");をみると、「いじわるCSV」の可能性は考えないでもよさそうですね。

仮に自分がこの種のプログラムを書かなければならなくなったら、メモリ上にインデックスを作って入力ファイルをseekしまくるのとどっちが早いかという話もあるけど、純粋な(?)マージソートにしちゃうかなあ。

(外部)マージソートで入力が100GB(GiB?)だとソートのためのファイル丸ごとの読み書きが40回ほど発生するから、それだけでも時間かかりそうだ(笑)

さてPerlの方。

my $startbyte = 0;
open DATS, "<hoge_100G.csv";
while(<DATS>) {
  my $nowbyte = tell(DATS);
  @colms = split /,/;
  push(@bytes, [int($colms[0]), $startbyte, ($nowbyte - $satrtbyte)]);
  $startbyte = $nowbyte;
}
close(DATS);
pop @bytes;
@bytes = sort {$a->[0] <=> $b->[0]} @bytes;
open READ, "<hoge_100G.csv";
open WRITE, "+>hoge_out_100G.csv";
for (@bytes) {
  seek(READ, $$_[1], 0);
  read(READ, $buff, $$_[2]);
  print WRITE $buff;
}
close(WRITE);
close(READ);

だいぶtraditionalというか色々ツッコミ入れたくなる書き方のスクリプトだけど

whileループの後にあるpop @bytesの目的が良くわからない。 kotlin版にはこれに該当するものないよねえ?

use strict;
use warnings;
use feature ':5.10';

open my $f, "<", "in.txt";
my @bytes;
while (my $line = <$f>) {
    #print $line;
    my $nowbyte = tell $f;
    my @f = split /,/, $line;
    push @bytes, [$f[0], $nowbyte];
    say $f[0];
}
close($f);
#pop @bytes;
for my $e (@bytes) {
    say $e->[0];
}

うん。popは不要だと思うんだがなあ…

grepとeol

LinuxでCRLFとLFを一括で相互変換するを見ていたら

CRLF to LF

$ grep -Ilrs `printf "\r\n"` . | xargs nkf -Lu --overwrite

LF to CRLF

$ grep -Ilrs `printf "\n"` . | xargs nkf -Lw --overwrite

というのが出てきたのだけど、後者は元がCRLFでもそのLFに引っかかってしまう気がするし、そもそもgrepの検索対象に改行はあるんだろうか? という疑問がある。それとというのも、sedにしてもawkにしても改行(レコードの区切り)はパターンバッファや$0には入ってこないから。

そして古のコード(UNIX V7のgrep)を見ると

execute(file)
char *file;
{
	register char *p1, *p2;
	register c;

	if (file) {
		if (freopen(file, "r", stdin) == NULL)
			errexit("grep: can't open %s\n", file);
	}
	lnum = 0;
	tln = 0;
	for (;;) {
		lnum++;
		p1 = linebuf;
		while ((c = getchar()) != '\n') {
			if (c == EOF) {
				if (cflag) {
					if (nfile>1)
						printf("%s:", file);
					printf("%D\n", tln);
				}
				return;
			}
			*p1++ = c;
			if (p1 >= &linebuf[LBSIZE-1])
				break;
		}
		*p1++ = '\0';
		p1 = linebuf;
		p2 = expbuf;
以下略

linebufに\nは入ってない。ですよねえ。

んが、WSLのUbuntuで試してみると

kbk@toybox4:~$ printf 'hello\r\n' | grep -oe 'o.' | hexdump -C
00000000  6f 0d 0a                                          |o..|
00000003
kbk@toybox4:~$ printf 'hello\n' | grep -e "$(printf 'o\n')" | hexdump -C
00000000  68 65 6c 6c 6f 0a                                 |hello.|
00000006
kbk@toybox4:~$ printf 'hello\r\n' | grep -e "$(printf 'o\r')" | hexdump -C
00000000  68 65 6c 6c 6f 0d 0a                              |hello..|
00000007
kbk@toybox4:~$ printf 'hello\r\n' | grep -e "$(printf '\n')" | hexdump -C
00000000  68 65 6c 6c 6f 0d 0a                              |hello..|
00000007
kbk@toybox4:~$ printf 'hello\rworld\n' | grep -e "$(printf 'o\r')" | hexdump -C
00000000  68 65 6c 6c 6f 0d 77 6f  72 6c 64 0a              |hello.world.|
0000000c
kbk@toybox4:~$ printf 'hello\rworld\n' | grep -oe "$(printf 'o\r')" | hexdump -C
00000000  6f 0d 0a                                          |o..|
00000003

バッファの最後に\nが入っているとしか思えないような動作をしている。ということでつい最近リリースされたGNU grep 3.8を調べると

grep.c

/* Search a given (non-directory) file.  Return a count of lines printed.
   Set *INEOF to true if end-of-file reached.  */
static intmax_t
grep (int fd, struct stat const *st, bool *ineof)
{
(略)
      /* Determine new residue (the length of an incomplete line at the end of
         the buffer, 0 means there is no incomplete last line).  */
      oldc = beg[-1];
      beg[-1] = eol;
      /* FIXME: use rawmemrchr if/when it exists, since we have ensured
         that this use of memrchr is guaranteed never to return NULL.  */
      lim = memrchr (beg - 1, eol, buflim - beg + 1);
      ++lim;
      beg[-1] = oldc;
      if (lim == beg)
        lim = beg - residue;
      beg -= residue;
      residue = buflim - lim;

      if (beg < lim)
        {
          if (outleft)
            nlines += grepbuf (beg, lim);

\n(eol)まで入ってるっぽいですねえ。ふむ。

巡回セールスマン問題

なんだこれ、ぜんぜんわからん。 pic.twitter.com/7PkUr6wbON
— ふるさと｜インフラ学習ならEnvader(エンベーダー) (@furusatojuku) November 10, 2021

このシリーズで最近見かけたのが「巡回セールスマン問題」だろうそれ。というもの。

なんだけど、訪れるポイントが9個なのでしらみつぶしに試しても問題なかったという (前回の「なぞのすくりぷと」)

そのほか前回紹介して以降に見かけた問題には

誕生日のパラドックス
髪の毛の本数(鳩の巣原理)

に関係したものがあった (問題の写真の載ったツイートを探したけど見つからなかった)。

誕生日のパラドックスの方の応用問題が奥深そうだった (N人のクラスでk人同じ誕生日の人がいる確率。みたいな)。

ハッシュ関数の衝突確率
The probability of a hash collision : programming

GNU Pascal Coding Standards

めも：GNU Pascal Coding Standards https://t.co/zfX99gAWuS
— sava (@lpproj) September 7, 2022

GNU の Pascal 用 Coding Standards ではなく、 GNU Pascal の Coding Standards らしい。

GNU Pascal Coding Standards

loop condition

while(1)派だったけど、昔々何かのコンパイラでアセンブルコード出したら1との比較命令入っていたので、それ以来for(;;)派に宗旨替えした https://t.co/M2vGmEG1J3
— だーま (@machidafumihiko) September 7, 2022

(1と)0との比較ならわかるんだけど、1との比較。とは?

unlearn

You must unlearn what you have learned.
学んできたことは一旦忘れよ。

―ヨーダ [Ep.V]

【unlearn】 (身に付けたこと)をわざと捨てる、意識的に忘れる；を学び直す
— スター・ウォーズで学ぶ英会話 (@SW_EIKAIWA) October 28, 2017

FORTRAN Compiler on IBM 704

unary minus

以前書いたFORTRAN IIでの単項マイナスの扱いのはなし。 704_FORTRAN_MiscMemos.pdf にあったけど算術式じゃなくて論理式云々という話だった。

p.16

I. Description of Boolean Statement
Because - is a unary operator it is part of the expression or symbol to which it applies. Therefore it must be bound to its expression or symbol by parentheses, with one exception.
If E is a variable or function name the complement can be written as - E when it is not part of a larger expression in the same statement. C = -E is correct, but C = -E + D is not correct. When E is part of a larger expression it must be writen as (-E). Thus, the latter expression must be written as C = (-E) + D

算術式に関しては FORTRAN に以下のような記述があった。

FORTRAN p.14 Expressions

If E is an expression, and if its first character is not + or -, then +E and -E are expression of the same mode as E. Thus -A is an expression, but +-A is not.

If E and F are expressions of the same mode, and if the first character of F is not + or =, then
 E + F
 E - F
 E * F
 E / F
are expressions of the same mode. Thus A-+B and A/+B are not expressions. The character +, -, *, and / denote addition, subtraction, multiplication, and division.

6, If E and F are expression, and F is not floting point unless E is too, and the first character of F is not + or -, and neither E and F is of the form A**B, then
 E**F
is an expression of the same mode as E. Thus A**(BC) is an expression, but I(BC) and ABC are not. The symbol ** denotes expression: i.e. AB means A^B

FORTRAN p.15 Orderring within a Hierarchy.

Parentheses which have been omitted from a sequence of consecutive multiplication and divisions (or cosecutive additions and subtractions) will be understood to be grouped from the keft. Thus, if . represents either * or / (or either + or -), then A.B.C.D.E will be taken to mean
 ((((A.B).C).D).E)

≪ prev 依頼人から一言

next ≫ Fighting Vipers