ときどきの雑記帖 RE* (新南口)

What Shall We Do with Our Old?

September 16, 2023

カレンダーフェア

某大型書店に行ったら2024年のカレンダーの販売コーナーができていた。

メモ

Pikuma: Understanding the Origins and the Evolution of Vi & Vim

なんか

いろいろダメ(謎)

新刊近刊

for Dummies

【ダミーズ・フェア】様々な分野を解りやすくまとめた洋書版トリセツシリーズDummiesから、ChatGPTやAI, メタバース、データ解析、暗号通貨、セキュリティなどの本を集めました。お買上げの方にはDummies の親出版社ワイリーのボールペンを進呈（先着順です）。新刊台にて。 pic.twitter.com/YfWH0PYqRC
— 洋書専門店 Books Kinokuniya Tokyo (@Kino_BKT) September 15, 2023

さすがfor Dummies、幅広く扱ってるなあ。そういや最近はこのシリーズから翻訳されていない?

デジタル社会の罠　生成AIは日本をどう変えるか

西垣さんの本はわりと追いかけているのでこれも要チェック。

第3版とどのくらい違うんだろう? あと2、3巻はとか総合版は?とか。さらにいうと値段がちょっと心配😄

C++ソフトウェア設計 ―高品質設計の原則とデザインパターン

ハッキング思考

ブルース・シュナイアー先生の新刊の邦訳『ハッキング思考』が出るぞ！ - YAMDAS現更新履歴で紹介されていた本。

今日の重箱の隅

Microsoft、“Unicode"のススメ | TECH+（テックプラス）

たとえば性能低下は他のコードページやテーブルベースのコードページ変換にメモリーやCPUリソースを必要とするものの、変換の支障は免れない。最新のWindowsは従来の英語圏向けであるWindows-1252エンコードよりも、UTF-8エンコード/デコードが高速化されているという。

何を言っているのかよくわからん文だなあ。と思ったので例によって元記事で該当部分を探す。

Use Unicode! - Dr. International

On Windows, it is much faster to encode or decode data using UTF-8 than with the legacy Windows-1252 encoding.

後半部分の元の文はここかな。この文に先行する部分とあわせて考えると、 Unicode(UTF-16)とUTF-8もしくはWindows-1252との間の変換を比較した場合、 UTF-8の方がmuch faster to encode or decode dataであるという(当然と言えば当然な)話のような。

前半部分も文の枝葉を切り落としてみると「性能低下は～必要とする」あるいは「性能低下は～変換の支障は免れない」とかなってさらによくわからない。

たとえば「変換は～(リソースを)必要とするので性能低下は免れられない」といった感じなら納得できるんだけど。ということで再度原文をみると

Performance
Modern operating systems are Unicode, usually UTF-16 or UTF-8 internally. Converting between other codepages, particularly table-based ones, consumes resources and takes time. Even if there’s a difference between UTF-8 data and a UTF-16 platform, that conversion is algorithmic and heavily optimized to be fast. You can’t get that from other “national” codepages. Because they require tables, which have to be mapped into memory, or at least the CPU cache, it slows down the conversions.

UTF-8とUTF-16の間の変換はalgorithmicにできて、それは(高度に最適化されているので)高速である。その他の“national” codepagesとUnicode(UTF-8もしくはUTF-16)の間の変換ではアルゴリズム的にはできなくてテーブルを使う必要がある(ものがある)。そういったテーブルはメモリーやCPUのキャッシュに展開されて slows down the conversions するよ。と。

awk

awkが新しくなる！？本家AwkがUnicode (UTF-8)とCSV対応に！ - Qiita

余談ですが、One True Awk の Issue を眺めていたら One True Awk は -F ’t’ は t という文字でフィールドを区切るのではなく -F ‘\t’ と同じ意味でタブ区切りとして解釈することを知りました。もちろんこのような仕様は他の awk にも POSIX awk にもありません。 t を区切り文字として使うことは少ないとは言えなかなかに酷い罠です。ちなみにこの罠は長く存在し使用例もあるため修正されないそうです。

これ、なんでこうなっているか(こんなものが入っているか) というと

gawktexi.in

@c @cindex historical features
As a special case, in compatibility mode
(@pxref{Options}),
if the argument to @option{-F} is @samp{t}, then @code{FS} is set to
the TAB character.  If you type @samp{-F\t} at the
shell, without any quotes, the @samp{\} gets deleted, so @command{awk}
figures that you really want your fields to be separated with TABs and
not @samp{t}s.  Use @samp{-v FS="t"} or @samp{-F"[t]"} on the command line
if you really do want to separate your fields with @samp{t}s.
Use @samp{-F '\t'} when not in compatibility mode to specify that TABs
separate fields.

クォートを忘れる~~おっちょこちょい~~うっかりさんを救済するためですね。まあ「小さな親切(以下略)」の類なんですかね。

ところで一連のやり取りの中で突然出てきた

awk: Make the historic -Ft to signify field separator as tab optional

That wasn’t what I meant (checking POSIX at compile time). Rather, gawk pays attention to the POSIXLY_CORRECT environment variable, and if it’s there, turns on the –posix option. There is one place in the OTA where this is done as well.

この「OTA」とは?

What does OTA stand for?

なんか色々あるなあとしばらく悩んだのだけど、 One True Awk の頭字語かこれ。

Unicode separated values (USV) don’t work · Issue #193 · onetrueawk/awk

indeed. USV is not supported in OTA.

今回初めて見たと思うんだけどいつ頃から使われてたんだろう?

ところで(その2)、UNIX V7のawkのソースを見ると

v7unix/v7/usr/src/cmd/awk/main.c at master

} else if (argv[0][0] == '-' && argv[0][1] == 'F') {    /* set field sep */
        if (argv[0][2] == 't')  /* special case for tab */
                **FS = '\t';
        else
                **FS = argv[0][2];
        continue;

-Fオプションに対する引数を一文字しかとらない? V7のマニュアル awk(1) [v7 man page] によると

A single character c may be used to separate the fields by starting the program with
   BEGIN { FS = "c" }
or by using the -Fc option.

やっぱり正規表現ではなく単純な一文字だけっぽい。 nawkで変わったということか。

ところで(その3)、

v7unix/v7/usr/src/cmd/awk/main.c at master

if (strcmp(argv[0], "a.out"))
        logit(argc, argv);

自分の名前がa.outだったら。ってデバッグ用?

issues

今回のリリースのためかリポジトリが賑やかだったのでメモ。

シリアル値

エクセルのシリアル値と日付との間の変換ロジックは、どうやって実装しているんでしょうね。

日付→シリアル値、は比較的楽だと思うんですが、
シリアル値→日付、がちょっと面倒そう。
— はけた＠できるExcel2021 (@excelspeedup) September 6, 2023

これは私もずっと疑問でした。
公開してほしいですよね。
1900/2/29が存在するアルゴリズムを。 https://t.co/9pe99C7KMX
— エクセルの神髄 (@yamaoka_ss) September 6, 2023

これ、むしろ1900年もうるう年にしてしまった方が計算が簡単にできるんじゃないだろうか (ただし2099年まで)? というのも、1900年1月1日を起点として、4年ごとの日数が366+365×3=1461日で動かないから。あるいは一年を365.25日としてごにょごにょした結果を適当に丸めたり切り捨てたりすれば1900年からの年数が求められて、さらに前年の大みそかまでの日数をひいた残りの日数から月と日が求められる。と。

そういや粒度というか解像度が違うけど、UNIX timeからの変換も似たようなもんだよなということでglibcをみてみたら glibc/time/offtime.c·- bminor/glibc

/* Compute the `struct tm' representation of T,
   offset OFFSET seconds east of UTC,
   and store year, yday, mon, mday, wday, hour, min, sec into *TP.
   Return nonzero if successful.  */
int
__offtime (__time64_t t, long int offset, struct tm *tp)
{
  __time64_t days, rem, y;
  const unsigned short int *ip;

  days = t / SECS_PER_DAY;
  rem = t % SECS_PER_DAY;
  rem += offset;
  while (rem < 0)
    {
      rem += SECS_PER_DAY;
      --days;
    }
  while (rem >= SECS_PER_DAY)
    {
      rem -= SECS_PER_DAY;
      ++days;
    }
  tp->tm_hour = rem / SECS_PER_HOUR;
  rem %= SECS_PER_HOUR;
  tp->tm_min = rem / 60;
  tp->tm_sec = rem % 60;
  /* January 1, 1970 was a Thursday.  */
  tp->tm_wday = (4 + days) % 7;
  if (tp->tm_wday < 0)
    tp->tm_wday += 7;
  y = 1970;

#define DIV(a, b) ((a) / (b) - ((a) % (b) < 0))
#define LEAPS_THRU_END_OF(y) (DIV (y, 4) - DIV (y, 100) + DIV (y, 400))

  while (days < 0 || days >= (__isleap (y) ? 366 : 365))
    {
      /* Guess a corrected year, assuming 365 days per year.  */
      __time64_t yg = y + days / 365 - (days % 365 < 0);

      /* Adjust DAYS and Y to match the guessed year.  */
      days -= ((yg - y) * 365
	       + LEAPS_THRU_END_OF (yg - 1)
	       - LEAPS_THRU_END_OF (y - 1));
      y = yg;
    }
  tp->tm_year = y - 1900;
  if (tp->tm_year != y - 1900)
    {
      /* The year cannot be represented due to overflow.  */
      __set_errno (EOVERFLOW);
      return 0;
    }
  tp->tm_yday = days;
  ip = __mon_yday[__isleap(y)];
  for (y = 11; days < (long int) ip[y]; --y)
    continue;
  days -= ip[y];
  tp->tm_mon = y;
  tp->tm_mday = days + 1;
  return 1;
}

真面目(?)に一年ごとに引き算しているなあ。なにか深い理由があるんだろうか?

もうひとつmuslから。

__secs_to_tm.c\time\src - musl

#include "time_impl.h"
#include <limits.h>

/* 2000-03-01 (mod 400 year, immediately after feb29 */
#define LEAPOCH (946684800LL + 86400*(31+29))

#define DAYS_PER_400Y (365*400 + 97)
#define DAYS_PER_100Y (365*100 + 24)
#define DAYS_PER_4Y   (365*4   + 1)

int __secs_to_tm(long long t, struct tm *tm)
{
	long long days, secs, years;
	int remdays, remsecs, remyears;
	int qc_cycles, c_cycles, q_cycles;
	int months;
	int wday, yday, leap;
	static const char days_in_month[] = {31,30,31,30,31,31,30,31,30,31,31,29};

	/* Reject time_t values whose year would overflow int */
	if (t < INT_MIN * 31622400LL || t > INT_MAX * 31622400LL)
		return -1;

	secs = t - LEAPOCH;
	days = secs / 86400;
	remsecs = secs % 86400;
	if (remsecs < 0) {
		remsecs += 86400;
		days--;
	}

	wday = (3+days)%7;
	if (wday < 0) wday += 7;

	qc_cycles = days / DAYS_PER_400Y;
	remdays = days % DAYS_PER_400Y;
	if (remdays < 0) {
		remdays += DAYS_PER_400Y;
		qc_cycles--;
	}

	c_cycles = remdays / DAYS_PER_100Y;
	if (c_cycles == 4) c_cycles--;
	remdays -= c_cycles * DAYS_PER_100Y;

	q_cycles = remdays / DAYS_PER_4Y;
	if (q_cycles == 25) q_cycles--;
	remdays -= q_cycles * DAYS_PER_4Y;

	remyears = remdays / 365;
	if (remyears == 4) remyears--;
	remdays -= remyears * 365;

	leap = !remyears && (q_cycles || !c_cycles);
	yday = remdays + 31 + 28 + leap;
	if (yday >= 365+leap) yday -= 365+leap;

	years = remyears + 4*q_cycles + 100*c_cycles + 400LL*qc_cycles;

	for (months=0; days_in_month[months] <= remdays; months++)
		remdays -= days_in_month[months];

	if (months >= 10) {
		months -= 12;
		years++;
	}

	if (years+100 > INT_MAX || years+100 < INT_MIN)
		return -1;

	tm->tm_year = years + 100;
	tm->tm_mon = months + 2;
	tm->tm_mday = remdays + 1;
	tm->tm_wday = wday;
	tm->tm_yday = yday;

	tm->tm_hour = remsecs / 3600;
	tm->tm_min = remsecs / 60 % 60;
	tm->tm_sec = remsecs % 60;

	return 0;
}

こっちは「計算」してますね。

Tシャツ

なんと！なんと！
近代科学社様から、モンティ・ホール問題Tシャツをご恵贈に預かりました！！！！！！！！
か　わ　い　い　！#モンティ・ホール問題 pic.twitter.com/bqaYX4tkeM
— 書泉_MATH (@rikoushonotana) September 11, 2023

このTシャツいいな😄

0 or 4

C++「0だろ」
Python「0だね」
Nim「え、4でしょ」
Julia「4ですね」 https://t.co/VW0n4iDQKq
— けむにく@競プロ (@kemuniku) September 10, 2023

メモ

こういう話題は好きなので。

スライドはここで公開しています↓https://t.co/QlcKbz0gmg
この分野で何年も研究していますが、今回改めて歴史を学んで、外から見た立ち位置を考えて、そして話が伝わるように試行錯誤しました。この講演が、色々な人たちにとってこの分野に興味をもつきっかけになれば、本当に嬉しいかぎりです。
— Yusuke Matsushita (@shiatsumat) September 11, 2023

ソフトウェアの科学～バグのない世界を目指して～

生殺与奪

鬼を倒す人がオープンソースという言葉を適切かつ無駄にかっこいい場面で使うアニメが流行れば、オープンソースという用語も誤解されることがなくなるのではないか（）
— 専門性・売上・原稿 (@golden_lucky) September 15, 2023

歌詞

みなさんが「♪コーブラー、ふんふんふんふん」と歌ってるのは

Leaving me blue
Missing you true
Only few memories after you

です。
— おちょごさん　10/7(土) わらリーマン2部出演 (@chogo2009) September 15, 2023

佐久平

北斗の拳ジャギ像参上✊

佐久平駅構内に北斗の拳ジャギ像が設置されました！
ぜひ佐久平駅を利用する際はご覧ください☺️ pic.twitter.com/gbfbccREuI
— 長野県佐久市(公式） (@saku_city) September 15, 2023

≪ prev Intolerance

next ≫ 見知らぬ国のトリッパー