ときどきの雑記帖 RE* (新南口)

冬の魔術

February 28, 2022

クラウドファンディング

お、これは。

髙荷義之原画展2022開催プロジェクト(By 株式会社アルト出版) - クラウドファンディング | Kibidango【きびだんご】

払い戻し

自販機にお金がのまれた。どうしよう？（デジタルリマスター） :: デイリーポータルZ

自分も自販機から商品が出てこなかったということが一度だけあって、それはSUICAで買ったときのことだった。やはり(自販機に書かれていた)番号に電話して返金処理と言うことになったんだけどなんと現金書留で送られてきた(宛先は電話したときに伝えた)。飲料一本分より送料がかなり高くついてると思ったが、その駅で受け渡しとなった場合(駅員経由でできないのなら) そのための人件費を考えれば現金書留の方が安くつくのか😄

Sealed class

Groovyでも。

Groovy 4.0.0 Introduces Switch Expressions and Sealed Types

What’s the more productive?

What’s the more productive? : smalltalk

without alphanumerics

Clojureなら英数字を用いずにプログラムを書けるのでないかというネタ

(+)が0で(*)が1だそう :-)

関数リテラル、unquote splice (~@) 、vectorは関数である等で積み上げていき再帰まで実装 https://t.co/oRoJWbgpJp
— 高梨陣平 (@jingbay) February 24, 2022

(+)が0だったり (*)が1だったりするのは、ちょっと前に話題になったJavaScriptで引数なしのmaxが-Infinityを返したり同じく引数なしのminがInfinityを返すという話と同じか。

$ deno
Deno 1.13.1
exit using ctrl+d or close()
> Math.min
[Function: min]
> Math.min()
Infinity
> Math.max()
-Infinity

ClojureではなくxyzzyのLispでやってみると

(+)
0
(*)
1
(min)
引数が少なすぎます: (min)
(max)
引数が少なすぎます: (max)

ふむ?🤔

ところで

Swearjure - Clojure without alphanumerics

posted 09 Jan 2013

またずいぶん古い話題を掘り返していたのだね。

intersection of character classes

まだexperimentalな機能だった気もするんだけど他のドキュメントでの使用例があったのね。

perldata - Perl data types - Perldoc Browser

Identifier parsing

(省略)

/ (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])
 (?[ ( \p{Word} & \p{XID_Continue} ) ]) *    /x

実際 perlrecharclass - Perl Regular Expression Character Classes - Perldoc Browser を見ると

Extended Bracketed Character Classes

This is a fancy bracketed character class that can be used for more readable and less error-prone classes, and to perform set operations, such as intersection. An example is
/(?[ \p{Thai} & \p{Digit} ])/

のあとに This is an experimental feature available starting in 5.18, and is subject to change as we gain field experience with it と続いていたり。

ところでキャラクタークラスのこの機能、他のプログラミング言語(やライブラリ)にもあるのだけど微妙に記述方法が違う。

Character Class Intersection in Regular Expressions

Character Class Intersection

Character class intersection is supported by Java, JGsoft V2, and by Ruby 1.9 and later. It makes it easy to match any single character that must be present in two sets of characters. The syntax for this is [class&&[intersect]]. You can use the full character class syntax within the intersected character class.

Regex character class operations: subtraction, intersection, union

Character Class Subtraction in Java and Ruby 1.9+

Java and Ruby do not have dedicated syntax for character class subtraction. Rather, the feature is just a logical by-product of their character intersection syntax. For instance,
[a-z&&[^aeiou]]
matches characters that are both English lowercase letters and not vowels. In effect, it subtracts the vowel class [aeiou] from the class of letters [a-z]. The effect is to match all English lowercase consonants.

Character Class Subtraction in .NET

In .NET, with […-[…]], you can specify that the character to be matched belongs to a certain class (everything before the hyphen), except if it belongs to another class (the embedded character class, which is “subtracted” by the hyphen).

For instance, the class

[a-z-[aeiou]]
matches an English lower-case consonant.

langinfo

glibcのlocale関連(のデータ構造)を追いかけているのだけど正直よくわからん😄

glibc/wctype/wchar-lookup.h

glibc/C-ctype.c に、あらかじめ存在しているもの用のテーブルの定義があって

#define STRUCT_CTYPE_CLASS(p, q) \
  struct                                                                      \
    {                                                                         \
      uint32_t isctype_data[8];                                               \
      uint32_t header[5];                                                     \
      uint32_t level1[1];                                                     \
      uint32_t level2[1 << q];                                                \
      uint32_t level3[1 << p];                                                \
    }

...(省略)

const STRUCT_CTYPE_CLASS(1, 1) _nl_C_LC_CTYPE_class_upper attribute_hidden =
{
  { 0x00000000, 0x00000000, 0x07fffffe, 0x00000000,
    0x00000000, 0x00000000, 0x00000000, 0x00000000
  },
  { 7, 1, 6, 1, 1 },
  /* 1st-level table */
  { 6 * sizeof (uint32_t) },
  /* 2nd-level table */
  { 0, 8 * sizeof (uint32_t) },
  /* 3rd-level table */
  { 0x07fffffe, 0x00000000 }
};

のようなものがいくつかあるのだけど header[5]に置かれる値の意味がよくわからん

const STRUCT_CTYPE_CLASS(1, 1) _nl_C_LC_CTYPE_class_upper attribute_hidden =
  { 7, 1, 6, 1, 1 },
const STRUCT_CTYPE_CLASS(1, 1) _nl_C_LC_CTYPE_class_lower attribute_hidden =
  { 7, 1, 6, 1, 1 },
const STRUCT_CTYPE_CLASS(1, 1) _nl_C_LC_CTYPE_class_alpha attribute_hidden =
  { 7, 1, 6, 1, 1 },
const STRUCT_CTYPE_CLASS(1, 0) _nl_C_LC_CTYPE_class_digit attribute_hidden =
  { 6, 1, 6, 0, 1 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_xdigit attribute_hidden =
  { 7, 1, 7, 0, 3 },
const STRUCT_CTYPE_CLASS(1, 0) _nl_C_LC_CTYPE_class_space attribute_hidden =
  { 6, 1, 6, 0, 1 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_print attribute_hidden =
  { 7, 1, 7, 0, 3 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_graph attribute_hidden =
  { 7, 1, 7, 0, 3 },
const STRUCT_CTYPE_CLASS(1, 0) _nl_C_LC_CTYPE_class_blank attribute_hidden =
  { 6, 1, 6, 0, 1 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_cntrl attribute_hidden =
  { 7, 1, 7, 0, 3 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_punct attribute_hidden =
  { 7, 1, 7, 0, 3 },
const STRUCT_CTYPE_CLASS(2, 0) _nl_C_LC_CTYPE_class_alnum attribute_hidden =
  { 7, 1, 7, 0, 3 },

マクロに同じ引数を与えた場合はヘッダーも同じ内容になるようだが… としばらく悩んだが考えてみればルックアップのときにヘッダーを見ているのだった。

glibc/wchar-lookup.h

static __inline int
__attribute ((always_inline))
wctype_table_lookup (const char *table, uint32_t wc)
{
  uint32_t shift1 = ((const uint32_t *) table)[0];
  uint32_t index1 = wc >> shift1;
  uint32_t bound = ((const uint32_t *) table)[1];
  if (index1 < bound)
    {
      uint32_t lookup1 = ((const uint32_t *) table)[5 + index1];
      if (lookup1 != 0)
        {
          uint32_t shift2 = ((const uint32_t *) table)[2];
          uint32_t mask2 = ((const uint32_t *) table)[3];
          uint32_t index2 = (wc >> shift2) & mask2;
          uint32_t lookup2 = ((const uint32_t *)(table + lookup1))[index2];
          if (lookup2 != 0)
            {
              uint32_t mask3 = ((const uint32_t *) table)[4];
              uint32_t index3 = (wc >> 5) & mask3;
              uint32_t lookup3 = ((const uint32_t *)(table + lookup2))[index3];

              return (lookup3 >> (wc & 0x1f)) & 1;
            }
        }
    }
  return 0;
}

((const uint32_t *) table)[0]から((const uint32_t *) table)[4] までをそれぞれの値を取り出している変数名を見ると5(ロング)ワードのヘッダーと対応しているようだ。

しかしそれでもまだよくわからんなあと思ったが、 wctype_table_lookup の直前にあるコメントを改めてみてみるとそこに答えはあった。

/* Bit tables are accessed by cutting wc in four blocks of bits:
   - the high 32-q-p bits,
   - the next q bits,
   - the next p bits,
   - the next 5 bits.

	    +------------------+-----+-----+-----+
     wc  =  +     32-q-p-5     |  q  |  p  |  5  |
	    +------------------+-----+-----+-----+

   p and q are variable.  For 16-bit Unicode it is sufficient to
   choose p and q such that q+p+5 <= 16.

   The table contains the following uint32_t words:
   - q+p+5,
   - s = upper exclusive bound for wc >> (q+p+5),
   - p+5,
   - 2^q-1,
   - 2^p-1,
   - 1st-level table: s offsets, pointing into the 2nd-level table,
   - 2nd-level table: k*2^q offsets, pointing into the 3rd-level table,
   - 3rd-level table: j*2^p words, each containing 32 bits of data.
*/

表にするとこうか。

(1,0)	(1,1)	(2,0)	式	名前
6	7	7	q+p+5	shift1
1	1	1	s	bound
6	6	7	p+5	shift2
0	1	0	2^q-1	mask2
1	1	3	2^p-1	mask3

				lookup1
				lookup2
				lookup3

s = upper exclusive bound for wc » (q+p+5)

ふむ。

コードポイントの上の方にある文字を扱おうとすると 1st-levelのテーブルが大きくなる?

Hugo メモ

0.93.0がリリースされたようだけど

Release v0.93.0 · gohugoio/hugo

Markdown diagrams and code block render hooks. Now it’s possible create custom templates for Markdown code blocks, either for all or just for specific programming languages. This can also be used to render diagrams from Markdown code blocks. We provide GoAT (Go ASCII Tool) natively from Hugo, but you can also add your own template to get Mermaid support. The implementation of GoAT is a Go implementation by @blampe of markdeep.mini.js’ diagrams.

これは…!

「誰もがやらなければいけないと思っている」
「そして」
「誰もがやりたくないとも思っている」#何かを受信
— 幻鳥@沼先案内人 (@2nd_junkey) February 26, 2022

≪ prev 季節の中で

next ≫ カス人間第一号