ときどきの雑記帖 RE* (新南口)
東京の優しい掟
素数夜曲
ちょっと前のサンデー毎日なんだけど トミノカントクが「本棚探偵」 という連載でゲストとして登場していて
その記事の中でこんなくだりが。
「タイトルだけで決めることもあって、 わけのわからない本も集まる。 『素数夜曲』なんて、数学の本なんだけど、 シャレが効いているでしょう(笑)。 最初の2ページくらいで挫折したけど、 全然理解できなくてもすごい熱量をもって書かれているのはわかる。 そういう感覚を味わえるのも面白いんですよ」
確かに「手強い」本だよねえ。 ってもう出てから10年なのか>素数夜曲
gawk
5.1.65 をちょっと眺めてみた。
hash
fnv1a ハッシュに関するChangeLogのエントリはこの辺か。
2021-03-21 Arnold D. Robbins <arnold@skeeve.com>
* str_array.c (fnv1a_hash_string): New function.
(str_array_init): Use fnv1a_hash_string if AWK_HASH env var
set to "fnv1a".
で、そのstr_array.cを見るとハッシュ関数が三つあった。
ひとつめ。
/* awk_hash --- calculate the hash function of the string in subs */
static unsigned long
awk_hash(const char *s, size_t len, unsigned long hsize, size_t *code)
{
unsigned long h = 0;
unsigned long htmp;
/*
* Ozan Yigit's original sdbm hash, copied from Margo Seltzers
* db package.
*
* This is INCREDIBLY ugly, but fast. We break the string up into
* 8 byte units. On the first time through the loop we get the
* "leftover bytes" (strlen % 8). On every other iteration, we
* perform 8 HASHC's so we handle all 8 bytes. Essentially, this
* saves us 7 cmp & branch instructions. If this routine is
* heavily used enough, it's worth the ugly coding.
*/
/*
* Even more speed:
* #define HASHC h = *s++ + 65599 * h
* Because 65599 = pow(2, 6) + pow(2, 16) - 1 we multiply by shifts
*
* 4/2011: Force the results to 32 bits, to get the same
* result on both 32- and 64-bit systems. This may be a
* bad idea.
*/
#define HASHC htmp = (h << 6); \
h = *s++ + htmp + (htmp << 10) - h ; \
htmp &= 0xFFFFFFFF; \
h &= 0xFFFFFFFF
h = 0;
/* "Duff's Device" */
if (len > 0) {
size_t loop = (len + 8 - 1) >> 3;
switch (len & (8 - 1)) {
case 0:
do { /* All fall throughs */
HASHC;
case 7: HASHC;
case 6: HASHC;
case 5: HASHC;
case 4: HASHC;
case 3: HASHC;
case 2: HASHC;
case 1: HASHC;
} while (--loop);
}
}
if (code != NULL)
*code = h;
if (h >= hsize)
h %= hsize;
return h;
}
ふたつめ。 GNU Smalltalkで使われているものと同じもの?
/*
Here is the hash function I'm using in GNU Smalltalk. The scrambling is
needed if you use powers of two as the table sizes. If you use primes it
is not needed.
To use double-hashing with power-of-two size, you should use the
_gst_hash_string(str, len) as the primary hash and
scramble(_gst_hash_string (str, len)) | 1 as the secondary hash.
Paolo
*/
/*
* ADR: Slightly modified to work w/in the context of gawk.
*/
static unsigned long
gst_hash_string(const char *str, size_t len, unsigned long hsize, size_t *code)
{
unsigned long hashVal = 1497032417; /* arbitrary value */
unsigned long ret;
while (len--) {
hashVal += *str++;
hashVal += (hashVal << 10);
hashVal ^= (hashVal >> 6);
}
ret = scramble(hashVal);
if (code != NULL)
*code = ret;
if (ret >= hsize)
ret %= hsize;
return ret;
}
static unsigned long
scramble(unsigned long x)
{
if (sizeof(long) == 4) {
int y = ~x;
x += (y << 10) | (y >> 22);
x += (x << 6) | (x >> 26);
x -= (x << 16) | (x >> 16);
} else {
x ^= (~x) >> 31;
x += (x << 21) | (x >> 11);
x += (x << 5) | (x >> 27);
x += (x << 27) | (x >> 5);
x += (x << 31);
}
return x;
}
みっつめ。 今回追加されたfnv1a。
/* fnv1a_hash_string --- fnv1a hash function */
/*
* FNV-1a hash function
* http://www.isthe.com/chongo/tech/comp/fnv/index.html
*/
static unsigned long
fnv1a_hash_string(const char *str, size_t len, unsigned long hsize, size_t *code)
{
/* FNV-1a */
register unsigned ret = 2166136261U;
while (len > 0) {
ret ^= (unsigned char) (*str++);
ret *= 16777619U;
len-- ;
}
if (code != NULL)
*code = ret;
if (ret >= hsize)
ret %= hsize;
return ret;
}
なんかずいぶん短いなこれ。
pma
お次はpma(Persistent Memory Allocation)
2022-06-09 Arnold D. Robbins <arnold@skeeve.com>
* custom.h: Deal with use/non-use of persistent malloc.
* main.c (main): Bracket mtrace call in #ifndef. Call pma_init()
first thing.
* configure.ac (GAWK_USE_PERSISTENT_MALLOC): Add call.
custom.hを見ると単純にプリプロセッサを使った置き換えで ごにょごにょしているようだ。
#ifdef USE_PERSISTENT_MALLOC
#include <stdlib.h>
#include "pma.h"
#define malloc pma_malloc
#define calloc pma_calloc
#define realloc pma_realloc
#define free pma_free
#else /* ! USE_PERSISTENT_MALLOC */
#define pma_init(verbose, file) 0
#define pma_get_root() NULL
#define pma_set_root(rootptr) /* nothing */
#define pma_errno 0
#endif /* ! USE_PERSISTENT_MALLOC */
実装のあるpma.cを見ると最初の著作権表記でちょっと気になることが
/* "pma", a persistent memory allocator (implementation)
Copyright (C) 2019, 2022 Terence Kelly
Contact: tpkelly @ { acm.org, cs.princeton.edu, eecs.umich.edu }
Home: http://web.eecs.umich.edu/~tpkelly/pma/ [or "https"]
Design: HTML: https://queue.acm.org/detail.cfm?id=3534855
PDF: https://dl.acm.org/doi/pdf/10.1145/3534855
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Code is intended to be (mostly) C99 / POSIX 2017.
Compile and link with your programs in the obvious way. If
assertions are enabled (the default) extensive integrity checks
are frequently performed; these checks may be slow, depending on
the size and state of the persistent heap.
*/
「GNU Affero General Public License」とな。 とは言え either version 3 of the License, or (at your option) any later version. と続いているから、 gawk本体のライセンスに影響はない という理解でよいのだろうか。
pma.hを見ると custom.hで置き換えマクロに登場していた関数がある。
extern int pma_init(int verbose, const char *file);
/* The following four functions may be used like their standard
counterparts. See notes on pma_init for fallback to standard
allocator. See notes on fork in README. */
extern void * pma_malloc(size_t size);
extern void * pma_calloc(size_t nmemb, size_t size);
extern void * pma_realloc(void *ptr, size_t size);
extern void pma_free(void *ptr);
/* The following two functions access the persistent heap's root
pointer, which must either be NULL or must point within a block of
persistent memory. If the pointer passed to pma_set_root is not a
pointer returned by pma_malloc/calloc/realloc, that is likely a
bug, though the implementation is not obliged to check for such
bugs. */
extern void pma_set_root(void *ptr);
extern void * pma_get_root(void);
/* Prints to stderr details of internal data structures, e.g., free
lists, and performs integrity checks, which may be very slow. */
extern void pma_check_and_dump(void);
/* Set non-metadata words of free blocks to given value. Useful for
debugging (v == 0xdeadDEADbeefBEEF) and for preparing backing file
to be re-sparsified with fallocate (v == 0x0). */
extern void pma_set_avail_mem(const unsigned long v);
root云々はよくわからんけど ふつーのmallocなどと呼び出しは互換ということか。
TODO
ところでTODOも見てみると興味深いものがあって。
大きく以下の分類があるのだけど
- Minor Cleanups and Code Improvements
- Minor New Features
- Major New Features
- Things To Think About That May Never Happen
- Things That We Decided We Will Never Do
その中のMajor New Featuresにはこうある。
Major New Features
------------------
Think about how to generalize indirect access. Manuel Collado
suggests things like
foo = 5
@"foo" += 4
Also needed:
Indirect through array elements, not just scalar variables
Add ability to do decimal arithmetic.
Rework management of array index storage. (Partially DONE.)
Consider using an atom table for all string array indices.
DBM storage of awk arrays. Try to allow multiple dbm packages.
?? A RECLEN variable for fixed-length record input. PROCINFO["RS"]
would be "RS" or "RECLEN" depending upon what's in use.
*** Could this be done as an extension?
?? Use a new or improved dfa and/or regex library.
Rewrite in C++.
Rewrite in C++.
?!
あと
DBM storage of awk arrays
って
(dbm使ってないけど)
今回のpma導入で実現できてない?