ときどきの雑記帖' 2013年6月(下旬)

翔泳社が電書を。というアナウンスがあったんだけど [販売開始] 電子書籍3点ざっと思いつくだけでもこの辺が出てこないのは納得いかない (いやまあ作るときに電書を考慮してなかったというのはありそうだけれども)

目標額到達したかー The Untold History of Japanese Game Developers by John Szczepaniak » We did it! The goal has been reached! — Kickstarter We've made it! As of typing this, we are over the £50k mark. 今朝方見たときはゴールまで3000ポンド近く残ってたような気がするのだけど怒濤の追い込み (ってまだ数日ある)。しかし、ひとりで5000ポンド(以上) 出している人がいらっしゃるのね…

やっと読んだ泳ぐやる夫シアターやらない夫が南北戦争を戦いぬくようです　第６回　【西部戦域】　アルバート・ジョンストンの憂鬱

半額セール。apressもアレを(ry Save 50% on Early Release Ebooks - Deals - O'Reilly Media

ああっ。早く七色星団に向けて艦隊が出撃するとき(いや、その前に集結するときだったか?) のBGMをリピート再生しまくりたい。余裕があればアキバまで買いに出たんだけど

■_

■_ 完結編

数式の部分なあ… Bernoulli » Hummus and Magnets

Bernoulli

by Christian Plesner Hansen Posted on December 10, 2012	

It's Ada Lovelace's 197th birthday today and the perfect day for my last post about Babbage‘s computing
engines. This one is about the famous first program for calculating the Bernoulli series which appeared
in Note G of her notes on Babbage's analytical engine. One of the odd things about this program is that
it's widely known and recognized as an important milestone – but as far as I can determine not widely 
understood. I haven't found even one description of what it does and how it does it, except for the
original note.

今日 (12月10日) は Ada Lovelace の 197回目の誕生日であり、Babbage の computing engine に関してわたしが
行った一連の投稿の最後を飾るのにふさわしい日です。今回はバベジの解析機関についての Ada によるノートの
Note G に記載されている Bernoulli series を計算する有名な first program に触れます｡ このプログラムの
odd things の一つが、(このプログラムは) 重要なマイルストーンであると広く知られていて、また、そのよ
うに認識がされているにもかかわらず、わたしが判断できる限りそのことが広く理解されていないということです。
このプログラムがやっていること (what it does) や、どのようにやっているのか (how it does) についての
description (解説、説明）を、original note を除いてわたしは一つとして見たことがありませんでした。


In this post I'll give such a description. It has roughly three parts. First I'll give a quick context
– what is Note G, that kind of thing. Then I'll derive the mathematical rules the program is based on.
I'll roughly follow the same path as Note G but I'll include a lot more steps which should make it
easier to digest. Note G is very concise and leaves a lot of steps to the reader and the derivation is
quite clever so some of those steps can be tricky. (You can also just skip that part altogether and
take the result at the end as given.)

今回の投稿でわたしはそういった解説をしようと考えています。記事は大まかに三つの部分から構成されています。
まずは quick context について。つまり、Note G とはなんなのか、どういったものかということについてで
す。次に、program が based on している mathematical rules を 導き出し (derive) ます。大枠では Note G と同
じ path に従いますが、digest をより簡単にするために多くのステップを含んでいます。Note G は very concise
ですが多くのステップを読者任せにしていて､その derivation はとても clever であり幾つかのステップは tricky
になっています (You can also just skip that part altogether and take the result at the end as given)。


In the last part I'll write up a modern-style program in C that does the same as the Note G program
and then break it down to get the program itself. Finally I'll do what you might call a code review of
the program.

最後のパートでは Note G にあったプログラムと同じことをする modern-style のプログラムを C で書きます。
そしてそのプログラム自身を理解するために break it down します。最後にあなたがコードレビューと呼ぶだろ
うものをプログラムに対して行います。

Okay, let's get started.
では始めましょう。


Background 背景


I won't go into too many general details about Babbage, Ada Lovelace, or the analytical engine. There's
plenty of resources on them and their history on the web, including other posts I've written about the
difference engine (what motivated it, how did it work) and programming the analytical engine (table code,
microcode, and punched cards).

バベジや Ada Lovelace，あるいは解析機関に関する非常に多くの general details には踏み込みません。そう
いったものに関する resource は、わたしがこの post の他に階謝幾関について書いたもの (今回一連の post
を行った動機) や (table code, micro code, punched cards のような) 解析機関のプログラミングについて書い
たものを含めて web 上に大量にあります。


In 1840, after Babbage had been working on the analytical engine for a while, he went to Turin and
discussed his ideas with a group of engineers and mathematicians. Based on those conversations one of
them, Luigi Menabrea, published an article about the engine in French in 1842. Soon after that Ada
Lovelace translated the article into English and Babbage encouraged her to write an original article.
She did – sort of – but rather than write a separate article she did it in the form of notes to her 
translation of Menabrea's article. Just to give an idea of how much she added, her translation with
notes, published in 1843, was three times as long as the original. One of the notes, Note G (they were
named from A to G), presented the program I'll describe in this post.

1840年、バベジはしばらくの間解析機関のために作業した後で Turin に行き、彼は自分のアイデアについて数学
者やエンジニアのグループとディスカッションしました。参加者の一人であった Luigi Menabrea は
conversations に基づいた engine についての article を 1842年にフランス語で出版しました｡ それを即座に
Ada Lovelace が英語に翻訳しましたが、バベジは original article を書くよう彼女を促しました。Ada は執筆
こそしましたが、しかしそれは separate article というよりはむしろ、彼女が行った Menabrea の article の
翻訳に対する注釈といったものでした。彼女がどのくらい追加したのかという an ideaを give するためだけに
彼女の翻訳は注釈つきで1843年に出版されましたが、これは元となったものの三倍ほどの分量がありました。その
ノートの一つである note G (AからGまでありました) に、今回わたしが説明しようとしているプログラムがあります。


We'll get to that very soon but first a few words about what it calculates, the Bernoulli series.

最初の数語を見ればすぐに何を計算しているかは理解できるでしょう。そう、Bernoulli series
(ベルヌーイ数列）です。 


Bernoulli


This is the first few values of the Bernoulli series:
以下に示すのは Bernoulli series の先頭からの幾つかです

    Name        B0 	B1 	B2 	B3 	B4 	B5 	B6 	B7 	…
    Value 	1 	-1/2 	1/6 	0 	-1/30 	0 	1/42 	0 	…

It's one of those mathematical objects, like e and π, that keep appearing in different places and seem
to have a special significance within the structure of mathematics. One place it appears is in Taylor
expansions of exponential and trigonometric functions – for instance, it holds that

Bernoulh series は e や π と同じくmathematical Objects の一つで、diffrentplaces で appear し続けているも
のです。この Bernoulli series は structure of mathematics において special signinficance を持っているよ
うに思われます｡ Bernoulli series は指数関数や三角関数の Taylor expansions にも現れます。たとえば､
it holds that


  \frac{x}{e^x-1} = \sum_{i=0}^{\infty}\frac{x^i}{i!}B_i = \frac{1}{0!}B_0 + \frac{x}{1!}B_1 + \frac{x^2}{2!}B_2 + \cdots


The Bernoulli series is not that difficult to compute but it's not trivial either. If 
you want to demonstrate the power of a computing engine it's not a bad choice.

この Bernoulli series は計算が難しいものではありませんが、trivial なものでもありません。
computing engine の威力をデモンストレーションするのなら悪くない選択です。


Part of Note G is concerned with deriving a way to calculate the series and that's what we'll start with,
using the formula above in combination with a second identity, one you're probably already familiar with,
the Taylor expansion of e^x:

Note G のpart は series を計算するための deriving a way (導出方法?) と関係しています。そして、 
that's what we'll start with,
using the formula above in combination with a second identity、
すでに見おぼえがあるだろう e^x の Taylore expansion から始めます：


  e^x = \sum_{i=0}^{\infty}\frac{x^i}{i!} = \frac{1}{0!} + \frac{x}{1!} + \frac{x^2}{2!} + \cdots


If we plug this into the left-hand side of the previous equation in place of e^x we get

これを前述の equation の左辺にあった ex に plug into すると次のようになります 


  \frac{x}{e^x-1} = \frac{x}{\left(\frac{1}{0!} + \frac{x}{1!}+\frac{x^2}{2!}+\cdots\right)-1}=\frac{x}{\frac{x}{1!}+\frac{x^2}{2!}+\frac{x^3}{3!}+\cdots}=\frac{1}{\frac{1}{1!}+\frac{x}{2!}+\frac{x^2}{3!}+\cdots}

This means that the original equation can also be written as
これはつまり、original equation は次のようにも書けるということです

  \frac{1}{\frac{1}{1!} + \frac{x}{2!} + \frac{x^2}{3!} + \cdots} =\frac{1}{0!}B_0+\frac{x}{1!}B_1+\frac{x^2}{2!}B_2+\cdots

Multiplying the denominator from the left-hand side onto both sides we get this identity:
denominator from the left side を両辺に乗ずると次の identity が得られます


  1 = \left(\frac{1}{0!}B_0+\frac{x}{1!}B_1+\frac{x^2}{2!}B_2+\cdots\right)\left(\frac{1}{1!} + \frac{x}{2!} + \frac{x^2}{3!} + \cdots\right)


This looks like we've gone from bad to worse, right? Both series on the right are infinite and we have
an inconvenient variable x stuck in there which we need to get rid of. This is where the cleverness
comes in.

This looks like we've gone from bad to worse, right？
右辺にある series はどちらも infnite であり、取り除く必要のある場所にはまり込んだ inconvenient variable
の x があります。
This is where the clevernes comes in.


We're not going to multiply these two series together, but if we were to do it we know 
what the result would look like. It would be a new series in x of the form:

これらふたつの series を一緒に multiply はしませんが、もし行なったならわたしたちは結果が似たようなもの
になると理解したでしょう。それは new series in x of the form でこうなります 

  c_0+c_1x+c_2x^2+\cdots

The 1 on the left-hand side even tells us what those coefficients are going to be: c0 is going to be 1
and all the remaining cis are going to be 0. And even though the full product will be infinite, the
individual coefficients are all nice and finite; those we can derive. Here is the first one, which we
know will be 1:

C0 は 1に近づき、それ以外の Ci はすべて 0に近づいていきます。そして full product は無限大になりますが、
個々の coefficients (係数はすべて nice and finite で微分可能です。以下に示すのはその最初の一つで、
これが1になることをわたしたちは知っています。


  1 = c_0 = \frac{1}{1!}\left(\frac{1}{0!}B_0\right)

The only unknown here is B0 and if we solve for it we get that it's 1. Let's try the next one:

ここで未知なのは B0 だけで、これが解決できれば式全体が 1 になります。次に進みましょう。


  0 = c_1 = \frac{1}{2!}\left(\frac{1}{0!}B_0\right) + \frac{1}{1!}\left(\frac{1}{1!}B_1\right)

Since we now know what B0 is B1 is the only unknown; solving for it gives us -1/2. One more time,

ここでB0 が判明したので、わからないのは B1 だけになりました。これを解決すると結果は 1/2 になります。


  0 = c_2 = \frac{1}{3!}\left(\frac{1}{0!}B_0\right)+\frac{1}{2!}\left(\frac{1}{1!}B_1\right)+\frac{1}{1!}\left(\frac{1}{2!}B_2\right)

Solving for B2 gives us 1/6.
さらにこれをB2について解決すると1/6という結果になります。


In general, if we know the first k-1 Bernoulli numbers we can now calculate the k'th by solving this:
一般的に最初の k-1 個の Bernounll numbers が判っていれば、以下の式を解くことによって k番目を計算できます：

  0 = c_k = \frac{1}{(k+1)!}\left(\frac{1}{0!}B_0\right)+\frac{1}{k!}\left(\frac{1}{1!}B_1\right)+\cdots+\frac{1}{1!}\left(\frac{1}{k!}B_k\right)

This only gets us part of the way though, it needs to be cleaned and simplified before we can code it.
(And again, if you find yourself getting bored feel free to skip to the next section where we start coding).

コードに落とす前に clean にして simplify しなければなりません(そして繰返しますが、退屈しているようなら
コーディングを始めている次のセクションへ進んでしまってもかまいません)。


The first simplification is to solve Bk up front. As you can see above the term we're interested in
is always the last one and has the form

最初の simplifcation は Bk の解決です。先に示したようにわたしたちが注目している term は常に最後の一つ
であり、次のような form を持っています。


  \frac{1}{1!}\left(\frac{1}{k!}B_k\right)=\frac{1}{k!}B_k

This we can solve for in the original equation:


  0 = \frac{1}{(k+1)!}\left(\frac{1}{0!}B_0\right)+\frac{1}{k!}\left(\frac{1}{1!}B_1\right)+\cdots+\frac{1}{k!}B_k


  \frac{1}{k!}B_k=-\frac{1}{(k+1)!}\left(\frac{1}{0!}B_0\right)-\frac{1}{k!}\left(\frac{1}{1!}B_1\right)-\cdots-\frac{1}{2!}\left(\frac{1}{(k-1)!}B_{k-1}\right)


  B_k=-\frac{k!}{(k+1)!}\left(\frac{1}{0!}B_0\right)-\frac{k!}{k!}\left(\frac{1}{1!}B_1\right)-\cdots-\frac{k!}{2!}\left(\frac{1}{(k-1)!}B_{k-1}\right)


This simplifies the process of calculating Bk, we now just have to plug the previous values into this
equation to get the next one.


We've already computed the first few values of the series so we can calculate the first two terms up front:
わたしたちはすでに数列の最初の二つの値を計算していますから、最初の二つの terms が計算できます


  -\frac{k!}{(k+1)!}\left(\frac{1}{0!}B_0\right)-\frac{k!}{k!}\left(\frac{1}{1!}B_1\right)=-\frac{1}{k+1}B_0-B_1=-\frac{1}{k+1}+\frac 12=\frac 12\cdot\frac{k-1}{k+1}

Pluggin them back into the formula we get:
それらをわたしたちが得た数式に逆向きに pluggin します


  B_k=\frac 12\cdot\frac{k-1}{k+1}-\frac{k!}{(k-1)!}\left(\frac{1}{2!}B_2\right)-\frac{k!}{(k-2)!}\left(\frac{1}{3!}B_3\right)-\cdots-\frac{k!}{2!}\left(\frac{1}{(k-1)!}B_{k-1}\right)

As the table of Bernoulli values at the beginning suggested all the numbers at odd indexes greater than
1, B3, B5, etc., are zero. And since we've already handled 1 as a special case we can just drop all the
odd terms:

最初にあった Bernolli values の table にあったように、1より大きな odd indexes の数値はすべて 0 であるこ
とを示唆しています。そして index の 1 はすでに special case として handle していますから、
odd terms (奇数項) はすべて単に drop できます。 


  B_k=\frac 12\cdot\frac{k-1}{k+1}-\frac{k!}{(k-1)!}\left(\frac{1}{2!}B_2\right)-\frac{k!}{(k-3)!}\left(\frac{1}{4!}B_4\right)-\frac{k!}{(k-5)!}\left(\frac{1}{6!}B_6\right)-\cdots

Above each Bi has two separate factors being multiplied onto it; we can multiply those 
together to make just one factor:

上記の Bi はそれぞれ二つの separate factor を持っています
それらはひとつの factor を作るために multiply together できます。


 B_k=\frac 12\cdot\frac{k-1}{k+1}-\frac{k!}{2!(k-1)!}B_2-\frac{k!}{4!(k-3)!}B_4-\frac{k!}{6!(k-5)!}B_6-\cdots


Since we're only even interested in even values of k we don't lose generality if we assume that that
k=2n for some integer n.

わたしたちが関心を持っているのは k が偶数のときの値だけなので、
some integer n に対して k=2n と仮定しても generality を失うことはありません。


  B_{2n}=\frac 12\cdot\frac{2n-1}{2n+1}-\frac{2n!}{2!(2n-1)!}B_2-\frac{2n!}{4!(2n-3)!}B_4-\frac{2n!}{6!(2n-5)!}B_6-\cdots

Okay, now we're getting close to the final form that we'll code up but first I'll have to stop and do 
something distasteful: I'll change how we denote the values in the Bernoulli series. What you've seen up
until this point is the modern, standard use of indices: B1 is -1/2, B2 is 1/6, B3 is 0 etc. In Note G,
however, Lovelace numbers the values differently. She skips the first two values and the rest are shifted
by 1 so that B1 denotes the value we would call B2, B2 is what we would call B3 and so on:

さて、final form にだいぶ近づきましたがまずは stop して distasteful な something をしなければなりません。
Bernoulli series における values の denote のやり方を変えます。ここまで見てきたやり方は
modern で標準的な indices の使い方で、B1 が -1/2, B2 が 1/6, B3 が 0 のようになっていました。
しかし Note G で Lovelace は values に異なる番号付けをしていたのでした。
彼女は値の最初の二つをスキップしてさらに残りをひとつシフトしていたので、
Note G でいう B1 はわたしたちが B2 と呼んでいるものの値を denotes していて、
Note G でいう B2 はわたしたちが B3 と呼んでいるものと同じといった具合になっています。


  Value 	1 	-1/2 	1/6 	0 	-1/30 	0 	1/42 	0 	…
  Modern 	B0 	B1 	B2 	B3 	B4 	B5 	B6 	B7 	…
  Note G 	not used 	B1 	B2 	B3 	B4 	B5 	B6 	…


Up until now I've been using (what we call) B0 and B1 so it made sense to use the modern convention but
now we only have terms left that have a name in Note G, so I'll switch to her convention. This means
that I'll write the formula as

ここまでは BO と B1 (とわたしたちが呼んでいるもの) を使ってきましたから modern convention を使うこと
には意味がありますが、現時点ではわたしたちには Note G に残されていた terms しかありません。ですから
ここでは、Ada の使った convention に従うことにします｡つまり、式を次のように記述するということです。 


  B_{2n-1}=\frac 12\cdot\frac{2n-1}{2n+1}-\frac{2n!}{2!(2n-1)!}B_1-\frac{2n!}{4!(2n-3)!}B_3-\frac{2n!}{6!(2n-5)!}B_5-\cdots

Note that the meaning of n doesn't change so the formula is almost the same as it was before, the only
difference is the numbering of the Bis.

n の意味を変えてはいないので数式は以前のものとほぼ同じであり、
唯一違うのは B の numbering だけであることに注意してください。


At this point it's convenient to give the factors being multiplied onto Bi a name; we'll call them A^n_i.
So the above is the same as

この時点で factor に名前をつけるのは convenient なので、A^n_i と呼ぶことにしましょう。ですから、上記の式は
次の式と同じことです


  B_{2n-1}=A^n_0-A^n_1B_1-A^n_3B_3-A^n_5B_5-\cdots-A^n_{2n-3}B_{2n-3}

where
ここで

  A^n_0 = \frac 12\cdot\frac{2n-1}{2n+1}

  A^n_i = \frac{2n!}{2i!(2(n-i)+1)!} \quad \mathbf{for}\quad i > 0

です。

The second one looks hairy but can be boiled down:
二番目のものは複雑に見えますが boiled down 可能です。


 \frac{1 \cdot 2 \cdots (2n-1) \cdot 2n}{(1 \cdot 2 \cdots (2i-1) \cdot 2i)(2(n-i)+1)!}=\frac{(2n-(i-1))\cdot(2n-(i-2))\cdots (2n-1) \cdot 2n}{1 \cdot 2 \cdots i \cdot (i+1)}=\frac {2n}{2} \cdot \frac{2n-1}{3} \cdot\cdots\cdot \frac{2n-(i-1)}{i+1}


The last step is possible because there is, very conveniently, the same number of terms in the numerator
and denominator. What's useful about this form is that it makes it clear that each of the Ais is the
previous one with another factor multiplied onto it (except for A0 which is special):

この最後のステップは possible です。なぜならとても都合のよいことに、numerator と denominator に同じ数の
terms があるからです。この form が useful なのは、Ai のそれぞれが
the previous one with another factor multiplied onto it
であることを明確にすることです(特別である A0を除きます):


  A^n_1 = \frac{2n}{2}

  A^n_2=\frac{2n}{2}\cdot\frac{2n-1}{3}=A^n_1\left(\frac{2n-1}{3}\right)

  A^n_3=\frac{2n}{2}\cdot\frac{2n-1}{3}\cdot\frac{2n-2}{4}=A^n_2\left(\frac{2n-2}{4}\right)


We're slowly starting to see a program take shape here: if we know the previous Bi we can iteratively
calculate the sequence of Ais and multiply them onto those values, ultimately giving us the next Bi.
Now we'll switch gears and code that up as an actual program.

We're slowly starting to see a program take shape here:
直前の Bi がわかっていれば、Ai の sequence は iteratively に計算できます。
そしてそれらは those values に multiply できるので、最終的に next Bi が得られます。
ではここでギアを切り替えて実際のプログラムのコードへ移りましょう 


The Program

Let's quickly recap what all the work above gave us. This formula gives us the next 
Bernoulli number, given the previous values:

all the work above gave us を quickly recap しましょう。
わたしたちがプログラムを記述する必要のある二つの key bits はこうでした 


  B_{2n-1}=A^n_0-A^n_1B_1-A^n_3B_3-A^n_5B_5-\cdots-A^n_{2n-3}B_{2n-3}

where
ここで

  A^n_0=\frac 12 \cdot \frac{2n-1}{2n+1}

  A^n_1=\frac {2n}{n}

  A^n_i=A^n_{i-1}\left(\frac{2n-(i-1)}{i+1}\right)

です。


Say we wanted to write a program that calculated B7 using this approach, that is, the special case
where n is 4 (since, as you may vaguely remember, n is the value such that k = 2n – 1).

このアプローチを使って B7 を計算するプログラムを記述したいとしましよう。つまり、n が 4 である
special case です (as you may vaguely remember; なぜなら n が k = 2n-1 であるような値だからです)。 


  B_7=A^4_0-A^4_1B_1-A^4_3B_3-A^4_5B_5


Let's take a crack at implementing this in C. We'll do it twice: first a straightforward implementation
that's quite similar to the Note G program and then we'll go back over it again and make some
modifications to make is (almost) step-by-step identical to Note G.

これを C で実装してみましょう。実装は二回行います。一回目は Note G のプログラムに非常に良く似た
straightforward の実装です。それから go back over it again して Note G に対して(ほぼ) step-by-step で
identical なものにするために幾つかの変更を加えます。


First, we can assume that the program has been given n and that we have already computed the preceding
Bi. We'll store those in some variables.

まずプログラムには n が与えられていて、かつすでに計算済みの Bi があると仮定ができます。
そのような値を変数に格納しましょう。


  double n = 4;
  double B1 = 0.166667;                             // 1/6
  double B3 = -0.0333333;                           // -1/30
  double B5 = 0.0238095;                            // 1/42

The general idea will be to keep a variable A that we'll multiply successive factors onto so it takes
the value of A^4_1, then A^4_3, etc. We'll the multiply those factors onto the previous Bi, which we've
already been given, and accumulate the result in another variable, result.

general idea では successive factors を multiply する variable A を keep して、A^4_1 の値、それからA^4_3
の値、さらに…と値を得ていきます。それらの factors をすでに与えられている previous Bi に multiply して
その結果を別の変数 result に accumlate します。


The first term A^4_0 we'll calculate directly:
最初の項 A^4_0 は直接計算します

  double A = 0.5 * (2 * n - 1) / (2 * n + 1);       // A0
  double result = A;

Then we calculate the second term, A^4_1B_1, and subtract it from the result:
それから第二項 A^4_1 を計算して result からそれを減じます


  A = 2 * n / 2;                                    // A1
  double term = B1 * A;                             // B1 A1
  result -= term;                                    // A0 - B1 A1

The we calculate A^4_3 by multiplying the appropriate factor onto A^4_1:
A^4_1 に appropriate factor を乗ずることにより A^4_3 を計算します



  A *= (2 * n - 1) / 3 * (2 * n - 2) / 4;           // A3
  term = B3 * A;                                    // B3 A3
  result -= term;                                   // A0 - B1 A1 - B3 A3

And for the last term, A^4_5B_5, we follow exactly the same pattern except that the 
factor is slightly different:

そして last term の A^4_5 B_5 に対して、factor が slightly diffrent な点を除けば
まったく same pattern に従います


  A *= (2 * n - 3) / 5 * (2 * n - 4) / 6;           // A5
  term = B5 * A;                                    // B5 A5
  result -= term;                                   // A0 - B1 A1 - B3 A3 - B5 A5
  printf("The result is: %g\n", result);


If you run this program it will print
このプログラムを実行したときの出力は以下のようになります 

The result is: -0.0333333

which is indeed -1/30 also known as B7. This is, in essence, what the Note G program does. It calculates
B7 using this sequence of steps. However, if you were to look at Note G now what you'd see would still
look a bit foreign. That's because C let's us do things that the analytical engine doesn't. For instance,
C lets us write complex expressions where the analytical engine only does one operation at a time. In C
we also don't need to worry about recurring expressions like n * 2 because the compiler will make sure it
only gets calculated once. The analytical engine obviously had no compiler so the programmer has to do
that kind of optimizations by hand.

この値は -1/30、つまりB7です。これは本質的に Note G の program が行っていることであり、上記の sequence of
steps を用いて B7 を計算しています。ただし Note G を見ても何か違ったものに思えるでしょう。それは解析機関
が行わないことを C では行なってくれるからです。たとえば解析機関が一度に一つしか行えないような複雑な式が
C では書けます。コンパイラーが一度しか計算しないことを保証するので、C では n * 2 のような recurring
expression について心配する必要もありません。解析機関にはコンパイラーはありませんから、プログラマーはこの
種の最適化を自分の手で行わなければなりません。


To take all this into account I'll rewind and go over the program again but this time I'll include all the
extra work. In the code comments I'll give how the variables in my program maps to variables in Note G.

take all this into account のためには rewind して go over the program again しますが、
今回は extra work の すべてを include します。
プログラム中の variables がどのように Note G の variables に map されているのか
をコメントで説明しています。

First off we have the values we're given as input:
初めに入力として与えられた値があります。 


  one = 1.0;                                        // V1
  two = 2.0;                                        // V2
  n = 4.0;                                          // V3
  B1 = 0.166667;                                    // V21
  B3 = -0.0333333;                                  // V22
  B5 = 0.0238095;                                   // V23


Note G doesn't use constant values, only variables. So to use the values 1 and 2 it needs two variables that
are pre-initialized with those values. Also, we'll assume that all variables are pre-declared and start out
with the value 0, we won't bother declaring then anymore.

Note G では constant values を使わず、変数だけを使っていました。そこで、1 と 2 という値のためにはその値で事
前に初期化されているふたつの variable が必要になります。同様にすべての variable が事前に宣言されて値 0 で初
期化されていると仮定し、このあとに bother declaring を行わないようにします。


The first step is to calculate A^4_0. The C version was:

最初のステップは A^4_0 の計算です。C バージョンではこうでした：


  double A = 0.5 * (2 * n - 1) / (2 * n + 1);       // A0
  double result = A;

and here is the same thing in analytical-engine-style C:
そしてこれを解析機関スタイルの C で書くとこうなります

  two_n_minus_one = two_n_plus_one = numerator = two * n;  // V4 V5 V6
  two_n_minus_one -= one;                           // V4
  two_n_plus_one += one;                            // V5
  A = two_n_minus_one / two_n_plus_one;             // V11
  A /= two;
  result += A;                                      // V13
  current_n = n - one;                              // V10


If you compare the two versions it should be clear that they do the same thing except that in the latter
you get all the excruciating details of the temporary variables. But it's still pretty straightforward.
Notice in the next to last step how we add A to result before result has ever been assigned – we're using
the fact that variables that haven't been used yet can be assumed to be 0.

上記二つのバージョンを比較したなら、後者では temporary variables の all the excruciatmg details を get
していることを除けばどちらも同じことを行っていることが明確になっているでしょう。しかしそれはまだ pretty
straightforward です。result に対する代入が行われる前に Aをどのように result に足しこんでいるのか、next
to last step に注意してください。‐未使用の variables は 0 であると仮定できるという事実を利用しています。


The next part of the C program calculates A^4_1B_1:
Cプログラムの次のパートはA^4_1 B1の計算です： 


  A = 2 * n / 2;
  double term = B1 * A;                             // B1 A1
  result -= term;                                    // A0 - B1 A1

In analytical-engine-style this corresponds to:
これに対応する解析機関スタイルはこうです

  denominator += two;                               // V7
  A = numerator / denominator;
  term = B1 * A;                                    // V12
  result -= term;
  current_n -= one;

Instead of recalculating 2*n we already have it in a variable, numerator, and will be decrementing it
as we go. Similarly with the denominator which we'll increment as we go rather than recalculate. Notice
again how we increment denominator before it has been initialized and rely on it being 0 so it ends up
containing 2.

既に変数 numerator に値がありますから 2 * n の再計算はしません。そして numerator はそうする意図があった
かのように decrementing します。同様に、increment をする denominator も再計算をせずに値を使います。どの
ように denominator をそれが初期化される前に increment するか、また初期状態に 0 が入っていることを期待し
ていて increment の結果値が2になるということにもう一度注意してください。


The next step is to calculate A^4_3B_3:
続くステップはA:B3:の計算です



  A *= (2 * n - 1) / 3 * (2 * n - 2) / 4;           // A3
  term = B3 * A;                                    // B3 A3
  result -= term;                                   // A0 - B1 A1 - B3 A3

In analytical engine style we get:
解析機関形式ではこうなります

  numerator -= one;
  denominator += one;
  factor_1 = numerator / denominator;               // V8
  A *= factor_1;

  numerator -= one;
  denominator += one;
  factor_2 = numerator / denominator;               // V9
  A *= factor_2;

  term = B3 * A;
  result -= term;
  current_n -= one;

The first two blocks together calculate A^4_3 and the last part subtracts another term from result. The
last part looks very similar:

最初二つのブロックは共に A^4_3 を計算していて、そして最後のパートでは result からanother term を 減じてい
ます。その最後のパートはとてもよく似ています:

  A *= (2 * n - 3) / 5 * (2 * n - 4) / 6;           // A5
  term = B5 * A;                                    // B5 A5
  result -= term;                                   // A0 - B1 A1 - B3 A3 - B5 A5

In this code we're using different constants to calculate A than the previous block, and we're using B5
instead of B3 in the second step. In the analytical-engine-style code we've already decremented the
variables we're using to calculate A so that part is exactly the same as for the previous block, but we
still need to use B5 instead of B3:

このコードでは A を計算するのに previou sblock とは異なる定数を使っていて、また､二番目のステップでは
B 3の代わりに B5 を使っています。解析機関形式のコードではすでに A を計算するために使っている variables
を decrement しています。そのためその部分は previous block と全く一緒なのですが、
B3ではなく、B5を使う必要があるのです。

  numerator -= one;
  denominator += one;
  factor_1 = numerator / denominator;
  A *= factor_1;

  numerator -= one;
  denominator += one;
  factor_2 = numerator / denominator;
  A *= factor_2;

  term = B5 * A;
  result += term;
  current_n -= one;

That's it – the value of B7 is now stored in result. The only thing left to do is to store the value in a
dedicated variable and then reset some of the other variables so we're ready to calculate the next Bi.

B7の値は result に格納されています。やるべきことで残っているのは dedicated variable に値を格納することだ
けです。そして次の Bi を計算するために some of the other variables をリセットします。 


  B7 += result;                                     // V24
  n += one;
  // Also reset numerator and denominator. numerator と denominator もリセットする


Now, it might look like we've just written the world's most excruciating C program. In fact what's
we've done here is step through, instruction-by-instruction, the Bernoulli program in Note G. If you
look through that program here on the right you will notice that for each step there is a corresponding
line in the analytical-engine-style C program above. (For details on how the table program format works
see my post on analytical programming.) Click the program to go to an expanded version.

ここまでは、わたしたちがただ単に the world's most excruciating C program を書いただけのように思えるか
もしれません。実際ここでわたしたちが行ったことは、Note G にあった Bernoulli program を
instruction-by-instruction で step through しただけです。もし program here on the right を look through
したら、前述の各ステップが解析機関形式の C program の行と対応していることに (table program がどのように
works しているのかの詳細はわたしの analytical programming についての投稿を参照してください）気がつくこ
とでしょう。Click the program to go to an expanded version. 



Comparing the programs
プログラムを比較する


Now, if you were to go though the program in details you would notice that what I just said isn't actually
true. There are some small differences between the C-style version and what you see in the original notes.
Those differences are, I'm afraid, bugs in the original program.

さてここでプログラムを詳細に見ていけば、わたしが単に actually true ではないと言っていることに気がつくでしょ
う。C 形式でのプログラムのアウトラインと original notes に書かれていたものとではちょっとした違いがあります
が、わたしはそういった違いは original program のバグではないかと疑っています。


There are three separate issues. The first one is trivial: in step 4 the instruction 
三つの個別のissuesがあります。最初の一つは trivial です。ステップ 4 においては次のようにすべきなのですが


  A = two_n_minus_one / two_n_plus_one;

is actually switched around and does
実際には switched around されて次のようになっています

  A = two_n_plus_one / two_n_minus_one;


That's a tiny issue that could have been introduced by the printer, and the comment 
for that step in the table has it the right way round.

これはprinter によって introduce された可能性のある tiny issue です。
そしてtable 中の that step に対するコメントでは right way round していました。


Secondly, remember how similar the code for calculating A^4_3B_3 and A^4_5B_5 was, the only difference
being that one used B3 and the other B5? The second issue is that Note G actually overlooks this
difference and where the code for A^4_5B_5 should be simply says to repeat the steps for calculating
A^4_3B_3. As you'll see if you click through to the expanded version, in the white space between steps
23 and 24 there is a comment saying “Here follows a repetition of Operations thirteen to twenty-three”.


二番目に、A^4_3 B3 を計算するコードと A^4_5 B5 を計算するコードとがどのくらい似ていたかを思い出してください。
唯一の違い は B3 を使うか B5 を使うかという点だけです。この二番目の issue は Note G が実際にはこの差異を
overlook していたということです。ここで､A:B5のためのコードは単に A^4_3 B3 の calculating のために
the steps を repeat すると主張すべきものです。expand version を click throughす ればわかるように、
step23 と step24 の間の whitespace に
"Here follows a repetition of Operations thirteen to twenty-three"
というコメントがあります。


The third issue is more systematic: the signs are reversed. If you fix the two first issues the result
will be correct – except that it has the wrong sign. The sign reversal is also present in the text of
the note so I'm not sure if it's deliberate, but if it is the program calculates something we wouldn't
call the Bernoulli series.

三番目の問題はもっと systematic で、結果の符号が反転しているというものです。先の二つの issues を修正
すれば結果は正しくなりますが、その符号が間違っているのです。この符号反転は注釈の本文にも存在している
ので、もしこのプログラムが Bernoulli series とわたしたちが呼んでいるものではない何かを計算するもので
あるのなら、それが deliberate (意図的なもの）なものかわたしには判断できません。



There is also a fourth bonus issue, which is not a bug exactly. At the end in step 25 the code resets
some variables such that it's ready to calculate B9. But if you loop around and run the main program
again it won't give the right result. This is both because some of the variables which the program
assumed were 0 on the first run won't be on the second and because the “loop” after step 23 would
need yet another block, this one for calculating A^5_7B_7. But this is only an issue if you consider
the program to calculate the whole Bernoulli sequence rather than just B7.


四番目の bonus issue もあります。とはいえこれは bug ではなく、単なる Observation です。コードでは step25
の最後で計算の準備ができていた B9 のような幾つかの variables をリセットしていました。しかしもし、loop
around して main program を再度実行したならば、そのときは正しい結果が得られないでしょう。これはプログラ
ム上で一度目の実行では 0 であると仮定されていた一部の変数が二度目に実行されたときにそうではないことと、
ステップ23のあとの“ループ”は他の A^5_7B_7 を計算するブロックがまだ必要としていることの両方が原因です。 


Now, I want to be sure to put these issues in the right perspective. This is a complex program. I've
been programming most of my life and it still took me a significant amount of time to understand how
it worked. This program was written by a mathematician with no background in programming – they were
inventing programming as they went along. On paper. There is an almost frightening intellectual power
behind this, both in the program itself and the underlying framework. For me, the fact that we modern
programmers, with the benefit of education and experience, can spot a few details out of place is only
what you would expect and takes little away from that.

さてここで、わたしはこれらの issues を right perspective に置きたいのですが、それは複雑な問題です。わた
しは自分の人生の大部分でプログラミングしてきました。そして今でもプログラムがどのように動作しているのか
を理解するために多くの時間を掛けています。このプログラムはプログラミングの background を持たない数学者
によって書かれました。彼らは they went a long として紙の上でプログラミングを発明しました。プログラムそ
のものと underlyng framework の両方において、その背後には an almost frightening intellectual power が
隠れています。わたしにとっては、わたしたち modern programmers が few details out of place を spot でき
るという the fact は、あなたが期待でき、かつ little away from that するであろう唯一のことです。

That's it, you now know how the Note G program works and what each step does. If you've found this
interesting there's a good chance you'll find Lovelace's original notes interesting too, I recommend
reading them. If you want to play around with the program yourself I've “ported” it to C; you can
find it on github.

これであなたは Note G のプログラムがどのように動作するのかということと、各ステップで行っていることを
理解しました。もしあなたがこれに興味を抱いたのなら、Lovelace の original notes にも興味を持つであろう
good change がありますから、それらを読むことをお勧めします。わたしが C へ「移植した」プログラムを自分
でもいじってみたいというのであれば github にあります。

2013年06月24日

■_

(意図的に空白にしています)

■_

非破壊「自炊」の衝撃複写技術進化の先は… （東京新聞） | 「日々担々」資料ブログ

■_

来月予定の新刊で

●Team Geek
Googleのギークたちはいかにしてチームを作るのか

Brian W. Fitzpatrick、Ben Collins-Sussman　著
角 征典　訳
224ページ
定価2,310円（税込）
ISBN978-4-87311-630-3
-----------------------------------
●コネクト
企業と顧客が相互接続された未来の働き方

Dave Gray、Thomas Vander Wal　著
野村 恭彦　監訳
牧野 聡　訳
318ページ
定価2,310円（税込）
ISBN978-4-87311-619-8

この二つがちょっと気になる

■_

FM-8ーっ泳ぐやる夫シアターやる夫と学ぶホビーパソコンの歴史　第六話　バウンドレス・エイト

■_

シンプルさの必要性 | eed3si9n
2012年4月23日にテキサスの Austin で行われた RailsConf 2012 での Rich Hickey (@richhickey) さんによる基調講演、Simplicity Matters を書き起こして翻訳しました。
Programmers know the benefits of everything and the tradeoff of nothing.
RLE in Perl5 - tokuhirom's blog.
プログラミングの勉強仲間をつくるには | C・C++のQ&A【OKWave】
作曲ソフト界には凡庸性の高い拡張子は無いんですか？ | その他(ソフトウェア)のQ&A【OKWave】
Twitter / houmei: "今は京の100倍を目指したスパコンをしくしくと開発している ...
Raspberry Pi［実用］入門――手のひらサイズのARM/Linuxコンピュータを満喫！：書籍案内｜技術評論社
軽々しく使われすぎだろうと思う『言葉』 : ひまログ2ch
【コラム】鉄道とIT (28) 鉄道の運行や安全と通信網の関わりいろいろ(2) | エンタープライズ | マイナビニュース

■_ Structurally fixing injection bugs

ま、この辺で。

More magic - Structurally fixing injection bugs

Structurally fixing injection bugs Posted on 2012-09-23

The two biggest threats to the web are caused by the same underlying mistake. It is time this problem
was fixed at its root. This article will attempt to provide the tools do so.

web に対するふたつの biggest threat は同一の underlying mistake によって引き起こされます。今こそその
問題を根本から解決するときです。この article では解決のためのツールを提供することを目指します。


Input sanitation, input filtering or output escaping?


The Open Web Application Security Project (OWASP) does a great job at educating people and suggesting
practical solutions to avoid common weaknesses. Unfortunately, most security bloggers focus on
vulnerabilities rather than the prevention of attacks, and those that do often give bad advice. For
example, common advice is to avoid XSS (cross-site scripting) and SQL injection bugs by "sanitizing"
or "validating" input. Now, by itself this is good advice.

OWASP (Open Web Application Security Project) は、人びとに対する教育と common weaknesses を排除する
practical solution の提案というすばらしい仕事を成し遂げました。残念ながら security blogger たちの大半は
prevention of attack ではなく脆弱性 vulerrabilities に focus していて、しばしば bad advice を与えてし
まっています。そういった一般的なアドバイスにはたとえば入力を「sanitizing」したり「validating」することで
XSS や SQL インジェクションを防止しろというものがあります。
Now, by itself this is good advice.


Unfortunately, the phrase "sanitize your inputs" is often misunderstood and the advice itself
can be misguided. For example, Chris Shiflett says:

残念なことに「入力を sanitize しろ」というフレーズはしばしば誤解されていて、このアドバイス自体が misguide
してしまっている可能性があります。たとえば Chris Shiflett はこう言っています。

    If you reject [anything but alphanumerics], Berners-Lee and O'Reilly will be rejected, despite
    being valid last names.   However, this problem is easily resolved.  A quick change to also
    allow single quotes and hyphens is all you need to do.  Over time, your input filtering
    techniques will be perfected.

    (アルファベットと数字以外)を reject したならば、Berners-Lee や O'Reilly は正しい last name にも
    かかわらず reject されてしまう。しかしこの場合は single quotes や hyphens を許可するような
    quick change をすればよい。いずれ、あなたの input filtering techniques は完璧なものになるだろう


I think this advice is a little unhealthy. Those are valid names, and rejecting them will only scare
away customers and reinforce the idea that the "security Nazis" are out to inconvenience
people. I wish people would place less emphasis on filtering and sanitizing. Citing this XKCD comic has
become a cliché, which (while funny) makes it worse:

わたしはこのアドバイスを少々 unhealthy なものだと考えています。正当な氏名であるのにそれを reject してしまう
ということは単に顧客を遠ざけてしうまうだけであるし、さらにユーザーに不便を強いる「Security Nazis」な考え方を
強化してしまうでしょう。
I wish people would place less emphasis on filtering and sanitizing.
Citing this XKCD comic has become a cliche', which (while funny) make it worse:


Validating and sanitizing input is good when it refers to parsing input into meaningful values immediately
when receiving it, so that you don't, say, get a URL when you are expecting an integer. The horror story of
ROBCASHFLOW shows how important input restrictions can be (but see also this cautionary list. Tl;dr: you're
doomed either way).

入力を受け取ったときに即座にそれを meanigful な values への parsing を行うことを意味しているのなら、入力の
validating や sanitizing はよいことです。つまり、整数を期待しているときに URL を得てはいけないということです。
ROBCASHFLOW の horror story は input restriction が以下に重要であるかを show しています
(but see also this cautionary list. Tl;dr; you're doomed either way)。


However, input sanitation will (in general) not prevent XSS or SQL injection. The OWASP XSS prevention
"cheat-sheet" recognizes input validation and sanitation for what it is; a good secondary security
measure in a broader "defense in depth" strategy.

しかしながら、入力の sanitation は(一般的には) XSS や SQLインジェクションを防げないでしょう。OWASP の XSS
prevation "cheat-sheat" では input の validation と sanitation をそういうものであると recognize しています。
そして a good secondary security measure in a broader "defense in depth" strangy.


Instead, there are only two correct ways to prevent "injection" bugs. The best is often even omitted
from advice, which is to avoid the problem entirely (see below). The other is to escape output.  Unfortunately,
advice to escape often seems to imply that you should manually call escaping procedures; "just use
mysql_real_escape_string()". This is a very bad idea; it's tedious, it's easy to forget, it makes code
less readable and it requires everyone working on the code to be equally informed about security issues (a
great idea, but not very realistic).

それどころか、"injection" bug を防ぐ方法というものはたったの二つしかありません。最善の方法は問題そのものを
排除するという、アドバイスとはかけ離れたものになってしまうことがしばしばです。もうひとつの方法は出力をエス
ケープするというものです。残念なことに、エスケープしろというアドバイスは往々にしてエスケープのための手続き
を手作業で呼び出せという暗黙の了解があるように思えます。たとえば "just use mysq_real_escape_string()" のよ
うなものです。これは非常によくないアイデアです。退屈であり、忘れてしまいがちであり、コードの読みやすさを低
下させ、さらにそのコードで作業するひとすべてに対して security issues を等しく inform させろということを要求
するのです(それはとてもgreat なアイデアですが、まったく非現実的です)。


Let's investigate how we can prevent these vulnerabilities easily and automatically. This will help us
secure applications in a structural rather than an ad-hoc way.

わたしたちはこれらの vulnerabilities をどのようにして easily かつ automatically に prevent できるのかを
investigate してみましょう。これは ad-hoc でない手法による secure なアプリケーションを手助けするでしょう。



The trouble with strings (文字列にまつわるトラブル)


The underlying problem of all these vulnerabilities is that a tree structure (e.g., the SQL script's AST or
the HTML DOM tree) is represented as a string, and user input which should be a node in the tree is inserted
into this string. If this includes characters from the meta-language which describes the tree's structure, it
can influence that structure. Here's an example:

こういった vulnerabilities に関する問題すべての根本にあるのは、木構造 (たとえばSQLスクリプトの AST であるとか
HTML の DOM tree のようなもの) が文字列として表現されていること、そして tree のノードであるべきユーザーの入力
がその文字列に挿入されていることです。もしそれに木構造を記述しているメタ言語からのキャラクターが含まれたならば、
それも structure に影響を及ぼす可能性があります。例を挙げましょう:


    <p>{username} said the following: {message}</p>


When message is "So you see, if a<b and c<a, then b>c.", you get output like this
(depending on the browser, HTML version and phase of the moon):

ここで message が "So you see, if ...." であったとすると、次のような出力を得ることになります
(depending on the browser, HTML version and phase of the moon):

   Math teacher said the following: So you see, if ac.

This code is simply incorrect, and this bug will frustrate users like the math teacher. But this can turn
into a security nightmare; any punk can make you look like a fool by making your images dance around,
taking over your users' sessions by stealing cookies, or do much worse. The underlying reason this nonsense
is possible at all is the fact that you are mixing user input strings with HTML.

単純にこのコードは間違っていますが、そのバグは数学教師のようなユーザーを frustrate させるだけでしょう。しか
しこれは security nightmare になりうるのです。すべての punk はあなたを愚かに見せる可能性があります。
by making your image dance around, taking over your users' sessions by stealing cookies, or do much worse.
stealing cookies によってユーザーのセッションを take over する

その underlying reason this nonsense は possible at all がユーザー入力文字列と
HTML を混ぜようとしていることなのです。


In other words, you're performing string surgery on the serialized representation of a tree structure. Just
stop and think how insane that really sounds! Why don't we use real data types? While researching this topic,
I found an insightful article called "Safe String Theory for the Web". The author has a good grasp
on the problem and comes close to the solution, but he never transcends the idea of representing everything
as a string.

言い換えると、あなたはシリアライズされた representation of a tree structure に string surgery を施そうとして
いるのです。手を止めて、それがいかに狂気あふれる行為なのかを考えましょう! なぜわたしたちは real data types を
使っていないのでしょうか? この topic を調査している間に、わたしは "Safe String Theory for the Web" というタイ
トルの insightful な article を見つけました。その作者はこの問題に対する good grasp を持っていて solution に
近づいたのですが、結局彼はすべてを文字列として表現するというアイデアを捨てることはできなかったのです。


Many people don't, so despite the flawed concept, there are several solutions that take string splicing as a
given. For instance, some frameworks have a sort of "safe HTML buffers", which automatically
HTML-escape strings. These solutions don't deal with the context problem from "Safe String Theory for
the Web". There's only one built-in string type, and making it context-aware is extremely hard, maybe
even impossible. Strongly typed languages have an advantage here, though!


Many people don't,ですから (すべてを文字列で表現する) flawed concept であるにもかかわらず
string slicing を与えられたものとして受け取るいくつかの solution があるのです。
たとえば一部の framework では、文字列自動的に HTML-escape がなされる "safe HTML buffers" といった類のものを用
意しています。これらの solution は "Safe String Theory for the Web" のような context problem に対する対処はし
ません。ただひとつの組み込みの文字列しかないのでそれを context-aware にするのは非常に困難で、たぶん不可能なこ
とでしょう。しかし Strongly typed languages にはここでも advantage があります!


Representing HTML as a tree helps preventing injection bugs, and has other advantages over automatic escaping.
For example, we need to worry less about generating invalid HTML; our output is always guaranteed to be
well-formed. The essence of an XSS attack is that it breaks up your document structure and re-shapes it.
These are just two sides to the same coin: By taking control of the HTML's shape, XSS is also avoided.

HTML を木として表現することは injection bug を防ぐ手助けになります。また、automatic escaping を上回る
advantage を持っています。たとえば invaild な HTML を生成することを心配する必要が少なくなります。また、出力が
常に well-formed なものであることが保証されます。XSS 攻撃の essence とはドキュメントの構造を壊してそれを
re-shape することです。そこにはコインと同じくただ二つの面だけがあります。HTML の shape の制御を握ることによっ
て、XSS も同時に avoid されるのです。


There's another, more insidious problem with splicing HTML strings, which I haven't seen discussed much either.
It's another context problem; if your complex web application contains many "helper" functions, it
becomes very hard to keep track of which helper functions accept HTML and which accept text. For example, is
the following PHP function safe?

もうひとつ別の、more insidious な問題が HTML strings の splicing にありますが、これについては I haven't seen
discussed much either です。これは another context problem です。もしあなたの複雑な web アプリケーションが多
数の“ヘルパー”関数を抱えていたならば、どのヘルパー関数が HTML を受け付け、どのヘルパー関数がテキストを受け
付けるのかといったことを track し続けることがとても困難になってきます。一つ例を挙げましょう。次の PHP 関数は
安全でしょうか?


    function render_latest_topicslist() {
        $out = '';
        foreach(Forum::latestPosts(10) as $topic) {
            $link = render_link('forum/show/'.(int)$topic['id'], $topic['title']);
            $out .= "<li>{$link}</li>";
        }
        return "<ul id=\"latest-topics\">{$out}</ul>";
    }

This is (of course) a trick question. Consider:
(もちろん)これは trick question です。次の例で考えてみましょう。

    $dest = htmlspecialchars($dest, ENT_QUOTES, 'UTF-8');
    echo render_link($dest_url, "<span>Go to <em>{$dest}</em> directly.</span>");

Either this second example is wrong and the tags will come out literally (i.e., as &lt;span&gt;...&lt;/span&gt;
in the HTML source), or the first example was wrong and you have an injection bug. You can't tell without
consulting render_link's API documentation or implementation. With many helper procedures, how can you keep
track of which accept fully formed HTML and which escape their input? What happens when a function which
auto-encodes suddenly needs to be changed to accept HTML?

タグは literally に出力される (つまり、HTML ソースの中で &lt;span&gt;...&lt;/span&gt; とな
る) 二番目の例が間違っているか、あるいは最初の例が間違っていて injection bug を抱えているかです。render_link
API のドキュメントと実装を確認しない限りは (コードが安全かどうかの) 判断を下すことはできません。多数の helper
関数がある中で、どうすれば accept fully formed HTML と escape their input とを track し続けられるでしょうか?
auto-encode する関数が突然 accept する HTML に変更を加えなければならなくなったときに何が起こるでしょうか?


This style of programming results in ad-hoc security. You add escaping in just the right places, decided
on a case-by-case basis. This is unsafe by default; you must remember to escape, which makes it error-prone.
It's also hard to spot mistakes in this style. The alternative to ad-hoc security is structural security:
a style which makes it virtually impossible to write insecure code by accident, thus eliminating entire
classes of vulnerabilities.

このプログラミングスタイルの結果は ad-hoc security になります。それは case-by-case でエスケープを挿入すべき
正しい場所を決めて挿入するというものです。このやり方はデフォルトでは安全ではなく、あなたはコードを
error-prone にするためにエスケープすることを覚えておかなければなりません。このスタイルにおいて間違いを指摘
するのは難しくもあります。ad-hoc security の alternative は structural security です。この structural
security とは、間違って insecure なコードを書いてしまうことを事実上不可能にして valnerabiolities の要因その
ものを取り除いてしまうスタイルです。


For example, in PHP we could use the DOM library to represent an HTML tree:
たとえば PHP では HTML tree を表現するのに DOM ライブラリを使えます:


    function get_latest_topicslist($document) {
        $ul = $document->createElement('ul');
        $ul->setAttribute('id', 'latest-topics');

        foreach(Forum::latestPosts(10) as $topic) {
            $title = $document->createTextNode($topic['title']);
            $link = get_link($document, 'forum/show/'.(int)$topic['id'], $title);

            $li = $document->createElement('li');
            $li->appendChild($link);
            $ul->appendChild($li);
        }
        return $ul;
    }

And the second example:

    $contents = $document->createElement('span');
    $contents->appendChild($document->createTextNode('Go to '));
    $em = $document->createElement('em');
    $em->appendChild($document->createTextNode($dest));
    $contents->appendChild($em);
    $contents->appendChild($document->createTextNode(' directly.'));
    $link = get_link($document, $dest_url, $contents);


Unfortunately, this code is very verbose. The stuff that really matters gets lost in the noise of DOM
manipulation. The advantage is that this is safe; text content cannot influence the tree structure, since
the type of every function argument is enforced to be a DOM object and string contents are automatically
XML-encoded on output.

残念ながらこのコードはとても冗長です。コードの本当に重要な部分は DOM 操作のノイズの中に埋もれてしまっています。
このやり方の advantage はそれが安全だということです。すべての関数の引数の型は DOM オブジェクトであることを強
制されていて、また、string contents は自動的に XML-encode されて出力されるので text content が tree structure
に影響することはありません。


Language design to the rescue!

Language design can help a great deal to improve security. For example, domain-specific languages like SXML
and SSQL can save the programmer from having to remember to escape while writing most "normal",
day-to-day code. This frees precious brain cycles to think about more essential things, like the program's
purpose. Here's the example again, using SXML:

言語設計は security の改善に大いに貢献します。たとえば SXML や SSQL のようなドメイン特化言語は、大部分の
"normal" な日々のコードを書くときに escape のことを気にしなければならないわずらわしさからプログラマーを助
けます。そしてそれはプログラマーの目的のような more essential things について考えるための precious brain
cycles を解放します。再度、SXMLを使った例を挙げましょう:


    (define (latest-topics-list)
      `(ul (@ (id "latest-topics"))
           ,(map (lambda (topic)
                   `(li ,(make-link `("forum" "show" (alist-ref 'id topic))
                                    (alist-ref 'title topic)))))
                 (forum-latest-posts 10)))

And the second example:

    (make-link destination-url `(span "Go to " (em ,destination) " directly."))

This code is safe from XSS, like the PHP DOM example. However, this code is (to a Schemer) just as readable
as the naive PHP version. And, most importantly, the safety is achieved without any effort from the programmer.

このコードはPHPのDOMを使った例と同じくXSSに対しては安全です。しかしこのコードは naive PHP version のものと
同じくらい読みやすいのです (Schemerにとっては)。さらに、最も重要なことはプログラマーの any effort なしに安
全性が achieve されているということです。


This shows the immense safety and security advantages that can be gained from language design. Of course,
this isn't completely foolproof: We still need to ensure URIs used in href attributes have an allowed
scheme like http: or ftp: and not, say, javascript:. Note that input filtering and sanitation can help in
situations like these! Also, just like with automatic escaping, strings in sub-languages (like JS or CSS)
aren't automatically escaped. However, there is less "magic" involved; this is a representation
for HTML, so it's obvious that only HTML meta-characters will be encoded. If we're also using DSLs for
sub-languages, this auto-escaping effect can be nested, solving the "context problem" in a way
automatic escaping cannot.

この例は言語設計から得られる計り知れないほど大きな安全性とセキュリティ上のadvantage を showしています。も
ちろん、これは完全な foolproof ではありません。href 属性で使われている URI が http: や ftp: のように許可
されている scheme であるか、あるいは javascript: のような許可されていないものであるかの保証を行う必要があ
ります。入力のフィルタリングや sanitation はこのようなシチュエーションにおいて助けになりうることに注意し
てください! また automatic escaping を使うことだけでは (JS や CSS のような) sub-language 中の文字列が自動
的にescape されることはありません。しかし there is less "magic" involved です。これは representation for
HTML なので、HTMLのメタキャラクターだけが encode されるであろうことは明白なのです。sub-language のために
DSLを使った場合その auto-escaping effect はネストする可能性があり、automatic escaping 手法では対応できない
"context problem" を解決します。


SXML rewards programmers for writing safe code by making it look clean, concise, and easy to write. String
splicing looks ugly and verbose in Scheme. In plain PHP this looks clean and simple, while DOM manipulation
looks ugly. This subtly guides programmers into writing unsafe code. However, there are some PHP libraries
that make safe code look clean. For example, Drupal has a "Forms API". It's a little ugly, but
it's idiomatic in Drupal, which means code that uses it is considered cleaner than code that doesn't.
Facebook is another attractive target for attackers, so they had to come up with a structural solution.
Their solution is a language extension called XHP which adds native support for HTML literals.

SXML は見かけを clean にし、一貫性を持たせ、記述を容易にして安全にコードを書かせることによってプログラマー
に rewards します。string splicing は Scheme では ugly かつ verbose です。plain PHP での string splicing は
clean かつ simple に見えますが、DOM 操作は ugly に見えます。このことは usafe なコードを書いてしまうことに
プログラマーを subtly (微妙に、巧妙に) に導いてしまいます。しかし、安全なコードを clean に見せる PHP ライ
ブラリがいくつか存在しています。たとえば Drupal は "Forms API" なるものを持っています。これは little ugly
ですが Drupal の中では idiomatic なもので、それを使っていないコードよりもつかっているコードのほうがより
cleand であるとみなされるものです。Facebook は攻撃者にとっての another attractive target であったので、彼
らは structural solution を行わなければなりませんでした。彼らのとった solution は、HTML リテラルに対する
native support を付加するXHP という名称の language extension です。


These solutions are all specific to some codebase, not part of basic PHP. A framework or an existing
codebase has "default" libraries, but when writing from scratch most programmers prefer to use
what's available in the base language. This means a language should only include libraries that are safe
by default. Otherwise, alternative safe libraries have to compete with the standard ones, which is an
unfair disadvantage!

これらの solution は part of basic PHP ではなく、すべて some codebase に specific なものです。framework や
既存の codebase は“default”のライブラリを持っていますが、0 から書き始めた場合、大部分のプログラマーは
base language で利用可能なものを好みます。これはつまり、プログラミング言語は safe by default なライブラリ
だけを include すべきだということです。さもなければ、alternativle safe libraries は (unsafeな) standard
ライブラリと競争しなければならないという unfair な disadvandage を持ってしまいます。


Sidestepping the SQL injection problem entirely

Even though it's possible to write safe code in almost any language if you try hard enough, the basic design
of a language itself subtly influences how people will program in it by default. Consider the following
example, using the Ruby PG gem:

try hard enough すればおおよそどんな言語であっても safe code を書くことは可能ですが、言語自身の basic design
はユーザーが (by default で)プログラムをどのように書くかに微妙に影響を及ぼします。Ruby PG gem を使った次のコ
ードで考えてみましょう:

    # This code is vulnerable to SQL injection if the variables store user input
    res = db.query("SELECT first, last FROM users "
                   "WHERE login = '#{login}' "
                   "AND customer = '#{customer}' "
                   "AND department = '#{department}'")

Here we're using string interpolation, which is the expansion of variable names within a string. We saw this
before, in PHP, but in Ruby you can drop back to the full language, which makes the safe solution a little
easier to write:

ここでは文字列に含まれる変数名を(その名前に対応する変数の内容に)展開する string interpolation というものを
使っています。string interpolation は以前も PHP の例を出していますが、Ruby では drop back to  the full language
が可能で、safe solution を次のようにちょっとだけ簡単にできます:


    # This code is safe
    res = db.query("SELECT first, last FROM users "
                   "WHERE login = '#{db.escape_string(login)}' "
                   "AND customer = '#{db.escape_string(customer)}' "
                   "AND department = '#{db.escape_string(department)}'")


Still, it looks uglier than the first example.
しかしまだ最初の例よりも uglier に見えます。


The documentation says the escape_string method is considered deprecated. That's because sidestepping the
problem entirely is much smarter than escaping. This is done by passing the user-supplied values
completely separate ("out of band") from the SQL command string. This way, the data can't
possibly influence the structure of the command. They are kept separate even in the network protocol, so
it is enforced all the way up into the server. As an added bonus, this is only slightly more verbose than
the naive version:

ドキュメントには escape_string メソッドは deprecated (非推奨) であると書かれています。それは sidestepping the
problem entirely が escaping よりもずっと賢いものだからです。この問題に対する sidestepping はユーザーから渡さ
れた値を SQL のコマンド文字列とは完全に切り離 ("out of band"にする) して渡すことによって行われます。このやり
方ではデータはコマンドの構造に影響を及ぼす可能性がありません。ユーザーからのデータと SQL コマンド文字列とはネ
ットワークプロトコルにおいても分離され続けるので、サーバーにおいてこの手法が行われることが強制されるのです。
As an added bonus, this is only slighly more verbose than the naive version:


    # This code is even safer
    res = db.query("SELECT first, last FROM users "
                   "WHERE login = $1 AND customer = $2 AND department = $3",
	           [login, customer, department])


This scales only to about a dozen parameters. With more, it becomes hard to mentally map the correct parameter
to the correct position. A DSL can do this automatically for you. For example, Microsoft's LinqToSQL language
extension seems to do this. SSQL currently auto-escapes, but it could transparently be changed to use
positional parameters.

このやり方は a dozen parameters に対してしか scale しません。さらに多くの parameters に対しては、個々の
parameter を正しい位置に map することが mentally に難しいものになります。DSL はこれを自動的に行ってくれます。
たとえば Microsoft の LinqToSQL language extension はこれを行っているように思われます。SSQL は現状
auto-escape を行っていますが、positional parameters を使うよう transparetly に変更できるようです。



Pervasive (in)security through (bad) design
===========================================

I'm not a native English speaker, so I looked up the word "interpolation" on Merriam-Webster:
わたしは native の English speaker ではないので、"interpolation" という単語を Merram-Webster で調べてみました:

    interpolate, transitive verb:
    To alter or corrupt (as a text) by inserting new or foreign matter

To corrupt, indeed!

Interpolation of user-supplied strings is rarely correct, and it puts almost any conceivable safe API at
a disadvantage by making the wrong thing easier and shorter to write than the right thing. Beginners, unaware
of the security risks, will try to use this "neat feature". It's put in there for a reason, right?
Some people are trying to fix string interpolation, which is a noble goal but I wouldn't expect this to be
adopted as the "native" string interpolation mechanism in a language any time soon.

user-supplied strings の interpolation が正しいことは滅多になくて、right thing を書くことよりも wrong thing
を書くことの方を容易かつ簡潔にしてしまうことで safe API と考えられるものほぼ全てを不利な状況に置いてしまいます。
security risks に無頓着な初心者はこの“neat feature”を使おうとするでしょう。It's put in there for reason,
right? 一部には string interpolation を修正しようとする人たちもいました。それは noble goal ではありますが、そ
れがある言語において即座に“native”な string interpolation 機構に適用されることをわたしは期待していませんでし
た。


The Ruby examples show the importance of good documentation and library design. The docs pointed us in the
right direction by marking the escape_string method as deprecated. Its good design is more apparent when
contrasting it with the MySQL gem. This has no support for positional arguments in query, having only
escape_string and prepare. The latter allows you to pass parameters separately, but it conflates value
separation with statement caching and has an unwieldy API. Finally, the docs are quite sparse. Taken
together, this all gently nudges developers into the direction of string interpolation by making that the
easiest way to do it. Much of this is due to the design of MySQL's wire protocol, which dictates the API
of the C library, which in turn guides the design of "high-level" libraries built on top of it.

この Ruby での例は優れたドキュメント、優れたライブラリ設計の重要性を明らかにしています。このドキュメントは
escape_string メソッドを非推奨 (deprecated) にしたことによって right direction をわたしたちに示しています。
その優れた設計は MySQL gem と対比したときに一層明確なものとなります。MySQL gem は positional arguments を
サポートしておらず、escape_string と prepare だけをサポートしています。このうち後者の prepare ではパラメ
ータを separately に渡すことが可能ですが、これは value separation with statement caching を合成してしまう
もので、また、an unwiedly なAPIを持っているのです。結局のところドキュメントが quite sparse なのです。Taken
together, これはすべて string interpolation を最も容易な方法にしてしまうことによって、developer たちをそち
らの direction へ gently に nudge しています。この原因は主として、top で built されている“high-level”ラ
イブラリの設計を guideしている C ライブラリの API を dictates している MySQL の wire protocol の設計に起
因するものです。


I think high-level libraries should strive to abstract away unsafe or painful aspects of the lower levels.
For example, the Chicken MySQL-client egg emulates value separation:

わたしは high-level ライブラリは abstract away unsafe もしくは painful aspects of the lower levels を strive
すべきだと考えています。たとえば Chicken MySQL-client の egg は value separation を emulate しています:

    (conn (string-append "SELECT first, last FROM users "
                         "WHERE login = '?login' "
                         "AND customer = '?cust' "
                         "AND department = '?dept'")
          `((?login . ,login) (?cust . ,customer) (?dept . ,department)))


Ruby's MySQL gem could easily have done this, but they chose to restrict themselves to making a thin layer
which maps closely to the C library.

Ruby の MySQL gem は同じことを簡単に行えましたが、Cライブラリに closely に map する layer を薄くするために
選択されたのは restrict themselves でした。


Not all is lost with crappy libraries: Abstractions can solve such problems at an even higher level. Rails
can safely pass query parameters via Arel, in a database-independent way, even though MySQL is one of the
back-ends. This is true for SQLAlchemy, PDO and many others.

crappy なライブラリによってすべてが失われたわけではありません。抽象化はそのような問題をより高いレベルでさえ
解決可能です。Arel を経由することで Rails は MySQL が one of the back-ends であった場合でもデータベース独立
の方法で安全に query パラメータを渡すことが可能です。This is true for SQLAlchemy, PDO and may others.


Other examples その他の例
=========================

This section will show more examples of the same bug. They can all be structurally solved in two simple
ways: Automatic escaping (by using proper data structures) or passing data separately from the control
language. But let's start with one where this won't work :)

このセクションでは more examples of the same bug を show します。それらのバグはすべて、二つの単純な手法に
よって structurally に解決可能です。その二つの手法とは automatic escaping と passing data separately from
the control language です。But let's start with one where this won't work :)


Poisoned NUL bytes 汚染された NUL バイト
========================================

As you may know, strings in the C language are pointers to a character array terminated by a NUL (ASCII 0)
byte. Many other languages represent strings as a pointer plus a length, allowing NUL "characters"
to simply occur in strings, with no special meaning.

ご存じかもしれませんが、C というプログラミング言語での文字列は NULバイト(ASCII の0) で終端された文字配列に
対するポインターです。他の多くのプログラミング言語では文字列はポインターと長さを併せて表現しています。そし
て文字列中に NUL「文字」を置くことも許していて、特別な意味を NUL 文字に持たせてはいません。


This representational mismatch can be a problem when calling C functions from these languages. In many
cases, a C character array of the length of the string plus 1 is allocated, the string contents are copied
from the "native" string to the array and a NUL byte is inserted at the end. This causes a
reinterpretation of the string's value if it contains a NUL byte, which opens up a potential vulnerability to a "poisoned" NUL byte attack.

このような representational mismatch はそれらのプログラミング言語から C で記述された関数を呼び出すときに問
題となる可能性があります。よくあるのは、文字列の長さに1加えた長さを持ち、その内容は“native”文字列から配
列にコピーされたものでさらに NUL byte が終端に存在している C の character array です。これはその文字列の値
の reinterpretation を引き起こします。それが NUL byte を保持していたときに “poisoned”NUL byte attack に
対する vunerability の potential を opens up してしまうのです。


Let's look at a toy example in Chicken Scheme:
Chiken Scheme での toy example を見てみましょう


    (define greeting "hello\x00, world!")

    (define calculate-length-in-c
      (foreign-lambda int "strlen" c-string))

    (print "Scheme says: " (string-length greeting))
    (print "C says: " (calculate-length-in-c greeting))

As far as Scheme is concerned, the NUL byte is perfectly legal and the string's length is 14, but for C, the
string ends after hello, which makes for a length of 5. There is no way in C to "escape" NUL bytes,
and we can't sidestep it here, either. Our only option is to raise an error:


この NUL バイトは Scheme においては完全に合法なものであり、文字列の長さは 14 です。しかし C ではこの文字列は
hello の直後で終了していて、長さは 5となってしまいます。C では NUL バイトを「エスケープ」する手段はありません。
また、ここで sidestep もできませんので、わたしたちが取り得る選択肢は error を raise することだけなのです。

    Scheme says: 14
 
    Error: (##sys#make-c-string) cannot represent string with
    NUL bytes as C string: "hello\x00, world!"

This is a good example of structural security; it doesn't matter whether the programmer is caffeine-deprived,
on a tight deadline or simply unaware of this particular vulnerability. He or she is protected from
accidentally making this mistake because it's handled at the boundary between C and Scheme, which is exactly
where it should be handled.

これは structural security の good example です。プログラマーが caffeine-deprived であるかどうか、あるいは
tight な deadline 上にいるか、特定の valunerrability に simply unawared だったかどうかは重要ではありません。
プログラマーはこの mistake を誤って行ってしまうことから protect されています。なぜなら、handle を実行すべき
C と Scheme の境界で handle されているからです。


HTTP response splitting/Header injection
========================================

HTTP response splitting and HTTP header injection are two closely related attacks, based on a single
underlying weakness.

HTTP レスポンスの splitting と HTTPヘッダーのインジェクションは single underlying weakness に基づく互いに
密接に関連した二つの攻撃です。

The idea is simple: HTTP (response) headers are separated by a CRLF combination. If user input ends up in
a header (like in a Location header for a redirect), this can allow an attacker to split a header in two
by putting a separator in it. Let's say that http://example.com/foo gets redirected to
http://example.com/blabla?q=foo.

そのアイデアは単純です。HTTP (のレスポンス)ヘッダーは CRLF combination によって分割されていますが、もし
ユーザーの入力が(たとえばリダイレクトのための Location ヘッダーのような) ヘッダーの中で終わっていたならば、
それは攻撃者にヘッダーの分割を許すことになります。例として

    http://www.example.com/abc%0d%0aSet-Cookie%3a%20SESSION%3dwhatever-i-want

で考えてみましょう。

An attacker can trick someone (or their browser) into following this link (%0d%0a is an URI-encoded CRLF pair):

攻撃では誰か(もしくはその人が使っているブラウザー)を以下のリンクを使って引っかけることができます
(%0d%0a は URI-encode された CRLF ペアです)。


This could cause the victim's session cookie for example.com to be overwritten:

    Location: http://www.example.com/blabla?q=abc
    Set-Cookie: SESSION=whatever-i-want

This is a session fixation attack. For this particular bug, the real solution is of course to properly
percent-encode the destination URI, but the general solution can be as simple as disallowing newlines in the
header-setting mechanism (e.g., PHP does this since 5.1.2). Doing it in the only procedure which is capable of
emitting headers is a structurally secure approach, but it won't protect against all attacks.

これは session fixation attack です。この particular bug に対する real solution はもちろん desitnation URI を
適切に percent-encode することです。しかしその general solution は header-setting mechanism における newline
の disallowing にまで単純にすることができます (e.g., PHP は5.1.2からこれを行っています)。emitting headers が
capable な precedure でのみこれを行うことは structurally secure approach ですが、しかしそれはすべての攻撃を防
ぐというものではありません。

For example, even if we disallow newlines it is still possible to set a parameter (attribute) or a second
value for a header, splitting it with a semicolon or a comma, respectively:

たとえば newline を禁じたとしても、ヘッダーに対してセミコロンやカンマで分割されることが期待されるパラメータ
(や属性)や second value をセットすることはまだ可能です。

    Accept: text/html;q=0.5, text/{user-type}

If this is done unsafely, extra content-types can be added. They can even be given preference:

もしこれが unsafely に行われた場合、extra content-types が付加される可能性があります。
それには preference を与えることすら可能です。

    Accept: text/html;q=0.5, text/plain;q=0.1, application/octet-stream;q=1.0

Protecting against these sorts of attacks can only be done with libraries which know each header's syntax and
use rich objects to represent them. This approach is taken by intarweb and Guile's HTTP library, and is
similar to representing HTML as a (DOM) tree. I'm not aware of any other libraries which use fully parsed
"objects" to represent HTTP header values.

この種の攻撃から防御することは、それぞれのヘッダーの構文とそれらを表現するために rich object を使用している
ライブラリでのみ可能です。この approach は intarweb や Guile の HTTPライブラリで採用されている HTML を (DOM
の) tree として表現する手法と似ています。I'm not aware of any other libraries which use fully parsed
"objects" to represent HTTP header values.


Running subprocesses サブプロセスの実行
=======================================

For some reason, often people use a procedure like system() to invoke subprocesses. It's the most convenient
way to quickly run a program just like you would from the command line. It will pass this string to the Unix
shell, which expands globs ("wildcards") and runs the program:

いくつかの理由で、サブプロセスを起動するためにユーザーは system() のような手続きをしばしば使用します。それ
はコマンドラインで起動するときのようにプログラムをさっと実行するもっとも convenient な方法です。このやり方は
UNIX シェルにワイルドカードを展開した結果と実行するプログラムを文字列として渡します。


    (system (sprintf "echo \"~A\"" input))  ;; UNSAFE:   byebye files"; rm -rf / " 


Several languages have specialized syntax for invoking the shell and putting the output in a string using
backticks, e.g., `echo hi`. The really bad part is that string interpolation is supported within the backtick
operator, e.g., `echo Hi, "{$name}"`. This is dangerous because the shell is yet another interpreter
with its own language, and we've learned by now that we shouldn't embed user input directly into a sublanguage.
Here too, string interpolation makes the wrong thing very convenient, which increases the risk of abuse and
bugs. After all, spaces and quotes are perfectly legal inside filenames, but when used with unsafely
interpolated parameters, they will cause errors.

いくつかのプログラミング言語ではシェルを起動してその出力を文字列に put する `echo in` のような bakcticks を使
った specialized syntax を持っています。reall bad part なのは backtick operator の中での string interpolation、
つまり `echo Hi, "${name}"` のような記述をサポートしている点です。これはとても危険です。なぜならシェルは独自
の言語を持ったもう一つのインタープリターであり、また、わたしたちはここまでユーザーからの入力を直接
sublanguage に埋め込むべきではないことを学んでいるからです。ここでもまた string interpolation は wrong thing
を very convenient にして risk of abuse and bugs を増大させています。結局のところ、スペースやクォートをファイ
ル名に使うのが perfectly legal なのにもかかわらず、それを unsafely に interpolated されたパラメータと共に使っ
たときにエラーを引き起こしてしまうのです。


It is possible to escape shell arguments, but it's very tricky: no two shells provide exactly the same command
language with the same meta-characters. Is your /bin/sh really bash, dash, ash, ksh or something else? It is
even unspecified whether the sh used is /bin/sh.

シェルの引数をエスケープすることは可能ですが、それは非常に tricky です。なぜなら同一のメタキャラクターをサポ
ートする全く同じコマンド言語を提供する二つのシェルというものは存在しないからです。あなたの使っている /bin/sh
は実際には bash か dash か ash か ksh かはたまたそれ以外の何かだったりしませんか? さらに言えば、sh が /bin/sh
であるかさえも unspefied なことなのです

However, a better approach is often available. Many programming languages offer an interface to one or more
members of the POSIX exec() function family. These allow passing the arguments to the program in a separate
array, and they don't go through the shell to invoke the program at all. This is faster and a lot more
secure.

しかし大抵の場合はより優れたアプローチが可能です。多くのプログラミング言語では POSIX の exec() ファミリーの
one or more members に対する interface を offer しています。これらの関数では separate array でプログラムを
引数に渡すことが可能で、また、プログラムを起動するのにシェルを経由しません。高速であるとのと同時にずっと
secureなのです。


    (use posix)
    ;; Accepts a list of arguments:
    (process "echo" (list "Hello, " first-name " " last-name))


By sidestepping the problem we've made it simpler, shorter than the system call above and safer, which
is our goal. In languages with string interpolation this will probably be slightly more verbose than
the system() version.

問題を sidestepping することにより前述のシステムコールを使ったものよりも単純で短く、さらにわたしたちの目
標であった安全なものにもしました。string interpolation を備えた言語ではこれはおそらく system() バージョン
よりも slightly more verbose なものになってしまうでしょう。


There is one small problem: by eliminating a call to the shell, we've also lost the ability to easily
construct pipelines. This can be done by calling several procedures, but this is way more complicated
than it is in the shell language. The obvious solution to that is to design a safe DSL. This is what the
Scheme Shell does with its notation for UNIX pipelines:

小さな問題が一つあります。call to the shell を eliminate することにより、わたしたちは pipeline を簡単に構
築できる ability も失ってしまうのです。pipleline の構築は calling several procedures によって行えますが、
これは shell 言語でやるよりもずっと複雑なやり方です。これに対する obvious solution が安全な DSL を設計す
るというものです。これは UNIX piplelines のための記法を使って Scheme Shell が行っていることです。


    ;; This will echo back the input, regardless of "special" characters
    (define output (run/string (| (echo input) (caesar 5) (caesar 21))))
    (display output)

Almost as convenient as the backtick operator, but without its dangers.
これは backtick 演算子とほぼ同じ利便性ですが、危険ではありません。



Summary まとめ
==============

Language design can help us write applications which are structurally secure. We should strive to make
writing the right thing easier than the wrong thing so that even naively written, quick and dirty code
has some chance of being safe. To reach this goal, we can use the following approaches, in roughly
decreasing order of safety:

プログラミング言語の設計はわたしたちが structurally secure なアプリケーションを記述するのを支援します。わ
たしたちはたとえ naivelyに書かれたコードであるとか quick and dirty なコードであっても some chance of being
safe があるような wrong thing を書いてしまうことよりも、right thing を書くことを容易にするように努力すべき
です。この目標に到達するために、わたしたちは次に挙げるようなアプローチがとれます。以下のものは安全度が高い
ものから並んでいます:


    "Sidestep" the issue by keeping data separated from commands.
    データをコマンドから隔離することにより問題を“sidestep”する


    Represent data in proper data structures, not strings. On output, escape where needed.
    データを文字列ではなく適切なデータ構造で表現する。出力時には必要なところでエスケープを行う。

    Use "safe buffers" which auto-escape concatenated strings.
    連結された文字列を自動エスケープする“safe-buffer”を使う


    If escaping or separation is impossible, raise an error on bad data.
    もしエスケープもseparation もできないのなら、error on bad data を raise する


    If all else fails you can escape manually, but use coding conventions that make unsafe code stand out.
    もし上記のことをすべてに失敗しても手作業でのエスケープができる。ただし、unsafe なコードを目立たせるよ
    うなコーディング規約を用いること


These approaches are your first line of defense. Besides using these, you should also filter and sanitize
your input. Just don't mistake that as the fix for injection vulnerabilities!

これらのアプローチは defense の first line です。これらのアプローチを用いる以外にも、入力に対するフィルタ
リングや sanitize を行うべきです。Just don't mistake that as the fix for injection vulnerabilities!


This is the positive advice I can give you. The negative advice is simply to avoid building language or
library features which make unsafe code easier to write than safe code. An example of such a feature is
string interpolation, which causes more harm than good.

以上はわたしができる positive advice です。negative adivce は単に言語やライブラリの、安全なコードを記述する
よりも安全でないコードを容易に作り出してしまうような機能を building してしまうことを避けなさいというものです。
そういった機能の例が、利益を得るよりも害を受けることの方が多い string interpolation です。

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License.

All code fragments on this site are hereby put in the public domain.

2013年06月23日

■_

オフィス北極星電書になってないんか喃。ホーム｜講談社コミックプラスこの辺で手を打つかしらん。 Amazon.co.jp: こちらもどうぞ: オフィス北極星全10巻完結(モーニングKC) [マーケットプレイスコミックセット] 東周英雄伝もか。

とある本を読んでいましたらパラダイムシフト(発想の転換) とかいう記述があって一気に読む気が失せたという。いくら何でも違うだろう…関係ないとは言わんけども。

とある100円ショップで。ポロシャツの襟立て+セカンドバックというバブル期そのまんまの出で立ちの男性が店員をつかまえてなにやら探している商品がその店にないことについてぐだぐだぶーたれていて、トドメが「もっと勉強しといた方が良いよ」って。なんつーか笑えた。

高層マンションがぽこぽこ建設されている武蔵小杉ですが、駅近くにあった JA (JA共済?) の建物が移転して真新しい建物に…ってあの辺もう耕作地とかないんじゃないの? (川沿いに元住吉の方に行けばまだあったかな) JA共済とは - コトバンク JAバンク - Wikipedia

■_

なんか気になった新刊・近刊。まあいちまんえんオーバーのはそうそう手が出せませんが○|￣|＿

■_

よーやっと http://www.iiitd.edu.in/~jalote/papers/CommonBugs.pdf を読んだ。が、ほぼ予想の範疇で意外なものというのはなかったかなあ。あと、LL やらきちんとした高水準言語ならこれは大丈夫だろうというのも。 coding standard (F-35 のアレで出てきた)と絡めて話をしてみたいんだけどあんま興味ある人いなさそうなのよねえ (「発表したい」とかゆー話ではありません)。 common bugs in programming - Google 検索

■_

Tech Thoughts: First steps with Hy

Tech Thoughts: First steps with Hy

Hy, write Python in Lisp

Hy is a dialect of Lisp that compiles immediately into a Python Abstract Syntax Tree. This allows the developer
to take advantage of the Lisp (lack of) syntax while putting all the powerful batteries included in Python at
her disposal.

I first got interested in Lisp when I started working my way through SICP. Among the many wonderful thing this
book introduced me to, was the language used throughout, Scheme. Scheme is amazing, but being minimal by
definition, it can get pretty tedious to get any non-trivial work done. On the other hand, I'm pretty familiar
with Python, its libraries and its native types. Having the ability to call these directly from a Lisp code has
immediately caught my attention.

In this article, I want to detail the first steps I do in Hy. We will work through a simple example, yet adding
some (minor) complexity that will show us how to leverage the awesomeness of Hy. I will assume prior
programming knowledge from my readers, as well as some familiarity with the Lisp parens-and-prefix notation (If
you don't, Wikipedia has a good article on it). Obviously, I'm also assuming prior familiarity with Python.

なんか面白そうと思ってリンク辿ってもソースのリポジトリもアーカイブも見あたらないので変だなあと思ったら

Tech Thoughts: First steps with Hy

A few things to keep in mind:

    The project is very young. Do not deploy this in production. Yet.
    Check out the documentation. It's young but good enough to get you started quickly.
    I don't know about other editors, but emacs users, you're going to need this to check out this one.

ぐはっ。

■_

author に$5.00ほど行くようにpayしてみた。 Lisp, the Universe and Everything: Free "Lisp Hackers" Ebook

■_ みっつの

~~しもべに命令~~

.NET Memory Profiling in Visual Studio 2013

The three most common mistakes made in .NET programming are:

    Memory leaks, usually via delegates or event handlers that weren’t properly released.
    Inefficient memory use, basically just holding onto more memory than necessary.
    Unnecessary allocations, a subtle problem that can become quite expensive over time.

Java だとどうなんだろうか。 Scala や Clojure、Groovy 辺りの JVM言語も含めたら。とか。

■_

2013年06月22日

■_

某方面でそこそこ権威のある資格試験の受験料を会社が負担する。但し合格したときのみ。そして合格しても給料に影響なし(昇給もないし別途手当が出るわけでもない)。さあこれで合格者(受験者)は増えるでしょうか?

ムダの効用みたいな感じでなにかこう書きたいものがあるのですが(ry

花屋で移植ポットに植えられていてしかも花まで咲いているひまわりを見かけてびっくりしたんですがそういうのあるんですねえ。いつ頃からだろう。ひまわり鉢植え - Google 検索種まき初心者のガーデニング～花別種まき育苗方法（ミニヒマワリ）～ミニひまわり小夏｜商品情報いろいろ検索｜タネ・苗・園芸用品・農業用資材の総合案内：サカタのタネ Amazon.co.jp: ミニヒマワリ小夏　サカタのヒマワリ種です: DIY・工具しかし、いくら何でも小さすぎではなかろうかという気が…

Bankers' rounding ってのがありますやね。日本語だと「銀行家の～」「銀行員の～」「銀行型～」ということになると思うんですがさすがに「銀行丸め」はないんじゃないかと… 銀行丸め - Google 検索「銀行の丸め」だったらありかなあ。

半額(apressは40%オフだっけか)セールの対象になってくんないかなあ Moving from C to C++

■_ Lambda cube

lambda cube - Google 検索

ラムダ・キューブ - Wikipedia Lambda cube - Wikipedia, the free encyclopedia Lambda Cube and programming languages | Lambda the Ultimate

■_

SaaS：スラムダンク・アズ・ア・スタートアップ
Business Media 誠：外食産業のバイトが足りない……各社はあの手この手で知恵を絞る (1/3)
clang+llvmでさりげなくすごいコードが生成されていた話。 - 組み込みの人。
これからのコードを書く者の技量として、どう書くとどう最適化が効くか、いかに最適化を邪魔しないようにコードを書くかということが問われるのかなと思いました。
つぎのようなC風のプログラムの値渡し参照渡し名前渡しのやり方を教えていた... - Yahoo!知恵袋
なえなえ | 普通に地デジチューナーを売る→一週間ぐらい毎日設定出来ないから来いって問い合わせで仕事にならない...
Buying Lisp Hackers by Vsevolod Dyomkin [Leanpub PDF/iPad/Kindle]
末期がん激痛の原因突き止めた…福岡大チーム(読売新聞) - 注目の社会トピックス - 国内 - livedoor ニュース
wishの意味とはー wishと過去分詞を一緒に使うと後悔の気持ちを英語で表せます | 英語 with Luke
(* programming -> girl -> ? *) | なぜ初心者でも関数型にトライしてみるべきか
2013/06/22 『レガシーコード改善ガイド』討論・品評会 #1 #lckaizen - Togetter
［虎四ミーティング］　駒田徳広（プロ野球解説者）＜後編＞「強運のプロ野球人生」 | スポーツ | 現代スポーツ | 現代ビジネス [講談社]
ただ、あの時期の迷いがあったからこそ、その後の自分があったのだと思いますね。自分にとって何が合っていて、何がダメなのかがわかりましたし、引き出しを増やすこともできたのかなと。

@node Dynamic Extensions
@chapter Writing Extensions for @command{gawk}

It is possible to add new functions written in C or C++ to @command{gawk} using
dynamically loaded libraries. This facility is available on systems
that support the C @code{dlopen()} and @code{dlsym()}
functions.  This @value{CHAPTER} describes how to create extensions
using code written in C or C++.

If you don't know anything about C programming, you can safely skip this
@value{CHAPTER}, although you may wish to review the documentation on the
extensions that come with @command{gawk} (@pxref{Extension Samples}),
and the information on the @code{gawkextlib} project (@pxref{gawkextlib}).
The sample extensions are automatically built and installed when
@command{gawk} is.

@quotation NOTE
When @option{--sandbox} is specified, extensions are disabled
(@pxref{Options}).
@end quotation

@menu
* Extension Intro::             What is an extension.
* Plugin License::              A note about licensing.
* Extension Mechanism Outline:: An outline of how it works.
* Extension API Description::   A full description of the API.
* Finding Extensions::          How @command{gawk} finds compiled extensions.
* Extension Example::           Example C code for an extension.
* Extension Samples::           The sample extensions that ship with
                                @code{gawk}.
* gawkextlib::                  The @code{gawkextlib} project.
@end menu

@node Extension Intro
@section Introduction

@node Extension API Description
@section API Description

This (rather large) @value{SECTION} describes the API in detail.

@menu
* Extension API Functions Introduction:: Introduction to the API functions.
* General Data Types::                   The data types.
* Requesting Values::                    How to get a value.
* Constructor Functions::                Functions for creating values.
* Registration Functions::               Functions to register things with
                                         @command{gawk}.
* Printing Messages::                    Functions for printing messages.
* Updating @code{ERRNO}::                Functions for updating @code{ERRNO}.
* Accessing Parameters::                 Functions for accessing parameters.
* Symbol Table Access::                  Functions for accessing global
                                         variables.
* Array Manipulation::                   Functions for working with arrays.
* Extension API Variables::              Variables provided by the API.
* Extension API Boilerplate::            Boilerplate code for using the API.
@end menu

@node Extension API Functions Introduction
@subsection Introduction

だいぶ力が入っているっぽい。

Bug #8552: ifやwhile構文の条件式中に多重代入文を入れると正しく解釈されません - ruby-trunk - Ruby Issue Tracking System
「人材採用に一貫性がない」ことがビッグデータで明らかに--グーグルがNYTに語る - CNET Japan
株式会社を退職しませんでした。 - ライフ・イズ・カルアミルク
今村博宣 - 深圳のお月様偶然、Twitterで生産現場の写真を見ちゃったんだけど・・・...
Scala: 空飛ぶサンドイッチのパーツ | eed3si9n
もし Scala をリアルワールドな制約である Java 的な親しみやすさや互換性を無視してワンダーランド的な設定でサブセットを作ったらどうなるだろうかと考えている。
Business Media 誠：牛丼並盛280円、その価格戦争に戦略的見通しはあるか