ときどきの雑記帖 迷宮編


一つ前へ 2011年 5月(上旬)
一つ後へ 2011年5月(下旬)




毎日一回は傍若無人な運転の自転車にぶつけられそうになってるような気がする。 一遍くらい反撃したい。

最初の三国志(光栄のゲームね)のマニュアルに、 「このゲームはC言語を使って開発されました」といった感じの文章が載ってたような 記憶がかすかにあるんだけどどうだったけか。


日本語むずい。 校閲と校正の違い 校閲 とは - コトバンク 査読 とは - コトバンク

■_ × 1.25

2011-05-19 - はてなるせだいあり

    <nurse> http://perldoc.jp/docs/perl/5.14.0/perl5140delta.pod#String32appending32is3210032times32faster

    <nurse> Unicodeの大文字小文字とか結構変更入ってるんだなぁ、Perl 5.14

    <nurse> 「文字列追加は 100 倍速くなりました」へぇ

    <unak> それはすごい

    <unak> 具体的に何をやったんだろ。

    <unak> ふむふむ

    <unak> なるほど、等比級数か。

    <unak> うん、ぜんぜんわからん

    {unak} minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;

    <unak> これかなあ?

    <unak> ソース取ってきてgrepしないとちと探しにくいな。

    <eban> +10ってなんだろうな

    <unak> SvLEN()はたぶんcapacityを返し、SvCUR()は現在の実際の使用長を返す、のだと思う。

    {unak} http://www.nntp.perl.org/group/perl.perl5.porters/2010/08/msg162611.html

    <unak> ふむ。

    <unak> 1/4にして10足してるわけだ。

    <eban> 1.25倍だとなんか都合が悪かったのか

    <eban> https://groups.google.com/group/perl.perl5.porters/browse_thread/thread/e383d31f079f6149

    <eban> こっちのほうがいいかな

    <eban> +10は

    <eban> That's similar to my analysis. I think it's more useful to round up to a

    <eban> multiple of the pointer size for small values.

1.25を掛ける。だと浮動小数点演算が出てきちゃうけど、 2ビット右シフトしたのを足す。だと整数だけでできるから。 かなあと思ったけど、そこまで効いてくるところかなあ。

■_ As a developer, my biggest gripe is non-standardize data from Excel.

As a developer, my biggest gripe is non-standardize data from Excel. Just found out about Google Refine, which seems to have a great system to fix that. : programming 結構伸びているなあと思ってみたら、わりと面白そう。 というか自分も恩恵にあずかれそう?

google-refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) - Google Project Hosting

Google Refine is a power tool for working with messy data, cleaning it up, transforming
it from one format into another, extending it with web services, and linking it to
databases like Freebase.





intro/p6-regex-intro.pod at master from perlpilot/perl6-docs - GitHub

Introduction to Perl 6 Regex


Over the years programming languages have incorporated features for regular expressions.
Some, such as Javascript, have added syntax specifically to support regular expressions.
Others, such as PHP, have just reused their native string type and utilize special
subroutines to parse strings as regular expressions. But one thing almost all of them
have in common is that they have mimicked the extended regular expression syntax of Perl.

追加しました。別の、PHPのような言語では native な文字列型を再利用して
その文字列を正規表現として解析するような特殊なサブルーチンを utilize しました。
Perl の拡張正規表現構文をまねています。

Of course, Perl wasn't the first programming language to have support for regular 
expressions. But it did make them popular. Perl has been so successful as a text 
processing and glue language and regular expressions so well interwoven into the 
language that anyone who uses Perl almost has to learn regular expressions. Also, by 
applying some of Perl's philosophy to regular expressions, common usages became easy 
and complex usages became possible. Here are just a few features that resulted: 
character class shortcuts, annotated regular expressions, ability to match unicode 
properties, zero-width assertions, independant subexpressions, and code execution 
inside of a regular expression.

もちろん、Perl は正規表現をサポートした最初のプログラミング言語ではありませんが、
正規表現を一般的なものにしました。Perl はテキスト処理と糊付けの言語として正規表現を
zero-width 表明、独立部分式、正規表現内に置かれたコードの実行といったものが

Unfortunately, as the regular expressioning public put more demand on Perl's regular 
expression syntax, it accumulated some crufty items-- little inconsistencies that were 
to maintain backward compatibility or were introduced because they were needed, but 
before they were fully thought out. In designing Perl 6, Larry Wall not only looked at 
the syntax and semantics of Perl proper, but he also took a hard look at the 
sub-language that is regular expressions and refactored it into something that makes 
better sense.

Perl 6の設計にあたり、Larry Wall はPerlの構文やセマンティクスを見るだけでなく
サブ言語として正規表現を hard look してより良いものにするために

In this article I'm going to give an introduction to Perl 6 regex (we call them "regex"
to maintain the historical association with regular expressions though they've strayed
quite far from the mathematical sense of regular languages). I'll point out differences
from Perl 5 syntax but no knowledge of Perl 5's regular expression syntax should be
necessary to understand this document. If you're a Perl 5 geek, you may be bored for
a while, but read anyway so that you can pick up the syntactic and semantic differences.

本 article では、Perl 6 の正規表現の説明をします
Perl 5での構文との違いを指摘しますが、理解するのにPerl 5の正規表現構文の
知識は必要ないようにします。あなたが Perl 5 geekであるのなら、多少退屈することも


Is R an ideal language to teach the fundamentals of programming to beginners? | (R news & tutorials)

Is R an ideal language to teach the fundamentals of programming to beginners?
R は初心者に対してプログラミングの基礎を教えるときための理想的な言語(ideal language)か?

May 6, 2011
By hayward

I'm helping out some colleagues learn programming from having zero experience with it in any
shape or form. It's quite a daunting task in some senses, because, well, it may not be easy!
They are researchers, so they'll need it for processing data and generating output, and
perhaps processing BIG DATA at some point too.

からです! 彼らは researcher だったのでデータを処理して出力を生成する必要があったし、そ

After some debate about the best way to go ahead, I've settled with R as being my weapon of
choice to train these lucky individuals. The choices were as follows ? note that I don't know
that many programming languages, so it's not a huge list. I thought it would be worth sharing
the pros and cons of each.

進むべき方向について多少の議論をしたあと、わたしは R がそういった幸運な個々人 (lucky
individuals ) を train (養成する、教える) するための my weapon として選択するという結論
わたしはそれぞれの pros and cons (良い点と悪い点、長所と短所) を共有することに価値がある


Pros: Dead easy to use. Nice and easy integration with databases which can be used to
deal with data processing. Can be extended to, for example, generate images (a plus
for these people who study visual cognition, so often need to make pretty pictures to
show to participants in experiments). There's also an immense number of tutorials and
guides on the net, and people who aren't into research can help you out just by
knowing their PHP.

データ処理に使えるデータベースを nice and easy に integration している。
(a plus for these people who study visual cognition,
so often need to make pretty pictures to show to participants in experiments)。
research 方面に行かない人があなたをその PHP の知識で助けてくれるかもしれない。

Cons: Probably overkill. Running a webserver all the time can be a pain, even if XAMPP
is used. It's not easy (or even possible, as far as I am aware) to run statistical
tests using PHP or any classes that can be added in.

おそらくオーバーキル。web サーバーを実行するのは、XAMPP が使われている場合でも苦痛な可能性
がある。PHP を使って statistical test を実行したり、それができるような何らかのクラスを追加
するのは簡単ではない (わたしの知る限りでは可能ですらない)。


Pros: Forces users to write clean code, and again it's very easy to use. Possible to
integrate with databases to churn through datasets. Like PHP, it can be used to
generate images for use in experiments (pygame), and again there are plenty of
examples and tutorials. Plenty of extensions to do stats and plot graphs (NumPy and
Matplotlib). Oh, and it's named after Monty Python. Ni.

ータベースと chrun through datasets を integrate するのが可能である。PHP と同様、実験
で使用するためのイメージを生成するのに使用できる (pygame)。また、たくさんの例やチュー
トリアルがある。統計を行ったりグラフをプロットする (NumPy や Matplotlib のような)たく
さんの拡張 (extensions) があるおっと、そうそう。名前は Monty Python からつけられたもの

Cons: again, probably overkill. Forcing people to worry about indentation can get
horribly confusing when they are barely aware of what they are doing, and they can get
tripped up. Just a personal issue I guess, but I've not quite managed to get to grips
with OOP in python. Maybe that's because I did it first in PHP and never could do more
than crash my computer when trying to learn Java. Ho hum.

繰り返しになるが、おそらく overkill である。インデントに気を使うことを強制することは自
単にわたし個人の問題だとは思うのだが、Python を使って OOP を理解させることは全くと言っ
ていいほど行わなかった。おそらくそれは、わたしが最初に使ったのが PHP であり、Java を学
Ho hum.


Pros: Easy syntax, and its power is growing with the new HTML 5 specifications. I mention it
because I recently saw this illustration of basic programming and it seemed worth considering.
There's no need to compile anything which is often good for beginners too.

簡単な構文と、そのパワーは新しい HTML specifications  ともに成長している。これを挙げた
のは、最近この言語の illustration of basic programming を目にして、考慮するに値するの

Cons: not really intended for churning big datasets and the kind of things I have in
mind. Quite a bit of the decent libraries out there need to be paid for to be used.
本来、大規模データセットの churning だとかわたしが想定していたもの向けを意図していた
Quite a bit of the decent libraries out there need to be paid for to be used.


Pros: syntax is very simple, with few gotchas present in other languages (e.g., ending
lines with a semicolon or forcing tabs in lines and so on). As it's loosely typed,
this can be both a blessing and a curse. It's a blessing because users don't have to
worry about declaring variables. It's a curse because they can slip into bad habits
and not understand variable types properly. Oh, and I don't need to say that it can
work on all sorts of databases, churn through data very rapidly, generate images, run
statistical tests and plot graphs that are of publication quality.

などの) gotchas もほとんどない。その緩やかな型付けのように、このことは祝福 (blessing) 
にも 呪い (curse) にもなりうる。ユーザーが変数の宣言について思い悩む必要のないことは祝
福であり、悪習慣に染まったり (slip into bad habits) 変数の型を適切に理解しなくなったり
する可能性のあることは呪いである。そして、この言語は all sorts of databases、
very rapidly な churn through data、イメージの生成、statistical tests の実行、
グラフのプロットといったものを publication quaility で行えるのはいうまでもない。

Cons: Had to really think about this, but I guess that R is a nightmare to google for
any kind of help when you're stuck. I think it's a fundamental issue relating to the
fact that calling something a letter of the alphabet probably doesn't help SEO
rankings all that much. The official documentation would benefit from being a bit more
like the PHP documentation (though maybe there is a site like that for R, I've just
not found it), with users able to comment and give better examples than those provided
initially. That being said, there are more blogs on R than you can shake even a very
large proverbial stick at, which more than make up for it. I always search the
legendary R-bloggers.com search box before googling anything to do with R now. I've
never had to look any further than that.

Had to really think about this,
しかしわたしは、R とはあなたが困りごとがあったときになんらかの助けを求めてぐぐるときの
のはおそらく SEO ランキングには寄与しないであろうということについての fundamental 
issue であると考えている。公式ドキュメントは、ユーザーがコメントをつけたりより良い例を
提供できるようにすることで、初期に提供したものよりももっと PHP のドキュメント (though 
maybe there is a site like that for R, I've just not found it) のようになって利益を得る
だろう。とはいうものの (That being said,)、
R についての blog はさらにたくさんあって
than you can shake even a very large proverbial stick at, which more than make up for it.
わたしはいつも、なにかに R を使うときにはぐぐるより前に
legendary R-bloggers.com search box を探している。
I've never had to look any further than that.

Is R an ideal language to teach the fundamentals of programming to beginners?

I think the answer is “yes”. The beginners I have in mind are researchers and have
specific needs regarding data processing, and it would benefit them to learn how to
run stats in R, opening up future possibilities as well (e.g., LMEs). I've not
mentioned Matlab, which I know is a favourite for researchers, because (1) it's a
gigantic monster to download and install, (2) I don't know it that well and (3) it's
prohibitively expensive. I was also tempted to evaluate the use of LOLCODE to see if
there was any mileage in using it (“IM IN YR LOOP UPPIN YR VAR TIL BOTH SAEM VAR”).

beginners とは researchers であると同時にデータ処理に関する specific needs を持っており、
さらに R を使って run stats するために R を学ぶことで恩恵を受ける人たちであり、
opening up future possibilities as well (e.g., LMEs).
わたしは researcher たちに人気のある Matlab に言及しなかったが、それは
(1) Matlab はダウンロードしてインストールするには gigantic monstaer であり
(2) わたしが Matlab のことをよく知らず、さらに
(3) とんでもなく高価である
I was also tempted to evaluate the use of LOLCODE to see
if there was any mileage in using it

I myself first dabbled in programming back when I had a Sinclair back in the old days,
and we did some very basic BASIC at primary school. Later on, I used BASIC to make
emulators that mimicked my friends' phrases and behaviour. Some of them were spot on!
I guess I've always been trying to model human behaviour. I'll post up the material I
use to teach my colleagues to help them out and have a permanent copy of the material
we go through.

わたし自身が最初にプログラミングで悪戦苦闘したのは昔々に Sinclair を手に入れたときのこ
とで、primary school でやった BASIC の非常に基本的なものでした。後にわたしは、友人の 
phrases や 行動を mimick するエミュレーターを作るために BASIC を使いはじめたのでした。
Some of them were spot on!
わたしは常に人間の行動 (human beahvior) をモデル化しようとしていたのではないかと思います。
I'll post up the material
I use to teach my colleagues to help them out
and have a permanent copy of the material we go through.

That's it for now, please feel free to share any other languages you may have found to
be good for beginners. I'm sure there are some things that I have missed.

That's it for now,




かみちゅ! 大全ちゅ~!


よくわかる現代魔法 2 ガーベージコレクター (集英社スーパーダッシュ文庫)

How Garbage Collection differs in the three big JVMs Application Performance, Scalability and Architecture ? The
dynaTrace Blog

How Garbage Collection differs in the three big JVMs
三大 JVM でガーベジコレクションはどのように異なっているのか

by Michael Kopp, May 11, 11

Most articles about Garbage Collection ignore the fact that the Sun Hotspot JVM is not
the only game in town. In fact whenever you have to work with either IBM WebSphere or
Oracle WebLogic you will run on a different runtime. While the concept of Garbage
Collection is the same, the implementation is not and neither are the default settings
or how to tune it. This often leads to unexpected problems when running the first load
tests or in the worst case when going live. So let's look at the different JVMs, what
makes them unique and how to ensure that Garbage Collection is running smooth.

ガーベジコレクションについての articles のほとんどがSun の Hotspot JVM が the only 
game in town でないことを無視しています実際には、IBM の WebSphere や Oracle の 
WebLogic を使ったときはいつでも異なるランタイムの上で実行することになるのです。ガーベ
ジコレクションの concept が同じであってもその実装は同じではないし、デフォルトのセッテ

The Garbage Collection ergonomics of the Sun Hotspot JVM
Sun Hotspot JVM のガーベジコレクションの使い勝手

Everybody believes to know how Garbage Collection works in the Sun Hotspot JVM, but
lets take a closer look for the purpose of reference.

誰もが Sun のHostspot JVM でガーベジコレクションがどのように動作しているかについて
知っていると信じていますが、reference のため詳細に見てみることにしましょう。

The memory model of the Sun Hotspot JVM
Sun Hotspot JVM のメモリモデル

The Generational Heap

The Hotspot JVM is always using a Generational Heap. Objects are first allocated in
the young generation, specifically in the Eden area. Whenever the Eden space is full a
young generation garbage collection is triggered. This will copy the few remaining
live objects into the empty survivor space. In addition objects that have been copied
to Survivor in the previous garbage collection will be checked and the live ones will
be copied as well. The result is that objects only exist in one survivor, while eden
and the other survivor is empty. This form of Garbage Collection is called copy
collection. It is fast as long as nearly all objects have died. In addition allocation
is always fast because no fragmentation occurs. Objects that survive a couple of
garbage collections are considered old and are promoted into the Tenured/Old space.

Hostspot JVM は常に Generational Heap を使っています。オブジェクトはまず young 
generation、その中でも Eden area に割り付けられます。Eden spcae が一杯であった場合には 
young generation ガーベジコレクションが引き起こされます。このとき、少数の生き残ってい
るオブジェクトは empty suvivor space へコピーされます。さらに、前回のガーベジコレクシ
ョンで Survivor にコピーされていたオブジェクトがチェックされ、そのうちの生きているもの
が同様にコピーされます。その結果、オブジェクトはひとつの survivor にのみ存在するように
なり、eden と他の survivor は空になります。このような形式のガーベジコレクションは copy 
collection と呼ばれます。このやり方はほぼすべてのオブジェクトが死んでしまっている場合
回のガーベジコレクションを生き延びたオブジェクトは old と見なされ、Tenured/Old space へ
と promote されます。

Tenured Generation GCs

The Mark and Sweep algorithms used in the Tenured space are different because they do
not copy objects. As we have seen in one of my previous posts garbage collection takes
longer the more objects are alive. Consequently GC runs in tenured are nearly always
expensive which is why we want to avoid them. In order to avoid GCs we need to ensure
that objects are only copied from Young to Old when they are permanent and in addition
ensure that the tenured does not run full. Therefore generation sizing is the single
most important optimization for the GC in the Hotspot JVM. If we cannot prevent
objects from being copied to Tenured space once in a while we can use the Concurrent
Mark and Sweep algorithm which collects objects concurrent to the application.

Tenured space で用いられるマークアンドスイープアルゴリズムはオブジェクトをコピーしない
くのオブジェクトが生きているときには時間がかかります。結果として、GC を tenured で実行
GC を排除するためにオブジェクトは、オブジェクトが permanent でありさらに tenured が一
杯でないことが保証されているときにYoung から Old にのみコピーされることを保証すること
が必要となります。したがって、generation sizing は Hotspot JVM における GC に対する
single most important optimization です。
Tenured spaceへオブジェクトがコピーされるのを防げなければ、
アプリケーションと並列にオブジェクトを collect する

Comparison of the different Garbage Collector Strategies

While that shortens the suspensions it does not prevent them and they will occur more
frequently. The Tenured space also suffers from another problem, fragmentation.
Fragmentation leads to slower allocation, longer sweep phases and eventually out of
memory errors when the holes get too small for big objects.

suspensions を短くすることは GC を阻害しない上に、より頻繁に GC が実行されるようになる
でしょう。Tenured space もまた別の問題、断片化 (fragmentation) を抱えています。断片化
ブジェクトに対して小さすぎる場合に out of memory エラーにつながります。

Java Heap before and after compacting
compacting 前後の Java のヒープ

This is remedied by a compacting phase. The serial and parallel compacting GC perform
compaction for every GC run in the Tenured space. Important to note is that, while the
parallel GC performs compacting every time, it does not compact the whole Tenured heap
but just the area that is worth the effort. Worth the effort means when the heap has
reached a certain level of fragmentation. In contrast, the Concurrent Mark and Sweep
does not compact at all. Once objects cannot be allocated anymore a serial major GC is
triggered. When choosing the concurrent mark and sweep strategy we have to be aware of
that side affect.

これは compacting phase によって改善されます。serial compacting GC と parallel 
compacting GC はTenured space 上で実行される GC すべてに対して compaction を実行します。
注意しておくべき重要なことは、parallel GC が毎回 compacting を実行するけれどもそれは 
Tenured heap 全体を compact するのではなくcompact を実行する価値のある領域に対してだけ
いうことです。対照的に、並行マークアンドスイープではすべてを compact するわけではありません。
もはやオブジェクトを割り当てできなくなると、serial major GC が発動します。

The second big tuning option is therefore the choice of the right GC strategy. It has
big implications for the impact the GC has on the application performance. The last
and least known tuning option is around fragmentation and compacting. The Hotspot JVM
does not provide a lot of options to tune it, so the only way is to tune the code
directly and reduce the number of allocations.

second big tunig option はしたがって、正しい GC 戦略の選択です。この選択肢にはGC がア
プリケーションの性能に与える impcat に対して big implications がありますlast and least 
known tuning option は fragmentation や compacting に関するものです。Hostspot JVM は多

There is another space in the Hotspot JVM that we all came to love over the years, the
Permanent Generation. It holds classes and string constants that are part of those
classes. While Garbage Collection is executed in the permanent generation, it only
happens during a major GC. You might want to read up what a Major GC actually is, as
it does not mean a Old Generation GC. Because a major GC does not happen often and
mostly nothing happens in the permanent generation, many people think that the Hotspot
JVM does not do garbage collection there at all.

Hotspot JVM にはもう一つスペースがあります。
それは、わたしたちが何年もの間愛することになった Permanent Generation です。
ガーベジコレクションが permanent generation で実行されますが、
これは major GC の間でしか発生しません。
あなたはMajor GCが実際にはどういうものであるのか、Old Generation GC と
major GC は頻繁には発生せず、また、permanent generation ではほとんどまったく発生しないので、
多くの人は Hotspot JVM はガーベジコレクションをまったく行わないものだと考えています

Over the years all of us run into many different forms of the OutOfMemory situations
in PermGen and you will be happy to hear that Oracle intends to do away with it in the
future versions of Hotspot.

何年もの間、わたしたちはみな PermGen における OutOfMemory situations をさまざまに異な
る形で経験していて、Oracle がHotspotの将来のバージョンでそれを取り去ろうとしているとい

Oracle JRockit

Now that we had a look at Hotspot, let us look at the difference in the Oracle JRockit.
JRockit is used by Oracle WebLogic Server and Oracle has announced that it will merge
it with the Hotspot JVM in the future.

さてここで Hostspot に注目して Oracle JRockit との違いを見ることにしましょう。
JRockit は Oracle WebLogic Server で使われていて、
Oracle はこれと Host JVM を将来マージするだろうとアナウンスしています。

Heap Strategy

The biggest difference is the heap strategy itself. While Oracle JRockit does have a
generational heap it also supports a so called continuous heap. In addition the
generational heap looks different as well.

Oracle JRockit が generational heap を持っている一方で
continuous heap と呼ばれるものもサポートしています。
それに加えて、generational heap は looks different as well です。

Heap of the Oracle JRockit JVM
Oracle JRockit JVM のヒープ

The Young space is called Nursery and it only has two areas. When objects are first
allocated they are placed in a so called Keep Area. Objects in the Keep Area are not
considered during garbage collection while all other objects still alive are
immediately promoted to tenured. That has major implications for the sizing of the
Nursery. While you can configure how often objects are copied between the two
survivors in the Hotspot JVM, JRockit promotes objects in the second Young Generation

Young space は Nursery と呼ばれ、それは二つのエリアしか持っていません。最初に割り当て
られたときオブジェクトは Keep Area と呼ばれるエリアに置かれます。他のすべての生きてい
るオブジェクトが即座に tenured へ promote するのに対し、Kepp Area にあるオブジェクトは
That has major implications for the sizing of the Nursery.
#Nursery の sizing に対して
#major implications (密接な関係) があります。
Hotspot JVM では二つの survivors の間でオブジェクトがどのくらい頻繁にコピーされるのか
を configure できるのに対して、JRockit は second Young Generation GC にあるオブジェク
トを promote します。

In addition to this difference JRockit also supports a completely continuous Heap that
does not distinguish between young and old objects. In certain situations, like
throughput orientated batch jobs, this results in better overall performance. The
problem is that this is the default setting on a server JVM and often not the right
choice. A typical Web Application is not throughput but response time orientated and
you will need to explicitly choose the low pause time garbage collection mode or a
generational garbage collection strategy.

この違いに加えて、JRockit は young オブジェクトと old オブジェクトを区別しない完全に 
continuous な Heap もサポートしています。スループット指向のバッチジョブのような一部の
シチュエーションでは、これは better overall performance な結果をもたらします。問題なの
は、これがサーバーJVM のデフォルト設定であり、しばしば正しい選択でないということにあり
low pause なガーベジコレクションモードか、さもなくば generational なガーベジコレクショ
ン戦略をexplicitly に選択する必要があるでしょう。

Mostly Concurrent Mark and Sweep

If you choose Concurrent Mark and Sweep strategy you should be aware about a couple of
differences here as well. The mostly concurrent mark phase is divided into four parts:

もし 並列マークアンドスイープ戦略を選んだのなら、ここに挙げた相違点にも注意すべきです。

    * Initial marking, where the root set of live objects is identified. This is done
      while the Java threads are paused.
      ここでは生きているオブジェクトの root set が identify されます。
      これは Javaのスレッドが pause している間に行われます。

    * Concurrent marking, where the references from the root set are followed in order
      to find and mark the rest of the live objects in the heap. This is done while the
      Java threads are running.

      ヒープ上で生きている残りのオブジェクトを発見してマークをつけるために root set から

    * Precleaning, where changes in the heap during the concurrent mark phase are
      identified and any additional live objects are found and marked. This is done while
      the Java threads are running.
      これは concurrent makr phase 中のヒープの変更を identify し any additional live
      object の検出とマークを行います。これは Java のスレッドが実行されているときに行

    * Final marking, where changes during the precleaning phase are identified and any
      additional live objects are found and marked. This is done while the Java threads
      are paused.
      Final marking
      precleaning phase の間の変更を identify し、そこで追加された生きているオブジェクト

The sweeping is also done concurrent to your application, but in contrast to Hotspot
in two separate steps. It is first sweeping the first half of the heap. During this
phase threads are allowed to allocate objects in the second half. After a short
synchronization pause the second half is sweeped. This is followed by another short
final synchronization pause. The JRockit algorithm therefore stops more often than the
Sun Hotspot JVM, but the remark phase should be shorter. Unlike the Hotspot JVM you
can tune the CMS by defining the percentage of free memory that triggers a GC run.

この sweeping もアプリケーションの実行と並列に行われますが、Hotspot は対象的に二つのス
テップに分割しています。最初にヒープの前半を sweeping します。このフェイズの間、スレッ
ドはヒープの後半に置くオブジェクトの割り当てを許されます。短い synchronization pause 
のあと後半が sweep されます。これにもうひとつ別の short final synchronization pause が
続いています。したがって JRockit のアルゴリズムではSun Hotspot JVMよりも頻繁にストップ
しますが、remark フェーズは逆に短いものになっているはずです。Hotspot JVM と異なり、GC 
を実行する引き金となるフリーメモリの割合を定義することでCMS を tune 可能です。


The JRockit does compacting for all Tenured Generation GCs, including the Concurrent
Mark and Sweep. It does so in an incremental mode for portions of the heap. You can
tune this with various options like percentage of heap that should be compacted each
time or how many objects are compacted at max. In addition you can turn off compacting
completely or force a full one for every GC. This means that compacting is a lot more
tunable in the JRockit than in the Hotspot JVM and the optimum depends very much on
the application itself and needs to be carefully tested.

JRockit は並行マークアンドスイープを含むTenured Generation GC すべてで compacting を行
いますその compacting は portions of the heap に対してインクリメンタルモードで行われま
す。ここで、percentage of heap that should be compacted each time や how many objects 
are compacted at maxのようなさまざま options を使ってチューンが可能です。
さらに、すべて GC に対して compacting を完全に禁止したり強制的に行わせることもできます。
これは Jrockit の compacting  は Hotspot JVM よりも格段に tunable であり、
その optimum はアプリケーションそのものに強く依存したものなので

Thread Local Allocation

Hotspot does use thread local allocation, but it is hard to find anything in the
documentation about it or how to tune it. The JRockit uses this on default. This
allows threads to allocate objects without any need for synchronization, which is
beneficial for allocation speed. The size of a TLA can be configured and a large TLA
can be beneficial for applications where multiple threads allocate a lot of objects.
On the other hand a too large TLA can lead to more fragmentation. As a TLA is used
exclusively by one thread, the size is naturally limited by the number of threads.
Thus both decreasing and increasing the default can be good or bad depending on your
applications architecture.

Hotspot はスレッドローカル割り当てを使っていますが、ドキュメントでこれに関する記述を何
かしら見つけることやチューンのやり方は難しいです。JRockit はデフォルトでスレッドローカ
を許可します。これには割り当て速度に対する利益があります。TLA の大きさはconfigure 可能
で、大きい TLA は複数のスレッドが大量のオブジェクトの割り当てをするアプリケーションで 
beneficial かもしれません。その一方で、大きすぎる TLA は一層のフラグメンテーションに直
結します。TLA は一つのスレッドにより排他的に使用されるので、その大きさはスレッドの数に

Large and small objects

The JRockit differentiates between large and small objects during allocation. The
limit for when an object is considered large depends on the JVM version, the heap size,
the garbage collection strategy and the platform used. It is usually somewhere between
2 and 128 KB. Large objects are allocated outside thread local area in in case of a
generational heap directly in the old generation. This makes a lot of sense when you
start thinking about it. The young generation uses a copy ccollection. At some point
copying an object becomes more expensive than traversing it in ever garbage collection.

JRockit は large オブジェクトと small オブジェクトとで割り当てを別にしています。オブジ
ェクトが large と見なされる境界は JVM のバージョン、ヒープのサイズ、ガーベジコレクショ
ンの戦略、プラットフォームといったものに依存します。この境界は通常 2Kバイトから128Kバ
Large objects are allocated outside thread local area in
in case of a generational heap directly in the old generation.
large オブジェクトはスレッドローカル領域の外に割り当てられます
generational heap に直接

This makes a lot of sense when you start thinking about it.
young generation は copy collection を使っています。
いくつかの点で、オブジェクトをコピーすることは traversing するよりも高価になります

No permanent Generation

And finally it needs to be noted that the JRockit does not have a permanent generation.
All classes and string constants are allocated within the normal heap area. While that
makes life easier on the configuration front it means that classes can be garbage
collected immediately if not used anymore. In one of my future posts I will illustrate
how this can lead to some hard to find performance problems.

そして最後に、JRckit が permanent generation を持っていないことに注意する必要があります。
While that makes life easier on the configuration front it means 
that classes can be garbage collected immediately if not used anymore.


The IBM JVM shares a lot of characteristics with JRockit: The default heap is a
continuous one. Especially in WebSphere installation this is often the initial cause
for bad performance. It differentiates between large and small objects with the same
implications and uses thread local allocation on default. It also does not have a
permanent generation, but while the IBM JVM also supports a generational Heap model it
looks more like Sun's rather than JRockit.

IBM の JVM は JRockit と多くの特徴を共有しています:そのデフォルトヒープは continuous 
なものです。とくに WebSphere のインストールにおいてこれはしばしば bad performance の 
initial cause となります。同じ implications を持った large object と small object は区
別され、デフォルトではスレッドローカル割り当てを使用します。また、IBM JVM は permanent 
generation も持ちませんが、JRockit よりは Sun のものにより似ているものです。

The IBM JVM generational heap

Allocate and Survivor act like Eden and Survivor of the Sun JVM. New objects are
allocated in one area and copied to the other on garbage collection. In contrast to
JRockit the two areas are switched upon gc. This means that an object is copied
multiple times between the two areas before it gets promoted to Tenured. Like JRockit
the IBM JVM has more options to tune the compaction phase. You can turn it off or
force it to happen for every GC. In contrast to JRockit the default triggers it due to
a series of triggers but will then lead to a full compaction. This can be changed to
an incremental one via a configuration flag.

Allocate and Survivor は Sun JVM の Eden and Survivor のように振る舞います。新しいオブ
ーされます。JRockit はこれと対象的に gc に従って二つのエリアを切り替えます。このことは、
オブジェクトは Tenured に promote される前に二つのエリア間で複数回コピーされることを意
味します。IBM JVM は JRockit のように、compaction phase を tune するためのより多くのオ
In contrast to JRockit the default triggers it due to a series of triggers
but will then lead to a full compaction.
これは configuration flag を通じて incremental なものに変更可能です。


We see that while the three JVMs are essentially trying to achieve the same goal, they
do so via different strategies. This leads to different behaviour that needs tuning.
With Java 7 Oracle will finally declare the G1 (Garbage First) production ready and
the G1 is a different beast altogether, so stay tuned.

Java 7 で Oracle はようやく G1 (Garbage First) が production ready であることと、
G1 が a different beast altogether, so stay tuned であることを宣言しました。

If you're interested in hearing me discuss more about WebSphere in a production 
environment, then check out our upcoming webinar with The Bon-Ton Stores. I'll be 
joined by Dan Gerard, VP of Technical & Web Services at Bon-Ton, to discuss the 
challenges they've overcome in operating a complex Websphere production eCommerce site 
to deliver great web application performance and user experience. Reserve your seat 
today to hear me go into more detail about Websphere and production eCommerce 

■_ DRM解除


Gmail - O'Reilly Japan News 第159号 - hogemuta@gmail.com


●オライリー・ジャパンのEbookがDRM Freeになります

2011年5月より、オライリー・ジャパンで販売するEbookをDRM Free化
DRM Free化にあたって、サーバ上のプログラムを変更いたします。
そのため2011年5月23日(月)、Ebook Storeを一時クローズいたし

■_ ねた



416 名無しんぼ@お腹いっぱい [sage] 2011/05/18(水) 21:16:01.53 ID:Qsnr0hPf0 Be:

417 名無しんぼ@お腹いっぱい [sage] 2011/05/18(水) 22:21:39.01 ID:G8fB/eEI0 Be:

418 名無しんぼ@お腹いっぱい [sage] 2011/05/18(水) 22:24:39.67 ID:vr4aODnJ0 Be:

419 名無しんぼ@お腹いっぱい [sage] 2011/05/18(水) 22:29:07.37 ID:sHRj2pv40 Be:

420 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 00:17:21.56 ID:4GFKBYJw0 Be:

421 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 00:27:07.30 ID:fZA7ONfs0 Be:

422 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 00:43:24.25 ID:OfBLRRhf0 Be:

423 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 02:06:46.74 ID:r1V9NeUc0 Be:

424 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 02:40:13.59 ID:fdxgLin80 Be:

425 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 04:33:20.35 ID:tk+n1d1G0 Be:

426 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 09:26:07.09 ID:sFQLGGRe0 Be:

427 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 11:38:37.44 ID:eop4nYQq0 Be:

428 416 [sage] 2011/05/19(木) 19:02:14.76 ID:Cw58avfJ0 Be:

429 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 19:02:48.02 ID:fxcX9jNf0 Be:

430 名無しんぼ@お腹いっぱい [sage] 2011/05/19(木) 20:15:55.56 ID:PrUXrwNi0 Be:
    余計な 執政など 無いよね




今月号の Software Design、連載記事(Software Designer #26)見て買った。 Spacewar! の作者にして、最初のLispインタープリターの開発者。 だそうな。
Software Design (ソフトウェア デザイン) 2011年 06月号 [雑誌]

■_ かんぺ

Ruby (1.9)のエンコーディング。

wycats's gist: 83c011e40e1970df0ef4 — Gist

Ruby Encoding Cheat Sheet

   1. Only call force_encoding on BINARY Strings.
      バイナリ文字列(という訳はちと変だな)に対してのみ force_encoding を使う

   2. When receiving a BINARY string from the network or file system, make sure to 
      force_encode it to its correct encoding.
      正しいエンコーディングへ force_encode する。

          * In general, the encoding information is provided in an out-of-band channel, 
            such as the Content-Type header in HTTP

            一般的には、エンコーディング情報は HTTP の Content-Type ヘッダーのような
            out-of-band channel にある。

          * If you don't know the encoding, the String is BINARY forever and should 
            not be concatenated with non-BINARY strings

   3. When calling force_encoding on a BINARY String, immediately call encode! 
      afterwards. This will transcode the String to the default_internal encoding

      バイナリ文字列に対して force_encoding を呼び出したら、そのあと即座に encode! を
      呼び出すこと。これはその文字列を default_internal なエンコーディングに変換する。

   4. When using a regular expression with /u, make sure that only Unicode Strings are 

      /u をつけた正規表現を使う場合、Unicode 文字列だけがくるようにすること。

   5. When using a regular expression with /n, make sure that only BINARY Strings are 

      /n をつけた正規表現を使う場合、バイナリ文字列だけがくるようにすること。

   6. If you get an incompatible encoding between BINARY (ASCII-8BIT) and another encoding,
      the correct debugging approach is to identify where the BINARY String came from.
      Usually, this means that a library read in BINARY data from the network and didn't
      give it an encoding.


   7. In app code, never use force_encoding to convert BINARY data into a particular 
      encoding. By the time you've reached app code, you have lost the information about 
      which encoding is being used. Instead, find where the String came into Ruby, and fix 
      it to set up the encoding based on the information it knows.

      force_encoding を決して使わないこと。アプリケーションコードをいじっているのであれば

   8. In library code, only use force_encoding to convert BINARY data into an encoding if
      you have information about what encoding is being used. This means that you have a 
      header in network protocols or a magic comment in templates (like ERB) or source files.


   9. Only include the magic comment in source files that actually contain characters 
      from that encoding


  10. To combine two Strings with known, but different encodings, use encode to transcode
      the Strings into the same encoding, then combine them.

      二つの文字列を連結するには、encode を使って変換して文字列のエンコーディングを

■_ ライセンス

オープンソースソフトのライセンス、開発者はGPLを最も採用、企業はApache - ニュース:ITpro

GNU General Public License)」であるのに対し、企業がOSSを導入する際に最も利用している
ライセンスは「Apache License」だった。

 同調査では、OpenLogicが運用するOSSパッケージの紹介・提供サイト「OpenLogic Exchange
した。2位は「Apache License」で7.6%、3位は「LGPL(GNU Lesser General Public License)」
で6.7%だった。4位に「BSD License」(5.3%)、5位に「MIT License」(4.1%)が続いた。

 一方、OLEXからダウンロードされるパッケージの32.7%が「Apache License」のもとで公開さ
れていた。2位は「LGPL」(21.0%)、3位は「GPL」(14.4%)、4位は「BSD License」(3.8%)、
5位は「MIT License」(1.6%)だった。

 また、企業アプリケーションで使用されているライセンスを調べたところ、「Apache License」
(15.3%)が最も多かった。2位以下に「MIT License」(10.8%)、「BSD License」(10.5%)、

Press Release: OpenLogic Scanning Data Reveals OSS Developers Choose GPL, Enterprises Prefer Apache この辺の、ライセンスの使われ具合の違いを見て思うところがあったので何か書こうと思ったけど 眠いので止めた。

■_ もなど


Suggestions for Learning Monads? : haskell

I need to learn Monads (Okay well I really don't need to learn Monads as Haskell is a 
hobby language, but I want to learn them). What I wonder is any suggestions for 
understanding them?

Everytime I have read something on Monads my head has exploded, so any other resource 
that very slowly breaks them down?

The idea that understanding monads is necessary to be productive in Haskell is a myth. 
If you do not yet understand monads then it is probably not yet your time to 
understand monads. Despite appearances, this is not intended to be a non-answer. Just 
keep going with Haskell generally, learn libraries as you need them (including their 
Monad instances), and eventually you will catch on to the "monadic pattern," 
at which point it will be as though you had known it all along, and you will not 
understand what all the fuss is about.

My suggestion would be "stop trying". Seriously, just write Haskell and read 
other people's Haskell code. There's nothing that fundamentally needs monads, but as 
you gain more familiarity with the type system and common repeated patterns, you'll 
probably start picking up hints about what they're for.

I'd be willing to bet that most people try to learn them prematurely (mostly because 
for some reason people talk about them all the fucking time) and end up quitting 
Haskell in frustration (or concluding that it's a bullshit academic language).

When you don't understand monads yet, chances are that you don't have a good grasp of 
your previous Haskell knowledge yet. My recommendation is to know the Prelude by 
heart* before attempting to learn about monads.

*Within reason. There is no need to know the intricacies of the Fractional hierarchy, 
but you should be able to implement any Prelude function from their type signature in 
1 minute cold.

Learn what Functors are, then Applicative Functors, then Monoids, then Monads (then 

Start here, then read the next two chapters as well: 

After that, if you're still hungry for more Category Theoretic typeclasses, start 
learning about arrows (or comonads, etc etc)!



気のせいです。たぶん。 Twitter / @Kakutani Shintaro: リンクのラベルがdisられているように思えるのは被害 ...



rubykaigi.org:System down for maintenance

■_ ぺちぺ

Consistency in PHP : PHP

I enjoy working with PHP. It's not as elegant a language as Ruby or Python perhaps. 
It's a little bit dirty but I like that! However, there's a few things which annoy the 
hell out of me, in particular consistency.

わたしはPHPを使って働いています。これはおそらく Rubyや Python のように言語としてエレ
ガントなものではありません。ちょっとばかり dirty なものですが、わたしはこれが好きなの
です! しかし、いくつか気になることがあるのです。特に一貫性について。

For example:

str_replace (needle, replacement, haystack)

strstr (haystack, needle)

stripos (haystack, needle)

I'm trying to think of other examples off the top of my head but I can't at the moment. 
However, I know they're out there!

Any reason for this, that I'm missing out on?

    Any reason for this, that I'm missing out on?

PHP is a thin wrapper around a whole bunch of C libraries. In a lot of cases, the 
naming convention comes from the underlying library. For example, the mysql functions 
are exact mirrors of the mysql C library. Same for the GD graphics library. In other 
cases, it's just different systems designed by different people.

The more I think about it in these terms the more I start to think that this was a 
good decision by the people behind the language as it allows people already familiar 
with those libraries to get straight to work.

And for that matter, I've always thought it was arrogant the way the likes of Ruby 
throw out 40 years of familiarity with C-like syntax 'just because'.

PHP is the Slackware of programming languages now that I think about it. Vanilla 
everything and not as shiny as the competitors.

Having used PHP for over a decade, C for years, Java, and more recently Ruby, Python I 
completely disagree with you. Inconsistency is a pain.

I take your point about those who've worked with the C libs underlying the PHP 
functionalities, but how many of those people are doing what you suggest and using PHP 
after having used the C libs? Versus people who've come from nowhere and learnt PHP as 
their first language? Versus people learning PHP for work, having come from another 
language altogether.

It's a bad design decision.

explode ($needle, $haystack) always get me.

Agreed, but I think that's because it's both a string and an array function, so you 
tend to go for the string version ($haystack, $needle) first. I do personally, since 
in other languages the string is the object being operated on (e.g. Python's 

Great point, I failed to notice that myself! Upvote for you, Sir.

Python's ','.join() is a wart too -- it makes logical sense (I understand why it was 
done), but visually it's just wrong...

Intellisense and experience help.

Upvote. Been using NetBeans for a few years now and function naming/param ordering 
became non-issue for me.

In general, I think it goes:

String functions (haystack, needle)

Array functions (needle, haystack)

In many cases, but not all. preg_* are string functions, basically, and they're 
(pattern, text), whereas the str_replace(/etc.) are (text, pattern)

Welcome to PHP. Looks like you're starting to get the hang of it.

The whole strip_tags vs stripslashes is a popular beef too. When you have so many 
people developing a language these things are bound to happen.

Yeah, everyone agrees: this stuff sucks. They did make object oriented versions of a 
lot of the library, but I think String stuff is untouched...

I'd like to see a ground-up redesign in, say, PHP7. Make it more object-oriented, fix 
the inconsistencies, etc. And then set up aliases to the old functions so as not to 
break backwards-compatibility.

I'll wait to even see PHP6 become real...


There have been a couple of forks over the years that attempted to cleanup the 
inconsistencies but none of them took off. Mostly because it's too late for such large 
changes that affect every single piece of PHP code.

Personally, I'm not sure what the best solution is. I think I'd adjust fairly quickly, 
but I would not be looking forward to updating old code. Especially if you're trying 
to support old and new versions of PHP.

No good reason, it was just bad planing. Unfortunately it isn't an easy fix without 
breaking everything.

why post responses if you genuinely have no clue as to the correct answer?

It's because many of these functions are based on functions from other languages, and 
generally adopted the same parameter ordering.

When you say "elegant", referring to Ruby, do you mean less "magical"?

Exactly. Less "magical" but more "snappy".





DropBox で、ディレクトリの移動をしたら 別のPCからアクセスしたときにファイルの削除とコピーになってしばらく何もできなかったという。

■_ 2.8

GNU grep 2.8がリリースされてました。 terny でなく minor を上げてるんでそこそこ大きめの変更があった?

GNU grep NEWS                                    -*- outline -*-

* Noteworthy changes in release 2.8 (2011-05-13) [stable]

** Bug fixes

  echo c|grep '[c]' would fail for any c in 0x80..0xff, and in many locales.
  E.g., printf '\xff\n'|grep "$(printf '[\xff]')" || echo FAIL
  would print FAIL rather than the required matching line.
  [bug introduced in grep-2.6]

  grep's interpretation of range expression is now more consistent with
  that of other tools.  [bug present since multi-byte character set
  support was introduced in 2.5.2, though the steps needed to reproduce
  it changed in grep-2.6]

  grep erroneously returned with exit status 1 on some memory allocation
  failure. [bug present since "the beginning"]

Copyright (C) 1992, 1997-2002, 2004-2011 Free Software Foundation, Inc.

  Copying and distribution of this file, with or without modification,
  are permitted in any medium without royalty provided the copyright
  notice and this notice are preserved.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts.  A copy of the license is included in the ``GNU Free
Documentation License'' file as part of this distribution.

マルチバイト locale 使用時の速度低下対策かな。


  Copyright (C) 1992, 1997-2002, 2004-2011 Free Software Foundation, Inc.

  Copying and distribution of this file, with or without modification,
  are permitted in any medium without royalty provided the copyright
  notice and this notice are preserved.

Short term work

See where we are with UTF-8 performance.

Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
70-man_apostrophe.patch.  Go through patches in Savannah.

Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
Fix --directories=read.

Write better Texinfo documentation for grep.  The manual page would be a
good place to start, but Info documents are also supposed to contain a
tutorial and examples.

Some test in tests/spencer2.tests should have failed!  Need to filter out
some bugs in dfa.[ch]/regex.[ch].


GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.

Lazy dynamic linking of libpcre.

Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
binary. Is there a possibility of doing even better by automatically
checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?  Once what to do with
libpcre is decided, do the same for libz and libbz2.

Matching algorithms

Check <http://tony.abou-assaleh.net/greps.html>.  Take a look at these
and consider opportunities for merging or cloning:

   -- ja-grep's mlb2 patch (Japanese grep)
   -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
   -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/>
      seems like nice work;
   -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
   -- agrep (Approximate grep) <http://www.tgries.de/agrep/>,
      from glimpse;
   -- nr-grep (Nondeterministic reverse grep)
   -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>;
   -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>;
   -- freegrep <http://www.vocito.com/downloads/software/grep/>;

Check some new algorithms for matching; talk to Karl Berry and Nelson.
Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
claim that his algorithm is faster than Boyer-More. Worth checking.

Fix the DFA matcher to never use exponential space.  (Fortunately, these
cases are rare.)

Standards: POSIX and Unicode

For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
and [. .] constructs is limited. This is difficult because it requires
locale-dependent details of the character set and collating sequence,
but POSIX does not standardize any method for accessing this information!

For Unicode, interesting things to check include the Unicode Standard
<http://www.unicode.org/standard/standard.html> and the Unicode Technical
Standard #18 (<http://www.unicode.org/reports/tr18/> “Unicode Regular
Expressions”).  Talk to Bruno Haible who's mantaining GNU libunistring.
See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
“Unicode Normalization Forms”), already implemented by GNU libunistring.

In particular, --ignore-case needs to be evaluated against the standards.
We may want to deviate from POSIX if Unicode provides better or clearer

POSIX and --ignore-case

For this issue, interesting things to check in POSIX include the
Volume “Base Definitions (XBD)”, Chapter “Regular Expressions” and in
particular Section “Regular Expression General Requirements” and its
paragraph about caseless matching (note that this may not have been
fully thought through and that this text may be self-contradicting
[specifically: “of either data or patterns” versus all the rest]).


大小文字の違いを無視ってのも locale によってはいろいろあるからなあ





無印良品にいって文房具のところを見てたんですが、 結構面白そうなのがありましたね。 ひょっとしてほかのメーカーが先に出してるものもあるのかもしれませんけど。 今度は買ってこよう。 植林木ペーパー4コマ付箋紙 約44×98mm・45枚 | 無印良品ネットストア とか 短冊型メモ チェックリスト 40枚・14行・約82×185mm | 無印良品ネットストア とか。 お、A5サイズのスリムノートなんてあったんだ(A6のは割りと愛用) 再生紙モバイルノート A5スリム無地40枚 | 無印良品ネットストア

■_ 5.14

The three significant features in Perl 5.14 - Islands in the byte stream とかすでにいろいろ情報が上がってますが、 5.14がリリースされたと。5.8とかまだそこそこ見るような気がしないでもないんですが(^^;

Perl 5.14 - The Perl Foundation

A new version of Perl, 5.14, was officially released on 14th May following the successful
test period, including the testing of release candidates. This is the first release of
Perl 5 using the new annual schedule.

There are a number of enhancements and alterations in this version, a full list of 
changes can be found at ( http://perl5.git.perl.org/perl.git/blob/HEAD:/pod/perldelta.pod ),
a summary of some of the changes:

    * Unicode 6.0 support, along with many, many improvements to our Unicode-related features
      Unicode 6.0 をサポート。さらにUnicode に関連した機能の数多くの改良が含まれます。

    * Improved support for IPv6
      IPv6 サポートの改善

    * Significantly easier autoconfiguration of the CPAN client
      格段に簡単になった CPAN クライアントの自動 configuration

    * A new /r flag which makes s/// substitutions non-destructive
      s/// で破壊的でない置換を行わせる /r フラグの新設

    * New regular expression flags to control whether matched strings should be treated
      as ASCII or Unicode

    * New "package Foo { }" syntax
      新しい "package Foo { }" 構文

    * Uses less memory and CPU than previous releases

    * A swathe of bug fixes, a large number associated with the work of Dave Mitchell 
      ( http://news.perlfoundation.org/2011/05/fixing-perl5-core-bugs-report-11.html )
      who has been fixing some deep bugs thanks to a TPF grant;

It is important to note that this version marks the official end of support for Perl 5.10.

このバージョンのリリースにより、Perl 5.10 は公式に end of support を迎えたことは

This work is just one year of development since the release of Perl 5.12.0. It contains
nearly 550,000 lines of changes from close to 3,000 files, this work was done by 150
authors and committers. The documentation, as always, pays tribute to those 
people who worked hard on this new version, "Many of the changes included in this 
version originated in the CPAN modules included in Perl's core. We're grateful to the 
entire CPAN community for helping Perl to flourish." The success of this version 
is dependent on the great work of the whole community, a particular note of thanks 
should go to Jesse Vincent for his coordination skills as release manager for 5.14.


Perl 5.14 is out : programming

I wonder if they had timed the release for today - 5/14?

They'll probably release a new version every 367 days, to keep up with the calendar, too.

You mean today 15/5?

I think the deprecations are important. They are starting to realise that the language 
has a fair bit of old cruft in it that needs removed. This is a good thing, even if it 
breaks a bit of code, it's no Python 3.

Yeah. We don't Perl to be the next Java."Just pile it over the crap boys."

The entire language should be deprecated. The code base should consist of a readme 
that says "I'm sorry."



35歳以上のプログラマー その11 

202 仕様書無しさん [sage] 2011/05/08(日) 20:01:32.54 ID: Be:

203 仕様書無しさん [] 2011/05/08(日) 20:37:14.54 ID: Be:

204 仕様書無しさん [sage] 2011/05/08(日) 20:41:31.54 ID: Be:

205 仕様書無しさん [sage] 2011/05/08(日) 22:11:19.84 ID: Be:
    うう フィンローダさん 元気かなぁ。


206 仕様書無しさん [sage] 2011/05/08(日) 22:55:13.32 ID: Be:


207 仕様書無しさん [sage] 2011/05/09(月) 00:33:03.72 ID: Be:

208 仕様書無しさん [sage] 2011/05/09(月) 03:39:20.43 ID: Be:
    Windows の初歩の初歩は、当時としては名著だと思うのです。 

209 仕様書無しさん [sage] 2011/05/09(月) 09:00:46.61 ID: Be:

210 仕様書無しさん [sage] 2011/05/09(月) 12:27:51.84 ID: Be:

211 仕様書無しさん [sage] 2011/05/09(月) 12:30:52.39 ID: Be:
    win32 に対応するように改定すれば、それなりに。 

212 仕様書無しさん [sage] 2011/05/10(火) 01:51:54.60 ID: Be:

213 仕様書無しさん [sage] 2011/05/10(火) 13:02:18.56 ID: Be:

214 仕様書無しさん [sage] 2011/05/10(火) 19:14:40.82 ID: Be:
    >>212 間違ってない

    >>210 単行本がアスキーから出てるからアスキーのどれかじゃない? 

215 仕様書無しさん [sage] 2011/05/11(水) 21:36:13.23 ID: Be:

216 仕様書無しさん [sage] 2011/05/11(水) 22:35:20.35 ID: Be:


「千言万語」は出してほしいなあ。 一応全部いりDVD持ってるので(高かったよなあ)、読返せるっちゃあ読み返せるんだけど。





(OOo の) Impress つかいづれー。 ま、慣れてないせいもあるんだろう。 Power Point 2007 に慣れちゃってるし。

Twitter / @Kenji Rikitake: 土居範久先生の新刊「相互排除問題」(岩波書店).#E ... のように推薦されていた

織田信長物語~桶狭間合戦の真実 (SPコミックス)
こっちはざっと読んだ。 感想などはあとで(たぶん…)。

■_ 今日の重箱の隅つつき

InfoQ: Telerikは、永久に無料の.NET逆コンパイラを約束した

.NETアセンブリの逆コンパイルツールとしてよく知られる.NET Reflectorは一度は無料であった
が、今年のはじめにRedGateの創設者であるSimon Galbraith氏は、「Reflectorを所持している
バージョンがあり、もっとも安いStandard ($35)は、スタンドアロンのWindowsアプリケーショ

ラ。」と約束したのであろう。Telerikは、コードを分析するVisual Studioの生産性向上アドイ
InfoQ: Telerik Promises a Free .NET Decompiler Forever

.NET Reflector, a well-known decompiling tool for .NET assemblies, was once free but 
RedGate decided to charge money for it earlier this year because “owning Reflector 
doesn't make commercial sense”, according Simon Galbraith, co-founder of RedGate. 
The tool comes now in three versions, the cheapest one being Standard ($35), a 
standalone Windows application offering browsing, analyzing and decompiling .NET code 
to C#, VB.NET or IL. The free version has a time bomb set to go off on May 31, 2011.

Perhaps as a reaction to RedGate's decision, Telerik promises “Free Decompiling. 
Forever.” Telerik sells JustCode, a code analysis Visual Studio productivity add-in, 
and one of its features is to decompile .NET assemblies. Now, the company has decided 
to make that feature available in a free stand-alone Windows tool, called 
JustDecompile, for those interested in decompiling code. JustDecompile can be used to 
analyze entire types of the assemblies loaded or external assemblies referenced. The 
tool can decompile lambda expressions, generics, yield statements, and auto-generated 

lambda expressions は「ラムダ式」じゃないかなあ。 確かに expression は表現とも訳せるけど。 auto-generated properties自動実装プロパティ とするのもなんとなく違和感が。「生成」とするのが気に入らなかったんだろうか。

んで、もうひとつは原文からして説明不足の部分があるのだけど、 Reflectorを所持していることに、商業的な意味がない (owning Reflector doesn't make commercial sense) というのは、これまでいろいろ目論見があって無料で配布していたのだけど その目論見通りにいっていないので 「doesn't make commercial sense」 なわけなんですよね。 「所持していることに商業的な意味がない」というくだりを読んだときに しばし意味をつかみかねてしまいました・

■_ 今日の重箱の隅つつき (その2)


InfoQ: Mono の現状と今後

しかしこれらは Mono のフリーバージョンでも同じように思われるので,この製品の利点とする
にはいくぶん不明確な部分もある。(Novell とは連絡を取っているので,興味深い事実が見つか
れば InfoQ でフォローアップ記事を発表するつもりだ。) このカテゴリに分類されるもうひと
つの製品は Mono Tools for Visual Studio である。この製品は,クロスプラットフォーム開発

    * Vsual Studio 内で Mono Migration Analyzer を直接実行可能
    * Mono アプリケーションのリモートデバッグ,OS X や Linux での実行にも対応可能
    * アプリケーションの RPM パッケージ へのバンドル
    * "Linux アプライアンス" の生成。これは SUSE Linux,Mono,アプリケーションを

.NET 開発者には,最後の機能の Microsoft 版というものは存在しない。その点を考慮した上で,
開発したアプリケーションのみでなく,VM の Windows Server ライセンスをユーザに販売する

このくだりの .NET 開発者には,最後の機能の Microsoft 版というものは存在しない。 部分がどういう意味(最後の機能が指しているもの)がぱっと見でわかりますか? ここでそんなの当然だろうと言われると話が進まない ○| ̄|_ のですが、

InfoQ: On the Current State and Future of Mono

Since this sounds like what you get from the free version of Mono, we are a little 
unsure about the benefits of this product. (Novell has been contacted and if we find 
anything interesting InfoQ will be running a follow-up article.) Another product that 
falls into this category is Mono Tools for Visual Studio. This product offers some 
useful features for cross-platform developers including

    * Running the Mono Migration Analyzer directly in Visual Studio
    * Remote debugging for Mono applications, even when running on OS X or Linux
    * Bundle applications into RPM packages
    * Create a “Linux Appliance”, which is a virtual machine preconfigured with SUSE 
      Linux, Mono, and your application.

For .NET developers there is no Microsoft alternative for this last feature. To even 
consider it you would have to contemplate the logistics of selling your customer not 
only your application, but also a Windows Server license for the VM.

原文では This product offers some useful features for cross-platform developers including という文があって、それにリストが続き、 For .NET developers there is no Microsoft alternative for this last feature. という分で締めているのに対して、 この製品は,クロスプラットフォーム開発者に便利な機能をいくつも提供する。例えば,(リスト略) .NET 開発者には,最後の機能の Microsoft 版というものは存在しない。 となっていて、 このために「最後の」が直前のリストの最後であるというのがわかりにくくなっているのではないでしょうか。

英語と日本語では語順が変わったりするのがよくあるのでこのように リストを間に挟んだものを訳すのは結構面倒だとは思うんですが、 「~以下のような便利な機能をいくつも提供する。」 として、「このうち、.NET 開発者には最後に挙げられている機能の~」 ぐらいにしても良かったのではないでしょうか。

今日も盛大にブーメラン飛ばしました。 返ってきたら怖いなあ ○| ̄|_

■_ ませまちか

Mathematica の取り扱いのページをいくつか見たんだけど 日本語版ホームエディションの情報は見つからなかったなあ。 ダイレクトメールかなにかだったんだろか。

〓 Mathematica 5 〓 

460 132人目の素数さん [] 2011/05/14(土) 08:47:12.97 ID: Be:

461 132人目の素数さん [] 2011/05/14(土) 10:44:39.31 ID: Be:

463 132人目の素数さん [] 2011/05/14(土) 15:12:02.75 ID: Be:

464 ◆4gEg9/nYKE [sage] 2011/05/14(土) 15:13:53.76 ID: Be:

465 132人目の素数さん [sage] 2011/05/14(土) 18:34:39.21 ID: Be:

466 132人目の素数さん [sage] 2011/05/14(土) 19:02:57.04 ID: Be:

467 132人目の素数さん [sage] 2011/05/14(土) 19:10:27.17 ID: Be:

468 132人目の素数さん [sage] 2011/05/14(土) 19:25:40.51 ID: Be:

469 132人目の素数さん [] 2011/05/14(土) 19:35:37.04 ID: Be:

470 132人目の素数さん [sage] 2011/05/14(土) 19:38:00.02 ID: Be:

471 132人目の素数さん [] 2011/05/14(土) 19:39:48.81 ID: Be:

472 132人目の素数さん [sage] 2011/05/14(土) 20:09:37.19 ID: Be:

473 132人目の素数さん [] 2011/05/14(土) 21:35:07.56 ID: Be:

474 132人目の素数さん [sage] 2011/05/14(土) 21:46:13.92 ID: Be:

475 132人目の素数さん [] 2011/05/14(土) 21:47:45.39 ID: Be:
    wlframからダウンロードしようとしてもwolfram id登録でひっかかる。


476 132人目の素数さん [sage] 2011/05/14(土) 21:54:41.87 ID: Be:

67000円かあ。 ボーナスが景気良かったころ位でてれば買っただろうけど現状では厳しいわねえ ○| ̄|_


■_ The worst algorithm in the world? : programming


The worst algorithm in the world? : programming

I think the worst algorithm in the world is

while(1){ fork(); }

This one also sucks at counting fibonacci numbers: echo Hello World!

fork while fork

:(){ :|:; }

Don't forget you have to call it.

:(){ :|:; };:



bash: syntax error near unexpected token `)'

While you figure that out, my evil script has forkbombed everywhere!

It's made quite a mess.

fork while fork がなんとなくうけた。







Mixing it up: when F# meets C#
Mixing it up: when F# meets C#
By ian | Published: May 9, 2011


If it were a perfect world, we'd all exist in a happy little bubble of our favourite
programming language and you'd never have to worry about the nasty details of interacting
with something written by ― gasp ― someone else in a ― double-gasp ― different language.
But unfortunately that's precisely what we have to do all the time. And that means that one
day all of your fancy-pants algorithmic, highly parallel, functionally pure F# code is going
to meet the world of “enterprise” C# development head-on.

happy little bubble の中にいて、他の誰かが異なる言語で書いた何かとやり取りすることの
しかし残念ではあるけれど、それはわたしたちがいつもしなければならない precisely なことなのです。
And that means that one day all of your fancy-pants algorithmic, highly parallel,
functionally pure F# code is going to meet the world of “enterprise” C# development head-on.

Of course the idiomatic way to avoid problems at the boundary between your F# code and the
outside world is to ensure that you only expose a small set of compatible types. This works
pretty well if your clients are also .NET languages. For instance you can do things like
exposing your collections as seq, rather than say, a native F# list, and this will mean your
collections can be consumed as IEnumerable. The only problem is it means you've got the
added burden of maintaining this mapping layer, because you'll no doubt want to use the F#
“native” types internally.

So, what options do we have if some of our F# types happen to leak into our public API?
Luckily, lots. Let's take a look at how some of the common F# constructs can be called from C#.

では、一部のF# の型が public API に leak してしまったなら、わたしたちには
どんな選択肢があるでしょうか? 幸いにもたくさんあります。
C#から呼び出せる common F# constructs のいくつかを見てみることにしましょう。


I talked about the power of discriminated unions a while back, so how do they map to C#?
Interestingly the F# compiler generates a very simple OO type hierarchy, with an abstract
base class that each of the cases derive from. This will mean that your calling code will
need to use some casting, but don't worry, you're not back in the hell of runtime typing;
there are a set of methods on the type that you can use to identify the case before you do
the cast, so at least you get some compile time sanity checking.

I talked about the power of discriminated unions a while back,
Interestingly the F# compiler generates a very simple OO type hierarchy,
with an abstract base class that each of the cases derive from.

キャストを行う前に identify thee case するのに使える
runtime typging 地獄に戻ることはありません。
そして少なくともコンパイル時 sanity checking の一部を手にします。

Here's a simple discriminated union that we'll be using as an example:
単純なdiscriminated union の例を挙げましょう:

   type TestDU =
       | One of int * int
       | Two of string
       | Three

F# でこの型を使いたいのなら、次のようにするでしょう:

   // A function 'f' that takes an instance 't' of our union type
   let f t = match t with One (a,b) -> printf "Got One" | _ -> ()

   // Apply our function with a particular TestDU union case
   let _ = f <| Two "foo"

That's pretty natural, as you'd expect. If we exposed the function f to C#, it looks like this:

この関数 f を C# にexpose したなら、次のような感じになるでしょう:

void Module1.f(Module1.TestDU t)

This means we have to create an instance of the TestDU type, but we'll quickly discover it's
abstract. As I mentioned before, the F# compiler has generated a very simple class hierarchy,
with the union cases being subtypes of TestDU, implemented as nested types (which are used
quite extensively by the F# compiler). But if we try and instantiate one of these subtypes…?

これはつまり、TestDU 型のインスタンスを生成しなければならないことを意味しています。し
かし、このクラスは抽象型です。前述したように、F# コンパイラーは TestDU のサブタイプで
ある union case を伴って非常に単純なクラス階層を生成していて、(F# コンパイラーによって
quite extensively に使われる)ネストした型としてクラスを実装しています。しかしもし、こ
のようなサブタイプの一つをinstantiate しようとしたら?

The union case types have no constructors.
union case types はコンストラクターを持たない

Ouch: ‘No constructors defined'. Luckily, instead there are helper functions on the base
class that will create instances for us, at least for those cases with arguments. The
functions are prefixed with ‘New', e.g.:

Ouch: ‘No constructors defined'. となります。幸いなことに、
ヘルパー関数は 'New' というプリフィックスがついたものになります。

   var one = Module1.TestDU.NewOne(1, 2);
   var two = Module1.TestDU.NewTwo("foo");

And slightly oddly, the parameterless cases (Three in our example) use a property on the
base class instead:

And slightly oddly,
パラメーターがない場合 (Three in our example) にはベースクラスのプロパティを使います:

   var three = Module1.TestDU.Three;

So now we can create instances of the union type to pass to the F# function.

これで、F# で書かれた関数に渡すための union type のインスタンスを生成できるようになりました。

But what if we're returned one instead? It will be of type TestDU, so how do we know what
case it actually is? And how do we get to the parameters? Well, as we know it's going to be
one of the subtypes, we could use the C# ‘is' operator to determine its identity, but the
base class also provides helper methods to do it for us, prefixed with Is, e.g.:

しかし代わりに返しているものはなんなのでしょうか? それは TstDU 型になるでしょうから、
ろうことはわかっていてC# の 'is' 演算子をその型の identity を決定するのに使えます。し
かしそのベースクラスもまた Is というプリフィクスの付いたヘルパーメソッドを提供していま

   if (two.IsTwo)
       var x = (two as Module1.TestDU.Two).Item;
       // Do something with x

As you can see, Item contains the case value. For tupled types this becomes Item1, Item2 etc.

見てのとおり、Item は case value を含んでいます(大小文字の違いがある?)。
tupled types に対しては、これは Item1、Item2、のようになります。


Records are immutable types that contain fields, they tend to be used quite a lot in F# code,
and luckily using them from C# is very straightforward. The only thing to remember is that
they're only constructor is the one taking all of the field values (because they're immutable),

レコードはフィールドを持った immutable な型で、F# のコードでは多用される傾向にあります。
また、幸運なことにそれをC#から使うのも very straightforward です。ただ一つ覚えておかな
ければならないのは、F#のレコードがすべてのフィールド値を one taking する単なる 
constructor であるということです

   type TestRecord =
       field1 : int
       field2 : string

Can be instantiated like this in C#:

   var r = new Module1.TestRecord(1, "foo");


It's usual in F# to use curried functions. That is, functions that can be applied one
argument at a time, returning a partially applied function. It turns out that the compiler
maps this type of function (when directly exposed, as a public function on a module, say)
into its tupled equivalent, which makes them easily callable directly from C#. For instance:

F# ではカリー化された関数を使うことはよくあります。カリー化された関数とは、一度には一
つの引数だけを apply できて部分的に apply された関数を返す関数のことです。C# から直接
呼びやすくできるように、F# コンパイラーはこの種の関数
(when directly exposed, as a public function on a module, say)
をそれと等価な tupled されたものに map します。例を挙げましょう。

let public adder a b = a + b

Has a native F# signature of int -> int -> int, but results in a generated CIL function
taking a tuple of two ints, with signature more like int * int -> int.

この関数は native な F# シグネチャーとして int -> int -> int を持ちますが、
結果として生成されたCIL 関数は二つの int の tuple を引数にとり、
int * int -> int のようなシグネチャーを持ちます。

But there are more complicated cases; F# functions can take functions and return them. When
we're dealing with functions in a first-class way like this, we can use the generic
Microsoft.FSharp.Core.FSharpFunc type. For example:

しかしもっと複雑なケースがあります。F# の関数はその引数や戻り値として関数を使うことが
できるのですが、このように firs-clcass way で関数を取り扱おうという場合に、generic type の
Microsoft.FSharp.Core.FSharpFunc を使えます。

   let public adder a b = a + b
   let public addOne = adder 1

Where addOne is a function that returns a partially applied function. Calling it from 
F# is a breeze:

ここで addOne は部分適用された関数を返す関数で、F#からこれを呼び出すのは容易 (a breeze) です:

   let x = addOne 100

But from C# it's a bit trickier. The function returns an FSharpFunc, that we then need 
to apply with Invoke:

しかしこれをC#から呼び出そうとした場合には bit tricker です。この関数は FSharpFunc を
返すのですが、そのあと apply するには Invoke を使う必要があるのです。

   Microsoft.FSharp.Core.FSharpFunc<int,int> f = Module1.addOne;
   var x = f.Invoke(100);

Although obviously you can combine this into a single line, and remove some of the type
signatures (which I tend to do as much as possible using var when writing C# code):

(which I tend to do as much as possible using var when writing C# code):

   var x = Module1.addOne.Invoke(100);

Things get even nastier if you happen to expose an F# function that takes a function. For
example, a function that takes another function and applies it to two arguments:

Things get even nastier if you happen to expose an F# function that takes a function.

   let apply op a b = op a b

In F# interactive we can see that this has the following, nicely generalised type signature:

対話環境のF# でこれが次のような nicely に generalised されたシグネチャーを持っているのを確認できます:

   val apply : ('a -> 'b -> 'c) -> 'a -> 'b -> 'c

i.e. it's a function that takes as its first argument a function that takes an ‘a and a ‘b
and returns a ‘c, it also accepts the ‘a and ‘b to pass to this function, and returns
the ‘c result. Clear as. We may well want to create a version of this that uses the integer
operator +, which is trivial from F#:

   let add a b = apply (+) a b

So how does it look if we try and do that from C#? In a word: messy. We have to use the
Microsoft.FSharp.Core.FuncConvert.ToFSharpFunc helper functions, along with the .NET
framework Converter as a way of specifying a generic, value-returning function.

So how does it look if we try and do that from C#?
一言で言えば “messy”です。
.NET framework Converter にあるヘルパー関数群
Microsoft.FSharp.Core.FuncConvert.ToFSharpFunc を使う必要があります。

   using Microsoft.FSharp.Core;
   var op = FuncConvert.ToFSharpFunc(new Converter<int, FSharpFunc<int, int>>((aa) =>
           return FuncConvert.ToFSharpFunc(new Converter<int, int>((bb) =>
               // The 'meat' of the function. Hmmm...
               return aa + bb;
   var zz = Module1.apply(op, 1, 2);

Whoa. That's enough for now. As you can see, the various different ways of interoperating
between F# and C# range from the neat to the nasty. The trick is definitely to pick your
battles. Think carefully about what you need to expose and remember it's probably best not
to cross the beams if you can avoid it.

今はこれで十分です。見てわかるとおり、F# と C# との間でやり取りをする方法には、
neat なものから nasty なものまでさまざまなものがあります。
The trick is definitely to pick your battles.
expose する必要のあるものについて注意深く考え、
and remember it's probably best not to cross the beams if you can avoid it.

Random Numbers & Random Frames – Flash Basics



Python Worst Practiceが酷すぎる | TRIVIAL TECHNOLOGIES on CLOUD Python Worst Practices | Python Worst Practices Online | Download Python Worst Practices « Programming « Study Aids « Examville

www.examville.comPython Worst Practices

View More Documents at Examville.com


Erlang くるかなあ Introduction to programming in Erlang, Part 1: The basics



今週のモーニングにはカレチ掲載。 ベタでもこういう話は好きなのよ。

ぽてまよ(5) (アクションコミックス)

■_ 今日の丸投げ

ってまあ、質問者に書かせるのは無理があるにしても、 Aho-Corasick あたりを誰も持ち出してこないのなんだかなあという感じが。 grep -F (fgrep) が多分使ってるけど。

大量データから抽出する効率よいperlプログラムは(1/2) | OKWave

大量データ Aファイル 3列 可変値(数値、URL、数値)タブ区切り 重複値あり
123 http://www.XX.co.jp/XX 4567
1111 http://www.XX.co.jp/XX 3333
3 http://www.XX.co.jp/YZ 4567
1111 http://www.YYY… 116
抽出対象データ Bファイル 1列(URL)重複なし

BファイルにあるURLで始まるURLがAファイルにある場合 Aファイルのその行を抽出したい。
grepで実施すると すごい時間がかかってしまうため、効率よい抽出方法をおしえてください。




# dummy作成
# フィルタ元リスト作成

# 文字列マッチ

# 正規表現マッチ

# grep -F -f 元リスト.txt dummy.txt






>この方法は Bファイル内のURLと完全一致のものを探すということになりませんでしょうか?



open B,”B.txt” or die $!; #タブ区切りなので拡張子を変更
while(<B>) {
 chomp; #改行を取る
 $b_url{$_} = 1; #ハッシュのキーに入れる。値はテキトー
close B;

@b_url = sort keys %b_url;

open A,”A.txt” or die $!;
open C,”>X.txt” or die $!;
while(<A>) {
 (undef,$url)=split /¥t/; #2個目の値にしか用はない

 for $b_url(@b_url) {
   if (index($url,$b_url)){
     print C;
close A; #ファイルハンドルが間違ってた
close C;


Larry Wallによると、組み込みのgrep関数よりもPerlは速いっていうことなんですけど、どうなんでしょうね。

投稿日時 - 2011-05-12 10:17:07

これで どのくらいの速度がでるか試してみたいと思います。


sub create_index {
  my $sumpling_interval = shift;
  my @splited_lines   = @_;
  my @index       = ();

  for ( my $i = 0; ( $i * $sumpling_interval ) <= $#splited_lines; $i++ ) {
    my $pos = $i * $sumpling_interval;
    push @index, { pos => $pos, url => $splited_lines[$pos]->{url} };

  # indexの尻に番兵を置く
  push @index, { pos => $#splited_lines, url => $splited_lines[$#splited_lines]->{url} };
  return @index;


use strict;
use warnings;
use Data::Dumper;

my $words_file = shift || '/usr/share/dict/words';

my @lines = create_dummy_data( $words_file,
  [ 'google.co.jp', 'yahoo.com', 'bing.jp' ] );
print $#lines, $/;

my @splited_lines = split_lines(@lines);
@splited_lines = sort { $a->{url} cmp $b->{url} } @splited_lines;
my @index    = create_index( 1000, @splited_lines );
my @target_urls = qw(http://google.co.jp/picture http://bing.jp/illust);
my @finded   = find_lines( \@splited_lines, \@index, \@target_urls );

print Dumper($_), $/ for @finded;

sub find_lines {
  my $splited_line_ref = shift;
  my $index_ref    = shift;
  my $target_url_ref  = shift;
  my @finded      = ();

  for my $target_url ( @{$target_url_ref} ) {
    my $previous_pos = -1;
    for my $index ( @{$index_ref} ) {
      if ( ( $target_url cmp $index->{url} ) <= 0 ) {
        if (  ( $previous_pos == -1 )
          && ( $target_url cmp $index->{url} ) != 0 )

          # Not found
          next URL_LIST;

        # search first match pos
        my $pos = $previous_pos;
        while ( $splited_line_ref->[$pos]->{url} !~ m/^$target_url/ )
          if ( $pos > $index->{pos} ) {

            # Not found
            next URL_LIST;

        # founded. push data
        while ( $splited_line_ref->[$pos]->{url} =~ m/^$target_url/ )
          push @finded, $splited_line_ref->[$pos];
      $previous_pos = $index->{pos};
  return @finded;

sub create_index {
  my $sumpling_interval = shift;
  my @splited_lines   = @_;
  my @index       = ();

  for ( my $i = 0; ( $i * $sumpling_interval ) <= $#splited_lines; $i++ ) {
    my $pos = $i * $sumpling_interval;
    push @index, { pos => $pos, url => $splited_lines[$pos]->{url} };
  return @index;

sub split_lines {
  my @lines     = @_;
  my @splited_lines = ();

  for my $line (@lines) {
    if ( $line =~ m/(\d+)\s(.+)\s(\d+)/ ) {
      push @splited_lines, { num1 => $1, url => $2, num2 => $3 };
  return @splited_lines;

sub create_dummy_data {
  my $file     = shift;
  my $base_url_ref = shift;
  my @lines    = ();

  open my $fh, '<', $file or die "$!:$file";
  while ( my $word = <$fh> ) {
    $word =~ s/\x0D?\x0A?$//;
    for my $base_url ( @{$base_url_ref} ) {
      my $url = 'http://' . $base_url . '/' . $word;
      my $line = '1234' . "\t" . $url . "\t" . '56789';
      push @lines, $line;
  close $fh or die "$!:$file";
  return @lines;

■_ The difference between Java and Dalvik

Dalvik ってレジスタータイプ(という分類なのか知らんけど)のVMだったのか。

Daneel: The difference between Java and Dalvik | antforge.org

Those of you who kept following IcedRobot might have seen that quite some work went 
into Daneel over the past months. He1 is in charge of parsing Android applications 
containing code intended to run on a Dalvik VM and transforming this code into 
something which can run on any underlying Java VM. So he is a VM compatible with 
Dalvik on top of a Java VM, or at least that's what he wants to become.

So Daneel is multilingual in a strange way, he can read and understand Dalvik bytecode, 
but he only speaks and writes Java bytecode. To understand how he can do that we have 
to look at the differences between those two dialects.

Registers vs. Stack: We know Dalvik bytecode uses a register-machine, and Java 
bytecode uses a stack-machine. But each method frame on that stack-machine not only 
has an operand stack, it also has an array of local variables. Unfortunately this 
distinction is lost in our register-machine. To understand what this means, let us 
look at a full Java-Dalvik-Daneel round-trip for a simple method like the following.

わたしたちは Dalvik のバイトコードがレジスターマシンを使っていることと、Javaのバイト
full Java-Dalvik-Daneel round-trip を見ることにしましょう。

public static int addConst(int val) {
   return val + 123456;

The first stop on our round-trip is the Java bytecode. So after we push this snippet 
through javac we get the following code which makes use of both, an operand stack and 
local variables.

ラウンドトリップの first stop はJavaのバイトコードです。

public static int addConst(int);
  [max_stack=2, max_locals=1, args_size=1]
   0: iload_0
   1: ldc #int 123456
   3: iadd
   4: ireturn

The second stop takes us to the Dalvik bytecode. We push the above code through the dx 
tool and are left with the following code. Note that the distinction between the 
operand stack and local variables is lost completely, everything is stored in registers.

次に、Dalvik のバイトコードを取り上げます。
dx ツールを使って先のコードから以下のコードを得ます。

public static int addConst(int);
  [regs=2, ins=1, outs=0]
   0: const v0, #0x1E240
   1: add-int/2addr v0, v1
   2: return v0

The third and last step is Daneel reading the Dalvik bytecode and trying to reproduce 
sane Java bytecode again. The following is what he spits out after chewing on the 
input for a bit.

第三段階および最終段階で Daneel は Dalvik のバイトコードを読んで
同じことをする Javaのバイトコードの再生成を試みます。

public static int addConst(int);
  [max_stack=2, max_locals=2, args_size=1]
   0: ldc #int 123456
   1: istore_1
   2: iload_1
   3: iload_0
   4: iadd
   5: istore_1
   6: iload_1
   7: ireturn





 >What's so special about Ruby on Rails? - AstonJ's Blog
What's so special about Ruby on Rails?
Posted on: April 14th, 2011 by AstonJ View Comments

There are many reasons why Rails is special, but in order to keep things fairly short (and
relevant to those new to programming) I won't delve too deeply into the mechanics. Instead
I'll cover the things that first drew me to the framework… and subsequently got me hooked!
 >Here's Why Ruby On Rails Is Hot
Here's Why Ruby On Rails Is Hot

Many companies have asked the questions “What's the fastest way to develop my web apps?” 
and “Should I use Java, Ruby, Python or something else?”

■_ Java.next()

あとでその2.今回は Scala。

(think) - Java.next() - Scala: The Revenge of the Static Typing
Java.next() - Scala: The Revenge of the Static Typing


This is the second post from my series dedicated to modern programming languages for 
the Java platform. Last time we've discussed the Groovy programming language, which 
is a member of the ever expanding family of dynamic programming languages. The Scala 
programming language, that is the object of today's discussion, is different beast 
entirely - not only it uses static typing(like Java & C# amongst others), but it 
also puts a heavy emphasis on the type system, functional and parallel programming.

In theory Scala runs both on the JVM and on the CLR(the .NET VM). The Java port, 
however, receives a lot more attention by Scala's developers and it probably accounts 
for close of to all of Scala's deployments(especially in production).

This article is extremely hard to write for me. Unlike Groovy, I'm deeply familiar 
with the language and would like to share quite a lot with you. For obvious reasons I 
cannot go into much detail (otherwise I'd have written an on-line book). You're 
encourage to follow up this article by reading some of the excellent resources, 
mentioned near its end.


次は Clojure だそうな。


LLVM Project Blog: What Every C Programmer Should Know About Undefined Behavior #1/3 What Every C Programmer Should Know About Undefined Behavior #1/3 : programming




■_ 一姫二太郎


インターネットラジオステーション<音泉> 22鯖目 [chaika]

463 声の出演:名無しさん [sage] 2011/05/11(水) 15:09:31.38 ID:Kzxm9RGh0 Be:

465 声の出演:名無しさん [sage] 2011/05/11(水) 16:25:25.14 ID:F+RlTwOY0 Be:

466 声の出演:名無しさん [sage] 2011/05/11(水) 16:37:55.72 ID:M9RvMcHw0 Be:

■_ 成り下がり

C言語なら俺に聞け(入門編)Part 83

834 デフォルトの名無しさん [sage] 2011/05/11(水) 20:18:10.99 ID: Be:

    int hika(char str[],char str2[])

     return 0;

     return 1;


835 デフォルトの名無しさん [sage] 2011/05/11(水) 20:20:38.93 ID: Be:
    なんで if(str==str2) じゃだめなんだろ 

837 デフォルトの名無しさん [sage] 2011/05/11(水) 20:22:06.27 ID: Be:

838 デフォルトの名無しさん [sage] 2011/05/11(水) 20:23:27.23 ID: Be:

839 デフォルトの名無しさん [sage] 2011/05/11(水) 20:24:17.53 ID: Be:

840 デフォルトの名無しさん [sage] 2011/05/11(水) 20:24:50.30 ID: Be:

843 デフォルトの名無しさん [sage] 2011/05/11(水) 21:58:03.80 ID: Be:


844 デフォルトの名無しさん [sage] 2011/05/11(水) 22:19:23.71 ID: Be:

845 デフォルトの名無しさん [sage] 2011/05/11(水) 22:23:00.77 ID: Be:
    あのう str1 も str2 も(関数引数由来の)ポインタなんですが… 


■_ 傾向

Google Code Jam で。


languages at google code jam : programming

They split the data, but didn't derive some useful stats. Like so:

Percent of participants who finished with perfect 100 score

C#           9.9%    
Java        10.5%   
C           11.2%
Ruby        13.5%
Python      14.5%
C++         20.4% 

For languages with smaller numbers, they are too small to have any meaningful statistics.

Number of participants and percent of those who passed qualification round

India       1679    83%
US          1315    83%
France      225     87%
Indonesia   146     89%
Poland      314     89%     
Japan       579     90%
Germany     197     91%
China       1720    92%
Russia      698     94%
Ukraine     269     96%

Regional language popularity

Perl        US, Japan
OCaml       France
PHP         US
Javascript  US
Python      US, Canada, Australia, Israel, UK 
Ruby        US, Japan
Haskell     Japan, US
Java        India, US
VB          India
C#          US, India
Pascal      Russia
C           India
C++         China, Russia, Ukraine

Shown only extremes, C++ and Java are popular everywhere.
EDIT: I've removed India from C++ after looking at numbers one more time.

Good job. One more.

For the languages with more than 100 participants, average amount of perfect scores:


Deviation from that average for each of those languages:

Haskell 156%
C++     128%
Pascal  103%
Python   92%
Ruby     86%
C        71%
Java     67%
C#       63%
Perl     61%
PHP      34%

In other news, it's painful but sometimes convenient to think in Haskell, C++ is still 
where it's at, Pascal users are a die-hard bunch, and web programmers don't care much 
for numerical analysis, except perhaps some of the django/ror folk.

You're conclusion mixes cause and effect. It's just as likely that people who can 
understand and program in Haskell are generally smarter than the average programmer. 
Not because smart people choose Haskell but because the barrier to entry is much 

Turns out hes a redditor.




一つ前へ 2011年5月(上旬)
一つ後へ 2011年5月(下旬)



メールの宛先はこちらkbk AT kt DOT rim DOT or DOT jp