ときどきの雑記帖 混迷編

最新ページへのリンク
目次ページへのリンク

一つ前へ 2011年4月(中旬)
一つ後へ 2011年5月(上旬)

ホームへ

2011年04月30日

■_

読んだ。
ユニケージ原論

何週間かぶりに神保町の明倫館書店 (明倫館書店 店舗のご案内) に行った。 欲しいと思った本が何冊かあったんだけどなかなか。 次に行ったときはもうないだろうなあ。

・今月の皇帝陛下
アルプス越え。でもこのあたりのエピソード軽く流す感じだなあ。

保証と保障の使い分けが間違ってるのを見るととても気になる。 まあ自分もあまり自信がなかったりするのだけど。

■_

■_ Perl について質問するには

こんなフローを辿らないといけないらしい → http://i.imgur.com/cooZ8.png

Damian Conway のお話。Perl 5 と Perl 6 の違いを説明するのに TOSのエンタープライズとTNGのエンタープライズを出したりして面白い。 Perl 6 Update

■_

How do you design programs in Haskell or other functional programming languages? - Programmers - Stack Exchange

I have some experience in object oriented programming languages like c# or ruby. I know
how to design a program in object oriented style, how to create classes and objects, and
how to define relations between them. I also know some design patterns.

How do people write functional programs? How do they start? Are there design patterns for
functional languages? Are methodologies like extreme programming or agile development
applicable for functional languages?

There are some people stating, that the OO-design-patterns are in fact solutions to 
problems in the OO-Paradigm which the functional programming paradigm doesnt show.

I would say: the way to design functional programs is a very natural way to think 
about a problem: Functional decomposition. You first think about the 
input-output-Parameters to the problem and you have (stated in Haskell):

problem :: Input -> Output

... and then you decompose this into "smaller" functions which can be glued 
together by the excellent glue, functional programming language provide (like 
higher-order functions, (.), currying, ...).

When designing a program in Haskell, it is usually good to start by thinking about the 
types. In particular, pay attention to what side-effects you want your program to 
perform and how you can separate the pure pieces from the side effecting pieces. This 
is also a good time to think about what constraints you want on your operations, and 
how to use the type system to enforce these constraints.

Once you have nailed down the types, it is usually just a matter of filling in the 
right code to make it compile.

When it comes to extreme programming/agile development, I think pure functional 
programming is very well suited to techniques like test driven development. Testing 
pure code is simply a matter of checking that you get the expected output given some 
input, while testing imperative code requires you to set up the state of the world 
before calling your code, and then carefully inspecting the world afterwards to make 
sure it changed in the way you expected.

After doing professional software development in OO languages for a number of years I came to believe that the mother of all design patterns is Don't Repeat Yourself (DRY). As code changes over time, programs with lots of repeated code tend be more susceptible to bugs because inevitably the developer will change something and forget about another place where that same pattern occurs and needs to be changed. This pattern stayed the same when I started programming in Haskell professionally. I break my programs up into small functions in ways that reduce repetition and seem to naturally fit the domain. Often I get it wrong. Just yesterday I discovered one function that sticks out in my mind that was doing two things that I discovered would be better split into separate functions. If I had not split them, I would have had to create a new function that duplicated some of the existing functionality. Functional programming languages (most obviously because of first-class functions) give you more tools to do this easily than procedural/OO languages.

Minimizing the scope of things a function can affect is another design principle that comes to mind. In a pure language like Haskell one way of doing this is to separate pure and impure code. But in my mind this is still just a specific instance of the DRY principle. Making a function pure instead of impure (say a part of a larger application monad) means that to test it you won't have to initialize as much of the environment--one less thing you have to repeat.
link|edit|flag
	
answered 23 hours ago
mightybyte
	
	
up vote 6 down vote
	

Well, there is How to Design Programs, which is a full course on how to design programs with Scheme, in itself a functional programming language. It is a good text.
link|edit|flag
	
answered yesterday
Arthur van Leeuwen
	
	
up vote 6 down vote
	

I write my answer mainly with Haskell in mind, though many concepts apply equally well to other functional languages like Erlang, Lisp(s), and ML. Some even apply (to some extent) to Ruby, Python, Perl and Javascript.
How do people write functional programs? How do they start?

By writing functions. When you are doing functional programming, you either are writing main, or you are writing a helper function. Sometimes your main goal might be to write a data type with various relevant functions that operate on it.

Functional programming is very well suited to both top-down and bottom-up approaches. Haskell strongly encourages writing your programs in high-level language, and then simply defining the details of your high-level design. See minimum, for example:

minimum    :: (Ord a) => [a] -> a
minimum xs =  foldl1 min xs

The function to find the smallest element in a list is written simply as a traversal over the list, using the min function to compare each element with the "accumulator", or current minimum value.
Are there design patterns for functional languages?

There are two things that could be equated to "design patterns", imho, higher-order functions and monads. Let's talk about the former. Higher-order functions are functions that either take other functions as input, or produce functions as output. Any functional language generally makes heavy use of map, filter, and fold (fold is often also called "reduce"): three very basic higher-order functions that apply a function to a list in different ways. These replace boilerplate for loops in a beautiful way. Passing functions around as parameters is an extremely powerful boon to programming; lots of "design patterns" can be accomplished simpler by using higher-order functions, being able to create your own, and being able to leverage the powerful standard library, which is full of useful functions.

Monads are the "scarier" topic. But they're not really that scary. My favorite way to think of monads is to think of them as enveloping a function in a bubble and giving that function superpowers (that only work inside the bubble). I could elaborate, but the world doesn't really need yet another monad analogy. So I'll move to quick examples. Suppose I want to use a nondeterministic "design pattern". I want to run the same computation for various different inputs at the same time. I don't want to choose just one input, I want to choose them all. That would be the list monad:

allPlus2 :: [Int] -> [Int]
allPlus2 xs = do x <- xs
                 return (x + 2)

Now, the idiomatic way to perform this is actually map, but for illustration's sake, can you see how the list monad allowed me to write a function that looks like it operates on one value, but endowed it with the superpower to work on every element in a list? Other superpowers include failure, state, interacting with the "outside world", and parallel execution. These superpowers are very potent, and most programming languages allow functions with superpowers to rampage all around. Most people say Haskell doesn't allow these superpowers at all, but really, Haskell just contains them in monads so their effect can be limited and observed.

tl;dr Grokking higher-order functions and monads is the Haskell equivalent to grokking design patterns. Once you learn these Haskell concepts, you start thinking "design patterns" are mostly cheap workarounds to simulate the power of Haskell.
Are methodologies like extreme programming or agile development applicable for functional languages?

I don't see anything that ties these management strategies down to any one programming paradigm. As phynfo stated, functional programming practically forces you to do function decomposition, breaking a large problem into subproblems, so mini-milestones should be a piece of cake. There are tools like QuickCheck and Zeno to test or even prove properties about the functions you write.
site design / logo © 2011 stack exchange inc; user contributions licensed under cc-wiki with attribution required

■_ yacc/lex

「学校」って大学かなあ。「受講」って言ってるし。

現在、学校でコンパイルの授業を受講しております。yaccとlexの違いについて知りた... - Yahoo!知恵袋

現在、学校でコンパイルの授業を受講しております。yaccとlexの違いについて知りたいです。

%{
#include <math.h>
#define YYSTYPE double
%}

%start line
%token NUM UMINUS

%left '+' '-'
%left '*' '/'
%right '^'
%right UMINUS

%%
line :
| line expr '\n' { printf("%f\n", $2);}
| line error '\n' {yyerrok;}
;

expr : expr '+' expr {$$ = $1 + $3;}
|expr '-' expr {$$ = $1 - $3;}
|expr '*' expr {$$ = $1 * $3;}
|expr '/' expr {$$ = $1 / $3;}
|expr '^' expr {$$ = pow($1,$3);}
|'-' expr %prec UMINUS {$$ = -$2;}
|'(' expr ')' { $$ = $2;}
|NUM
;

%%

yylex()
{
    int c;

    while((c = getchar()) == ' ');
    if(((c >= '0') && (c <= '9')) || c == '.'){
        ungetc(c, stdin);
        scanf("%lf", &yylval);
        return NUM; }
    else
        return c;
}

学校でこのようなプログラムを配布されました。cygwinを用いているのですが、いくらやってもエラーがでます。
コンパイル方法として
yaccは、
bison –dv –y
lexは
flex –l
を用いてコンパイルすればいいと聞いたのですが、エラーが出てしまいます。
yaccとlex部分でテキストファイルを分けなければならないのでしょうか?
もしその場合、yaccとlexのプログラムの違いを教えて頂ければ幸いです。


ベストアンサーに選ばれた回答

http://ja.wikipedia.org/wiki/Yacc
http://ja.wikipedia.org/wiki/Lex
Yaccは構文解析、Lexは字句解析を行うので用途が違うよ。


ありがとうございました 解決しました

本当にこの回答で解決したんだろうか。 質問で貼ってあるコードって lex (flex) いらないじゃん。

あれ。main() もないね。別ファイルかな。

■_ compile to JavaScript

Hacker News | I've been working on Ralph for a while now: https://github.com/turbolent/ralphIt...

I've been working on Ralph for a while now: https://github.com/turbolent/ralph

It compiles a major subset of Apple's Dylan 
(http://lispm.dyndns.org/documentation/prefix-dylan/book.anno...) to JavaScript, both 
for use on a CommonJS implementation and in the browser. A bootstrapping compiler is 
implemented in JS, but the same compiler is also available in Ralph itself and 
features define-macro (Cl-like). The whole runtime is defined in Ralph as well and 
provides a single-inheritance object system (including next-method):

https://github.com/turbolent/ralph/blob/master/src/runtime/core.ralph

Almost all of the features are shown in
https://github.com/turbolent/ralph/blob/master/src/tests/runtime-tests.ralph

and I'm using it a project now. To build HTML5 apps, there's a small toolbox:
https://github.com/turbolent/toolbox

Maybe it's useful to someone else. Cheers

Ralph ってのはなににひっかけた名前なんだろうか。

しかし Hacker News って追いかけにくいのよね。 そう度々見に行くわけにもいかないし、 と思ったら rss あるにはあるのか http://news.ycombinator.com/rss stackoverflow と同じで、見に行く頻度が低いとこれ自体を取り逃がすことがありそうだなあ。

■_

Tech Notes: Complexity is the enemy
Complexity is the enemy

I'm almost through my seventh year working at Google(!). I have learned many things 
there, more than I could ever write down. I thought I would at least share with you 
something that's only come to me with more experience.

Complexity is the death of software. It's hard to quantify the cost of, and it tends 
to creep in slowly, so it's a slow boil of getting worse that's hard to see until it's 
too late. On the other side, frequently it's easy to see a benefit of increasing 
complexity: a new layer of indirection allows new feature X, or splitting a process 
that ran on one machine into two allows you to surmount your current scaling hurdle. 
But now you must keep another layer of indirection in your head, or implement an RPC 
layer and manage two machines.

The above is hopefully just as obvious to a new programmer as it is to a veteran. What 
I think I've learned through my few years in the industry is a better understanding of 
how the balance works out; when complexity is warranted and when it should be rejected. 
I frequently think back to a friend's comment on the Go compiler written by Ken 
Thompson: it's fast because it just doesn't do much, the code is very straightforward.

It turns out that, much like it's easier to write a long blog post than it is to make 
the same point succinctly, it's difficult to write software that is straightforward. 
This is easiest to see in programming langauge design; new languages by novices tend 
to have lots of features, while few have the crisp clarity of C. In today's programs 
it's frequently related to how many objects are involved; in distributed systems it's 
about how many moving parts there are.

Another word for this problem is cleverness: to quote another one of the C hackers, 
"Debugging is twice as hard as writing the code in the first place. Therefore, if 
you write the code as cleverly as possible, you are, by definition, not smart enough 
to debug it."

What helps? I wonder if it maybe just comes down to experience -- getting bitten by 
one too many projects where someone thought metaprogramming was cool. But I've found 
having specific design goals to evaluate new code by can help. It's easier to reject 
new code if you can say "this does not help solve the initial goals of the 
project". Within Google the template document for describing the design of a new 
project has a section right at the top to list non-goals: reasonable extensions of the 
project that you intend to reject.

Ironically, I've found that using weaker tools can help with complexity. It's hard to 
write a complicated C program because it can't do very much. C programs tend to use 
lots of arrays because that's all you get, but it turns out that arrays are great -- 
compact memory representation, O(1) access, good data locality. I'd never advocate 
intentionally using a weak tool, though. Instead, my lesson has been: write Python 
code like it was C.

■_ Why Haskell?

なぜHaskellを学ぶのか。 目新しい話題ではありませんが。

Mired in code: Why Haskell?

Why Haskell?

This should not be considered an expert overview of the language. It's a deep language, 
and I'm still learning it. This is a discussion of what I've seen so far to explain 
why I chose it.

As I mentioned, I'm not learning Haskell because I expect it to be a marketable skill; 
I'm learning Haskell because I expect it to improve my programming skills in general. 
That should happen because it's radically different from other languages I know, so 
it's idioms should be new to me, and will thus provide more options when working in 
other languages. At least one person regrets this - because they wind up thinking 
about how much simpler things would be in Haskell.

まず述べておきたいのは、わたしは Haskell が marketable skill であるから学んでいるの
ではなく、自分のプログラミング技術一般を向上させることを期待しているから学んで
いるのです。Haskell はわたしが知っている他の言語とは radically に異なっているので
プログラミング技術が向上するでしょうし、知っておくべきイディオムはわたしにとって
目新しいものでしょう。そして、その結果として、他の言語で仕事をするときにより多くの
選択肢をもたらすことでしょう。
#ちとあやしい
少なくとも一人がそれを残念に思っています。なぜなら、


As a final bonus, there appears to be more higher mathematics in use when writing 
Haskell than in most other languages. So I'm hoping to get a little more use from my 
mathematics degree than I've gotten so far.

Things I look for in a language

Being well-designed

略
A note about homoiconicity

Haskell isn't homoiconic. That means it can't have real (i.e. LISP) macros. Which 
means LISP is more powerful. Right?

Haskell は homoiconic ではありません。これは Haskell が (Lispのように)
本物のマクロを持てないということを意味します。
これは Lisp のほうがより強力であるということです。ですよね?

Maybe. LISP macros are generally used for one of three things:
おそらくは。
Lisp のマクロは一般に以下にあげる三つのいずれかのために使われます:

   1. Controlling evaluation to create new constructs. That's built into Haskell.
      新しい constructs を作るための評価を制御する。
      この機能は Haskell に組み込まれています。

   2. Creating domain specific languages. The ability to make operators functions and 
      vice versa pretty much covers this.

      ドメイン特化言語を作る。関数を演算子にしたりその逆をする能力がこれを
      (pretty mach に) カバーします。

   3. Automated code creation. Um...
      自動化されたコード生成。

Ok, Haskell doesn't have that one. And that is one of the more powerful uses of macros. 
There is an extension to Haskell (Template Haskell) that covers this. It's not portable.
It's poorly documented. It's works an ASTs instead of source code. But it is there.

よろしい。Haskell にはないものが一つあります。
それはマクロのより強力な

I'll be blogging about my experiments with Haskell as well, and possibly revisiting 
some of the things I did in Clojure. Stay tuned...
Copyright Mike W. Meyer. Distributed under the Attribution-ShareAlike License. Simple template. Powered by Blogger.

■_

実に中途半端なところで活動限界。

意味もなくぺたり。 Amazon.co.jp: 木村浩一: Wishlist

2011年04月29日

■_

Hasekll 本を買ったの何冊目だ
5月半ばのお届けになってたはずなのになぜ。 まあ早くなる分にはいいんですが。

■_ locale

某所でも locale に絡んでいろいろあったようですが。

Case insensitivity seems to ignore lower bound of interval

From: 	Eric Bischoff
Subject: 	Case insensitivity seems to ignore lower bound of interval
Date: 	Tue, 26 Apr 2011 17:27:49 +0200

Hi all,

$ echo "ijklmnopqrstuvwxyz" | awk '{ gsub(/[R-Z}/, "X"); print }
ijklmnopqrXXXXXXXX

please notice that "r" is not matched, i.e. case insensitivity is applied only 
to [S-Z] interval.

$ awk --version
GNU Awk 3.1.7
(...)

$ echo $LANG
fr_FR.UTF-8

The problem does not appear when locale is C.

The problem does not appear when interval is specified as [r-z] (lower case)..

This contradicts http://www.gnu.org/software/gawk/manual/gawk.html#Locales
which documents 
     $ echo something1234abc | gawk '{ sub("[A-Z]*$", ""); print }'
as returning
     something1234
while it returns
     something1234a

Bug reproduced both on Ubuntu Natty beta 2 and on Fedora 15.


I hope that helps,

とまあ、[R-Z] という範囲指定なのに、なんで小文字の s から z までが ひっかかってるんじゃ。という事情を知っている人にとってはまたかという話題。

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Davide Brini
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Tue, 26 Apr 2011 18:49:27 +0100

(引用部略)

This is not a bug but expected behavior (not that I agree, but that's the way it is).

The executive summary is that many non-C locales have different collation
orders (mostly dictionary order, regardless of case). In those locales,
an expression like [R-Z] may match (at least) "RsStTuUvVwWxXyYzZ", plus
perhaps any other character that sorts between them (note that the above
does not include "r"). Similarly for other range expressions.

To work around, either use LC_ALL=C to get plain ASCII orgering, or use
[[:upper:]] or [[:lower:]] etc. as appropriate, or if using partial ranges,
make it explicit eg [RSTUVWXYZ].

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Aharon Robbins
Date: 	Wed, 27 Apr 2011 21:48:41 +0300

Greetings. Re the below.

First, thank you for the bug report.

Second, it's not a bug, but rather the consequence of how locales behave.
This is documented somewhat in the released gawk manual and documented better
in the upcoming one.

I do agree that the behavior is suprising, disconcerting, undesirable,
and so on.  For this reason, the upcoming version of gawk translates
ranges of the form [d-h] into '[defgh]' before compiling the regular
expression.

You can check out the development version from the git repository
on savannah.gnu.org, if you like, to try it.

Thanks,

Arnold

Re: Case insensitivity seems to ignore lower bound of interval
From: 	John Cowan
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Wed, 27 Apr 2011 16:40:21 -0400

Aharon Robbins scripsit:

> I do agree that the behavior is suprising, disconcerting, undesirable,
> and so on.  For this reason, the upcoming version of gawk translates
> ranges of the form [d-h] into '[defgh]' before compiling the regular
> expression.

Alas, that means that in a locale where e-acute sorts after e, the regex
[d-h] will not match it.  You can't have everything at once, but it
would be good to have a switch to turn this behavior on and off.

Re: Case insensitivity seems to ignore lower bound of interval

From: 	Eric Blake
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Wed, 27 Apr 2011 14:55:49 -0600

(引用部略)


POSIX already states that the regex [d-h] is unspecified in all but the
C locale, because there is no one-size-fits-all intepretation of what it
_should_ represent.  If you want e-acute in the set, it is always better
to ask for it explicitly.  Meanwhile, I welcome this change, as it is
easier to document that the expansion always mirrors the C locale rather
than the expansion depends on the collation order of the current locale.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Re: Case insensitivity seems to ignore lower bound of interval

From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 06:06:37 +0200

(引用部略)

Hi Aharon,

First, thank you for your kind answer.

I am really sorry, but I think you did not read my bug report accurately. It 
looks like you gave the answer to another question than the one I asked.

What you are answering here is a Frequently Asked Question about un undesired 
behaviour of the locales, but this is not what I am speaking about.

What I am speaking about is not documented. On the contrary, I showed you 
where the behaviour contradicts the documentation.

Let me restate my bug report, perhaps it was not clear enough.


1) Contradiction with the documentation :

http://www.gnu.org/software/gawk/manual/gawk.html#Locales says that

     $ echo something1234abc | gawk '{ sub("[A-Z]*$", ""); print }'

returns

      something1234

while it returns

     something1234a

Please notice the *a* at the end. That's what this bug report is about. This 
"a" has no reason to appear.


2) Another way to reproduce :

$ echo "ijklmnopqrstuvwxyz" | awk '{ gsub(/[R-Z}/, "X"); print }

ijklmnopqrXXXXXXXX

Please notice the "r".

Why does the "r" not be changed into "X", while "s", "t", "u", "v", "w", "x", 
"y", "z" do?


I think this is a real bug.

The fact that the locales are taken into account in the intervals is indeed 
questionable, but is another topic.

Sorry to be insisting. I still hope that helps.

-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com

まあ locale の問題だからとだけ言われても納得いきませんわなw

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Paul Jarc
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 00:17:07 -0400


That example behaves as described in the documentation for some
locales, but not in others (such as yours, apparently).  That's the
whole point of that section of the documentation--different locales
have different behavior for character ranges.

Note that case-insensitivity is not an intended feature at all.  It's
just an accidental result of the character collation of some locales.
Some locales arrange characters in the order aAbBcC...zZ, so a range
like [A-Z] includes all upper- and lowercase letters except lowercase
a.  Other locales may arrange them as AaBbCc...Zz, so [A-Z] excludes
lowercase z instead.  But the usual expectation, and the actual
behavior in the C locale, is that [A-Z] includes only uppercase
letters, and [a-z] includes only lowercase letters.


paul

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 07:32:18 +0200

(引用部略)

Oh, ok, now I understand, then the problem is that
         [R-Z]
evaluates either as
      RrSsTtUuVvWwXxYyZz
or as
      rRsStTuUvVwWxXyYzZ
 

(引用部略)

OK. Thanks for explanation, now I get it.

But there is still something deeply wrong here.

My understanding of the word "collation" is :
   "A", "a",  and "à" are "equivalent"with respect to alphabetical order
It is not :
   "A", "a" and "à" are "next to each other" in alphabetical order.

Current situation is :
         a < à < A < b < B < c < C ...
while it should be :
       a = à = A   <   b = B   <   c = C ...


My two cents.

-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com

Re: Case insensitivity seems to ignore lower bound of interval
From: 	John Cowan
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 01:59:02 -0400

Eric Bischoff scripsit:

> My understanding of the word "collation" is :
>    "A", "a",  and "à" are "equivalent"with respect to alphabetical order
> It is not :
>    "A", "a" and "à" are "next to each other" in alphabetical order.

Proper collation algorithms do work this way: differences in letters
are most important, but if they are equal, accents are looked at, and
if *they* are equal, case is looked at.  But that's not the way
strcmp() works in C, unfortunately; the best approximation available
to case-blindness is to sort b either before or after B.

Re: Case insensitivity seems to ignore lower bound of interval

From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 09:38:55 +0200

Le jeudi 28 avril 2011 07:59:02, John Cowan a écrit :
(略)
> Proper collation algorithms do work this way: differences in letters
> are most important, but if they are equal, accents are looked at, and
> if *they* are equal, case is looked at.

Yes, that's more or less what I described as the correct algorithm.

But you are right, there are three stages of comparaison, not two.

> But that's not the way
> strcmp() works in C, unfortunately; the best approximation available
> to case-blindness is to sort b either before or after B.

Then it's using strcmp() that is plain wrong :-(.


-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com

Re: Case insensitivity seems to ignore lower bound of interval
From: 	arnold
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 08:28:01 GMT

Hi.

> > But that's not the way
> > strcmp() works in C, unfortunately; the best approximation available
> > to case-blindness is to sort b either before or after B.
>
> Then it's using strcmp() that is plain wrong :-(.

Gawk does not use strcmp() for regex matching. (You may not have been
saying that it did, I admit.)

The issue is indeed as described in the previous mails, and the
development version of the gawk doc explains these issues considerably
better.

I recommend checking out a copy from the git repo on savannah.gnu.org
and reviewing the doc; I will welcome feedback on it!

Thanks,

Arnold

Re: Case insensitivity seems to ignore lower bound of interval
Re: Case insensitivity seems to ignore lower bound of interval
From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 12:21:37 +0200

Le jeudi 28 avril 2011 12:00:33, vous avez écrit :
> You seem to think this is gawk-specific, but in fact any locale-aware tool
> that uses regular expressions behaves the same (try eg with sed or grep).

Not here:

$ echo 'ijklmnopqrstuvwxyz'| sed 's/[r-z]/X/g'
ijklmnopqXXXXXXXXX
$ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
ijklmnopqrstuvwxyz

$ echo 'ijklmnopqrstuvwxyz'| awk '{gsub("[r-z]", "X"); print}'
ijklmnopqXXXXXXXXX
$ echo 'ijklmnopqrstuvwxyz'| awk '{gsub("[R-Z]", "X"); print}'
ijklmnopqrXXXXXXXX

$ echo 'ijklmnopqr'| grep "[r-z]"
ijklmnopqr
$ echo 'ijklmnopqr'| grep "[R-Z]"

$ awk --version | head -n 1
GNU Awk 3.1.7
$ sed --version | head -n 1
GNU sed version 4.2.1
$ grep --version | head -n 1
GNU grep 2.6.3

Those results were predictable for sed and grep, not for awk. Furthermore, 
they are inconsistent between awk and sed.

-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com
Re: Case insensitivity seems to ignore lower bound of interval
From: 	Davide Brini
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 11:43:52 +0100

On Thu, 28 Apr 2011 12:21:37 +0200
Eric Bischoff <address@hidden> wrote:

> Le jeudi 28 avril 2011 12:00:33, vous avez écrit :
> > You seem to think this is gawk-specific, but in fact any locale-aware
> > tool that uses regular expressions behaves the same (try eg with sed or
> > grep).
> 
> Not here:
> 
> $ echo 'ijklmnopqrstuvwxyz'| sed 's/[r-z]/X/g'
> ijklmnopqXXXXXXXXX
> $ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
> ijklmnopqrstuvwxyz

This is strange, since with GNU sed 4.2.1 I get

$ echo 'ijklmnopqrstuvwxyz'| sed 's/[R-Z]/X/g'
ijklmnopqrXXXXXXXX

And this shows the converse:

$ echo 'IJKLMNOPQRSTUVWXYZ' | sed 's/[r-z]/_/g'
IJKLMNOPQ________Z

("[r-z]" does not include "Z", because of ... vVwWxXyYzZ)

I get the above results with all the UTF-8 locales I can try on this box
(admittedly, not too many).

> $ echo 'ijklmnopqrstuvwxyz'| awk '{gsub("[r-z]", "X"); print}'
> ijklmnopqXXXXXXXXX
> $ echo 'ijklmnopqrstuvwxyz'| awk '{gsub("[R-Z]", "X"); print}'
> ijklmnopqrXXXXXXXX

Same here.
 
> $ echo 'ijklmnopqr'| grep "[r-z]"
> ijklmnopqr
> $ echo 'ijklmnopqr'| grep "[R-Z]"

This is expected, since "[R-Z]" does NOT match "r" in the locales under
discussion. But nonetheless, you are right that grep seems to behave
differently, although I was almost sure I had seen it show the same
behavior at some point; I may be misremembering.

Furthermore, the above is in direct contradiction to grep's documentation,
which states

"Within a bracket expression, a "range expression" consists of two
characters separated by a hyphen.  It matches any single character that
sorts between the two characters, inclusive, using the locale's
collating sequence and character set.  For example, in the default C
locale, `[a-d]' is equivalent to `[abcd]'.  Many locales sort
characters in dictionary order, and in these locales `[a-d]' is
typically not equivalent to `[abcd]'; it might be equivalent to
`[aBbCcDd]', for example.  To obtain the traditional interpretation of
bracket expressions, you can use the `C' locale by setting the `LC_ALL'
environment variable to the value `C'".

So I would definitely expect grep to follow awk's and sed's behavior.


For further thought, "sort" is another command whose collating order is
affected by the locale ("consistently", so to speak, with awk and sed).

$ printf '%s\n' A b Z | sort
A
b
Z

$ printf '%s\n' A b Z | LC_ALL=C sort
A
Z
b

-- 
D.

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Thu, 28 Apr 2011 15:12:14 +0200

Le jeudi 28 avril 2011 14:04:28, Davide Brini a écrit :
> But you got me curious. My original test system used mostly vanilla tools
> built from source, so I thought I would try on stock distros instead.
> And guess what, both on a standard RHEL 6 and Debian squeeze, I see your
> results (ie gawk behaves differently).

OK, that's interesting information. That means that either those distributions 
patch the tools, either there's some compilation option that differs.

I have tested 6 stock distros too, all with the same behaviour forgawk (I have 
not tested  sed nor grep  yet).

> Maybe, but my point was that it was no gawk-only bug. But now, having been
> able to reproduce your results, it may well be that gawk does something
> different. Arnold is the authoritative source here.

Yes, and let me take the occasion to thank him for all the great work on gawk 
and the nice documentation.


(略)
> The only way you would get
> output is if [R-Z] was implemented as "RrSs..." etc., which seems not to
> be the case; rather, it seems to be the other way round ("rRsS..." etc.).

Correct.

Ah, you were right, grep behaves as awk on this, I did not test thoroughly 
enough:

$ echo 'ijklmnopqs' | grep '[R-Z]'
ijklmnopqs
$ echo 'ijklmnopqr' | grep '[R-Z]'
$

$ echo 'ijklmnopqs' | awk '/[R-Z]/ { print }'
ijklmnopqs
$ echo 'ijklmnopqr' | awk '/[R-Z]/ { print }'
$

That does not make it more logical :-), and sed is definitely different:
$ echo 'ijklmnopqs' | sed '/[R-Z]/p'
ijklmnopqs
$ echo 'ijklmnopqr' | sed '/[R-Z]/p'
ijklmnopqr
$ echo 'ijklmnopqR' | sed '/[R-Z]/p'
ijklmnopqR
ijklmnopqR
$ echo 'ijklmnopqS' | sed '/[R-Z]/p'
ijklmnopqS
ijklmnopqS


-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Eric Bischoff
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Fri, 29 Apr 2011 09:03:52 +0200

(引用部略)

OK, at that time I did not understand how that would solve the problem (nor 
even  why that was relevant).

That's really good news for all awk developpers. Thanks for pointing me to 
that comment again, Paul.


-- 
Éric Bischoff - Bureau Cornavin
Technical writing and translations
http://www.bureau-cornavin.com

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Aharon Robbins
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Fri, 29 Apr 2011 10:55:23 +0300

Eric,

Hi.

> At this point, my personal opinion is that all intervals are simply
> unusable with gawk, as they gives results that are both unpredictable and
> counter-intuitive with any locale other than "C".

Indeed - this is why the (development) documentation clearly explains
not to use them, and why the development code converts [R-Z] into
[RSTUVWXYZ].  I have been fighting this issue for years now.

Davide Brini states:

> You seem to think this is gawk-specific, but in fact any locale-aware tool
> that uses regular expressions behaves the same (try eg with sed or grep).

And this too is correct.  POSIX locales (in my not-so-humble opinion) are
a total and utter botch.

(I'll point out also that all of this happens down in the library routines
that gawk uses, and which are (complicated, messy) black boxes as far as
I'm concerned.)

> Asking non-English users to write explicit [abcdefghijklmnopqrstuvwxyz] 
> choices is not a solution either.

[[:lower:]], [[:upper:]] and so on exist to mitigate this issue. They are
not perfect solutions.

> Collation [...]

Collation has to do with sorting order, and less so with regular expression
matching.  Gawk doesn't support [[=e=]] which is supposed to match all
versions of the letter 'e'.

> My point is that [R-Z] should either be defined to [rRsStTuUvVwWxXyYzZ]
> or to [RSTUVWXYZ], but not to a surprising thing like [RsStTuUvVwWxXyYzZ]
> (no "r").  The current situation where [R-Z] catches "t" but does not
> catch "r" is really weird, even if it's compatible with the freedom
> offered by the POSIX standard.

I agree, which is why I've clarified the doc and changed the code, but again,
this is not a gawk-specific issue but a general locale issue.

> One technical possibility would be to simply use Unicode code positions.

Unfortunately, no.  Gawk is used in many parts of the world where Unicode
is not the standard character set (Japan, China, etc.) and restricting
gawk to just Unicode would not be a good idea.  You can today use
octal escapes inside [...] if you want. (It's even documented! :-).
But that's only good for single bytes.

Maybe in another 10 years it'll be safe to move exclusively to Unicode.

To sum up, it's a thorny issue, of which I'm well aware, but there is
no simple easy solution.

If you still disagree, then I'm sorry, there's nothing else I can do
to help.

Thanks,

Arnold

Re: Case insensitivity seems to ignore lower bound of interval
From: 	Aharon Robbins
Subject: 	Re: Case insensitivity seems to ignore lower bound of interval
Date: 	Fri, 29 Apr 2011 10:57:12 +0300

Hi.

(引用部略)

I have been fighting this issue for at least 10 years (I think). To turn
it off, just use --posix.  Gawk already has too many command-line options;
I don't want to add another one.

Thanks,

Arnold

まあそうだろうねえ>10年以上

■_ 50 years ago

Abstract Heresies: 50 years ago — April 28, 1961

Thursday, April 28, 2011

50 years ago — April 28, 1961

    Compatability of LISP 1 and LISP 1.5.

       1. EVALQUOTE has two arguments while APPLY has three. To change a LISP 1 program for
          LISP 1.5 simply eliminate the p-list if it is null. If the p-list is needed, then
          the function apply is available in LISP 1.5,

       2. Arithmetic in LISP 1.5 is new, improved, and generally incompatible with LISP 1.

       3. LISP 1.5 has many extra features. There [sic, these] are being written up for the
          new LISP 1 Programmer's Manual. Until it comes, check in Room 26-265.

そういや、1.5 はわりと取り上げられるのをみるけど 1 はそうでもないような?

■_

Stand up for your communities and projects - Perlbuzz Slipping away from the Perl community - Perlbuzz

■_

2011年04月28日

■_

自転車放置禁止区域に立ってたら、 「自転車を停めたいのでどいてもらえませんか」と言われた。

闇金ウシジマくんの単行本、限定版を買ってみた
闇金ウシジマくん /21 ウシジマくん特製・地獄の取り立て帳付限定版 (ビッグコミックス)

■_ ruby-core

reject で終わったけど IO#split の機能はあったらうれしいかな。 ただまあ効率よく実装するのはかなり面倒だろうなあ。

Feature #4615: Add IO#split and iterative form of String#split
http://redmine.ruby-lang.org/issues/4615

Subject: [ruby-core:35896] [Ruby 1.9 - Feature #4615][Open] Add IO#split and iterative fo...
Subject: [ruby-core:35898] [Ruby 1.9 - Feature #4615][Assigned] Add IO#split and iterativ...
Subject: [ruby-core:35901] [Ruby 1.9 - Feature #4615] Add IO#split and iterative form of ...
Subject: [ruby-core:35918] [Ruby 1.9 - Feature #4615] Add IO#split and iterative form of ...
Subject: [ruby-core:35919] [Ruby 1.9 - Feature #4615] Add IO#split and iterative form of ...

■_ あとでよむ

Mired in code: Why Haskell?
Why Haskell?

This should not be considered an expert overview of the language. It's a deep language, 
and I'm still learning it. This is a discussion of what I've seen so far to explain 
why I chose it.

■_

■_

中途半端なものしかねーーっ

2011年04月27日

■_

コンビニで、二つレジがあるうちの一つだけが使われていて順番待ちをしているときに 婆様が使っていない方のレジに向かって行くのと もう一つのレジを開けに行くタイミングが微妙にはまって 順番待ちを無視されたときの複雑な気分と来たら。

■_ 誤訳?

順序が逆になってんじゃなかろか

perldelta - perl v5.14.0 での変更点 【perldoc.jp】

正規表現¶

(?^...) construct signifies default modifiers¶

((?^...) 構造はデフォルト修飾子を示します)

An ASCII caret "^" immediately following a "(?" in a regular expression
now means that the subexpression does not inherit surrounding modifiers such as /i, but
reverts to the Perl defaults. Any modifiers following the caret override the defaults.

正規表現中、ASCII キャレット "^" の直後に "(?" があると、 (/i のよ
うな)それを囲む修飾子を継承せず、 Perl のデフォルトに戻ることを意味するようになりまし
た。 キャレットに引き続く任意の修飾子はデフォルトを上書きします。

Stringification of regular expressions now uses this notation. For example, qr/hlagh/i would
previously be stringified as (?i-xsm:hlagh), but now it's stringified as (?^i:hlagh).

正規表現の文字列化はこの記法を使うようになりました。 例えば、以前は qr/hlagh/i は 
(?i-xsm:hlagh) に 文字列化されていましたが、(?^i:hlagh) に文字列化されるようになります。

(?^i:hlagh) のように記述するんだから、 ASCII キャレット "^" の直後に "(?" があると、 キャレット「が」直後にあるとでないといけないような。 ^ が “following”であって“followed”じゃないし。 これってどこに報告すればいいんだろ。 ML?

■_ JVMの上の

本当にいろいろなものが乗っかりますねえ。 トップページで謳われている文言。

Home - Redline Smalltalk - Smalltalk for the Java Virtual Machine.

... because nothing is as productive as Smalltalk, and the App has to run on the Java Virtual Machine.

Smalltalk ほど生産的 (productive) なものはないのでこのアプリケーションも Java Virtual Machine 上で動かさねばならない。かな?

twitter アカウントもあるのねw Redline Smalltalk (redline_st) は Twitter を利用しています

Getting Smalltalk on the JVM (redline.st)
Getting Smalltalk on the JVM : programming
This is really neat, but there isn't a lot of information on the site (no FAQ).

   * How is the user experience? Do you get a full Smalltalk environment image thingy?
   * How feature complete is it compared to other Smalltalk implementations?
   * How well does it integrate with Java?
   * Where does it come from? Is it a brand new implementation, or maybe a freed,
     previously proprietary compiler?

■_ The five most important algorithms

5つ。といわれたら自分は何をあげるかなあ。

The five most important algorithms?

The five most important algorithms?

Bernhard Koutschan posted a compilation of the most important algorithms. The goal is to
determine the 5 most important algorithms. Out of his list, I would select the following
five algorithms:

    * Binary search is the first non-trivial algorithm I remember learning.
      二分探索

    * The Fast Fourier transform (FFT) is an amazing algorithm. Combined with the Convolution
      theorem, it lets you do magic.
      高速フーリエ変換

    * While hashing is not an algorithm, it is one of the most powerful and useful idea in
      Computer Science. It takes minutes to explain it, but years to master.
      (「アルゴリズム」ではないが) ハッシュ技法

    * Merge sort is the most elegant sorting algorithm. You can explain it in three sentences
      to anyone.
      マージソート

    * While not an algorithm per se, the Singular Value Decomposition (SVD) is the most important
      Linear Algebra concept I don't remember learning as an undergraduate. (And yes, I went
      to a good school. And yes, I was an A student.) It can help you invert singular matrices
      and do other similar magic.

最後のはなんだっけ Singular value decomposition - Wikipedia, the free encyclopedia とリンク先をみて理解。

そのほかのご意見

   1.

      My choices are:

      1. Quicksort (though I find it ugly and difficult to explain to someone :( )

      2. Dijkstra's algorithm (Shortest path)

      3. Euclidean algorithm for GCD

      4. Strassen's algorithm for Matrix multiplication

      Comment by Ragib Hasan — 5/7/2010 @ 21:56

   4.

      Good list but I think the RSA Encryption Algo and the Diffie Hellman Key exchange do need a mention :)

      I would add two more to my list, the data compression and the viterbi algo :)

      Comment by Richie — 5/7/2010 @ 22:15

   8.

      Great list.
      I would add Quicksort, Public-Key Cryptography and LZ Compression algorithms to the list.

      Comment by Guru Kini — 6/7/2010 @ 0:47

  17.

      1. Matrix Decompostion Algo e.g. LU-Decomposition
      2. Krylov subspace iteration methods.
      3. Monte Carlo method.
      4. Fast Multipole Methods
      5. Quicksort

      I have picked out of this list:
      http://amath.colorado.edu/resources/archive/topten.pdf

      Everything on the list is an algo. when you try to use it…

      Comment by Felix Gremmer — 8/7/2010 @ 9:26
  19.

      Edmonds matching algorithm

      Comment by Craig — 23/4/2011 @ 23:37


© 2004-2011, Daniel Lemire (lemire at gmail dot com). This work is licensed under a Creative Commons License.

■_

■_

あとでよむ

Why Open Source Company Culture is Important | Intridea Blog
Why Open Source Company Culture is Important

Companies have many ways to benefit from an open source culture. While many arguments 
can be made about the philosophical implications of choosing to contribute to the open 
source community, at the end of the day philosophy isn't going to persuade any CEO to 
adopt open source. The real reason that open source culture is important to a business 
is because it's a business decision that can bring many real-world advantages over the 
proprietary-focused alternatives.
Burrows-Wheeler transform - Wikipedia, the free encyclopedia

The Burrows-Wheeler transform (BWT, also called block-sorting compression), is an 
algorithm used in data compression techniques such as bzip2. It was invented by 
Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center 
in Palo Alto, California.[1] It is based on a previously unpublished transformation 
discovered by Wheeler in 1983.

2011年04月26日

■_

○| ̄|_

だむえー
オリジンはそのなんというか。

■_ 完結編

金曜日に投稿されてたのねw

バグとテストと残業中 

68 仕様書無しさん [sage] 2011/04/22(金) 12:54:32.03 ID: Be:
    IT土方まさか☆デスマ

    第1話 仕様書の中にあった、ような…
    第2話 それはとってもバグ臭いなって
    第3話 もう徹夜はしたくない
    第4話 妖精さんも、小人さんも、いないんだよ
    第5話 デバッグなんて、時間がない
    第6話 こんなの仕様がおかしいよ
    第7話 営業の狂気と向き合えますか
    第8話 顧客って、ほんとバカ
    第9話 そんな、わたしが帰れない
    第10話 もう誰にもわからない
    第11話 最後に散った道しるべ
    第12話 わたしの、最低の上司 

■_ The myth of the Lisp genius

Lisp の天才の神話。といったところ?

The myth of the Lisp genius — The Endeavour

The myth of the Lisp genius

by John on April 26, 2011

I'm fascinated by the myth of the Lisp genius, the eccentric programmer who 
accomplishes super-human feats writing Lisp. I'm not saying that such geniuses don't 
exist; they do. Here I'm using “myth” in the sense of a story with archetypical 
characters that fuels the imagination.  I'm thinking myth in the sense of Joseph 
Campbell, not Mythbusters.

Richard Stallman is a good example of the Lisp genius. He's a very strange man, 
amazingly talented, and a sort of tragic hero. Plus he has the hair and beard to fit 
the wizard archetype.

Let's assume that Lisp geniuses are rare enough to inspire awe but not so rare that we 
can't talk about them collectively. Maybe in the one-in-a-million range. What lessons 
can we draw from Lisp geniuses?

以下略

■_ Perl 6 (Rakudo)

Haskell と張り合う? Perl 6

More Pi « Just Rakudo It

1	convert :: (Integer,Integer) -> [Integer] -> [Integer]
2	convert (m,n) xs = stream next safe prod cons init xs
3	  where
4	    init = (0%1, 1%1)
5	    next (u,v) = floor (u*v*n')
6	    safe (u,v) y = (y == floor ((u+1)*v*n'))
7	    prod (u,v) y = (u - fromInteger y/(v*n'), v*n')
8	    cons (u,v) x = (fromInteger x + u*m', v/m')
9	    (m',n') = (fromInteger m, fromInteger n)

The difference comes from Haskell's extremely elegant on-the-fly pair notation. When I 
translate that to p6, I get

1	sub convert($m, $n, @x) {
2	    stream(-> $u { floor($u.key * $u.value * $n); },
3	           -> $u, $y { $y == floor(($u.key + 1) * $u.value * $n); },
4	           -> $u, $y { $u.key - $y / ($u.value * $n) => $u.value * $n; },
5	           -> $u, $x { $x + $u.key * $m => $u.value / $m; },
6	           0/1 => 1/1,
7	           @x);
8	}

えーと -> って Ruby と一緒だっけか。

■_

もう一個 Perl 6ネタ

Separate compilation, package refactors and gradual typing: oh boy, what a mix! | 6guts

So, I've spent much of the time since I last wrote here working on these issues. Since 
this involved re-working packages, I've also dealt with a lot of the issues there. We 
can now have support for lexically scoped packages, for example.

my module Secret {
    our class Beer {
        method drink() { say("mmm...pivo!") }
    }
}
Secret::Beer.drink();

Here, the class Beer is decidedly installed in the package Secret, but the Secret 
package it is installed into is lexical. This means that while we can see that package 
inside the current lexical scope (say, the main lexical scope of our module), that 
package will not be visible outside of the module at all – unless you explicitly 
export it.

ここで、class Beer は明らかに Secret パッケージにインストールされます。しかし
この Secret パッケージは lexical にインストールされるのです。これは、
カレントの lexical スコープ (つまりわたしたちのモジュールの main lexical スコープ)
の内側でこのパッケージを見ることができるけれども、
はっきりとした形で export をしていない限り
モジュールの外側では見られなくなってしまうということです。


■_ ハッシュ

先に調べてた人がいたり。 文字列のハッシュ値の計算 - imHo

ハッシュの概念はわかる、けど実際のところどんなもんかよくわかってない…。特に文字列のハッ
シュ値をどうやって計算してるのかと、ハッシュ値が衝突したときにどうするのかと、バッファの
サイズをどうするのか。なのでいくつかの処理系を調べてみた。

  

■_ JVM上に

Research or implementation for GHC to Java backend compiler : haskell

I saw two research documents (I think one from Dons?) for a Java back-end but not much 
in the way of implementation. I have seen bits and pieces of the LLVM backend. Does 
anyone have any information on what it would take to generate java bytecode? Is this 
an impossible task?

Edit-1: This looks promising: http://wiki.brianweb.net/LambdaVM/LambdaVM

なんかいろいろつっこまれてます。

■_

2011年04月25日

■_

・アフタヌーン
「天地明察」が新連載。

今日が初月給というのをついった上で結構見かけました。

■_ Mozilla JavaScript 2011

Mozilla で使っているJavaScript処理系のGCがどうとか

David Mandelin's blog » Mozilla JavaScript 2011

Mozilla JavaScript 2011

So, JägerMonkey is done (as of last fall, really), and Firefox 4 is out (yay! and 
whew!), so that means we can get back to the fun stuff: new coding projects. At the 
Platform Work Week, we firmed up plans for JavaScript for the next year or so. The 
main themes are debugging and performance:

(略)

The reason for the pauses is that SpiderMonkey uses an old-school stop-the-world 
mark-and-sweep collector. Briefly, it works like this:

   1. Based on some heuristics, the JS engine decides it is time to collect some garbage.
   2. The GC finds all the GC roots, which are the immediately accessible objects: JS 
      local variables, the JS global object, JS objects stored on the C++ stack, and a few 
      other things.
   3. The GC marks all objects that can be reached from the roots, by following all the
      pointers stored in the roots, then all the pointers stored in the objects reached 
      from the roots, and so on.
   4. The GC sweeps over all allocated objects. If an object is not marked, there is no
      way for the program to access it, so it can never be used again, and the GC frees it.

The main problem with stop-the-world mark-and-sweep GC is that if there are a lot of 
live objects, it can take a long time to mark all those objects. “A long time” 
typically means 100 milliseconds, which is not that long, but is disruptive to 
animation and is noticeably jerky.

Our first step in fixing GC pauses will be incremental GC. Incremental GC means that 
instead of stopping the program to mark everything, the GC periodically pauses the 
program to do a little bit of marking, say 3 milliseconds worth. There is an overhead 
to starting and stopping a mark phase, so the shorter the pause time, the slower the 
actual program runs. But we think we can make the pause time unnoticeable without 
having too much impact on throughput.

(略)

Posted: April 22nd, 2011 under Uncategorized.

インクリメンタルGC か。

■_

本文はおいといて。

エミュレーション、そしてコンピューティングの歴史

学ぶために

    * Technical University of Berlin では、Konrad Zuse の「Z」マシンについて詳しく説明して
      います。Zuse の Z3 と Z4、そして最初の言語、プランカルキュールの詳細を調べてください。

      (略)

    * 2010年の時点で、計算を自動化した初のデジタル・コンピューターの発明者については論争が
      続いていました。大抵は ENIAC を開発したペンシルベニア大学ということになっています。
      けれども、ほとんど話題にされていませんが、それ以前の業績もあります。1930年代後半には、
      John Atanasoff と Clifford Berry が、現在 ABC と呼ばれる最初のコンピューターの作成にす
      でに着手しました。この初期の歴史に目を向けた、新しい本が最近出版されています。この本の
      著者、Jane Smileyについては、Wired の記事「Pulitzer Prize-Winning Novelist Tells the 
      Tale of the World's First Computer」で取り上げられています。Smiley はコンピューターの
      歴史だけでなく、この発明の背後にある政治的、心理的背景、そして企業のドラマについても明
      らかにしています。

      (略)


なんか面白そうな話が。

■_ ハッシュ関数

ちと別件で調べごとをしていたときに気がついた。 UNIX v7 の awk。

V7/usr/src/cmd/awk/tran.c

hash(s)	/* form hash value for string s */
register char *s;
{
	register int hashval;

	for (hashval = 0; *s != '\0'; )
		hashval += *s++;
	return(hashval % MAXSYM);
}

なんだこの投げやりなハッシュ関数(^^;

■_

(プログラマーでなくても)誰もが知っておくべき五つのプログラミング言語とかいうので Larry Wallが話している(これって何ヶ月か前のだけど何で今頃話題に上ったんだろ。

動画なんだけど、ありがたいことに文字に起こされているので内容把握は楽。

5 Programming Languages Everyone Should Know | Larry Wall | Big Think

Question: What are the five programming languages everyone, even non-programmers, 
should know about and why?

Larry Wall:  Oh, boy, that's a really tough question.  It's kind of like asking what 
are the five countries you should know about if you're not interested in geology, or 
geography, or politics, and the answer varies depending on what your actual interests 
are, or what are the five companies you should know.  And the answer changes over time, 
too.  Back when I was getting started, lo these many decades ago, the answers would've 
been Fortran, Cobalt, Basic, Lisp, and maybe APL, and those were very formative 
languages back then and people learned a lot from those, but these days, it might be 
more important for you to know JavaScript, even if the only reason you know that is 
that you know whether or not to click the "enable JavaScript" button in your 
browser.  But JavaScript is a nice, lightweight, object-oriented language and that's 
why it can fit in a browser and do these things such as run little programs that help 
you input your data and then send it off to a web server somewhere.

以下略

Cobaltってのは聴いてると「COBOL」の間違い(LarryはCOBOLといってるのだけど それが正しく伝わってない)のような気がする。

■_

■_ ようやく

日本語翻訳版の出版予定が! Coders at Work プログラミングの技をめぐる探求|Ohmsha

オーム社の本はAmazonさんに現れるのは遅めの印象があるのだけど 今回はすでにある。 Amazon.co.jp: Coders at Work プログラミングの技をめぐる探求: Peter Seibel, 青木 靖: 本 発売日: 2011/5/25 わくわくが止まりません(原書はどうした)

2011年04月24日

■_

日経ソフトウエア 2011年 06月号 [雑誌]
Yさんの連載(今月が最終回でしたが)。 プロセス間通信を題材にしていてそのサンプルで使っているのが 「クリップボード」、「名前つきパイプ (named pipe)」、「メールスロット」 ってのはどうなんだろうか。特に三つ目。

数学ガール ゲーデルの不完全性定理① (MFコミックス アライブシリーズ)
もう一冊別の漫画が同時に出ていますが、とりあえずこっちを買った。

テルマエ・ロマエ III (ビームコミックス)
勢いがなくなってきているという見方もまああるんだろうけど (二巻のときに向かいさんがきつめのコメントをしていたような覚えがあるんだけど見当たらないな。 別の人だったかしらん)、自分はこれはこれでありじゃないかなあという感じが。 この先どういう「発見」を主人公がしていくのかが気になるし、 あと、時期皇帝候補のお話は、ハドリアヌスの次の皇帝はあの人なので以下略 ってことですよねえ。その辺と一緒に物語の決着もつくんだろうか。

本来ならもう来ているはずなのにAmazonさんから遅延のお知らせがががががが Review: Learn You a Haskell for Great Good! - iRi Kindle やら iPad は持っておらんので(^^;

■_ あらいぐま

昨日今日とこのスレとしてはかなりの勢い。

関数型プログラミング言語Haskell Part14

162 デフォルトの名無しさん [sage] 2011/04/23(土) 16:55:42.08 ID: Be:
    ガウスの消去法でさえも、どうやればいいのかちょっと悩んでしまう。
    Haskellってコードの作成コストと保守コストは低いかもしれんが、プログラマを養成するコストがとんでもないことにならないか? 

163 デフォルトの名無しさん [sage] 2011/04/23(土) 17:05:00.33 ID: Be:
    エントロピーって言葉を知っているかい?
    Haskellで得られる生産性の高さはHaskellを学ぶ労力と釣り合わないってことさ 

164 デフォルトの名無しさん [sage] 2011/04/23(土) 17:14:18.16 ID: Be:
    新卒プログラマをデスマーチに投入することで生まれる
    希望から絶望への相転移エネルギーを回収することで宇宙のエントロピーは減少するのさ。 

165 デフォルトの名無しさん [sage] 2011/04/23(土) 17:27:26.61 ID: Be:
    何言ってるのかよくわからないからりりかSOSで例えてくれ 

166 デフォルトの名無しさん [sage] 2011/04/23(土) 21:59:09.73 ID: Be:
    死屍累々たる新卒を糧にして命の花は咲くということさ。 

167 デフォルトの名無しさん [sage] 2011/04/23(土) 22:12:59.71 ID: Be:
    おまいら・・・ 

168 デフォルトの名無しさん [sage] 2011/04/23(土) 22:40:36.42 ID: Be:
    >>162
    この手の問題は、IOモナドやSTモナドで配列の破壊的更新をしてしまうという解は常にあるわけで、「Haskellらしさ」にこだわらなければ他の言語よりむずかしいということはないと思う。
    モナドって何さ? というのはたしかに説明するのが難しいけど、そこはまぁ、LISPとかと同じように「変数に値をバインドしている」と考えることにすればいいわけで。
    「変数に値をバインド」というのは何さ? 変数って値を入れておく箱じゃないの? といわれたら、これまた面倒だが、そこまでBASIC時代の人は無視しても良いと…

    破壊的更新はHaskellらしくないと嫌う風潮があるけど、状態モナドを使ったほうが自然なアルゴリズム、破壊的更新のほうが理解しやすいプログラムというのはありえ、
    そういうときに破壊的更新を躊躇するのはむしろ良くないことだと思う。 

176 デフォルトの名無しさん [sage] 2011/04/24(日) 07:11:17.72 ID: Be:
    > エントロピーって言葉を知っているかい?
    エントロピーって単語を使ってみたかっただけだよな?
    頼むからそうだと言ってくれ

177 デフォルトの名無しさん [sage] 2011/04/24(日) 10:11:39.70 ID: Be:
    >>176
    魔法少女まどか☆マギカって知ってるかい? 

178 デフォルトの名無しさん [sage] 2011/04/24(日) 10:52:49.22 ID: Be:
    最近のアニメって浅い知識で専門用語使いたがるね。正直萎えるわ 

179 デフォルトの名無しさん [sage] 2011/04/24(日) 10:55:28.58 ID: Be:
    そんな事言ったらこの世からSFがすべて無くなるぞw 

181 デフォルトの名無しさん [sage] 2011/04/24(日) 11:10:04.05 ID: Be:
    アニメを含む娯楽作品は、作者の思いが込められたりはするが、
    それを深読みしたり現実の世界と繋げたりすることは二の次で、
    まずは純粋に楽しめよ

    Haskellも関数型とかモナドとかはとりあえず二の次で、
    まずは純粋に楽しめよ 

203 デフォルトの名無しさん [sage] 2011/04/24(日) 15:12:15.61 ID: Be:
    >>200
    Simon Thompsonが書いたErlang本を読んだ

    filterやreverseを自分で実装させるあたり
    教育的配慮に富んだ本だなと思って読み進めていたところ
    3章の練習問題がコンパイラ作成で悶絶した

    Haskellerというのは社会性がないというか
    手加減を知らない人たちなんだろうな。そう思った 

235 デフォルトの名無しさん [sage] 2011/04/24(日) 22:01:03.84 ID: Be:
    あらいぐま haskell 

237 デフォルトの名無しさん [sage] 2011/04/24(日) 22:52:14.20 ID: Be:
    >>235
    そういうの、あたしは嫌いじゃないな
    誰か あらいぐまハスケルたん 描いて 

あらいぐまHaskellっていつぐらいが初出なんだろう。

■_

■_ 読む

Slipping away from the Perl community : programming The Lambda Calculus - Code Monkey Have Fun - Site Home - MSDN Blogs Programming is “Pointless” - Code Monkey Have Fun - Site Home - MSDN Blogs

ちょっと時間の経ってるのもまじってるけど。

■_

限界。

2011年04月23日

■_

jaegermonkey 使ってjsshell のビルドをしたいんだけど手順がわかんねえ。 autotools ないとだめなんかな。

■_ コンパイル時に符号つき整数の最大値を求める

C でのお話。

c - Programmatically determining max value of a signed integer type - Stack Overflow

This related question is about determining the max value of a signed type at compile-time:

C question: off_t (and other signed integer types) minimum and maximum values

However, I've since realized that determining the max value of a signed type (e.g. 
time_t or off_t) at runtime seems to be a very difficult task.

The closest thing to a solution I can think of is:

uintmax_t x = (uintmax_t)1<<CHAR_BIT*sizeof(type)-2;
while ((type)x<=0) x>>=1;

This avoids any looping as long as type has no padding bits, but if type does have 
padding bits, the cast invokes implementation-defined behavior, which could be a 
signal or a nonsensical implementation-defined conversion (e.g. stripping the sign bit).

I'm beginning to think the problem is unsolvable, which is a bit unsettling and would 
be a defect in the C standard, in my opinion. Any ideas for proving me wrong?

■_ 訳

InfoQの訳出は元記事からあまり時間を置かないでされるのでとても良いと思うのだけど、 首を傾げてしまうような訳がちょっと目立つ感があるのが残念。

InfoQ: 新しいセルフホスティングIDE: RedCarとJRuby、Cloud9 IDEとJavascript

Smalltalkerを除いて動的言語のプログラマが使っているツールとIDEは低いレベル言語で書かれている。
ある言語のための開発ツールをその言語で書くことには、とりわけユーザ(つまり開発者)大きな
利点がある。エディタやIDEの場合、開発者は他の言語やプラットフォームを扱わずに拡張でき
る。Java開発者はJavaベースのIDEを一覧から選択できる。どれもJavaのコードを少し書けば拡
張できるIDEだ。しかし、JavascriptやRubyの場合、こうはいかない。

「低いレベル言語」って…(^^;

InfoQ: A New Crop of Self Hosting IDEs: RedCar and JRuby, Cloud9 IDE and Javascript

Except for Smalltalkers, dynamic language programmers are used to their tools and IDEs 
being written in lower level languages. Writing developer tools for a language in the 
language has big advantages, particularly for users (ie. developers). In the case of 
editors and IDEs, it means developers can extend their tools without having to deal 
with another language and platform. Java developers today can choose from a list of 
Java-based IDEs, all extenѕible by writing a bit of Java code. That's not quite the 
case for Javascript or Ruby. 

元記事では lower level のところが太字になってたのでそれに引きずられたのかもしれませんが、 これは IDE で開発する言語(Ruby やらJavaScriptやら)よりもシステム寄りの 低水準 (low-level) の言語で記述されているよ。ということですよね。 そのあとの文章もちと収まりが悪いと思うけどそこはスルー。

■_ 警告

気になったので(ちょっとだけ)調べてみた。

ActivePerl/Tk で dumpが勝手に | OKWave

perl, v5.8.6で、Perl/Tkを利用した環境です。

従来は、Windows-XP(x32)で利用していて、特に問題なかったのですが、
同じ環境をWindows7(x32)に導入して利用しようとしたところ、
Menu メソッド呼び出しのところで、ちょうど Devel::peekのdumpで表示させた
ような内容のものが、DOSプロンプト上に出てしまいます。

↓こちらの記事(英文)に書かれていることも試してみましたが、解決できませんでした。
http://community.activestate.com/forum-topic/perl-tk-cause-warning

Windows7だからということではなさそうですが、
この表示が出てしまう原因、出さなくする方法について、教えてください。
何かご存知の方、よろしくお願い致します。

サンプルスクリプト: test.pl
-----------------------------
use strict;
use Tk;
my $mw = MainWindow->new();
my $menu = $mw->Menu();
MainLoop;
-----------------------------

このスクリプトをWindows7環境上で実行すると、
MainWindowのダイアログは画面に正しく表示されていますが、
DOSプロンプトで、下記のメッセージも出てしまいます。
⇒4行目(my $menu = $mw->Menu();)をコメントアウトするとこの表示は出ません。

-----------------------------
C:\Perl\Tk\Win7>test.pl
@ 535 not utf8
SV = PV(0x3d4f9d0) at 0x3d53870
REFCNT = 2
FLAGS = (POK,pPOK,UTF8)
PV = 0x3d0d9e4 "\203\201\203C\203\212\203I 9"\0 [UTF8 "\x{c1}C\x{c3}C\x{ca}\x{283}\x{c9}I 9"]
CUR = 10
LEN = 11
SV = PVMG(0x214bf74) at 0x3d53870
REFCNT = 2
FLAGS = (SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x3d0d9e4 "\203\201\203C\203\212\203I 9"\0 [UTF8 "\x{c1}C\x{c3}C\x{ca}\x{283}\x{c9}I 9"]
CUR = 10
LEN = 11
MAGIC = 0x3d5d5a4
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 10

"\203\201\203C\203\212\203I 9" に引っかかるものを感じたのでやってみた。

>perl -e "print qq{\203\201\203C\203\212\203I 9}"
メイリオ 9

ふむ。 フォント指定のところで警告が出ているっぽい? xp はメイリオではないと思うけど、この辺のデフォルト設定はどうなってるんだっけか。

この質問にあるリンク先を見ると、中国語版のxp で同様の問題があったらしい。

Perl/Tk cause warning | ActiveState Community Site

Perl/Tk cause warning
Posted by lilin on 2008-06-01 22:14
Forums: ActivePerl Support | OS: Windows

Windows XP Home Edition, Perl ActivePerl 5.10.0.1003 Tk 804.028
A script to test Perl/Tk:

#! perl -w

use Tk;

$main = MainWindow->new();

$menubar = $main->Frame(-relief => "raised", -borderwidth => 2);

$filebutton = $menubar->Menubutton(-text => "File", -underline => 0);
$filemenu = $filebutton->Menu();
$filebutton->configure(-menu => $filemenu);

$filemenu->command(-label => "Exit", -command => \&exit_choise, -underline => 1);
$filebutton->pack(-side => "left");
$menubar->pack(-side => "top", -fill => "x");

MainLoop();

sub exit_choise {
print "You chose the Exit choise!\n";
exit;
}

When running, it caused the following warning:

@ 535 not utf8
SV = PV(0x1d3e094) at 0x1d88794
REFCNT = 2
FLAGS = (POK,pPOK,UTF8)
PV = 0x1d2c654 "\313\316\314\345 9"\0Malformed UTF-8 character (unexpected non-continuation byte 0xce, immediately after start byte 0xcb) in subroutine entry at C:/Perl/site/lib/Tk/Widget.pm line 190.
Malformed UTF-8 character (unexpected non-continuation byte 0xe5, immediately after start byte 0xcc) in subroutine entry at C:/Perl/site/lib/Tk/Widget.pm line 190.
[UTF8 "\x{0}\x{0} 9"]
CUR = 6
LEN = 8
SV = PV(0x1d3e094) at 0x1d88794
REFCNT = 2
FLAGS = (POK,pPOK,UTF8)
PV = 0x1d2c654 "\313\316\314\345 9"\0Malformed UTF-8 character (unexpected non-continuation byte 0xce, immediately after start byte 0xcb) in subroutine entry at C:/Perl/site/lib/Tk/Widget.pm line 190.
Malformed UTF-8 character (unexpected non-continuation byte 0xe5, immediately after start byte 0xcc) in subroutine entry at C:/Perl/site/lib/Tk/Widget.pm line 190.
[UTF8 "\x{0}\x{0} 9"]
CUR = 6
LEN = 8

Any idea!

(略)

© 2011 ActiveState Software. All rights reserved. Privacy Policy | Contact Us | Support

たぶんこれ \313\316\314\345 9 もフォント指定だろう。 中国語(簡体字)版Windowsのデフォルトのフォントって宋体とかいうのじゃなかったかな。 スペースと9の前には4バイト分あるから、それっぽい。

が、面倒くさいのでこれ以上は調べない(えー

■_ lock

なんかシリーズものになってる。 最初から読み直してみるかなあ (Windows以外の人は多分無縁)。

■_

■_

入力の調整をせんとなあ(謎

2011年04月22日

■_

O'Reilly Village/オラの村 - 『Mathematicaクックブック』刊行記念セミナー 書泉ブックタワーにて開催 に行ってきました。 なんか講演した方の関係者が多かったぽいです。 いろいろ話を聴いてやっぱり欲しくなったのですが、 いくらなんでも一月の給料丸ごとと大差ない金額ってのはねえ。 会場で使ってますって人を見ると(削除)

■_

Lisp の呪いネタ。Hacker news でも取り上げられたようなんですが

Hacker News | The Lisp Curse
Occam's Razor suggests that the reason there is no dialect of Lisp as popular as less 
expressive languages is that no one happens to have created one yet. What languages 
are has changed. Languages used to be specs. Now they're open source projects. Open 
source projects seem to succeed based more on the energy of their initial organizers 
than anything else. There are only a few big language-project organizers, and none of 
them happens to have chosen to implement a language that is a dialect of Lisp. That's 
the sort of thing that's true until it's false. E.g. Clojure could be the 
counterexample.

Maybe there's more to it than a small sample size, but that seems the likeliest 
explanation. The second most likely explanation is almost as mundane: that the reason 
new dialects of Lisp have trouble attracting adherents is that mainstream programmers 
are put off by s-expressions.

ざっくり略

My understanding is that prior to creating Perl, Larry Wall had studied (natural 
language) linguistics, with an aim towards perhaps designing a writing system for some 
natural language.

I believe he's explicitly rejected the ideas (common in the Lisp world) that extreme 
conciseness and conceptual orthogonality should be the ultimate goals of programming 
language design.

Calling him "relatively clueless about language design" seems a little odd.


Matz was a researcher working on compilers and programming languages when he created 
Ruby, so it doesn't seem obvious to me that he was "relatively clueless about 
language design" at the start.


He was definitely clueless about the implementation of Ruby. MRI started as a 
half-assed Scheme implementation, which is easy to see when you read the MRI source 
code. Scheme has lexical scoping, while Ruby has...python/perl-style scoping.

Scheme has first class anonymous functions, while Ruby has blocks, which are a hack 
around MRI's slow function calls.


while Ruby has blocks, which are a hack around MRI's slow function calls

Is this really the reason? I thought Ruby has blocks because Smalltalk has blocks.

以下略

He was definitely clueless about the implementation of Ruby. とかまつもとさんもえらい言われ様だな(^^;

■_

PHPプロジェクトの80-90%は巨大なクソの山であるという事実 : candycane development blog

PHPプロジェクトの80-90%は巨大なクソの山であるという事実 : candycane development blog

なぜPHPはゲットーだったのか

ダンボ地区のかなりクールなスタートアップの創始者と私は世の中の多くのPHPの開発者でない
人たちがPHPとその周囲のコミュニティを軽蔑するのかについて話す機会があった。彼はとても
興味深い点に言及した事が私の印象に残った。なぜなら私はこれまで聞いた事がない指摘だった
からだ。


誤解しないで頂きたい。いくつかの素晴らしいPHP開発者は存在していた。しかし周囲は濾過さ
れていない初心者のソースだらけだった。カウボーイPHP開発者達が何の規約も無しにプロ​​ジ
ェクトを行えば、それはphpBBや、PHPNuke、またはPHP3のファイルの節くれだったマッシュアッ
プのようになった。しかし、あなたはPHP開発者だけを非難することができるだろうか?いいえ! 
他のWeb言語の巨人達、ASPやPerlも地獄のようなスパゲッティコードの促進に加わっていた。

Co-Founder を「創始者」とか、 「『いくつかの』すばらしいPHP開発者は存在していた」 とか気になるなあ。「いくつか」ってモノを数えてるんじゃないんだから…(^^;

ほかにも「てにをは」やら言い回しが気になる点が結構あるけど まあ、いいや(ブーメラン来るし)。

あ、ブーメランは謹んで受けまする ○| ̄|_

■_ まじっくなんば

'This kind of "I broke things, so now I will jiggle things randomly until they unbreak" is not acceptable.' - Linus Torvalds : programming
Re: Linux 2.6.39 rc3

What are all the magic numbers, and why would 0x80000000 be special?

(略)

Guys, we've had this discussion before, in PCI allocation. We don't do
this. We tried switching the PCI region allocations to top-down, and
IT WAS A FAILURE. We reverted it to what we had years of testing with.

Don't just make random changes. There really are only two acceptable
models of development: "think and analyze" or "years and years of
testing on thousands of machines". Those two really do work.

                  Linus

うはー。きつ。書いていることはまったくもって Linus が正しいと思うけども。

■_

■_

たぶん丸投げだよなあ。これ。

Unixの質問なのですが。。。 - Linux Square会議室
SortとJoinとGrepと全部使って、一つのコマンドが書けません。そうする必要はないのかな。
下記の質問を教えてください。

2011年04月21日

■_

健康診断。

■_ さて

どうなるかw

バグとテストと残業中 

64 仕様書無しさん [sage] 2011/04/06(水) 01:53:55.53 ID: Be:
    IT土方まさか☆デスマ

    第1話 仕様書の中にあった、ような…
    第2話 それはとってもバグ臭いなって
    第3話 もう徹夜はしたくない
    第4話 妖精さんも、小人さんも、いないんだよ
    第5話 デバッグなんて、時間がない
    第6話 こんなの仕様がおかしいよ
    第7話 営業の狂気と向き合えますか
    第8話 顧客って、ほんとバカ
    第9話 そんな、わたしが帰れない 

65 仕様書無しさん [sage] 2011/04/06(水) 01:56:55.03 ID: Be:
    短納期でせこい事考えれば、行き着く先は... 

66 仕様書無しさん [sage] 2011/04/06(水) 16:08:52.52 ID: Be:
    >>64
    ワロタ

    第10話 もう誰にもわからない 

■_ それはどうか

そもそも「なんの」正規表現使うのかが明確でないので答えようもなく。

正規表現教えてください | OKWave

正規表現教えてください

たとえば,
secureとenergyという単語を含む英語を検索する場合どう書けばよいでしょうか?

もう少し細かい条件として,
secureとenergyの間にはさまる単語は4語以内(0語から4語),単語と単語の間のスペースは1語です。

つまり,以下,英文としての正否はともかく,
secure a a lot of energy
secure a lot of energy
secure lot of energy
secure a energy
secure energy
などが検索できればよいのです。

よろしくお願いいたします。


質問者が選んだベストアンサー

量指定子を使えば可能だと思います。

あまりちゃんとした検証をとった訳ではないのですが、これでどうでしょうか
確認した処理系は鬼車です。

# 正規表現
\bsecure( ?\w* ?){0,4}energy\b

# マッチする
secure a a lot of energy
secure a lot of energy
secure lot of energy
secure a energy
secure energy

# マッチしない
secure so many a lot of energy


お礼

ありがとうございました。ばっちりです。
自分なりにその後,
secure.+energy
として,間が開きすぎのものは手作業でみていたりしましたが教えていただいたもので,うまくできました。
大変助かりました。

ANo.1


簡単に書くなら "\<secure\>.*\<energy\>" ですかね。
\< = 単語の始まり(直前が空白類)
\> = 単語の終わり(直後が空白類)

間が「.*」なので、4語以内と言わずどんなに長くてもマッチしてしまいますが。

0語~4語にマッチさせたいとなると、
\s=空白類(スペース・タブ・改行)
\w=英単語(スペースを含まない英数文字列)
なので、いずれかにマッチする選択( | )と組み合わせて、

(\s|\s\w\s|\s\w\s\w\s|\s\w\s\w\s\w\s|\s\w\s\w\s\w\s\w\s)

こんな表記になります。

これを .* の代わりに記述すれば、

\<secure\>(\s|\s\w\s|\s\w\s\w\s|\s\w\s\w\s\w\s|\s\w\s\w\s\w\s\w\s)<energy\>

となるでしょう。

ANo.2


最後の energy の単語頭を間違えました。

\<secure\>(\s|\s\w\s|\s\w\s\w\s|\s\w\s\w\s\w\s|\s\w\s\w\s\w\s\w\s)\<energy\>

こちらで。

現状で \s \w と \>、\< を同時に受け付ける処理系はなかったような気がすんだけど (自信ない)。それに、\s \w で繰り返しがないから「単語」にはマッチしようがない。

ベストアンサーも、secureenergy にマッチしちゃうんじゃね? 「処理系」が鬼車ってのもよくわからん。

■_

■_


一つ前へ 2011年4月(中旬)
一つ後へ 2011年5月(上旬)

ホームへ


リンクはご自由にどうぞ

メールの宛先はこちらkbk AT kt DOT rim DOT or DOT jp