MapReduce Views in CouchDB

Facebook にシェア
このエントリーをはてなブックマークに追加
[`livedoor` not found]
Delicious にシェア

前回に続いて、MapReduceによるView作成、View実行を確認してみました。
Javascript以外の言語でMapReduceを行う場合は、couchDBのiniファイルに以下のようなイメージで追記します。
正常反映されれば、FutonのView Codeセグメントでlanguageとして使用可能となります。

[query_servers]
python = /usr/bin/couchpy

下記はサンプルとして、tweetのユーザ情報でlocation = ‘Tokyo’であるtweetを出力しています。
また、本サンプルではReduce処理を使用していません。
https://github.com/shmachid/twitter_mining/blob/master/define_view_in_couch.py

# -*- coding: utf-8 -*-

import couchdb
from couchdb.design import ViewDefinition


SERVER_URL = 'YOUR COUCHDB URL'    #ex: http://localhost:5984
DB_USER = 'YOUR USER'
DB_PASSWD = 'YOUR PASSWORD'
DB = 'YOUR DB NAME'

server = couchdb.Server(SERVER_URL)
server.resource.credentials = (DB_USER, DB_PASSWD)


try:
    db = server.create(DB)

except couchdb.http.PreconditionFailed, e:
    db = server[DB]

    def mapper(doc):
        if doc['author']['location'] == "Tokyo":
            yield (doc['id'], doc['text'])

    ## if you need to use reduce function, please remove bellow the comment-tag.
    #def reducer(keys, values, rereduce):
    #    return max(values)

    view = ViewDefinition('index', 'map_location', mapper, language='python')
    #view = ViewDefinition('index', 'map_location', mapper, reducer, language='python')
    view.sync(db)


    records = [(row.key, row.value) for row in db.view('index/map_location')]

    for record in records:
        print record[1]

Viewを作成・更新後、初回実行時にViewの更新処理が走る為、View実行完了まで時間が掛かりました。(対象件数は25万件であった)
データ更新が絶えず入っている状況でViewへアクセスがあった場合は、どのよう挙動となるのかが気になります。
ログは以下のように出力されます。何か良い方法あるのかな、探したけど分からなかった。

[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244494 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244520 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244548 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244579 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244606 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244634 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244664 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244693 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244721 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244747 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244773 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244797 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244821 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244848 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244877 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244903 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244934 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244961 for tweets _design/index
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244991 for tweets _design/index