前回に続いて、MapReduceによるView作成、View実行を確認してみました。
Javascript以外の言語でMapReduceを行う場合は、couchDBのiniファイルに以下のようなイメージで追記します。
正常反映されれば、FutonのView Codeセグメントでlanguageとして使用可能となります。
[query_servers] python = /usr/bin/couchpy
下記はサンプルとして、tweetのユーザ情報でlocation = ‘Tokyo’であるtweetを出力しています。
また、本サンプルではReduce処理を使用していません。
https://github.com/shmachid/twitter_mining/blob/master/define_view_in_couch.py
# -*- coding: utf-8 -*-
import couchdb
from couchdb.design import ViewDefinition
SERVER_URL = 'YOUR COUCHDB URL' #ex: http://localhost:5984
DB_USER = 'YOUR USER'
DB_PASSWD = 'YOUR PASSWORD'
DB = 'YOUR DB NAME'
server = couchdb.Server(SERVER_URL)
server.resource.credentials = (DB_USER, DB_PASSWD)
try:
db = server.create(DB)
except couchdb.http.PreconditionFailed, e:
db = server[DB]
def mapper(doc):
if doc['author']['location'] == "Tokyo":
yield (doc['id'], doc['text'])
## if you need to use reduce function, please remove bellow the comment-tag.
#def reducer(keys, values, rereduce):
# return max(values)
view = ViewDefinition('index', 'map_location', mapper, language='python')
#view = ViewDefinition('index', 'map_location', mapper, reducer, language='python')
view.sync(db)
records = [(row.key, row.value) for row in db.view('index/map_location')]
for record in records:
print record[1]
Viewを作成・更新後、初回実行時にViewの更新処理が走る為、View実行完了まで時間が掛かりました。(対象件数は25万件であった)
データ更新が絶えず入っている状況でViewへアクセスがあった場合は、どのよう挙動となるのかが気になります。
ログは以下のように出力されます。何か良い方法あるのかな、探したけど分からなかった。
[Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244494 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244520 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244548 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244579 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244606 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244634 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244664 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244693 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244721 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244747 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244773 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244797 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244821 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244848 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244877 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244903 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244934 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244961 for tweets _design/index [Sat, 05 May 2012 17:01:30 GMT] [info] [<0.6123.0>] checkpointing view update at seq 244991 for tweets _design/index


