Skip to content

Commit

Permalink
Checking in wikipedia-logger, improved version of wikipedia-hdfs
Browse files Browse the repository at this point in the history
As the functionality is not really specific to HDFS, renamed the project to
wikipedia-logger.
switched to cronolog, which is presumably more reliable and clean.
HDFS archiving is not re-implemented yet.
  • Loading branch information
kngenie committed Dec 22, 2018
1 parent d8c6dec commit faa8bdc
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 0 deletions.
23 changes: 23 additions & 0 deletions wikipedia/logger/start-logger.sh
@@ -0,0 +1,23 @@
#!/bin/bash
#
HERE=$(cd $(dirname $0); pwd)

: ${KAFKA_SERVERS:=crawl-db07:9092,crawl-db02:9092}
: ${KAFKA_BIN:=$HERE/kafka/bin}
: ${DATADIR:=$HERE/data}
: ${CONSUMER_GROUP:=wikipedia-logger}

topic=$1
case "$topic" in
wiki-irc|wiki-links) ;;
'') echo Usage: $0 '{wiki-irc|wiki-links}' >&2; exit;;
*) echo invalid topic: $topic >&2; exit;;
esac

CONSUMER_OPTS=(
--bootstrap-server $KAFKA_SERVERS
--topic $topic
--consumer-property group.id=$CONSUMER_GROUP
)
$KAFKA_BIN/kafka-console-consumer.sh "${CONSUMER_OPTS[@]}" | \
cronolog $DATADIR/$topic-%Y-%m-%d.log --symlink $DATADIR/$topic.log
7 changes: 7 additions & 0 deletions wikipedia/logger/wikipedia-logger.ini
@@ -0,0 +1,7 @@
[program:wikipedia-logger-irc]
directory=/1/crawling/wikipedia-logger
command=/1/crawling/wikipedia-logger/start-logger.sh wiki-irc

[program:wikipedia-logger-links]
directory=/1/crawling/wikipedia-logger
command=/1/crawling/wikipedia-logger/start-logger.sh wiki-links

0 comments on commit faa8bdc

Please sign in to comment.