A tiny Kafka-driven text publishing social network service.
To run Kafker on your local machine, you will need
- A Kafka server listening on
localhost:9092 - A reasonably new Python3 environment with
poetryinstalled
For local usage, the repository contains a docker-compose-configuration which is an almost identical copy of an example configuration provided by the Kowl project.
Kowl is an easy to use webinterface for Kafka and very helpful to understand what's going on under the hood.
To run Kafka on your local machine, ensure that you have a working docker configuration and run
$ cd docker
$ docker-compose upThe compose file will expose kafka and the kowl webinterface on your local machine, reachable at http://localhost:8080.
You can easily delete the entire kafka state (that is, all messages and topics) by stopping the docker containers and deleting the zk-single-kafka-single folder.
This repository uses Python Poetry for dependency management. After cloning the repository, run
$ poetry installto setup a development environment.
You can now run commands in the environment via poetry run <stuff> (such as poetry run python -m kafker) or spawn a shell which configures the environment automatically by running
$ poetry shellThe commands below assume that you are inside such a shell.
Kafker is a Faust application. To initialize and run Kafker, start at least one worker after ensuring that your Kafka server is reachable.
$ python -m kafker worker -l infoSee the Faust documentation for a few more details on how to run a Faust application. Note that using the faust executable is also possible: faust -A kafker.app worker -l info.
Kafker exposes a number of REST-APIs and CLIs.
They can be found in the views.py and commands.py modules respectively, so take a look at the code for all the details.
The commands can be discovered through the Faust CLI:
$ python -m kafker --helpFor example, you can use
$ python -m kafker register -n markusto register a new user or
$ python -m kafker post -a markus -t "Why not Zoidberg?"to create a new post.
Timelines, follow relationships and a few other things are exposed via REST APIs.
They can be reached at http://localhost:6066/.
For example, you can inspect a user's timeline at http://localhost:6066/timeline/markus/.
There is a small script called generate_data.sh that can be used to add some trivial data to the system.
To add a corpus of data to the Markov Chain, you can use the ingest-data-command:
$ python -m kafker ingest-data my-text.txtThe format is quite lenient, I recommend using files that contain lines of text with short-ish lines and not too many funky special characters.
You could, for example, use data from the Trump tweet archive (see the FAQ) - or some speech corpus - to initialize the chain. Something like this for preprocessing:
$ jq '.[].text' trump_tweets.json | sed "s/\\n/ /g" | sed "s/ +/ /g" | tr -d "\"" > trump_tweets.txtYou could now either ingest the data:
$ python -m kafker ingest-text trump_tweets.txtOr add some posts to the system:
$ python -m kafker register -n trump
$ shuf -n 23 trump_tweets.txt | tr "\n" "\0" | xargs -0 -P 10 -IX python -m kafker post -a trump -t XThis repo uses black and isort for code formatting and pylint for linting.
To easily ensure that code changes conform to the style requirements, you can use pre-commit to automatically run checks before every commit.
You need to install the hooks from within a poetry shell and commit from within a poetry shell as well:
$ pre-commit installYou can manually run the checks on all changed files
$ pre-commit runor on all files
$ pre-commit run -aYour editor can probably be configured to at least run isort and black in regular intervals.