Installing apache kafka and zookeeper on ubuntu 20.04 or raspberry pi (ubuntu) cluster
Recently I installed 3 broker kafka cluster on raspberry pi. Used three Raspberry Pi 4b 4 GB memory models and 32 GB sd cards. Same process can be followed to install kafka cluster on normal ubuntu machines (or any linux variant), just skip the parts about copying images to sd cards for raspberry pi. Steps would be same for any x number of linux machines.
I could have followed the approach of installing all required components to three machines separately but I just did it on one and then copied cards. It did not save me whole lot of time because of speed of process but it validated process of backup and restore for brokers. My desktop was windows machine, so steps to copy images would be different on linux.
Let us start by preparing first machine. What do we need?
- Linux — picked ubuntu for pi
- Java
- Zookeeper
- Kafka
- In addition, can setup users for kafka and zookeeper
- Setup services
NOTE: I am in no way linux expert, follow your linux best practices.
Started by installing raspberry pi imaging utility on desktop:
https://www.raspberrypi.org/blog/raspberry-pi-imager-imaging-utility/
Put first SD card in desktop. Launch above raspberry pi imaging utility. Click on Choose OS. Chose Ubuntu:
Select SD Card and click on Write. It’ll take some time.
Once done, put card in first raspberry pi. If you have monitor connected to your pi, logon to pi. Or just ssh from your desktop. I went with ssh but first need to get address of new pi. Ran following command on powershell to get address:
arp -a | findstr “dc”
(could be b8 for some models)
After getting IP address, ssh into first machine. I started by setting hostname first because we are setting up cluster. Can also set this hostname while you still have card in your desktop slot by editing config file but I prefer this way.
sudo hostnamectl set-hostname <name>
Because this is cluster, pick some naming convention like kafka-broker-1 or broker-1 and so on.
Can start by updating packages:
sudo apt update && sudo apt upgrade
Install java:
sudo apt install default-jdk -y
Let us make some directories and set up some users/permissions.
sudo mkdir /opt/zookeeper
sudo mkdir /data/zookeeper
sudo mkdir /opt/kafka
sudo mkdir /data/kafka
Zookeeper:
sudo useradd zookeeper -m
sudo usermod — shell /bin/bash zookeeper
sudo passwd zookeeper
sudo usermod -aG sudo zookeeper
sudo chown -R zookeeper:zookeeper /opt/zookeeper
sudo chown -R zookeeper:zookeeper /data/zookeeper
Kafka:
sudo useradd kafka -m
sudo usermod — shell /bin/bash kafka
sudo passwd kafka
sudo usermod -aG sudo kafka
sudo chown -R kafka:kafka /opt/kafka
sudo chown -R kafka:kafka /data/kafka
Install Zookeeper:
cd /tmp
sudo wget https://apache.claz.org/zookeeper/zookeeper-3.6.2/apache-zookeeper-3.6.2-bin.tar.gz
(change to version that you need)
cd /opt/zookeeper
sudo tar -xvzf /tmp/apache-zookeeper-3.6.2-bin.tar.gz — strip 1
(this will extract tar file in current folder)
Set server/broker id:
sudo bash -c ‘echo 1 > /data/zookeeper/myid’
Just copy sample config file for now. We would update it after imaging our sd card.
sudo cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
Create service file:
sudo vi /etc/systemd/system/zookeeper.service
[Unit]
Description=Zookeeper Daemon
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target[Service]
Type=forking
WorkingDirectory=/opt/zookeeper
User=zookeeper
Group=zookeeper
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfgTimeoutSec=30
Restart=on-failure[Install]
WantedBy=default.target
Set Restart policy as per your need. Until cluster is fully set up, I like to to set it to no.
sudo systemctl daemon-reload
sudo systemctl enable zookeeper
sudo systemctl start zookeeper
Then check its status:
sudo systemctl status zookeeper
Status should be active/running. Next part is to setup kafka.
Installing Kafka:
cd /tmp
sudo wget https://www.apache.org/dist/kafka/2.6.0/kafka_2.13-2.6.0.tgz
(change to version that you need)
cd /opt/kafka
sudo tar -xvzf /tmp/kafka_2.13–2.6.0.tgz — strip 1
(this will extract tar file in current folder)
Open server.properties to set couple of properties. We would update it again once all brokers are set up.
sudo vi /opt/kafka/config/server.properties
set broker.id=1 in this file. can leave zookeeper connecto localhost:2181. We would update it later.
Create service file:
sudo vi /etc/systemd/system/kafka.service
(can edit Requires/After lines in service file because we would be in cluster mode and it is not absolute requirement to have the local one up and running)
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service[Service]
Type=simple
WorkingDirectory=/opt/kafka
User=kafka
Group=kafka
Environment=”JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64"
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-failure[Install]
WantedBy=multi-user.target
Set Restart policy as per your need. Until cluster is fully set up, I like to to set it to no.
sudo systemctl daemon-reload
sudo systemctl enable kafka
sudo systemctl start kafka
Then check its status:
sudo systemctl status kafka
Status should be active/running.
At this stage, we have zookeeper and kafka running on one machine in stand alone mode. Either we can just follow same steps on brokers 2 and 3 (and replace ids to 2 and 3 where needed) or image the card and start with that. I went with copy card approach.
Shut down pi and take out sd card. Put it back in desktop and use Win32 Disk Imager utility:
This will read from card and write to image in selected path. Once it is done, take out card and put back in pi. Then take card for second pi and put it in desktop. I could have used above utility to write to the card also but I stuck with using raspberry pi image utility used earlier but this time select Custom option:
Select image path and it would look like following:
After choosing SD card, click Write option. This is going to take some time. Once done, put this card in second pi and turn it on. Then do following steps:
If services are running, stop zookeeper and kafka services
Change hostname:
sudo hostnamectl set-hostname <name>
(if first one was broker-1, this should be broker-2)
Set zookeeper server id:
sudo bash -c ‘echo 2 > /data/zookeeper/myid’
Change id for kafka too:
sudo vi /opt/kafka/config/server.properties
set broker.id=2 in this file. can leave zookeeper connect to localhost:2181. We would update it later.
Can try to test it by bringing up both services. This would still be in stand alone mode.
Now take third SD card and put it in desktop. Follow all steps for broker-2 and just change 2 to 3 for hostname, zookeeper id and kafka broker.id.
At this stage we have all three machines setup with kafka and zookeeper but in stand alone mode. Now let us put them in cluster.
Set services in cluster mode:
Open ssh to all three machines from different windows and repeat following steps for all three:
- Stop zookeeper and kafka services
- Edit /opt/zookeeper/conf/zoo.cfg file and add following three lines at end of file:
server.1=broker-1:2888:3888
server.2=broker-2:2888:3888
server.3=broker-3:2888:3888 - Edit /opt/kafka/config/server.properties and change zookeeper addresses:
zookeeper.connect=broker-1:2181,broker-2:2181,broker-3:2181 - Delete all files from /kafka/data directory, especially meta.properties. This is to remove any information from stand alone mode.
- Following step is not required but I had messed up some file permissions in my debugging sessions, so does not hurt to to do it again:
sudo chown -R zookeeper:zookeeper /opt/zookeeper
sudo chown -R zookeeper:zookeeper /data/zookeeper
sudo chown -R kafka:kafka /opt/kafka
sudo chown -R kafka:kafka /data/kafka
- Enable/start both services and check for status
Testing:
We can do quick test to validate the installation. From first broker ssh window, create test topic:
/opt/kafka/bin/kafka-topics.sh — create — zookeeper broker-2 — replication-factor 1 — partitions 1 — topic test-topic-1
(we are using broker-2 to use zookeeper from second broker just for validation, it could be either one)
We should get confirmation that topic is created.
Now from all three ssh windows, run following command to get topic list:
/opt/kafka/bin/kafka-topics.sh — list — zookeeper broker-3:2181
(can change broker-3 to any other broker)
Debugging:
If you run into starting either service on any broker, try some of following steps:
Make sure permissions on directories are set correctly
Make sure that broker.id for kafka and myid for zookeeper is correct
Zookeeper connect addresses are correct in kafka server.properties file
Server list at bottom of zoo.cfg is correct
If you still have some issues, try to run ExecStart command from service file for kafka or zookeeper on command prompt like:
/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Backup:
Initially we created image after installing zookeeper and kafka. Then we did some manual updates on each machine. Once whole cluster is setup, we can again create images from cards but now we can do it for all brokers with full configs setup already. This would help in restoring individual brokers. Once you know the process, restoring from common image does not take too much time either.
Configuration Setup:
We just took out of the box config files and changed ids and addresses only. There is lot that can be configured for zookeeper and especially kafka. But that is a separate topic and also very well documented on apache website. One option always to look for is how long you want to keep messages/logs in kafka. Default is 168 hours (1 week).
Thanks!