I have been running Home Assistant for several years now. I have migrated from a lowly raspberry pi 1 B+ to a laptop with a Python virtual environment. Next I upgraded that laptop to an SSD and switched to running the docker container, before I set up my docker-compose.yaml based deployment. This compose file has served me well, and running the docker container has made upgrades very easy, something I do once at the end of the month.

As my network has grown, there are two things that have caught my eye. The first is splitting up my Home Assistant instance into a few more independent services. The second is running everything on a local Kubernetes stack instead of via docker compose.

Why Fix What Ain't Broke?

While my system does work very well, I have read enough anecdotal reports to suggest that it could work even better. One of the reasons is my reliance on Home Assistant's native Zigbee integration called ZHA. So far I have not yet run into any serious issues other than some general odd behavior from time to time, overall I can say that the ZHA plugin is fast, reliable, and easy to use. I would certainly continue to recommend it to those just getting started and not as confident in administering multiple services. However several community members mention the improved hardware support that Zigbee2MQTT offers.

I used to use Home Assistant's native Zwave integration as well before migrating to ZwaveJS. My experience with ZwaveJS has been excellent so far, and the migration was quite trivial, but I did read from one of the Zwave device manufacturers, that the devices are better supported when ZwaveJS is used in conjunction with an MQTT server rather than directly integrated into Home Assistant.

Since both of these services can use an MQTT Broker to pass messages, It seems like the time has come to finally add MQTT to my stack. This will also be useful for future ESP32 based projects but more on that later!

I could have easily just extended my docker-compose.yaml file, as it was already working. Then update my integrations and configurations as needed. However I have been wanting an excuse to tinker with a local Kubernetes (k8s) stack to gain better understanding in the tools and workflows available to manage modern deployments.

Is it overkill? Yes, easily. Will it be frustrating? Likely. Will I learn something along the way? Most definitely!

Host OS

Currently I have mostly standardized on Debian for appliance type tasks. Debian however can be a bit much when all you want is a container runtime. This is why the debian-slim images for docker are popular base images, as they try to strip out a lot of the bloat. However they are hardly small when compared to images based on Alpine Linux.

Alpine is quite small in comparison, the install iso is only 150Mb, and that's not a net-install image like Debian, that's the full OS! Alpine is focused on containers, servers, or other security sensitive systems where you want to minimize risk by minimizing your attack surface through reduced container cruft.

This philosophy comes with a few differences to those used to working on a run of the mill Linux server. Some of the gotchas include: - muslC instead of GlibC (especially an issue for Python programs) - Very limited package repository (e.g. vi but no vim) - Stripped down installer - No GUI by default (multiple available in the repo) - mdev instead of udev - OpenRC instead of systemd

All of these add a little extra 🌶 to the setup but none are a reason to discount Alpine, just things to be aware of. Well except maybe mdev which is lacks an important feature compared to udev and I am a little surprised it doesn't exist.

Alpine provides this Comparison with other Distros page if you are curious as to some of the other differences.

Install Process

Install Media

Nothing really special here, just flashed a USB stick I had laying around. I would like to note the difference in performance if you don't use the right set of flags for dd.

I grabbed the standard image for x86_64 because I am running my primary node on an Intel gen 8 i7. Alpine has other download images available depending on your hardware or operation context.

I used fdisk -l to identify my USB drive and it's associated device path under /dev/sdXX

dd status=progress if=alpine-standard-3.16.2-x86_64.iso of=/dev/sdb bs=4M oflag=sync 

Alpine provides some basic documentation on the Installation on their wiki. I found the additional docs on the Alpine setup scripts also very helpful.

setup-alpine

The first thing we are greeted with after boot is a login window, the user root will let you login with out a password.

Next you can run the setup-alpine command to start the installer script and it will prompt you for a series of answers to help guide the install.

Keyboard Layout

I picked us and then when prompted for the Available variants: I again chose us

Hostname

Give your machine a unique name. I don't curren't run an independent DNS server and my router's local DNS options are limited to DHCP clients. Plus it helps identify the node in k8s.

Network

If your machine like mine has both a wireless and wired interface, you should appear in the list. When I tried to initialize just the Wi-Fi or do so before the ethernet interface, I had some issues with DNS configuring resolve.conf. Letting both interfaces get initialized and set with DHCP seemed to allow the DNS settings to be self set.

Once I configured the wireless interface, I say no to manual configuration as this just opens up the file /etc/network/interfaces

Root password

Set a new password for the root user as they currently have no password set.

Timezone

I would recommend using the ? here to confirm your options in time zone selection. For me I used EST5EDT and it should keep up to date with DST changes.

Proxy

I don't use one, but if you do you should set it here.

NTP

You will be prompted to select between three different NTP daemons.

When I was first presented with list list I was not sure what to choose or why chrony might be a good default. So I did a little investigation and didn't come back with much but I did find these two sources of information:

With chrony also being a preferred solution by RedHat seems to ensure that it will continue to remain a secure and stable NTP client. This is also the default option here and unless you have a good reason to change, I would stick with it.

Mirror

This step is where I ran into some odd issues. When you are prompted for a mirror, one of the options is f to find the fastest mirror from the list. Unfortunately for me, the mirror it picked was having an issue and didn't have certain package versions available so my install failed. I did not realize this till several steps and retries later. I was able to proceed with my install by manually selecting a different mirror number on the next install attempt.

The error I eventually see once I get to the end of the install is this:

ERROR: linux-lts-5.15.68-r0: package mentioned in index not found (try 'apk update')

:facepalm:

Unfortunately this is an issue with the mirror ette.biz that I ran into again a few days later. I found some contact info for the company and I reached out to them to see if they can get it fixed. Looking at the Mirror health page I can see that this one is having some issues as of this article.

To get around this, I just rerun setup-apkrepos and selected another appropriate repo based on location and bandwidth.

I noticed when rerunning the script it just append the new mirror so I needed to remove the old one and upgrade the protocol from HTTP to HTTPS 🔒

Setup User

Is this system going to have actual users on it? For me it is not and I can do authentication with SSH keys. So I choose to not add a user at this time. I can always add one later if I need to.

SSH Server

You will be asked to choose between two different SSH server implementations:

I have used dropbear in the past, it's fine for embedded projects, especially if you are restricted on resources like space, but for here I kept the default of OpenSSH as I am more comfortable with that offering.

Root Login

You can set yes, no, or leave the default of prohibit-password and it's a good idea to set up SSH keys for access and leave the default option. If you can't transfer your *.pub key over with physical media, select yes which will let you log in with a password for the root user and then you can copy over your SSH keys and then reconfigure sshd.

SSH Key

If you have your SSH 🔑 hosted on the network somewhere, it can be pulled in automatically so you don't have to manually add it later. Maybe this is something I should setup at some point in the future. For now I just left none

Disk Selection and Setup

You will be asked which storage media detected you would like to use. For my use case, I want to write to the internal hard-drive and then boot from that. Alpine supports other boot workflows depending on your needs.

I entered sda and continued.

Then it asks what type of disk you would like to use it as. For a install that boots from the disk, we want to choose sys and this will let us repartition the entire disk for the alpine installation.

You will be prompted one final time to confirm your selection before any data is wiped and alpine is installed to disk. It will default to n here so you have to actively choose y before you can continue.

If the script detects previous partitions, it will give an additional confirmation step where you must also explicitly pass y in order to continue.

Since my initial run through chose a repo that is in a bad health status, my initial install fails. After fixing the repo issue as mentioned in Mirror I have to unmount the partions in order to re-run setup-disk.

umount /dev/sda1
umount /dev/sda3
swapoff -a

Then I can run setup-disk and it will correctly populate the list for me. Unfortunately however I ran into what seems like a bug in the installer, when I try to confirm that I wish to proceed with the wipe.

The file /dev/sda3 does not exist and no size was specified. 
mount: mounting /dev/sda3 on /mnt failed: No such file or directory

via GIPHY

The error message is no lie, there is no file /dev/sda3 so what to do about it? Maybe try wiping the drive with dd?

dd if=/dev/zero of=/dev/sda bs=4M 

NOTE: status=progress was not an allowed parameter for dd inside the alpine image

Still failed to create /dev/sda3, same error message as before.

🤔

This has to be an issue with the hardware hot-plugging because the device partitions are not showing up. So I restarted mdev

rc-service mdev restart

Re-run setup-disk and...

Installation is complete. Please reboot

via GIPHY

Kubernetes

Which one?

There were several viable k8s distributions I could have gone with for this project, but I limited my scope to ones that support single node installations. The k8s distributions I ended up evaluating were:

I don't intend this section to serve as a comprehensive comparison but rather just some notes on what I notices about each project and how that helped inform my choice for this project.

k0s

A production ready k8s with a focus on small and secure. This project has commercial backing, and seemingly a great pace of updates on their GitHub repo.. This single binary provides just enough to get k8s up an running.

The major thing that ended up turning me off was how you do a multi-node install. The default way of doing a multi-node install is to have all the nodes up on your network ready to accept connection for management. The manual process seemed like a bit more work compared to k3s or microk8s when it comes to adding nodes post initial install.

k3s

This project has been under the Rancher group for some time, which is now a SUSE product. It's target are small installs and edge computing, but can be scaled up or down as needed and integrate with larger projects. Adding new nodes seemed like an easy task, documentation looks good, and the GitHub repo was not only active but has a lot of stars and watchers.

K3s seems like they want to provide more than "just enough" to get started so that you have an easier onramp to actual production use. The single binary install method that k3s uses is similar to k0s, and makes it easy to upgrade the k8s distribution regardless of the underlying OS.

This is also the distribution that is included with Rancher Desktop.

I ultimately chose k3s because I felt it offered a good balance of simple, documented, and popular with the community.

minikube

Minikube is a popular and well supported across multiple OS's for single node k8s clusters. The main downside is that minikube is not production ready according to the docs as that isn't the focus. Minikube is meant to run a local instance of k8s for local machine traffic.

Minikube can be used as a Docker desktop replacement for those who are looking for a flexible solution for local development.

MicroK8s

A k8s production ready distribution with the backing of Canonical, the makers of Ubunutu. This version is focused on simple single node deployment and installs via Snap package. This makes the install a bit more obtuse if you are not already using Snap (Ubuntu and derivatives come with Snap preinstalled)

MicroK8s has an a neat plugin system to easily install some common k8s components like istio, knative, argocd. This makes microk8s a compelling option but the limited documentation is a turn-off for me. The project repo has been picking up steam and has been keeping up to date with upstream k8s.

When I last tinkered with MicroK8s it was on an Raspberry Pi 4b w/ 8Gb of RAM. The performance wasn't great and there were several bugs I found with some of the addons not running on arm at all. This left a bad taste in my mouth and kept me from looking at it too closely for this project.

Installing k3s

There are a decent set of docs on Installation of k3s into various Linux distros.

I used the install script:

curl -sfL https://get.k3s.io | sh -

This setup the my kubernetes cluster and wrote out a kubectl config file that I can be used from a remote machine to authenticate against your cluster's API server for management of the cluster.

Rebooted and confirmed that the cluster came back up.

Static /dev/ttyACM Paths

mdev vs. udev

Most modern Linux systems that use systemd have a hotplugging system called udev that can handle events when devices get plugged in. The kernel will populate /dev/ with your device files, and the hotplugging system can trigger events based on this, like auto-mouting. When certain serial devices get plugged in they populate a /dev/ttyACM device slot, starting at /dev/ttyACM0 and increasing by 1 for each devices of the same class. So the next one would be /dev/ttyACM1 and so on. If you only ever have one device, it is reliably going to be populated at /dev/ttyACM0 for the most part. It's once you have multiple devices plugged in during boot, or if you were to unplug and re-plug a device (like accidentally bumped by a 🐱) it's not possible to ensure that you will end up at /dev/ttyACM0 again.

This is where device static mapping comes in. A simple script I have that detects my USB Zigbee and or Zwave devices, then creates a symlink thats always at the same location regardless of which /dev/ttyACM* gets assigned. So I can always refer to those devices by the paths /dev/zigbee and /dev/zwave and not have to worry about order of USB population. My custom udev rule-set:

KERNEL=="ttyACM*", ATTRS{idVendor}=="1cf1", ATTRS{idProduct}=="0030", SYMLINK+="zigbee"
KERNEL=="ttyACM*", ATTRS{idVendor}=="0658", ATTRS{idProduct}=="0200", SYMLINK+="zwave"

This file lives in /etc/udev/rules.d/99-usb-serial.rules and is automatically read on boot. Notice how we provide a vendor id and a product id to make it easy to identify specific hardware. We can get that information from lsusb or other commands:

$ lsusb
Bus 002 Device 004: ID 0658:0200 Sigma Designs, Inc. Aeotec Z-Stick Gen5 (ZW090) - UZB
Bus 002 Device 010: ID 1cf1:0030 Dresden Elektronik
Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 147e:2016 Upek Biometric Touchchip/Touchstrip Fingerprint Sensor
Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

These vendor and device IDs are the same regardless of what system you are on, or what USB port you plug into, making them an excellent way to identify your particular hardware devices.

As I mentioned earlier, Alpine makes use of BusyBox's mdev for hotplugging event handling. When I began to look, there were some ideas on how to do this, as mdev does support a similar concept.

Docs for mdev gives you a decent idea of what the file format is, and how we might go about editing it. They even support easily creating a symlink for your /dev/ population events

Similarly, ">path" renames/moves the device but it also creates
a direct symlink /dev/DEVNAME to the renamed/moved device.

One important thing to note:

The command is executed via the system() function (which means you're giving a
command to the shell), so make sure you have a shell installed at /bin/sh.  You
should also keep in mind that the kernel executes hotplug helpers with stdin,
stdout, and stderr connected to /dev/null.

If we want to log, then we will have to persist it to disk, useful for debugging custom mdev scripts! mdev supports running custom scripts as the command to be executed when an event rule is matched.

The Goal

Based on a hotplug event, create a symlink to the matching /dev/DEVNAME such that the same device and vendor ID pair are always available under the same device path.

via GIPHY

What do we Know and What can we Learn?

I found several good sources of information on how to work with mdev.

The first thing is to confirm which environment variables are being populated during and add or remove event.

For this we can just add a very simple line to /etc/mdev.conf file such that it prints everything to a file. Most rules will cause mdev to stop trying to evaluate additional rules, so be sure to put it above any other lines looking for ttyACM[0-9] as a match.

ttyACM[0-9] root:tty 660 *printenv > /opt/dev-map/env.log

We need to restart the mdev service before it will pick up our new rule:

rc-service mdev restart

Next we can test plugging in some hardware and see what variables are being populated:

DEVNAME=ttyACM0
ACTION=remove
SHLVL=1
HOME=/
SEQNUM=3134
MAJOR=166
MDEV=ttyACM0
DEVPATH=/devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/tty/ttyACM0
SUBSYSTEM=tty
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MINOR=0
PWD=/dev

This is consistent with both How to set up mdev rules for Busybox blog post and mdev.c comments. Unfortunately we don't get PRODUCT or any values that can take us directly from our known device and vendor ID pair to this device.

We do have a few useful bits of data here though

  • DEVPATH
  • MDEV
  • ACTION

I just need a way to link the data I know (USB device and vendor ID pair) to the something I learn (the environment variables that get populated) so I can create the right symlink 🤝

The first thing that came to mind was dmesg as it likely had enough information during the device plugin event that we can link them together.

For my testing I am using a BetaFlight based flight controller for drones, running in an HID mode to pretend it's a joystick. It populates on /dev/ttyACM slots so it served as my test hardware while I developed this solution. The device and vendor id par for this board was 0483:3256

I plugged in the device, and then checked dmesg for events

# dmesg | tail | grep 0483:3256
[426648.450076] input: Betaflight Betaflight STM32F411 as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.0/0003:0483:3256.0009/input/input22
[426648.450404] hid-generic 0003:0483:3256.0009: input,hidraw0: USB HID v1.11 Gamepad [Betaflight Betaflight STM32F411] on usb-0000:00:14.0-1/input0

I played around with the output a little and was eventually able to grab just the USB PCI ID, which for this port is 0000:00:14.0

Now I can check the action type to see if it is add or remove and then either create or delete a symlink from /dev/$DEV_TYPE to /dev/$MDEV and match with the USB PCI ID in $DEVPATH

Scripting It

Pulling it together in a reusable way, I came up with a base script plus some configuration environment variables. Each device match uses the same script, and just passes in the configuration parameters for any /dev/ttyACM[0-9] event. The base script I named static_ttyacm.sh and located in /opt/dev-map/ as an arbitrary location.

#!/bin/ash
# Dynamically grab the device ID
# Give dmesg enough time to have the logs we need, by calling sleep 1
PCI_ID=`sleep 1 && dmesg | grep $DEV_ID | grep 'usb-' | tail -n 1 | cut -d - -f 3` 
LINK_PATH=/dev/$DEV_TYPE
TTY_DEV=/dev/$MDEV

# Test if we found a match for our device in dmesg
if [[ "$DEVPATH" == *"$PCI_ID"* ]];
then
    logger "Matched $DEV_TYPE device at $DEVPATH to $DEV_ID using $PCI_ID"
    # Test if the ttyACM device was added or removed
    if [ "$ACTION" = "add" ];
    then
        ln -sf $TTY_DEV $LINK_PATH
        # Confirm success or failure on creation of symlink
        if [ $? == 0 ];
        then
            logger "Created soft-link from $LINK_PATH to $TTY_DEV"
        else
            logger "Soft-link creation between $TTY_DEV and $LINK_PATH failed"
        fi
    fi
    if [ "$ACTION" = "remove" ];
    then
        rm $LINK_PATH
        # Confirm success or failure on removal of symlink
        if [ $? == 0 ];
        then
            logger "Removed soft link from $LINK_PATH to $TTY_DEV"
        else
            logger "Soft-link removal between $TTY_DEV and $LINK_PATH failed"
        fi
    fi
fi

Then I can add an entry to /etc/mdev.conf to call /opt/dev-map/static_ttyacm,sh with the right environment variables set, and we should see a log and an a symlink :fingers_crossed:

So my updated /etc/mdev.conf entry looks like this:

#Custom
#-ttyACM[0-9]   root:root    0660 *printenv > /opt/dev-map/env.log
-ttyACM[0-9]    root:root    0660 *DEV_TYPE=bfhid  DEV_ID=0483:3256  /opt/dev-map/static_ttyacm.sh 
-ttyACM[0-9]    root:root    0660 *DEV_TYPE=zigbee  DEV_ID=1cf1:0030  /opt/dev-map/static_ttyacm.sh
ttyACM[0-9]     root:root    0660 *DEV_TYPE=zwave  DEV_ID=0658:0200  /opt/dev-map/static_ttyacm.sh

The - at the front of the line indicates we should not stop on this match, and continue on until a match that is not prefixed with a -.

A quick restart of the service and we can test plugging in and unplugging the device, and see if we get any messages in /var/log/messages

rc-service mdev restart

This will force mdev to reload the config file. Now to plug in 🔌

Sep 27 12:07:08 poseidon user.notice root: Matched bfhid device at /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.1/tty/ttyACM0 to 0483:3256 using 0000:00:14.0
Sep 27 12:07:08 poseidon user.notice root: Created soft-link from /dev/bfhid to /dev/ttyACM0

And confirm that we have a symlink in /dev/

# ls -lah /dev/bfhid
lrwxrwxrwx    1 root     root          12 Sep 27 12:07 /dev/bfhid -> /dev/ttyACM0

Now just to reboot and confirm it's all still working. Post reboot I check if we have anything in /var/log/messages and if we have our symlink in /dev/ and we have neither!

After some investigation I needed to rebuild the initramfs with the new file. The Initramfs init page gives us the which we can modify slightly to this:

mkinitfs -c /etc/mkinitfs/mkinitfs.conf -b / $(ls /lib/modules)

Excellent, now we reboot. And still no logs, what gives?

Thanks to some help on the OFTC - #alpine-linux IRC I learned I needed to also updat my /etc/mkinitfs/features.d/base.files file to include an entry to my script. I added it just before the /etc/mdev.conf file, I am not sure order matters here but I have no tested this.

...
/opt/dev-map/static_ttyacm.sh
/etc/mdev.conf
...

Next we test if mkinitfs will pick up our script as being included in the boot image.

# mkinitfs -l | grep static_ttyacm.sh
./opt/dev-map/static_ttyacm.sh

Now we can rebuild the image again, and reboot. Unfortunately thuogh we won't have logging on boot because of the runlevel this is executed in. I belive it can be configured but I am not going to look into that at this time.

via GIPHY

Additional Work

At this point I have k3s installed, and my devices are available under /dev/zigbee and /dev/zwave and I am ready to start migrating my deployment. There is more that I need to follow up on, I will touch on those topics in a future post, as there is so much I already covered here and so much more I need to cover as I continue down this path of discovery.

  • Device Pasthrough
  • Helm Charts
  • Migrating to zigbee2mqtt
  • MosquittoMQTT Broker
  • Addtional Worker Nodes

So far I am happy that I stuck it out and worked through hottplugging on Alpine. There was a real struggle at times but I was able to meet the challenge I set for myself and the accomplishment is excellent. I look forward to working out how I should setup device passthrough. I have an idea of two possible solutions, but I am not sure which I will end up going with. Stay tuned for updates as I continue the migration.

Share and Enjoy!