Distractions - I've got some

administrator · 21 February 2024 17:25

The two extra Turing RK1 boards arrived after getting back from my week away so I’'ve been spending some time finishing building the hardware. The 32-core beast is now sitting next to me with all it’s 15 LEDs shining/flashing and the 5 fans purring gently.

I’ve flashed Ubuntu 22.04.4 onto each of the nodes and configured them all to play nicely on my network, e.g. static DHCP reservations and SSH keys. I’ve also installed K3S - a simpler version of Kubernetes (aka K8S) which will allow me to deploy pods (a bundle of resources like containers and storage) that can auto-failover to another node if I need to shut a node down or it crashes. I already have 48 (system type) pods up and running and that’s only using 3% of the CU and 8% of the memory from one of the 4 nodes.

Also installed is longhorn, which is a persistent storage system. Think of it like RAID but with the storage running across different nodes. This is what allows K3S to fail pods over to other nodes, because the disk storage is in multiple locations.

Next job is to finalise the clustering and test fail-over with a test pod. After that I’m going to try installing some ‘real world’ software. Unfortunately I can’t try WD because consolewd is built for the Pi OS so I don’t think it will work on the configuration I have. I think CumulusMX should work though. It will be interesting to see if I can get it to run, updating MQTT and MySQL and have all of that magically fail over if I shut a node down.

There are still points of failure…the cluster motherboard may fail, or the power go off, or the network go offline, but I’m having fun learning this stuff and applying it to my environment

bitsostring · 21 February 2024 17:43

I know nowt about Linux but there are 7 Linux versions of WD here?

administrator · 21 February 2024 18:10

It’s got to be 64-bit so less than 7 and the 64-bit Linux ones say desktop or GUI and I’m not running either of those.

bitsostring · 21 February 2024 18:21

Now I understand even less about Linux

administrator · 21 February 2024 18:32

Desktop or GUI means like Windows. Console means command line, e.g. CMD in Windows or MS-DOS.

With Linux you start with command line, and can optionally install the GUI/desktop on top of that. Linux servers are often used only with the command line available.

administrator · 22 February 2024 15:36

I’m making progress

I now have 3 cluster server nodes which can each ‘run’ the complete environment. So if one dies for any reason one of the other two will take up the cluster management duties. All three server nodes also act as ‘worker nodes’ and the fourth node is just a worker node. Worker nodes are where the pods that do the real work run.

The network is also configured with a load balancer. If I have a service running on one (or more) nodes each with it’s own internal IP address, the load balancer provides a fixed ‘external’ IP address (one from my network) that points to the current live service on whichever node it’s currently running on.

I’ve added a simple ‘Hello World’ service which basically displays a web page showing the internal IP address of the node that the service is running on. The service is configured to run on 3 worker nodes and to re-create itself if it dies. When it’s running I can kill the service on the node the web page shows it’s running on. Pretty much instantaneously the node IP address changes on the web page showing that a new node has taken the work on. The system then re-starts the service I just killed, maybe on the original node, or perhaps on a new one.

Being honest, this isn’t actually something that’s new to me. I’ve specified and bought systems that did this before I retired, but the price of those systems had a lot of zeros in it. They also needed some clever people to set them up (way beyond my skills). By contrast this is all being done in a small box sitting next to my laptop that probably cost less than 1% of the big systems I previously got installed.

My next plan is to get something non-trivial working. That’s going to push my knowledge quite a bit so it’s likely to take longer to achieve!