Being eerily quiet!

I know I’ve been quiet. I’m sure you all missed me :wink: I’m trying to slowly catch up on what I’ve missed.

It’s been a busy time with Dad’s Taxi runs to airports, train stations and other places. International visitors going home, international visitors arriving. Entertaining international visitors. Readying a house for sale. Watching the aurora. Not a lot going on really!

I’ve also been continuing to work on my master plan for ruling the weather-watch.com (and associated domains) universe in a cloud environment when I’ve had a few moments to spare! It’s taken a lot of Google-fu and experiments to achieve but I think I now have a working environment that I can start to move things across to. What I’ve ended up with is:

  • 3 cloud servers (4-core CPU, 16GB RAM, 160GB disk)
  • Each server runs Debian 12.5
  • Each server has the latest version of Docker Swarm installed. Docker Swarm allows multiple containers (think ‘little virtual servers’) to be run on each cloud server and allows the containers to be switched between servers should a server crash or have to be rebooted.
  • Each server has a replicated disk storage area which is accessible to all other servers. Think of this like a shared directory on Windows. Putting files here, e.g. a web site, is part of what allows containers to jump from server to server. They always have the files they need in what looks like a local server directory.
  • Each server is also running a clustered version of MariaDB database (using Galera). This means that any database can be accessed from any server. This is another feature that allows containers to move from server to server because there’s always a live copy of the database available to access. The database also has a load balancer in front of it, so a container just accesses the load balancer which routes the database access to the most lightly loaded server.
  • The servers are accessed from the Internet through a Cloudflare tunnel. So when you connect to a web site, e.g. you actually connect to the Cloudflare end of the tunnel. The tunnel is encrypted from Cloudflare to my cloud servers and at my end your connection pops out into the approriate server. This mechanism also provides a private network that I can use. So from my laptop I have an encrypted connection to all of my servers, with the servers effectively having local IP addresses on my home LAN. The tunnels and private network mean that the firewall protecting my cloud servers only allows one single port through, the tunnel port. So all the dangerous ports that hackers like to attack are inaccessible.
  • Pretty much all of this environment can be re-created by running a script. The script can create and configure networks, servers, firewalls, etc, and it installs and configures additional software as required. So I can rebuild pretty much all of the environment in about 30 minutes. This will make it easier for me to do major upgrades in future. I can build a new parallel environment using the latest software and then (hopefully) use Docker Swarm to migrate the containers from the ‘old’ environment to the new one when I’ve finished testing. I suspect I’ll have some more work to do the first time I attempt this but I’ve got the building blocks in place.

If I’ve missed anything important on the forum, please let me know.

8 Likes

Again, I say WOW! You have indeed been more than busy creating the computing environment for the sites. Congratulations on your journey of discovery and thanks for all your efforts!

Best regards,
Ken

My new Docker Swarm server environment is now live. I’ve migrated my first 4 ‘stacks’ across to it and I’m pleased to see that Swarm does what it says on the tin…it’s deployed them across the three servers to balance the load and the network even knows where they are so they remain accessible from the outside world.

I’ve got a fair bit more to migrate to it and I don’t yet know how some of the migrations will work. I’m enjoying sitting in the garden listening to some smooth jazz whilst figuring it out. I just wish the clouds would go away because it’s got a bit chilly in the last 30 minutes!

I’m working on things that aren’t visible to you all first, but there’s going to come a time when I’ll need outages to complete some migrations, e.g. this forum. I’ll let you know in advance of that happening.

My descent into the world of containers continues. I’m working on what I always thought would be the most difficult containers to implement…the complex scripts that download and process the GFS and ECMWF data that WxSim uses.

These are complicated because they are a combination of Perl + various third-party Perl modules, Python + some third-party Python modules, Bash scripts and third party apps which are only available in source code format (C+Fortran), plus my own scripts.

My first task was to figure out how to get all the extra bits of code into a container. Containers run ‘images’ (think of them a bit like a specialised Operating System). Each time you start a container it’s effectively like doing a factory reset on it with the base image being reloaded…no post-build installed software is carried forward when the container restarts. There’s obviously no standard image that includes all the bits I need so I’ve had to work out how to build a customised image of my own, including figuring out how to get my image into a container registry (GitLab) to allow me to use the image to build my own containers wherever they are. That proved to be ‘interesting’, but I think I’ve won that battle now.

The scripts in my first container (ECMWF download/processing development system) are now running without errors and are generating good results, apart from one small detail. The data is being processed about 5 times slower than in it’s current environment :frowning:

I’m not too surprised about this because it’s running in a very different environment compared to the current one, particularly for disk access. I have two different types of disk available…local disk and replicated disk storage. The replicated storage requires management by the server (it’s got to keep the contents in sync across three servers) and with a lot of disk access happening in the scripts this means that the CPU is updating disks alongside running my scripts. It’s not just CPU, there’s a lot of disk I/O too which means the disk is slow to access by the scripts.

This is obviously something I don’t need to deal with in the current environment, but I’ve got a couple of ideas of how to improve things by not using the replicated disk storage for some activities and also bulk loading data into memory to process it rather than reading it from disk. I already do the second of these when processing the GFS data so I can re-use the same technique.

So I’m going back to hiding again whilst I wrap my head around some more new concepts. I’ll be watching though so be good :wink:

This is fantastic Chris, much more than I would delve into, I’m more of a windows server person, as that’s what i do most days, so good luck with all of it.
Great to see that you are trying to :+1:improve the WXSIM outputs for all.
Keep up the great work. :+1:

My hunch was correct. Switching from replicated disk to local disk has made a huge difference. My previous test run took over 100 minutes to complete. With local disk that’s down to 18 minutes! That’s even faster than the current live processing, although that’s sharing a server with the forum and other stuff so it’s not really a fair comparison. However, I’m feeling happy that I have a solution, without having to make significant code changes.

That was my professional world prior to retirement. Part of my role was IT Architect for a Windows server/desktop environment and my (professional) Windows experience started way back in the 1980’s, although I arrived it from CP/M and DOS so it was a natural progression.

I supplemented that with my ‘amateur’ Linux activities from about 1993/4…my first dabblings were with Yggdrasil and Slackware Linux. I think I might still have a set of Slackware installation floppy disks around somewhere. Containers weren’t something I got involved with professionally though…I knew they existed but they didn’t suit our high-security environment at the time. That may be different now, but that’s someone else’s problem now :slight_smile: Learning about them is good for keeping my brain active.

You sound a bit like me, being doing computers etc since late 80’s, most people nowadays don’t know what a floppy disk is ! let alone a 5 1/2 inch disk, or even real drives?? for payroll etc.
I did try Linux quite a while ago but never got into it. Might give it a go once I retire fully and see what I can make of it.

It started to go downhill with those. 8 inch floppies were much better :stuck_out_tongue_closed_eyes:

2 Likes

Illustration from Linoterm phototypesetter manual, late '70s, showing 8" floppy:

I think the first computer I used 8" floppies on was a Cromemco System 3. That would have been in 1978 or 1979. I was also using paper tape for program loading/saving and punched cards for code/data input too.

It’s just taken me 36 hours to figure out how to fix a problem with my scripts. I have two scripts, A and B, where A downloads some data, then calls ‘B’ to process the data, before returning to A to complete the updates. I split the scripts up because sometimes I need to use B on it’s own, e.g. when testing to re-process previously downloaded data. This has worked fine for a long time, probably 10+ years.

After containerising things, all of a sudden it doesn’t work. If A calls B then B doesn’t complete, but if I run B on it’s own then it works fine. Lots of trials and tests later and I’ve worked out why this is happening…or at least I know the cause without knowing why some third party code I’m using is behaving differently.

Some of you will know of MySQL. It’s a well used Linux database and one I’ve used for many things over the years. There are nice interface modules that you can use with Perl or PHP, and other languages, which makes it easy to use.

About 15 years ago MySQL was bought out by Oracle and that kind of thing can upset users of the bought-out product because Oracle sometimes place licensing demands on you that you don’t want, or perhaps starts to take the product in ways that suits them but not everyone else. So when the buy-out happened a fork (copy of the code) was taken of MySQL. That new version was called MariaDB. In reality these days if you think you’re using MySQL you are probably using MariaDB.

MariaDB has been kept pretty much in line with MySQL so far, so what works with one usually works with the other. That was what I found…I’ve actually been using MySQL modules to access my MariaDB databases and that’s worked well…until now!

My containerised scripts (copies of the un-containerised scripts) gave a strange error when they were first run. On investigation I discovered that the latest MySQL modules aren’t correctly recognising some versions of MariaDB databases. I don’t know if that’s an accident or deliberate. If it’s an accident then it’s been a known issue for many months and you’d think someone would have fixed it sooner than that.

Anyway, no matter, there’s a MariaDB module available that is functionally identical to the MySQL module…but it does recognise MariaDB databases. I’ve just found out that ‘functionally identical’ isn’t quite true though.

Going back to A and B…script A access the database, then when called script B also accesses the database. The two scripts are pretty much seperate processes and each makes it’s own connection to the database because it can’t use the connection set up by the other script. That’s worked for years…two connections existing side-by-side. Unfortunately, it appears that the MariaDB module doesn’t like two connections co-existing. So A runs OK, then B runs OK until it tries to access the database…when it gets told that there’s no data matching the query it’s using…which is odd because if I run B on it’s own the data is there.

It took ages to figure this out, mainly because I tend to look for problems in my own code first. The fix is relatively simple…I just close the database connection in ‘A’ before calling ‘B’, which opens and closes the database connection for itself, and then re-open the connection when processing returns to ‘A’.

I don’t know if any readers use MySQL/MariaDB in the same way as I’m using it, but I thought it was worth mentioning this in case anyone experiences the same, seemingly inexplicable, issue!

2 Likes

Interesting…I haven’t completed any data checks yet, but it seems that the containerised ECMWF and GFS data processing scripts are much faster than the non-containerised versions! As an example, I’d just done a test GFS run. The current production scripts take about 1 minute to download data and 35 mintes to process it. The test script took 3 minutes to download the data (a bit slower) but only 6 minutes to process it!

At the moment I’m working out how to get the scripts running, so I’ve not done more than a cursory check that the data being produced is correct. That will come soon!