Early outage warning

As you may have seen elsewhere on the forum I’m part way through a significant server migration project, i.e. I’m moving to a containerised environment using Docker Swarm. This has been a major learning exercise for me and the project has already been underway for a couple of months.

I’m migrating individual ‘sub-systems’ across to the new environment when I’m happy that they’re tested and will be stable. I’m currently working on the scripts and websites that provide GFS/ECMWF data for WxSim and I’m getting reasonably confident about the new system’s operation.

I have both the GFS and ECMWF download/processing scripts running full-time now. These have run for about 4 days now without any problems. I’ve done a number of tests to check that both scripts generate the same data and I’m pleased to say that the output is identical from both old and new systems. At this stage it’s also looking like the new scripts are a fair bit faster, although I don’t really understand why that would be. Apart from a few changes like database names, and the connection method to the database, the scripts are identical. Perhaps it’s because the servers they’re running are less loaded because I’ve still got other migrations to complete? We’ll see what performance is like once I’ve completed all my migration work.

The dashboard to show current status/runtime averages is also working. I’ve been working on the access/data management scripts today and I’m as happy as I can be that they are working correctly.

I say “as happy as I can be” because I’m not able to fully test them until it’s time to go live and that’s the reason for this message. WxSimate only has the ability to point at one URL for data from my system, so I’ve no way to test WxSimate because I can’t put the new system in place behind the URL…if I did then you’d all be running with test data!

So what I’m proposing is that once I’m ready to migrate the WxSim data to the new servers I’ll post an update here saying that there will be period of ‘uncertainty’ about the system working and data validity. Unfortunately, there’s no way to lock you out from using the system…WxSimate doesn’t have that ability and in any case I need WxSimate to work so that I can complete my final testing. So once I start the migration you might find that you see errors in WxSimate about files missing, etc. Also for a period if you download data you’ll be getting test data…by the point of migration I will be 99% sure that it’s good, but until I complete the final end-to-end tests I can’t say there aren’t any problems.

I don’t have a date/time in mind for the migration yet but if things go well it will probably be in the next week. I just need to check that there won’t be any unintended issues when I switch the host.domain across to the new servers. I’ll post again when I know the date/time the move will take place. Of course, if on testing post-migration I find issues I will switch back to the current system whilst I resolve the issues.

Nice to here. You can test it without going live by having participants use their host file to point to the server ip. Just a thought

Rob

Thanks for the suggestion but unfortunately it won’t work.

The connection to the container that runs the WxSim data web server uses a private network 10.x.x.x address. The connection from the Internet uses a Cloudflare secure tunnel which only works using host+domain names. Essentially when you ask for data from https://wxd.weather-watch.com you’ll be connecting to Cloudflare and they then route the connection on to my servers using the secure tunnel. Secure tunnels are pretty good…the server firewall will eventually have no open ports to the outside world so attacks from the outside will be pretty limited. I also have a fully encrypted VPN-like connection from my laptop to the internal network that the servers are connected to (that network is also private access just for me and my servers). It’s even possible for my servers could to not have public IP addresses, although I’ve not tried that yet.

That also wouldn’t work in the current environment…the current server supports multiple domains and there’s a reverse proxy installed on it that routes incoming web access requests to a host+domain to the correct internal destination.

Testing is going well, with each GFS and ECMWF run being processed successfully by the new system.

I’ve also completed migrating another sub-system which is linked to the WxSim data processing, so that removes the last remaining dependency and fully enables the move of the WxSim processing to proceed.

I’m currently in discussions with Tom about the timing of the migration, but I’m hoping it will be later in the week.

1 Like

Sorry for the last minute warning…I’m going to flip the live downloads to the new server for the next 15 minutes so that I can do a test that is impossible to do otherwise. I will switch back to the current server as quickly as possible whilst I analyse the results of the test. The data on the new server is as up-to-date as the current server so if you download without errors then I’m confident that you will have good data.

The test went very well. WxSimate was able to download from the new server without error and WxSim ran and generated a forecast that looks reasonable.

I’m now going to get ready for the full switch-over, which will probably happen around 3pm (UK) this afternoon.

The WxSim data transfer is about to start. The system will be unstable for up to 15 minutes as DNS sorts itself out but after that things should be back to normal.

The new servers are live for WxSim data. Downloading is working for me and WxSim has successfully completed it’s first run.

If you’re having post-upgrade problems, please post in this thread.

The only potential problem could be for people who are using very old versions of WxSim/WxSimate, e.g. running on XP. This might not be compatible with the new server. We’ve not been able to test this. If you can upgrade to the latest version, please do when you get a chance. If you’re unable to upgrade then please switch to using Bohler data for GFS which hasn’t changed.

My 3pm run seems to have run ok and got the 06z data for both GFS and ECMWF
Will keep an eye on it

1 Like