Forum problems - 5th July

I think an upgrade has broken the forum. I’m working on recovering things and I think I’m getting there. If the forum is behaving weirdly for you try emptying your browser cache, closing your browser forum tab and then open the forum in a new tab.

A lot of bits of the forum are currently disabled so it might look a bit different to normal. I have to go on a Dad’s Taxi run now, but I’ll be back to look further into this later on.

I think I hit a perfect storm of confusion!

A couple of days ago there was a problem afte migrating the forum. I assumed it was related to the migration process, although I’d used it successfully before. So we moved back to the ‘old’ Discourse forum (the one we’re using now) whilst I investigated what had gone wrong.

Then today, the ‘old’ forum had exactly the same as the migrate forum. From what I know now…

  1. When you migrate a forum, you build a new empty forum first and then restore the last backup from the old forum into it. The process of creating a new forum automatically grabs the very latest released code, so at this point the new forum was running a slightly newer version than the old forum server.
  2. At around the same time Cloudflare added an option to easily enable various performance enhancing settings. I looked at the options and they seemed reasonable so I enabled them.
  3. Something - forum software, forum theme, forum plugin or forum theme component - decided it didn’t like to play ball and started producing weird output.
  4. We moved back to the old forum and the problems went away…
  5. Until today when the old forum prompted me saying that there was a critical update to install. I installed it and things seemed OK…at first
  6. After a while things started to go wrong, just like (3) above.

So my current best guess is that the migration was actually successful and the fault lies with Cloudflare settings and/or forum base code or ‘addons’ and/or updated forum base code/‘addons’.

I’m starting to slowly re-enable things, a few at a time so see if/when things break. Hopefully I’ll find a miscreant and allow the developers to figure out what’s wrong.

For now, if forum output goes weird for you, wait a little while, clear your browser cache and re-access the forum in a new browser tab. Hopefully that will get you back to a working version.

2 Likes

Having thought about this some more I think I understand what’s happened.

When I turned on some additional Cloudflare performance enhancing settings, nothing happened. Why’s that? Well the key option was auto-minify, which shrinks JS, CSS and HTML by removing unnecessary stuff from the files, kind of like compressing them. That sometimes breaks websites and there’s a warning about this, although not on the page where I changed the settings.

I have Cloudflare configured to cache files for 4 hours. So nothing chaged for that period…and after 4 hours when the cache expired everyone started to get the minified (broken) files. That’s why it seemed fine when I tested the change, but broke at a seemingly random time later.

With the current (old) forum server, the files didn’t change so they were held in the cache for days. Then when I applied the updates today I set the 4 hour clock ticking and then things fell apart when the clock expired.

Auto-minify is disabled and the Cloudflare cache has also been emptied, so what you’re getting now is the un-minified files which work fine.

I have (I think) all the plugins, themes and theme components enabled now and things see to be OK. I’ll only know for sure after 4 hours, but as I think I understand the mechanism that broke things I feel more confident that we’ll pass 4 hours without a problem. Famous last words!

2 Likes

Four hours have passed and we’re still here so I think we’re back to normal.

1 Like

I take it at some point you will try moving it to the new server

The current visitors were only visible in FF.
For Chrome I had to use right click inspect and then “clear cache and force reload”
For Safari one needs to install the develop extension, but version 13 of Safari is needed for that. I can not test that with my gear.

The 4 hours seems to be too short and for now after 24 hours Safari does not show current visitors.
Regards,
Wim

Yes. Now that the current server has been returned to normal I will get back to testing out the migration again and when I’m happy it’s going to work well I’ll let you all know when the outage will be.

I’ve tested “Who’s Online” with:

  • Chrome - working OK after clearing the browser cache
  • Firefox - working OK after clearing the browser cache
  • Opera - working OK after clearing the browser cache
  • Egde - worked first time

I’d recently accessed the forum using Chrome, Firefox and Opera which is why I had to clear their caches. I hadn’t used Edge to access the forum for some time so the cache would have been stale and therefore refreshed automatically.

4 hours is what Cloudflare suggests to browsers as a TTL (time-to-live) for the browser cache. It is only a suggestion. Browsers are free to choose how long they cache pages/page components for. Unfortunately, Safari is no longer available for Windows so I’ve no way to test it on my equipment and I can’t explain why it is retaining cached files for longer than is suggested, nor why Apple don’t provide a way to clear the cache. The Cloudflare cache was cleared of incorrectly minified files yesterday, so from that point it’s up to browsers to pick up the latest files, which should be at most after 4 hours. I don’t know of anything I can do to force browsers to refresh their caches. It seems odd that Safari has a >24 hour cache time though.

For general info for everyone…

There are (at least) two levels of caching taking place when you access the forum.

Cloudflare cache - The Cloudflare system grabs copies of files and for slow changing files, e.g. JavaScript and CSS, it keeps a copy in it’s ‘global’ cache to be provided when browsers access the forum. This speeds things up a little because you’re not relying on the network link directly to my server. For example, if a lot of people are accessing the forum much of the data will be sent from the very fast Cloudflare cache/network rather than relying on the direct link across the Internet to my forum (which does have a 1Gbps network connection and a very quick link to the Internet).

Browser cache - Your browser almost certainly (I think pretty much all browsers do caching) caches files downloaded from the server (actually what Cloudflare sends to you). It’s probably caching the same kind of files as Cloudflare do, e.g. JavaScript and CSS. If a file is present in your local cache then the browser can display it from disk rather than having to download it.

Obviously caching can save bandwidth and improve page display times, but there’s an obvious downside that if files do change then it will take some time for your display to change because you may be displaying from your local browser cache or the Cloudflare cache. There are various techniques used to avoid this, e.g. fast changing files often have some form of ‘cache-busting’ mechanism built in, e.g. they may be given a subtly different name each time you access them which makes both Cloudflare and your browser believe it’s a new file and always download it. Or the server can issue a “don’t cache this page” flag, although caches are free to ignore these flags but that would then make fast changing sites not work very well. The forum index page uses one of these techniques to make sure you always see the latest posts when you access it.

The Cloudflare and browser caches work automatically, each deciding when to try to get new data. However…

Your browser might decide it wants a new copy of a file, e.g. because it’s local cache timeout has triggered, or you’ve cleared the cache. If it requests the file again then it will just get what Cloudflare has in it’s cache. “You” can’t override Cloudflare. So if your browser cache is 1 hour and Cloudflare is 4 hours, then 3 out of 4 times your browser is going to dowload the same file until Cloudflare updates it’s own cache and sends you a new file (if there is one).

Cloudflare cache updates files after 4 hours (as I currently have it configured). However, I have the option to do a manual purge of the cache. At this point, Cloudflare will download every new file that’s been accessed directly from my server, to fill it’s own cache, and then send that file to browsers requesting it. If your browser doesn’t request the file then it won’t get the latest file. This is why to solve this recent problem I had to first purge the Cloudflare cache so that good versions of the files were available to be downloaded AND then you had to clear your browser caches so that your browser would request new copies of all files and therefore download the good files from Cloudflare.

I said there are ‘at least’ two levels of caching. Some ISPs also provide (or used to anyway) caches within their networks. If an ISP is serving a few million users in a country an ISP cache could save them bandwidth, e.g. if a million people all access news sites trying to find out what’s happening in a general election in the country, then if the ISP has the files locally they don’t need to download them from the new site at every access. This can make their own Internet connections faster. I had experience of one of these in the past and it wasn’t good. You can clear your browser cache, but if the ISP cache is doggedly holding onto a file that has been updated there’s nothing you can do to force a refresh.

I’ve got migrated forum test site up and running. It’s working OK at the moment but given previous experience I want to wait a few hours before I’m completely confident that it’s OK. If things still look OK this evening then I’ll probably do the live forum migration sometime tomorrow (Sunday).

The test migrated forum has survived intact past the 4 hour mark so I’m happy that the live migration will succeed this time (as it would have last time if it weren’t for external factors). I’m not sure of our plans for tomorrow because they will depend on the weather, but I’ll probably be migrating at some point during the day. It’s a lot faster now, so the forum should only be down for about 45 minutes.

2nd time I have seen this screen. A minute later I was able to login to the forum.

2024-07-17T04:00:00Z

1 Like

Apologies for that. I had to restart docker to apply a configuration change. That should only take a matter of seconds. Should…if you don’t make a typing mistake in the settings. It took a moment or two to find and fix the problem so the forum was offline for that time.

Unfortunately, the forum software is the only docker software I’m running that can’t switch servers if the one it’s running on isn’t available for some reason. There’s no way around that without moving to an unofficial build mechanism. That means no support and it can also be tricky applying updates.

No worries Chris, just wanted to pass along in case it might help. Thanks again for hosting this site for all of us.

1 Like