McMahon GFS 2024011200 z Missing - Last was 2024011118 z

McMahon - Last Good GFS was ( 2024011118 z ) and Last Should Be ( 2024011200 z )

Note: this was Checked at 2024-01-12 06:06 UTC.

It is possible that the Data was running Late and may now be available.

Note: this is an automated Post, check data to ensure it is correct!

Kindest Regards,

Tony

00z GFS data is now available.

I think there’s a network issue because I’ve had a notification from a network monitor that I’ve downloaded a lot more data than usual in the last few hours. That usually triggers when there have been multiple failed downloads of GFS and/or ECMWF data.

I have a very busy day ahead today so I’m not sure when I’ll get a chance to look at this though.

It’s not a network issue. It’s another RAID disk check. I’ll have to find out why these are triggering, possibly a disk issue although the SMART data isn’t suggesting any issues at the moment.

Off to start my busy day now.

Looks like your the monthly array check has kicked in and triggered it same as last month

My theory about the reason for the check last time doesn’t appear to have been correct. Finding the cause will be tricky though because it will (might be!) hidden in a log from somewhere in the last month. That’s a lot of log entries to look at!

I’ve managed to find a little time to do some debugging. So far what I know is…

  1. This is only a check (so far). Unlike last time no corruptions have been found. That’s what I expected because I’ve not had any unexpected outages since the last raid fix.
  2. I’m trying to tune the checking to make it have less impact on the system. I think last time I tuned the checking performance up too high to try to get it to finish sooner. As there’s nothing to fix a slower check rate works, although it will take longer to finish. The changes I’ve just made have dropped the system 1 minute load average from over 3 to about 1.3. This should make the system more responsive and allow the GFS/ECMWF processes to take less time to run.

I’m off on my travels again so won’t be able to check again until later this evening, or maybe tomorrow morning.

It looks like performance is better now. The latest GFS run was 15 minutes later than the average and the ECMWF run was only 10 minutes later than the average. I might try slowing the check down a little more, but it’s already going to take about 36 hours to complete (it’s checking 4 partitions adding up to 8TB of RAID1 so there’s a lot of data to check).

I’ve not had a chance to check if the raid check had completed but the GFS and ECMWF runs seem to be completing in line with the long term average completion times again.