"Weather-Display" has stopped responding - Do you want to close or wait

Version 10.37S143non Win 10 Pro running on data.csv from Virtual weather Station on second weather pc. THe VWS station runs continuously and the vws data.csv file is linked to the Weather Display directory on this pc for instant updates and access. The system has worked flawlessly for more than a year, BUT…

Beginning about a month ago I would notice the Weather display pc had a dialog [Weather Display has stopped responding. [Do you want to close the program or wait for it to continue?] WD Watchdog 1.2 is running. Even if I click close the program - nothing happens, WD still is running. What I do see is that for a variable period from 4- 10 hours, usually beginning after 9pm WD freezes the graphs of wind data (speed, direction), but all other parameters continue to be graphed as though nothing were wrong. (see image).

Weather Display is pretty much the only program running on this pc with an i5-6400T CPU and 16 GB ram and a 500 GB SSD that is less than 30 % full. The duration of this "pause or freeze is variable as I mentioned, and it does not occur every night. I don’t even know where to begin troubleshooting this one!


It looks to me like all the data stops, not just the wind data. The only data still registering on the graph is whatever sensor the dark red line is for and everything else is flat-lined.
Is this one sensor on the WD PC or part of the VWS PC?

If this occurs at roughly the same time each day, what else is happening at that time, and look at both the WD PC & your VWS PC?
How are the two PC’s connected, if it’s via a LAN, is there anything happening on the router or elsewhere on the LAN which may use all the bandwidth or memory?

The fact that WD has continued to record the data on the graphs says that it was running throughout, it just wasn’t getting any data, so it just graphs the last figure until new data is received.

Have you had a look at the logfile at the same time to see what has been logged? If all the log entries are the same then WD was running, it just didn’t get new data.

I would also check the power save settings, in case the WD PC is shutting down some ports. This used to happen with USB dataloggers, where the default power save function would close the USB ports, thinking there was no activity, when in fact there was, and data flow would stop.


Budgie - I see you are correct - all the data in the graphs stops. Apologies for delay in replying - My house is at last crawling with contractors replacing fascia damaged by the old leaky roof now the the new roof is at last finished.

TO answer some of your questions: The LAN is gigabit shielded Cat 7 with clients all connected through TP-Link SG2424 switch; the VWS pc is win 7 (because there are programs on it I cannot upgrade (owner’s quit issuing licenses/ went oob) ) Its CPU only ever gets above 50% when rebooting - gross overkill for running VWS and GRL3. I double checked the backup schedule (Reflect) and they are set to run in volume shadow copy mode at 2:00 am, so I doubt that is locking anything, and the whole Backup takes only 20 min on avg. anyway. I double-checked the ethernet card settings just because it has been so long and did find a primary DNS server that is no longer in service, but the current DNS Server for the LAN was #2 in the list so it has been doing its job. Anyway _ I cleaned all that up just because.

My LAN is connected to Internet via Asus GT-AX11000 and a Motorola MB8600 - both of which are frighteningly stable. But even when the cable modem goes offline for a firmware update, all LAN clients still see and remain connected with each other. The Asus logs show nothing like conflicting IP addresses or disconnects - all clients use DHCP - with the two wx pcs having reserved IPs in the router.

Turning to the logfile for this month - I did find the “dropout period” in the data - so my next task is to pull the VWS DATA.CSV (big file) from the VWS pc and examine whether this might just be a very old known glitch in VWS. My station is an old Texas Weather Inst WR-25 - from the mid 1990’s via rs-232 serial to VWS. VWS never had a method to sync the WR-25 clock with the PC clock, and Tx Wx never had an automated way to do this either. So the clock in the WR-25 tends to gain about 1min 30 sec per mon. and whenever its time and the PC and VWS time get more than 3 min different, VWS used to freak out and write files with dates like 2022-13-59-0000.csv. Completely intermixing the days hours and minutes with the mo field. Whenever that happened - proper writing to the master dbase.csv master file would cease. It was always easy enough to fix, and has become pretty rare as I tend to watch for the time difference now and reset the WR-25 to the PC time a couple of times each month. and restart VWS so it can compact its database.

What all this long explanation leads up to is that it will take me a day or so to go through the VWS database files to see if they contain any “killer datetime entries” and if so purge those .

And, finally, everything on my lan is set to never sleep never hibernate - always heat the house, so I doubt ports shutting down is an issue in play here.

Thanks so much for pointing me to the right path!

Steve

That’s my graph, Dan! #-o :lol:

Steve,
It’ll be interesting to see if you find anything in VWS DATA.CSV file which matches the dropout period in WD, keep us posted. :wink:

Whoops! :oops: :oops:

UPDATE- I thoroughly scoured the c:\VWS directory and its children on the vws win 7 machine and found there were indeed tow “killer BIN files” in the Vws\setup|summary and \summary2 (which seems to just be a second copy of the working summary directory) - These are always pretty easy to spot by their filenames which are “future dates, sometimes a century into the future”
In this case the files were 2022_12.bin and 2022_12_31.bin - neither of which should have been generated until 23:59:59 on December 31, 2022. In previous versions of VWS the presence of these files not only indicated that VWS had suffered a file writing seizure, but their mere existence halted data recording until real time caught up with the future. I discovered this behavior more than ten years ago and it never was fully purged from VWS. The only solution was to delete the bad BIN files and edit the master dbase.csv file to remove the single wrong row containing the corrupted RECDATE; once that was done, VWS immediately resumed recording data correctly until the next glitch.

BUT in the very last version of VWS v 15.00 p05 the behavior changed a little. The existence of the killer bin files does not stop recording data to the dbase.csv file, nor is the bad RECDATE row wirtten to that master file. VWS continues to generate good BIN files in the Setup\summary and setup\summary2 directories - but it behaves unstably. The program may freeze randomly (and gets killed and restarted by StartWatch)

In my setup of WD - I have WD pulling data from a different csv than the main DBASE.csv (which is updated only each minute. Rather I use a DATA.CSV generated each 2 sec by VWS and containing only one row of complete observations, which is overwritten at the next 2-sec update. This has the effect of mimicking near real-time displays on the WD main page, and also doesn’t allow a second program (WD) to muck about with the main VWS data file DBASE.CSV. My ultimate plan is to move the weather station over to the WD machine and let WD collect the raw data, and then feed that backwards to VWS to generate its graphics and VWSql updates, but not until I have proven the WD pc reliable over a year’s time.

So, to the problem of WD randomly freezing - thus far it has not happened for 72 hrs following my discovery of a bad DNS Server entry on the VWS Machine and the removal of the two killer bin files from that machine.

Apologies for the very long-winded and detailed explanation here, but there are more than a few of us who are slowly evolving from VWS into WD, so I hope my experience might help someone else down the rod - and thanks again to Budgie for helping to map the way to ferret this out!

_Steve

UPDATE- WD froze again last night or afternoon from 1530-2030 approx .

The only thing I have found is in the Windows Application Log Event ID 1000 with the details below

Faulting application name: WeatherDisplay.exe, version: 1.40.0.0, time stamp: 0x62aa1ce6
Faulting module name: gdi32full.dll, version: 10.0.19041.1826, time stamp: 0x8296c8df
Exception code: 0xc0000005
Fault offset: 0x00053bac
Faulting process id: 0x3ca8
Faulting application start time: 0x01d8a374e59e140b
Faulting application path: C:\wdtwi\wdisplay\WeatherDisplay.exe
Faulting module path: C:\WINDOWS\System32\gdi32full.dll
Report Id: 1807db2b-dde4-4785-b377-970b29f9a4f7
Faulting package full name:
Faulting package-relative application ID:

Checking the VWS machine logs - no errors, no “killer bin files” and the master dbase.csv is not missing any of its one minute interval entries. (Recall that WD is actually using a different CSV generated by VWS every 2 sec called DATA.CSV that contains only a single line csv of the last readings at the 2 sec sample interval.)

The graphics “card” in the WD pc is built into the cpu (I5-6400T) running latest Intel driver for the Intel HD Graphics 530 (31.0.101.2111 7/19/2022)

Additionally, a second instance of cronweatherflow was spawned during/after this freeze so there are now two running. I use the weather flow for lightning and solar.

So, sadly the gremlin still lurks doing mischief here #-o

If you need me to upload log files or the like - please refresh my 8-bit mind how to do that from WD - Thanks for any help – Steve

EDIT _ I just completed the SFC /SCANNOW of the win 10 WD PC and it found no errors and DISM /Online /Cleanup-Image /ScanHealth found no component store corruption either, so it looks like the Win 10 Pro installation is all nice and clean.

Just to be clear, was this actually WD freezing (program showing “Not Responding” or not being able to use the menus and required a restart of WD or the PC to correct it) or was WD running still but not receiving data from the VWS PC again?
Does the logfile for August show a gap in the data during the freeze, or is it filled with repeated data?

It shows as a Windows Dialog box saying The Program Weather Display has stopped responding. and the button choices are to cancel, or close the program. However - the dials (windspeed and direction) are still updating, and it is possible to click in the WD main screen or menu to close and then restart the program. BUT - I have yet to be in front of the machine when it first throws the error, so I cannot say whether WD is frozen or hung during that time. Usually it stops getting data for a few hours and then just as spontaneously resumes - so the graphs plot all straight horizontal lines from the last acquired data point to the first new data point.

VWS is generating the DATA.csv file each 2 seconds, BUT the WD log file shows the last line of data, and then the next line is recorded into the log at the time the graphs resume - So it is as though WD is hung trying to read the DATA.csv file during this time and then finally (after several hours) is able once again to pull data from it. So it is not filled with repeated data, it just has a time gap.

I had VWS generate this separate DATA.CSV file because it can do this every 2 seconds (mimicking realtime display) and because no other process (VWSql, VWSAPRS, for example) uses this DATA.CSV file. Only WD.

I will try to look for any other clues next time it happens, but I am stumped for now. And of course as I reply to you everything is humming along like “Nothing to see here - please move along!”

UPDATE 8-21-2022 - Additional info - The win 7 VWS pc windows logs show no error or warning ID events during this episode. The win 10 WD pc however shows LOTS of Event ID 36871 “A fatal error occurred while creating a TLS Client credential. The internal error state is 1003.”
These events have no real rhyme or reason to them except that once they start, they accumulate rapidly - like in 2sec to 10 minute intervals, then mysteriously cease for a few hours before happening again. I have applied all the Microsoft suggested fixes for this problem which included using only TLS 1.2, setting Registry entries to force .NET to use TLS 1.2. This problem is widely reported following system updates from Microsoft dating back many months. So I am totally not sure it has anything to do with the WD has stopped working message.

Attached is the full screen snip of the event. Notice that this time - whenever the event actually occurred is difficult to determine from the graphs, as they appear to have continued recording despite the dialog popping up - i.e. I don’t see any areas where ALL channels are flatlined. The only real evidence that WD suffered a crash is the presence of TWO ChronWF instances now running in the notification area of the taskbar. Hitting the Close button on the windows dialog box has not perceptible effect. Force closing either of the Chron WF instances also does not seem to affect WD receiving lightning data and solar data.
the WD Watch Dog app is running, which likely accounts for WD not closing when the dialog box close button is pressed.

Any Windows Sleuths out there? Recall I have run both sfc /scannow/ and the DISM check health and neither reported any windows file corruption errors.

Thanks all you mystery lovers for your thoughts !
#-o


The TLS errors might relate to the Win 10 PC trying to talk to the Win7 PC. By default Windows 7 doesn’t support TLS 1.2 but there is a patch available to enable it. Maybe WD is timing out trying to load data from Win7 and restarting to try to recover the situation?

Thanks, Chris - I did consider that Win 7 might be “behind” the security protocols of Win 10, and when I investigated thoroughly last night - I found I had indeed applied the patch to the Win 7 way back when Microsoft still supported it. So the Win 7 machine was properly configured to use any of TLS 1.0 1.1 or 1.2. The only part “missing” was to enable the .NET 4 to use SecureCrypto in the registry so I went through the addition of the required keys there. However, the Win 7 machine log contains no SChannel errors at all. (perhaps because in this context it is acting as the “server” of the data?

Frankly, I am not sure this SChannel error thing is the culprit - the web is filled with Schannel errors going back into the 2015’s, and each year a new MS Update seems to spawn a new storm of them after a lull. As a test 9admittedly risky perhaps) I have disabled all TLS on the Win 10 machine (the WD machine) and will watch what happens over the next few hours in the System Logand with WD - which has been running just happily as it can for 30 hours now.

I will concede that perhaps my OCD needs better management here! :oops:

Steve

Update - 7 whole days running - SHHHHh - not too loud! The Win 10 WD machine even downloaded and installed KB5016592 Cumulative Update Preview for .NET Framework 3.5 and 4.8 for Windows 10 version 21H2 for x 64 - and two hours later nobody seems to have been upset! :wink:

So It seems that Chris’s suggestion of a communications error between the older Win 7 and new Win 10 Machines was the key to solving this. Once I made sure all the correct Registry entries matched for both machines re: .NET and SecureCrypto they seem to be playing politely together again.

Thanks to all who chimed in - and I hope this might be of use for andother of us “Rube Goldberg” fanatics!

-Steve