Troubleshooting and other notes
This area contains things that were noted during the installation or modification of a UD/WCG installation.
What does it mean when the UD/WCG screen saver is active, but only showing the generic World Community Grid text & logo? This depends on where and when you see it come up. The scenarios below detail the most common to the most uncommon reason, and what to do about it.
- The UD.EXE task is not running at all (most common). If you watch a machine with this saver running for an extended period of time (i.e. 10 minutes), there’s a good possibility UD.EXE is not running at all. Checking the Processes list can confirm this.
- The work unit is still initializing before it can display any graphics. Typically this happens upon reboot/login, and the work unit requires some setup time to display anything. A few minutes of patience will usually cure this.
- The work unit has finished and is uploading the results. You’re watching the results approach 100% when suddenly the screen saver will change to the generic text. This is normal as there’s no more graphics to show. It can take several minutes to compile and upload the results. Once uploaded, a new work unit will be downloaded and start processing.
Some work units appear to be somewhat unstable. Primarily Genome Comparison (GC) v1.0.0.8, but also Human Proteome Folding 2 (HPF2) have been found idle with the CPU throttle at 100% (sometimes 0). The Genome Comparison window will also show 0's for the Sequence ID's. It's hard to confirm, but it is possible that these two work units are having a problem initializing their task when they are newly downloaded (after the PC has finished a previous task and uploading the results). Since they are running in our environment as a Windows service, a reboot is needed to fix these issues. However, when these tasks were seen in this state, they had been auto-rebooted by Deep Freeze only an hour previous.
It’s interesting, with the combination of configuration options and after-install modifications, what you can make this software do. By converting WCG to a service, but enabling the screen saver option makes the software run in the time in the background but only when the WCG screen saver is active. It’s all a matter of choosing and applying the right combination of changes.
Sometimes a WCG task can morph into a runaway process. When this happens, and WCG is not installed as a service, you can kill the runaway UD_???.EXE task from the task manager and this will force WCG to get a new work unit. If you are running WCG as a system service, you can one of two things: 1) Convert WCG from a service back to an app, exit WCG, reconvert back to a service (this works) or 2) shut down and restart the service (not tested yet).
The work files (*.UD) appear to be encrypted.
The CS.UD file contains much of the configuration of WCG such as user name, password, machine ID, proxy, screen saver flag, etc. Anything that is reported by or changeable through the WCG preference pane is stored here. Removal of this file makes UD.EXE go through the Grid Agent Registration process again. This also explains why the configuration changes made through the WCG preferences pane are user independent.
You can’t deploy WCG easily in a software image, and especially not as a service. The software needs to be configured for each machine individually before it can be converted to a service. However, you can install it to the point of the Grid Agent Registration window, and stop it there. You can put the shortcut into All Users and then make your image like normal. Once deployed, the first user to log on (typically the deployer) will see the registration window and can finalize the install.
The RAM size required for each work unit varies, ranging from ~75Mb to as high as 370Mb. This means that some machines with lower memory (i.e. 512Mb) won’t see some work units if the requirements for the work units are too high, but those with 1Gb will see more or all. RAM is typically the limiting factor for what work your machine gets.
When a new work unit is started, all the required files (EXE, BMP, DLL) are downloaded regardless of whether they already exist from a previous work unit. This means that if some work unit EXE is flawed (as some have been) and cause instability, runaway processes then any new instance of the work unit will be fixed.
Email the author: Peter Schepers | Last updated: Dec 6, 2006