Posts by Steven Meyer
11)
Message boards : Number crunching : Bug Report - Random Reboots ( Message 5300 )Posted 2923 days ago by Steven Meyer There seem to be some bug(s) that may be causing my computer to randomly reboot. This is the situation: I have been running S@H on my Q6600 for about a year. During this time I have not seen any random reboots. Recently, I upgraded the NVidia video driver in order to support the optimized CUDA app from S@H. Subsequently, I started to notice some occasional random reboots about 3-4 times per week. I decided that it was probably some issue with the new NVidia video driver, but was not bothered enough by it to do something about it yet. Then, for reasons having to do with improving the through-put of this computer by letting it use mostly optimized apps, I set D@H to "No new Tasks" on this computer. Eventually, it ran out of D@H tasks to do and was running only S@H for a several days. Today I temporarily set D@H to allow new tasks, so that there would be a few in the queue for the times when S@H is having problems servicing their client computers. I got 6 D@H tasks, and 3 of them started running immediately. The computer immediately had another random reboot, which I had not seen while there were no D@H tasks to be run. Now I have set D@H to "No new Tasks" once again, and suspended those tasks that were downloaded. We shall see if there are any more random reboots while D@H is not running. I wonder if there are any switches that I can set that will cause the D@H apps to write more debugging info to the log so that the cause of the random reboots can be determined. |
12)
Message boards : Web site : New Docking@Home Website ( Message 5164 )Posted 2946 days ago by Steven Meyer The links from the Notification section of the " Your Docking@Home account " page, are returning . . . This is still not working . Click HERE to see an example of the error messages. |
13)
Message boards : Number crunching : Wrong numbers in "max # of error/total/success tasks" ( Message 5090 )Posted 2967 days ago by Steven Meyer As you haven't changed the numbers, am I correct that you will never resend non-valid completed WUs ever? Michela, some time ago, when I was first starting to crunch D@H WU, I was sent a large number of WU with a short deadline. Since D@H is not my only project, the work overload caused D@H to run in "High Priority" thus shutting down the other project. In order to reduce the work overload, I aborted about half of the D@H WU. Then I checked one of the aborted WU on the web site and saw the line. max # of error/total/success tasks 0, 1, 1 Since the abort was counted as an error, all of the aborted Work Units will never be sent out again. This may or may not be an issue, given your post-processing. Now, however, I see that the number of success tasks has been set to 2, but the error and total numbers are unchanged. max # of error/total/success tasks 0, 1, 2 It might make sense to change the counts to be 1 for errors, 1 or 2 for total, and 2 for success so that an abort will not prevent the WU from being reissued. D@H settings of errors 0, total 1, success 2 will cause the WU to be abandoned with one error or any 2 results. Again, maybe this is OK with your post-processing... |
14)
Message boards : Number crunching : When a work unti fails, the computer doesn't keep going. . . ( Message 5089 )Posted 2967 days ago by Steven Meyer I came in this mornign to find all my computers were giving this error: This message such-and-so-program has encountered a problem and needs to closeis from the Windows Operating System, and thus is outside the control of BOINC, or the client programs from D@H. The operating system puts the dialog box on the screen in order to inform you of the failure of the program and then waits for you to respond by clicking on the "OK" button. (IMO this should really be a "Bummer" button!) In any case, the operating system is really patient and will wait forever for you to respond. Although the message box is outside the control of D@H, I would think that the program developers at D@H will be interested in why their program raised such an error on so many computers at about the same time since it is likely that there is a bug in their code. The other possibility is that some other program that is running on all of your computers is the cause of the failure by stepping on something needed by the D@H program. Note: That other program could be a computer virus or worm. Do be sure to check your computers for infections. Can you think of something else that was running at the same time on all of those computers? Note too: That other program could be your virus scanner! It could be, for example, that the virus scanner will open some file in order to scan it for viruses with an exclusive lock, which prevents other programs from opening the file until the virus scanner is done with the file. If the D@H code tries to open the file and does not handle the failure to open the file, then that can be an error that will be raised to the operating system and may result in the message that you saw. |
15)
Message boards : Web site : Web Site Mix-up? ( Message 5077 )Posted 2982 days ago by Steven Meyer Now it appears that my own account is the "Default" account on the Server Status page. I logged out, then clicked on the Server Status link, and it told me that I was logged on. However, a click on the "My Account" link showed the login page, indicating that I was not logged on. |
16)
Message boards : Number crunching : "Too many error results" ( Message 5076 )Posted 2982 days ago by Steven Meyer Timing and scheduling are part of the problem, but the biggest part is that a single user can cause work units to never be calculated by simply aborting them . There are two solutions that I can think of . . .
|
17)
Message boards : Web site : New Docking@Home Website ( Message 5075 )Posted 2982 days ago by Steven Meyer The links from the Notification section of the " Your Docking@Home account " page, are returning . . . Not Found For example "High Priority" Strikes Again . |
18)
Message boards : Number crunching : "Too many error results" ( Message 5061 )Posted 2988 days ago by Steven Meyer This WU , and many more like it, were sent to my computer with a very short deadline, resulting in everything running at "High Priority" in order to try to get them all done before the deadline. In order to reduce the work overload, I aborted about half of the tasks. There are two problems here.
|
19)
Message boards : Number crunching : "High Priority" Strikes Again ( Message 5057 )Posted 2990 days ago by Steven Meyer I recently started crunching for Docking@Home as a second project, S@H being my first. Docking@Home has repeatedly fetched a large stack of work units, all of which are due in such a short amount of time that all of them are required to run "High Priority" in order to get all of them done in time. This cheats other projects out of their share of CPU time, and puts D@H into a large amount of Debt in relationship to other projects. In order to remedy the problem by reducing the work load from D@H, I have had to abort dozens of D@H work units. Something needs to be done to reduce the number of D@H work units fetched or else increase the time allowed to complete them so that the work load is not so heavy that "High Priority" is required to get the job done in time. |
20)
Message boards : Web site : Web Site Mix-up? ( Message 5048 )Posted 2993 days ago by Steven Meyer That shouldn't happen, let me ask Brian Happened again this morning with a different name: Hello! You are logged in as: |
Next 10 posts