Posts by Memo

11)

Message boards : Number crunching : Overclocking a Core 2 Duo

( Message 3289 )
Posted 3741 days ago by Memo
I have a Core 2 Duo 6700, it is also Watercooled.

At a full load, the CPU never goes above 80°F.

Do you think it would be OK to overclock the CPU, or should I leave it at stock speed just to make sure that I don't get errors with the application on this project?


well go ahead, that is one of the reason we validate the results sent to us, that way you can benefit from that too :)
12)

Message boards : Number crunching : Homogenous Redundancy?

( Message 3288 )
Posted 3741 days ago by Memo
I have some C2D and they have been validating correctly. Now I wonder if there is a difference among the 3 C@D cores: Allendale, Conroe, Merom

[edit]

not to mention Kentsfield, the quad-core
13)

Message boards : Number crunching : Valid result = No Credit Granted

( Message 3287 )
Posted 3741 days ago by Memo
I have simliar problems. 1 invalid which computed OK till the end : http://docking.utep.edu/result.php?resultid=182663 and two with no consensus yet : http://docking.utep.edu/result.php?resultid=187665, http://docking.utep.edu/result.php?resultid=181876.


These results require more analysis, I found that the checksums of the sumary.txt file are different but the same for charmm.out, I would like to look at these closer but will do it next week while at the lab. Thanks for reporting this issue
14)

Message boards : Number crunching : Valid result = No Credit Granted

( Message 3286 )
Posted 3741 days ago by Memo
Aaron I did a quick check on your results for those two workunits and these are the results:

WU = 48526
61d42174fba26a453b1235f8952f388b
146cbb9bbcc0af02af0c0caa3002653e

WU = 48525
61d42174fba26a453b1235f8952f388b
4b714d1a48705b59ddd0fc9c66bd36ef


This are the checksums of the summary.txt file that contains the results. Your are the bottom row and as you can see they are different. As you can see the top row is the same but remember that we are sending the same WU at the moment.

As why the results are different I cannot say since charmm completed all computation successfully, maybe a random error in one of the calculations? If you think you know why this might be happening (overclocking, hardware malfunctions) please let us know. Thanks for the observation.
15)

Message boards : Number crunching : Invalid results reported thread [Use Here]

( Message 3249 )
Posted 3749 days ago by Memo
By looking at the time a replica was sent and the time it was reported you can come to the conclusion that it ran for more than a few seconds.

I can say its only a problem with the applications(BOINC or CHARMM) I will try to look at that next week but I make no promises as is finals week here at UTEP and I am loaded with work, but for sure I can throw it in the queue :)

Thanks for the observation
16)

Message boards : Web site : Newsletter broken

( Message 3155 )
Posted 3757 days ago by Memo
Thanks for the obvservation.

Can you check again please, either somebody (Andre) fixed the problem or was just something temporal because I didn't found any problems with the newsletter.
17)

Message boards : Number crunching : New WU mentioned on front page

( Message 2830 )
Posted 3789 days ago by Memo
This is realted to something else that I am still investigating but to give you an idea for today (Mountain time) there have benn arround 447 requests, so this seems to be something related something else. I will keep you updated.

Thank you

Memo
18)

Message boards : Number crunching : 6 % unassigned but no work ?

( Message 2669 )
Posted 3811 days ago by Memo
My WinXP AMD machine is getting the "committed to other platforms" message. The shared memory page says 6 % (56 WU) are unassigned. I've tried updating the shared memory page and the timestamp at the bottom goes up by a minute, but it keeps saying unassigned 6 % and my WinXP AMD machine keeps getting the "committed to other platforms" message. Is there some other platform limit in action here ?

The WIN / AMD machine does already have 8 WU. It's on a modem and is set for 1.7 days of work so it doesn't run out. It does a WU in about 3 Hours + 1 or 2 Minutes.

It's Machine # 1096 .

From the machines stats page:

Average turnaround time 1.57 days
Maximum daily WU quota per CPU 50/day
Results 267
Number of times client has contacted server 1737

-- David

EDIT: Fix BBCode


That is interesting and I did not expect this to happen. Will investigate on this issue.

Is the "My WinXP AMD machine" the same machine as "The WIN / AMD Machine"?

If it is did you notice how long did it take to get the work?


It's the same machine. Sorry for the confusion. On D@H I have 3 machines:

WinXP Pro - Intel Celeron 2.3Ghz - 1.5GB Ram
WinXP Home - AMD Sempron 3100+ (Socket 754) - 1.25GB Ram
Linux Ubuntu 6.10 - AMD Sempron 2500+ (Socket A) - 1 GB Ram

The WinXP Pro machine has the 56k modem and is setup for internet connection sharing. The WinXP home and the Ubuntu Linux machine are using DHCP and the WinXP Pro machine acts as a NAT router.

The problem was on the "WinXP Home - AMD Sempron 3100+" machine. While it was asking for work, I reloaded the shared memory status page a few times and noticed that the time stamp at the bottom of the webpage was incrementing by exactly 1 minute (hh:mm:02). I let it ask for work 4 or 5 times and then I suspended network activity. I'm on a modem so I don't have an always on connection. It was probably a few hours before I connected it again.

There used to be a 4th machine but it ran RHEL3 and would never work even with the ulimit fix. RHEL3 is a 2.4.x Linux kernel with a bunch of stuff that was backported to it. The RHEL3 machine is going away this month, anyway.

HTH,

-- David



OK thanks for clarify on that, tomorrow we have a meeting I will make sure we discuss this issue. According to what I know, and I am far from being a BOINC expert, when a machine requests work it start to check the replicas available in the shared memory and if it is taken by another HR class it increases the infeasibile count and goes to the next one, if there is the same result for all the replicas in the shared memory the message of there is work but it is com... is sent.

I will check the cgi log to see if I can find something suspicious, do you remember by any chance when this happened?
19)

Message boards : Number crunching : 6 % unassigned but no work ?

( Message 2649 )
Posted 3813 days ago by Memo
My WinXP AMD machine is getting the "committed to other platforms" message. The shared memory page says 6 % (56 WU) are unassigned. I've tried updating the shared memory page and the timestamp at the bottom goes up by a minute, but it keeps saying unassigned 6 % and my WinXP AMD machine keeps getting the "committed to other platforms" message. Is there some other platform limit in action here ?

The WIN / AMD machine does already have 8 WU. It's on a modem and is set for 1.7 days of work so it doesn't run out. It does a WU in about 3 Hours + 1 or 2 Minutes.

It's Machine # 1096 .

From the machines stats page:

Average turnaround time 1.57 days
Maximum daily WU quota per CPU 50/day
Results 267
Number of times client has contacted server 1737

-- David

EDIT: Fix BBCode


That is interesting and I did not expect this to happen. Will investigate on this issue.

Is the "My WinXP AMD machine" the same machine as "The WIN / AMD Machine"?

If it is did you notice how long did it take to get the work?
20)

Message boards : Number crunching : Shared memory overview page available

( Message 2635 )
Posted 3814 days ago by Memo
If you can swap the replicas in and out of shared memory back to the database, is it not possible to redistribute them in shared memory at that point?


You make it sound easy :-) With the current boinc code it is not that easy to implement though, because it implies HR knowledge of hosts that will come and ask for work at a certain point in time. And that's something you never know with volunteer computing projects: hosts disappear and attach all the time, so the best thing you can do is make a good as possible estimate.

But that's what all the computer science in this project is for: to improve the boinc default scheduling algorithms. And we plan to do exactly that :-)

Thanks
Andre



Also once one unassigned replica is downloaded by a host that WU will belong to that HR class and the other two replicas must go to the same HR class. So even if you send back replicas to the DB once they come back to the shared memory they will have the same HR class, otherwise, homogeneous redundancy will not work.



Next 10 posts