Work Units running 'forever' on Linux


Advanced search

Message boards : Unix/Linux : Work Units running 'forever' on Linux

Sort
Author Message
ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6171 - Posted 31 Jan 2011 17:52:48 UTC

I have 3 machines running Fedora 12 (BOINC 6.10.45, 2 of them with World Community Grid skin, which shouldn't make a difference, but thought I'd mention it), that have all been running their Docking work units for more than 20 hours but still say 0.000% complete in the local BOINC managers.

The BOINCTasks monitor tipped me off to it as it was showing them at 180% to 420% of estimated run times with the 'time left' column blank for those. My windows machines that run Docking are not showing this problem (so far).

Here are some of the problem WU names
28-Jan-2011 03:02:32 [Docking] Starting task 1hvi1hbv_mod0014crossdockinghiv1_11675_236234_0 using charmm34 version 623

28-Jan-2011 04:13:04 [Docking] Starting task 1hvi1hbv_mod0014crossdockinghiv1_11629_421847_0 using charmm34 version 623

28-Jan-2011 12:03:47 [Docking] Starting task 1hvi1hbv_mod0014crossdockinghiv1_17100_203391_0 using charmm34 version 623

Let me know what other info you need... I'm not sure how to troubleshoot this. If it had been only 1 machine I would have just rebooted it or at least restarted its BOINC client service and forgot about it... but all 3 Linux machines doing it only on the Docking project raised a red flag (they also run WCG, Einstein and Orbit, though Orbit hasn't had any work for about 4 months)... plus the windows machines running Docking are not exhibiting the problem. So...

Thanks!

ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6173 - Posted 31 Jan 2011 18:09:54 UTC

Oops. See
http://docking.cis.udel.edu/community/forum/thread.php?id=460

Appears to have been going on about a year and a half without a fix so apparently the solution is just abort them when you notice it happen. Sounds like this is not a good project to select for the "set it and forget it" crowd.
____________

Profile hanty

Joined: Aug 17 10
Posts: 1
ID: 31901
Credit: 140,929
RAC: 0
Message 6203 - Posted 3 Feb 2011 6:09:50 UTC

I have the same problem. I decide to delete the job.
____________
Entia non sunt multiplicanda praeter necessitatem.

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6219 - Posted 4 Feb 2011 16:46:24 UTC - in response to Message ID 6203 .

Hi All,

we have a disk issue. We stopped the generation of new jobs and are looking at the issue.

Sorry for the problem and thank you for the notes!

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6232 - Posted 5 Feb 2011 14:38:04 UTC - in response to Message ID 6219 .

Dear All, a new update from D@H:

1) We are in a recovery mode. In other words, we are collecting and validating results but we are not generating and distributing new tasks for the moment, while we are investigating what caused the problem yesterday.

2) Please bear with us. We do not have a full time system administrator taking care of D@H but the work is done by students. They are doing their very best but they have also classes and homework. We are dedicating the weekend on understanding the problem and fixing it.

Thanks for your several notes and support.

Michela

____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

Profile Michela
Forum moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Joined: Sep 13 06
Posts: 163
ID: 10
Credit: 97,083
RAC: 0
Message 6242 - Posted 7 Feb 2011 17:42:12 UTC - in response to Message ID 6232 .

We are back distributing jobs. We removed all the jobs with potential 0% progress that were in our database. Unfortunately some jobs were distributed by the time we worked on the database. Can you please abort those jobs with 0% progress and get new jobs?

Thanks,

Michela
____________
If you are interested in working on Docking@Home in a great group at UDel, contact me at 'taufer at acm dot org'!

ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6902 - Posted 11 Oct 2012 3:19:36 UTC

Hi all,

As I reported in a windows forum message
I'm seeing this problem again... Charmm 34a2 6.23 Linux and Windows.

I've started applying the Abort all Docking work units on the Tasks tab + 'no new tasks' for Docking, on the Projects tab...
until someone here can tell us what's going on.

Thanks. :-|

ZoSo

Joined: Oct 14 10
Posts: 16
ID: 33872
Credit: 3,742,738
RAC: 0
Message 6922 - Posted 16 Oct 2012 3:20:19 UTC - in response to Message ID 6902 .

Thanks for the reply.

______


____________

rbm73

Joined: Apr 5 11
Posts: 1
ID: 39512
Credit: 100,276
RAC: 0
Message 7000 - Posted 12 Dec 2012 2:08:52 UTC - in response to Message ID 6902 .

Hi all,

As I reported in a windows forum message
I'm seeing this problem again... Charmm 34a2 6.23 Linux and Windows.

I've started applying the Abort all Docking work units on the Tasks tab + 'no new tasks' for Docking, on the Projects tab...
until someone here can tell us what's going on.

Thanks. :-|


I had the problem of a task running forever on 21 Oct 2012 with the following task:

1m0b1htf_mod0014crossdockinghiv1_29652_457260_0

Should have run 3 hours I think, certainly no more than 6. At 65 hours, with no swapping to other tasks, E@H and WCG, I was forced to kill the task. It was reporting 100% complete, but I don't know how long it was doing that. Then I suspended all processing on D@H.

I have been hoping to see a post about the solution.

I run an Intel Centrino 1.86 GHz (old Toshiba Satellite) so I only get one task at a time. OS is Xubuntu 12.04.

Any update would be appreciated.
Thanks.
____________
Boyu Zhang
Forum moderator
Project administrator
Project developer
Project tester

Joined: May 5 10
Posts: 88
ID: 28821
Credit: 2,013,795
RAC: 0
Message 7001 - Posted 14 Dec 2012 18:02:14 UTC - in response to Message ID 7000 .

We had a storage space issue on the server in October, and that caused some incomplete workunits to be distributed to the volunteers. "1m0b1htf" is one of the batches of workunits affected.

The problem has been solved and we are back to normal operation now. Could you please abort the old 0% progress workunits? Sorry about the inconvenience!

Thanks!
Boyu

Hi all,

As I reported in a windows forum message
I'm seeing this problem again... Charmm 34a2 6.23 Linux and Windows.

I've started applying the Abort all Docking work units on the Tasks tab + 'no new tasks' for Docking, on the Projects tab...
until someone here can tell us what's going on.

Thanks. :-|


I had the problem of a task running forever on 21 Oct 2012 with the following task:

1m0b1htf_mod0014crossdockinghiv1_29652_457260_0

Should have run 3 hours I think, certainly no more than 6. At 65 hours, with no swapping to other tasks, E@H and WCG, I was forced to kill the task. It was reporting 100% complete, but I don't know how long it was doing that. Then I suspended all processing on D@H.

I have been hoping to see a post about the solution.

I run an Intel Centrino 1.86 GHz (old Toshiba Satellite) so I only get one task at a time. OS is Xubuntu 12.04.

Any update would be appreciated.
Thanks.

Message boards : Unix/Linux : Work Units running 'forever' on Linux

Database Error
: The MySQL server is running with the --read-only option so it cannot execute this statement
array(3) {
  [0]=>
  array(7) {
    ["file"]=>
    string(47) "/boinc/projects/docking/html_v2/inc/db_conn.inc"
    ["line"]=>
    int(97)
    ["function"]=>
    string(8) "do_query"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#15 (2) {
      ["db_conn"]=>
      resource(78) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(51) "update DBNAME.thread set views=views+1 where id=582"
    }
  }
  [1]=>
  array(7) {
    ["file"]=>
    string(48) "/boinc/projects/docking/html_v2/inc/forum_db.inc"
    ["line"]=>
    int(60)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(6) "DbConn"
    ["object"]=>
    object(DbConn)#15 (2) {
      ["db_conn"]=>
      resource(78) of type (mysql link persistent)
      ["db_name"]=>
      string(7) "docking"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(3) {
      [0]=>
      object(BoincThread)#3 (16) {
        ["id"]=>
        string(3) "582"
        ["forum"]=>
        string(1) "6"
        ["owner"]=>
        string(5) "33872"
        ["status"]=>
        string(1) "0"
        ["title"]=>
        string(37) "Work Units running 'forever' on Linux"
        ["timestamp"]=>
        string(10) "1355508134"
        ["views"]=>
        string(3) "467"
        ["replies"]=>
        string(1) "9"
        ["activity"]=>
        string(22) "3.6727488187859996e-33"
        ["sufferers"]=>
        string(1) "0"
        ["score"]=>
        string(1) "0"
        ["votes"]=>
        string(1) "0"
        ["create_time"]=>
        string(10) "1296496368"
        ["hidden"]=>
        string(1) "0"
        ["sticky"]=>
        string(1) "0"
        ["locked"]=>
        string(1) "0"
      }
      [1]=>
      &string(6) "thread"
      [2]=>
      &string(13) "views=views+1"
    }
  }
  [2]=>
  array(7) {
    ["file"]=>
    string(63) "/boinc/projects/docking/html_v2/user/community/forum/thread.php"
    ["line"]=>
    int(184)
    ["function"]=>
    string(6) "update"
    ["class"]=>
    string(11) "BoincThread"
    ["object"]=>
    object(BoincThread)#3 (16) {
      ["id"]=>
      string(3) "582"
      ["forum"]=>
      string(1) "6"
      ["owner"]=>
      string(5) "33872"
      ["status"]=>
      string(1) "0"
      ["title"]=>
      string(37) "Work Units running 'forever' on Linux"
      ["timestamp"]=>
      string(10) "1355508134"
      ["views"]=>
      string(3) "467"
      ["replies"]=>
      string(1) "9"
      ["activity"]=>
      string(22) "3.6727488187859996e-33"
      ["sufferers"]=>
      string(1) "0"
      ["score"]=>
      string(1) "0"
      ["votes"]=>
      string(1) "0"
      ["create_time"]=>
      string(10) "1296496368"
      ["hidden"]=>
      string(1) "0"
      ["sticky"]=>
      string(1) "0"
      ["locked"]=>
      string(1) "0"
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(1) {
      [0]=>
      &string(13) "views=views+1"
    }
  }
}
query: update docking.thread set views=views+1 where id=582