Pirates@Home logo

Pirates@Home

Berkeley Open Infrastructure
BOINC!
for Network Computing
Home Help Status Forums Glossary Account

Long Run Times & Then Error's ... ???

log in

Advanced search

Message boards : Number Crunching : Long Run Times & Then Error's ... ???

Author Message
STE\/E
Send message
Joined: 13 Aug 04
United States
BlackOps
Credit: 42,141.2
RAC: 0.00
Joined: Aug 13, 2004
Verified: Oct 4, 2011
Punishment: Mess Duty
Message 1221 - Posted: 6 Feb 2005 | 10:57:11 UTC
Last modified: 6 Feb 2005 | 11:00:10 UTC

I've had several WU take 90 minutes to run & then they just show up as Computation Error's in my Account. I've had 2 like that & have 2 going right now that are at 1:10:00 & showing 35-40 minutes left to run yet ... ???

I also have 2 more at 40 min's on another Computer ...

Steven Purvis
Send message
Joined: 13 Jan 05
United Kingdom
BOINC Synergy
Credit: 1,828.0
RAC: 0.00
Joined: Jan 13, 2005
Verified: NEVER
Pieces of Eight: 3
Message 1222 - Posted: 6 Feb 2005 | 11:12:12 UTC - in response to Message 1221.

> I've had several WU take 90 minutes to run & then they just show up as
> Computation Error's in my Account. I've had 2 like that & have 2 going
> right now that are at 1:10:00 & showing 35-40 minutes left to run yet ...
> ???
>
> I also have 2 more at 40 min's on another Computer ...
>

I've got 5 now which have suffered from error -185 Maximum CPU time exceeded, all on the same computer though. It's not a powerful computer, and I didn't know there was a time limit on CPU, rather than return limit. Maybe I should just detatch that computer.
____________

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,065.6
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 1223 - Posted: 6 Feb 2005 | 13:41:24 UTC - in response to Message 1221.
Last modified: 6 Feb 2005 | 14:52:16 UTC

> I've had several WU take 90 minutes to run & then they just show up as
> Computation Error's in my Account. I've had 2 like that & have 2 going
> right now that are at 1:10:00 & showing 35-40 minutes left to run yet ...
> ???
>
> I also have 2 more at 40 min's on another Computer ...

The amount of work done by the sextant WU's should be about the same for all computers. Slower computers should just take longer to do the work. The CPU estimate was computed for the slowest computer in the fleet, but we can only roughly take into account the amount of time devoted to other projects, though that has more of an influence on the delay_bound not the CPU time estimate.

Sextant cycles the search position through all the pulsars in our list.
There are almost 1300 of them. The fraction done should be updated correctly and the number of positions searched should also be increasing. (Note that there is no real "search", as we don't have any Einstein@Home analysis code in sextant, only graphics.)

Sextant has a simple checkpointing system, so that if the WU is interrupted (to work on SETI or Einstein WU's) then it should start up again at the last position recorded. This should be noted in the stderr output and there will be a checkpoint file containing the iteration count and the number of position searched so far (these are often the same). Symptoms of a problem here would be that the fraction done and number of search positions don't update correctly. I will take another look at this code to verify that it is working properly.

____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

Carlos G P
Send message
Joined: 22 Dec 04
United States
BOINC Synergy
Credit: 2,600.8
RAC: 0.00
Joined: Dec 22, 2004
Verified: Nov 12, 2009
Pieces of Eight: 8
Punishment: Keel Haul
Message 1226 - Posted: 6 Feb 2005 | 15:27:54 UTC
Last modified: 6 Feb 2005 | 15:28:30 UTC

Is the problem seen by Poor Boy, Steven Purvis, and Ricardo (http://pirates.vassar.edu/forum_thread.php?id=278#1220) possibly be that maybe they are not running cc 4.19 as is now requried? (http://pirates.vassar.edu/forum_thread.php?id=279#1213)

STE\/E
Send message
Joined: 13 Aug 04
United States
BlackOps
Credit: 42,141.2
RAC: 0.00
Joined: Aug 13, 2004
Verified: Oct 4, 2011
Punishment: Mess Duty
Message 1228 - Posted: 6 Feb 2005 | 19:46:52 UTC
Last modified: 6 Feb 2005 | 19:50:50 UTC

I have been and now am running the v4.19 on all my Computers Carlos, so thats not the problem for me ...

I've had 21 long running Wu's since last night, 15 of them have been returned okay & granted credit, 6 showed a computation error ...

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,065.6
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 1229 - Posted: 6 Feb 2005 | 22:05:37 UTC - in response to Message 1228.

> I have been and now am running the v4.19 on all my Computers Carlos, so thats
> not the problem for me ...
>
> I've had 21 long running Wu's since last night, 15 of them have been returned
> okay & granted credit, 6 showed a computation error ...

How often do you have BOINC switch between applications/projects? If it is a rather short inteveral on a slow machine then it could be that the progress is not getting checkpointed, and then the app falls back to where it was last time, over and over again, until the time limit is reached.

Diagnostic output to stderr is limited right now because we turned a lot off for Einstein@Home, but in the next release I'll increase the output just for Pirates.

____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

STE\/E
Send message
Joined: 13 Aug 04
United States
BlackOps
Credit: 42,141.2
RAC: 0.00
Joined: Aug 13, 2004
Verified: Oct 4, 2011
Punishment: Mess Duty
Message 1239 - Posted: 7 Feb 2005 | 3:40:23 UTC
Last modified: 7 Feb 2005 | 4:01:06 UTC

How often do you have BOINC switch between applications/projects?
==========
Actually when I got all the Pirate WU's yesterday I changed my Resource Preferences to 10000 to 1 in favor of Pirate's to be sure I was able to run them all out in time. So I wasn't switching at all between the Projects with that setting.

But even so out of 760 WU's I downloaded last night I still will miss the deadline on 3 WU's by less than 20 minutes. But I checked the WU's out and nobody else has turned them in yet either so they still may be all right.

My slowest PC is a P4 3.06 running in HT mode, so I wouldn't exactly call that slow ... ??? Out of 760 WU's it looks like I had 24 of the 1/2 hour or longer WU's, 18 of them were Valid & recieved credit, 6 showed the Computation Error and were Invalid.

[B^S] RicketyCat
Avatar
Send message
Joined: 21 Jul 04
United States
BOINC Synergy
Credit: 410.1
RAC: 0.00
Joined: Jul 21, 2004
Verified: Oct 11, 2010
Pieces of Eight: 4
Message 1241 - Posted: 7 Feb 2005 | 6:33:01 UTC
Last modified: 7 Feb 2005 | 9:09:46 UTC

I have not had any long run-time errors, but I have noticed several "upload" errors. It seems sextant may not like dial-up.

I have my prefs. set to switch at 12 hr. intervals to allow longer WUs (SETI, Einstein) time to finish before going on. With the "debt" checker and the high priority assigned to Pirates there has been no problem getting the client to move through the WUs and should be finishing on time.

The following is an example of what is in the stderr file:

2005-02-06 17:44:45 [Pirates@Home] Unrecoverable error for result wu_1107619403_2545_1 (Maximum CPU time exceeded)

and in the stdout file for the same:

2005-02-06 17:44:45 [Pirates@Home] Aborting result wu_1107619403_2545_1: exceeded CPU time limit 762.619719

[edit] just looked at that particular WU in the results. Saw that the unit was compiled using BOINC 4.56. Is that a server-side compiler or the version I should be using to avoid errors? [/edit]
____________

Profile Wormholio
Captain
Avatar
Send message
Joined: 6 Jun 04
United States
Away
Credit: 4,065.6
RAC: 0.00
Joined: Jun 6, 2004
Verified: Mar 13, 2008
Dubloons: 3
Pieces of Eight: 10
Punishment: Aztec curse
Message 1247 - Posted: 7 Feb 2005 | 14:35:18 UTC - in response to Message 1239.
Last modified: 7 Feb 2005 | 14:36:36 UTC

> Actually when I got all the Pirate WU's yesterday I changed my Resource
> Preferences to 10000 to 1 in favor of Pirate's to be sure I was able to run
> them all out in time.

Hmm... I wonder if such a high resource share could have caused the problem.
It should not, but I wonder...

Not that having such a high resource share is bad. In fact it's a good thing that the Pirates are stressing the system in ways more casual users might not think of.

A new version of server and client software is on the way, so we will see if these problems clear up once we update. Stand by...


____________
-- Eric Myers

"Education is not the filling of a pail, but the lighting of a fire." -- William Butler Yeats

STE\/E
Send message
Joined: 13 Aug 04
United States
BlackOps
Credit: 42,141.2
RAC: 0.00
Joined: Aug 13, 2004
Verified: Oct 4, 2011
Punishment: Mess Duty
Message 1248 - Posted: 7 Feb 2005 | 15:44:29 UTC

Hmm... I wonder if such a high resource share could have caused the problem.
It should not, but I wonder...
==========

It's never caused a problem at any of the other Projects, it's my way of assuring that only that project runs, once BOINC switch's to the Project that is set to 10000 it will only run that Project as long as I have WU's for it. Then BOINC will switch back to the other Projects if I run out but immediately switch back to the 10000 if I get any more WU's for it ...

Profile Liberto [Valencia]
Avatar
Send message
Joined: 23 Jul 04
Spain
Astroseti
Credit: 245.3
RAC: 0.00
Joined: Jul 23, 2004
Verified: NEVER
Dubloons: 1
Punishment: Walk Plank
Message 1251 - Posted: 7 Feb 2005 | 16:30:37 UTC - in response to Message 1248.

> Hmm... I wonder if such a high resource share could have caused the problem.
> It should not, but I wonder...
> ==========
>
> It's never caused a problem at any of the other Projects, it's my way of
> assuring that only that project runs, once BOINC switch's to the Project that
> is set to 10000 it will only run that Project as long as I have WU's for it.
> Then BOINC will switch back to the other Projects if I run out but immediately
> switch back to the 10000 if I get any more WU's for it ...
>
>
That might be yourway of assuring that the program runs, but do not forget that the BOINC CC, has an internal clock that says every when a unit must be switched to another project.
If you have 12 units from pirates and they get crunched in 8 minutes each, your BOINC program will continue until it reaches the 60 minutes mark and then switch to anothe project.
When its turn comes back again - pirates will take the rest.

I have not had any problems in the runs of all projects and in spite of having modified my preferences to 800 in pirates, I got the units that the server wanted to send, so I switched back to the even figure of 200 for each project.

It also happens that many people get desperate and try continuosly to contact the project by updating... that does not help the CC process which has an internal secuence clock so I would encourage people to be more patient and things will go much better. I am usin the CC 4.19 and I am involved in all available projects - including a continuous machine trial to reach LHC, when they get ready - and it works OK.


____________

John McLeod VII
Avatar
Send message
Joined: 20 Jul 04
United States
BOINC Synergy
Credit: 3,324.6
RAC: 0.00
Joined: Jul 20, 2004
Verified: NEVER
Message 1261 - Posted: 8 Feb 2005 | 2:57:20 UTC - in response to Message 1251.

> > Hmm... I wonder if such a high resource share could have caused the
> problem.
> > It should not, but I wonder...
> > ==========
> >
> > It's never caused a problem at any of the other Projects, it's my way of
> > assuring that only that project runs, once BOINC switch's to the Project
> that
> > is set to 10000 it will only run that Project as long as I have WU's for
> it.
> > Then BOINC will switch back to the other Projects if I run out but
> immediately
> > switch back to the 10000 if I get any more WU's for it ...
> >
> >
> That might be yourway of assuring that the program runs, but do not
> forget that the BOINC CC, has an internal clock that says every when a unit
> must be switched to another project.
> If you have 12 units from pirates and they get crunched in 8 minutes each,
> your BOINC program will continue until it reaches the 60 minutes mark and then
> switch to anothe project.
> When its turn comes back again - pirates will take the rest.
>
> I have not had any problems in the runs of all projects and in spite of having
> modified my preferences to 800 in pirates, I got the units that the server
> wanted to send, so I switched back to the even figure of 200 for each
> project.
>
> It also happens that many people get desperate and try continuosly to contact
> the project by updating... that does not help the CC process which has an
> internal secuence clock so I would encourage people to be more patient and
> things will go much better. I am usin the CC 4.19 and I am involved in all
> available projects - including a continuous machine trial to reach LHC, when
> they get ready - and it works OK.
>
BOINC decides which project to crunch for at several times. When a result is complete, when a WU is downloaded, and at 60 (changeable by a setting) minutes after the last decision event. It decides based on the resource debt for each project, and picks the highest one. The resource debt is calculated based on the resource share, the CPU time for each project during the last segment, and the wall time of the last segment. So if Pirates is 1000 x higher resource share than any other project, after an hour of crunching for a different project, Pirates should get about 1000 hours of CPU time to finish its work. (I personally use a factor of 20 which seems to be enough).
____________


BOINC WIKI

Post to thread

Message boards : Number Crunching : Long Run Times & Then Error's ... ???

Home Help Status Forums Glossary Account


Return to Pirates@Home main page


Copyright © 2020 Capt. Jack Sparrow