niceness and Linux
Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee
niceness and Linux
I just started running F@H on some dual-Xeon Linux boxes at the office. The thing that troubles me is that even though they are running at nice 19, they are still taking CPU away from other jobs. For example, if I run burnP6 (x2) and F@H (x2), I see the burnP6 jobs only getting about 90ish % of the CPU and F@H about 10ish %. If I stop F@H, the burnP6 jobs get 98ish %. The remainder is X, top, etc.
Is there a way to run F@H in Linux so it truly doesn't take CPU away from other running processes? If not, I might not be able to do this at the office.
In XP at home, I don't see this problem.
Is there a way to run F@H in Linux so it truly doesn't take CPU away from other running processes? If not, I might not be able to do this at the office.
In XP at home, I don't see this problem.
-
- Posts: 237
- Joined: Tue Nov 11, 2003 6:48 am
- Location: Gothenburg, Sweden
- Contact:
However, burnP6 doesn't matter. Even doing an empty while loop in the shell should attempt to grab 100% CPU. If these types of jobs are running at the default niceness of 0, they are quite a bit higher-priority than a nice 19 job such as the F@H client.
Upon more googling, I now see this appears to be a limitation of Linux. Nice 19 jobs will still get some CPU. I'm seeing them get about 6-8% even if there is another normal job wanting 100%. There appears to be discussions about a "SCHED_IDLE" kernel patch, which would allow a certain class of jobs to run only if nothing else wants the CPU, but it appears they have problems (e.g., deadlocks and priority inversion).
My conclusion is that running the Linux client is not as well-behaved as running the XP client in that it cannot completely get out of the way if something else wants the CPU. The upshot of this is I will not be able to run this on CPUs at work. When we run something, we want it to get 100% CPU. I was just hoping to soak up idle cycles.
Bummer. Because I likely had 50 2+ GHz CPUs I could have utilized.
Perhaps the 2.6 kernels support the SCHED_BATCH functionality, but I'm stuck with 2.4 for now for other reasons.
Upon more googling, I now see this appears to be a limitation of Linux. Nice 19 jobs will still get some CPU. I'm seeing them get about 6-8% even if there is another normal job wanting 100%. There appears to be discussions about a "SCHED_IDLE" kernel patch, which would allow a certain class of jobs to run only if nothing else wants the CPU, but it appears they have problems (e.g., deadlocks and priority inversion).
My conclusion is that running the Linux client is not as well-behaved as running the XP client in that it cannot completely get out of the way if something else wants the CPU. The upshot of this is I will not be able to run this on CPUs at work. When we run something, we want it to get 100% CPU. I was just hoping to soak up idle cycles.
Bummer. Because I likely had 50 2+ GHz CPUs I could have utilized.
Perhaps the 2.6 kernels support the SCHED_BATCH functionality, but I'm stuck with 2.4 for now for other reasons.
The correct interpretation of 'nice' in linux and unixy operating systems is that it uses less cpu at the most nice level, but always will receive some cpu. The 'everything moves forward' is a requirement to prevent weird priority inversion scenarios that could cause deadlocks in your system.
I maintain a 2.6 patchset that includes SCHED_BATCH functionality, and I used to maintain a 2.4 patchset that had it also. My current 2.6 patchset is found here: http://kernel.kolivas.org and my 2.4 patchset I handed over to Eric Hustvedt and he maintains it here: http://www.plumlocosoft.com/kernel/. SCHED_BATCH functionality for mainline linux kernels is still unlikely unless I push much harder for it which at the moment I have no intention of doing.
Note that true idle scheduling is not common for any operating system, and even the bsds that support it only have it available to root because of the priority inversion risk it poses. My patch, however, makes it safe for ordinary users to set batch scheduling.
I maintain a 2.6 patchset that includes SCHED_BATCH functionality, and I used to maintain a 2.4 patchset that had it also. My current 2.6 patchset is found here: http://kernel.kolivas.org and my 2.4 patchset I handed over to Eric Hustvedt and he maintains it here: http://www.plumlocosoft.com/kernel/. SCHED_BATCH functionality for mainline linux kernels is still unlikely unless I push much harder for it which at the moment I have no intention of doing.
Note that true idle scheduling is not common for any operating system, and even the bsds that support it only have it available to root because of the priority inversion risk it poses. My patch, however, makes it safe for ordinary users to set batch scheduling.
Thanks.
Unfortunately, I doubt I can play with kernel patches for these machines now. (I used to, but things have changed now that our machines are more centrally managed.)
It is too bad that nice 19 isn't given signficantly less CPU (say 1% or less). I don't need true idle scheduling. 8% is just unacceptable.
Unfortunately, I doubt I can play with kernel patches for these machines now. (I used to, but things have changed now that our machines are more centrally managed.)
It is too bad that nice 19 isn't given signficantly less CPU (say 1% or less). I don't need true idle scheduling. 8% is just unacceptable.
Well in that case you could try an ancient script I created a while ago called idlerun which just watches interrupts and decides if the machine is idle or not and starts/stops applications. http://idlerun.kolivas.org
-
- Patron of SPCR
- Posts: 2674
- Joined: Sun Jul 04, 2004 6:07 am
- Location: Houten, The Netherlands, Europe
Are those machines doing anything at night? I've seen people at http://forum.folding-community.org/
reporting to use schedulers or the crontab to run F@H outside of office hours.
ckolivas,
I know very little about linux, so bear with me. F@H consists of two processes running at once and interacting with each other, the client and the core. Can idlerun handle this?
reporting to use schedulers or the crontab to run F@H outside of office hours.
ckolivas,
I know very little about linux, so bear with me. F@H consists of two processes running at once and interacting with each other, the client and the core. Can idlerun handle this?
From memory it pauses the child group so yes it should (I haven't looked at the code in about 4 years though...)Tibors wrote: I know very little about linux, so bear with me. F@H consists of two processes running at once and interacting with each other, the client and the core. Can idlerun handle this?
I've written the equivalent of idlerun in the past (called "polite") and may try to resurrect it. The thing is we have our own custom distributed job system that grabs free cycles on machines (no non-nice processes running, X input activity, shell input, ignore browsers eating CPU due to flash ads, etc.). What I might try to do is integrate F@H SIG_STOP/SIG_CONT process group control into the daemon that runs on each machine since it already has a good idea about whether or not the machine is idle according to our own criteria.
The fact is jobs can pop up 24/7 since we have nocturnal graduate students.
If SHED_IDLE/SCHED_BATCH were available, I wouldn't have had to do a thing. Oh well, nothing wrong with a little challenge.
The fact is jobs can pop up 24/7 since we have nocturnal graduate students.
If SHED_IDLE/SCHED_BATCH were available, I wouldn't have had to do a thing. Oh well, nothing wrong with a little challenge.
That sounds interesting. Me too, I had noticed the F@H CPU useage of ~10% even upon load. But I had troubles running F@H with the script:ckolivas wrote:Well in that case you could try an ancient script I created a while ago called idlerun which just watches interrupts and decides if the machine is idle or not and starts/stops applications. http://idlerun.kolivas.org
Code: Select all
[root@xxxxxxx CPU1]# /home/jan/progs/idlerun-0.21/idlerun -i 1 -c 30 -w -- /home/jan/foldingathome/CPU1/FAH502-Linux.exe -forceasm -advmethods -verbosity 9
Note: Please read the license agreement (FAH502-Linux.exe -license). Further
use of this software requires that you have read and accepted this agreement.
--- Opening Log file [March 22 01:06:00]
# Linux Console Edition #######################################################
###############################################################################
Folding@Home Client Version 5.02
http://folding.stanford.edu
###############################################################################
###############################################################################
Launch directory: /home/jan/foldingathome/CPU1
Executable: /home/jan/foldingathome/CPU1/FAH502-Linux.exe
Arguments: -forceasm -advmethods -verbosity 9
Warning:
By using the -forceasm flag, you are overriding
safeguards in the program. If you did not intend to
do this, please restart the program without -forceasm.
If work units are not completing fully (and particularly
if your machine is overclocked), then please discontinue
use of the flag.
[01:06:00] - Ask before connecting: No
[01:06:00] - User name: JanW (Team 31574)
[01:06:00] - User ID: xxxxxxxxxxxxxxxx
[01:06:00] - Machine ID: 1
[01:06:00]
[01:06:00] Loaded queue successfully.
[01:06:00] + Benchmarking ...
Child not running
Don't mean to turn this into a "idlerun support thread". Just thought I'd give some feedback.
Well I don't mind if the moderators don't since it is fairly on topic for FAH, and I haven't had any interest in this program for quite a while. I tried it at home with the FAH client and it works fine, but you need to recompile it so that it understands the threading model of your current installation. As a bonus I've updated idlerun with some minor cleanups and changed the default mode to start/stop according to cpu load so for the most common usage you need only specifyJanW wrote:Don't mean to turn this into a "idlerun support thread". Just thought I'd give some feedback.
Code: Select all
idlerun ./FAH502-Linux.exe
http://members.optusnet.com.au/ckolivas ... .22.tar.gz
I highly recommend you compile it yourself with 'make' and then install it where you want it.
-
- Posts: 968
- Joined: Fri Jan 07, 2005 7:13 pm
- Location: Bristol, UK
- Contact:
I'll definately try it, but I hardly had time to stop by SPCR lately, let alone tinker with F@H. Thanks a lot for your efforts, though. This could be very helpful for me, as I use my CPU mainly in short bursts of data processing, during which I idle myself, waiting for the results.
Re: compiling the source. I know it's preferrable to compile, but the previous version had failed to compile on my box. Whatever the reasons, you seem to have fixed them as v0.22 compiles just fine.
Re: compiling the source. I know it's preferrable to compile, but the previous version had failed to compile on my box. Whatever the reasons, you seem to have fixed them as v0.22 compiles just fine.
FC3 rpms
There is a "desktop kernel" set of rpms for FC3 which include my cpu scheduler and SCHED_BATCH (idle scheduling) support for those that wish to only use rpm based kernels.
http://apt.bea.ki.se/kernel-desktop/
http://apt.bea.ki.se/kernel-desktop/
Ok, here are my first impressions (only tried just now):
First off, it runs! I just used, as you suggested "idlerun ./FAH502-Linux.exe". But I likely need to fine-tune some parameters. Currently, FAH will regularly produce enough CPU activity to shut itself down (at least that's what I assume): I was observing the output of "top", with FAH sitting at 97--98% and no other CPU-intensive app running. And then, without any appearent reason, FAH stopped, only to reappear some time (1-2min) later. This seems to happen regularly, and even when the computer is otherwise idle, the time needed to complete a frame is up by about 70% on average, compared to FAH w/o iderun.
Incidentally, what does the according parameter in idlerun mean?
Any idea on how I can make FAH use idlerun when I start it as a service?
First off, it runs! I just used, as you suggested "idlerun ./FAH502-Linux.exe". But I likely need to fine-tune some parameters. Currently, FAH will regularly produce enough CPU activity to shut itself down (at least that's what I assume): I was observing the output of "top", with FAH sitting at 97--98% and no other CPU-intensive app running. And then, without any appearent reason, FAH stopped, only to reappear some time (1-2min) later. This seems to happen regularly, and even when the computer is otherwise idle, the time needed to complete a frame is up by about 70% on average, compared to FAH w/o iderun.
Incidentally, what does the according parameter in idlerun mean?
What exactly reaches 105% of what?the idlerun documentation wrote:-h
The average (h)ighmark of CPU usage at which the command will be paused. The default is 1.05 (105%).
Any idea on how I can make FAH use idlerun when I start it as a service?
105% cpu demand. If only one application is running and it is fully cpu bound (like FAH) then the cpu demand is 100%. If two are fully cpu bound then the demand slowly rises to 200%. Since it slowly averages the cpu demand over time, you cant just set it to 200% because it only slowly rises to 200%. Setting it to 105% means if only FAH is running the load should be 100% and it wont be paused. Setting it to 105% means that if _anything_ else tries to run it will raise the load quickly above 105% and FAH will be shut down. If it is easily being stopped you can raise this value to something more reasonable say 135%. At the moment it has to be below the threshhold for at least 1 min before starting again. I can easily change this to seconds instead of minutes. I erred on the side of stopping the task rather than allowing it to run but code modifications are easy to do.
Probably ideally I should monitor the rate of rise and determine from that instead of the average cpu.
I'm not sure how FAH is set up to run as a service, but wherever the script is that launches it, you can edit it to read 'idlerun -h 135 FAHblah' instead of just FAHblah.
Probably ideally I should monitor the rate of rise and determine from that instead of the average cpu.
I'm not sure how FAH is set up to run as a service, but wherever the script is that launches it, you can edit it to read 'idlerun -h 135 FAHblah' instead of just FAHblah.
Great, thank you for that answer, ckolivas. Seems a lot clearer now. I suppose what timing is best for restarting the process depends on typical machine useage. For me it would be seconds, but others might prefer your current default. Any chance of making it a command line parameter?
I'll try again with -h 1.5 as I really only care about processes that want lots of CPU for several minutes. It doesn't matter if it takes a while before FAH shuts down, as it's gonna get only 10% CPU during that time anyway. 10% of, say, 2 minutes = 12s, that's no big deal. I'm in a hurry, but not that desperate.
I'll try again with -h 1.5 as I really only care about processes that want lots of CPU for several minutes. It doesn't matter if it takes a while before FAH shuts down, as it's gonna get only 10% CPU during that time anyway. 10% of, say, 2 minutes = 12s, that's no big deal. I'm in a hurry, but not that desperate.
0.23
Ok since it is so easy to modify, here is a slightly newer one:
http://members.optusnet.com.au/ckolivas ... .23.tar.gz.
This one has the idle polling interval the same as the busy polling interval (default 10 seconds) and has the high and low watermarks set to 1.37 and 0.37 (the .37 comes from the exponential function related to rate of rise by the way).
It may be worth mentioning that if you are running hyperthreading on linux, you will want a kernel at least 2.6.7+ as this is the first kernel that has 'nice' awareness between hyperthread siblings (which no other operating system has yet )
http://members.optusnet.com.au/ckolivas ... .23.tar.gz.
This one has the idle polling interval the same as the busy polling interval (default 10 seconds) and has the high and low watermarks set to 1.37 and 0.37 (the .37 comes from the exponential function related to rate of rise by the way).
It may be worth mentioning that if you are running hyperthreading on linux, you will want a kernel at least 2.6.7+ as this is the first kernel that has 'nice' awareness between hyperthread siblings (which no other operating system has yet )
v0.24
Ok now this version should do what you want without extra parameters. It watches the rate of rise of cpu load and determines what the dynamic cpu load is. It will therefore stop it faster under heavy load and restart it faster under light load than previously. The default load check interval is 10 seconds, and since the kernel only calculates load every 5 seconds there is not much point going lower than 10s. The watermarks are set to 163% high and 37% low load. I took the opportunity to clean it up a bit more too
Get it here:
http://members.optusnet.com.au/ckolivas ... .24.tar.gz
Get it here:
http://members.optusnet.com.au/ckolivas ... .24.tar.gz