niceness and Linux

ilh · Post by **ilh** » Mon Mar 21, 2005 10:07 am

I just started running F@H on some dual-Xeon Linux boxes at the office. The thing that troubles me is that even though they are running at nice 19, they are still taking CPU away from other jobs. For example, if I run burnP6 (x2) and F@H (x2), I see the burnP6 jobs only getting about 90ish % of the CPU and F@H about 10ish %. If I stop F@H, the burnP6 jobs get 98ish %. The remainder is X, top, etc.

Is there a way to run F@H in Linux so it truly doesn't take CPU away from other running processes? If not, I might not be able to do this at the office.

In XP at home, I don't see this problem.

Qwertyiopisme · Post by **Qwertyiopisme** » Mon Mar 21, 2005 10:18 am

Semi n00b here, are you sure that burnp6 had a higher priority than F@H?

tay · Post by **tay** » Mon Mar 21, 2005 10:26 am

If you post a link to burnp6 i will try it out on my single cpu box although I dont know how useful the result will be. The dual cpu board i have is sitting idle because of extreme lazyness

.

ilh · Post by **ilh** » Mon Mar 21, 2005 11:48 am

However, burnP6 doesn't matter. Even doing an empty while loop in the shell should attempt to grab 100% CPU. If these types of jobs are running at the default niceness of 0, they are quite a bit higher-priority than a nice 19 job such as the F@H client.

Upon more googling, I now see this appears to be a limitation of Linux. Nice 19 jobs will still get some CPU. I'm seeing them get about 6-8% even if there is another normal job wanting 100%. There appears to be discussions about a "SCHED_IDLE" kernel patch, which would allow a certain class of jobs to run only if nothing else wants the CPU, but it appears they have problems (e.g., deadlocks and priority inversion).

My conclusion is that running the Linux client is not as well-behaved as running the XP client in that it cannot completely get out of the way if something else wants the CPU. The upshot of this is I will not be able to run this on CPUs at work. When we run something, we want it to get 100% CPU. I was just hoping to soak up idle cycles.

Bummer. Because I likely had 50 2+ GHz CPUs I could have utilized.

Perhaps the 2.6 kernels support the SCHED_BATCH functionality, but I'm stuck with 2.4 for now for other reasons.

ckolivas · Post by **ckolivas** » Mon Mar 21, 2005 2:08 pm

The correct interpretation of 'nice' in linux and unixy operating systems is that it uses less cpu at the most nice level, but always will receive some cpu. The 'everything moves forward' is a requirement to prevent weird priority inversion scenarios that could cause deadlocks in your system.

I maintain a 2.6 patchset that includes SCHED_BATCH functionality, and I used to maintain a 2.4 patchset that had it also. My current 2.6 patchset is found here: http://kernel.kolivas.org and my 2.4 patchset I handed over to Eric Hustvedt and he maintains it here: http://www.plumlocosoft.com/kernel/. SCHED_BATCH functionality for mainline linux kernels is still unlikely unless I push much harder for it which at the moment I have no intention of doing.

Note that true idle scheduling is not common for any operating system, and even the bsds that support it only have it available to root because of the priority inversion risk it poses. My patch, however, makes it safe for ordinary users to set batch scheduling.

dukla2000 · Post by **dukla2000** » Mon Mar 21, 2005 2:13 pm

I'm running 2.6.8 (SuSE 9.2) but wouldn't have a clue how to find anything about SCHED_BATCH unless you give me idiot-proof instructions. (And kernel compiles are beyond my courage threshold!)

ps - cancel that: ckolivas obviously does know the score.

ilh · Post by **ilh** » Mon Mar 21, 2005 2:22 pm

Thanks.

Unfortunately, I doubt I can play with kernel patches for these machines now. (I used to, but things have changed now that our machines are more centrally managed.)

It is too bad that nice 19 isn't given signficantly less CPU (say 1% or less). I don't need true idle scheduling. 8% is just unacceptable.

ckolivas · Post by **ckolivas** » Mon Mar 21, 2005 2:32 pm

Well in that case you could try an ancient script I created a while ago called idlerun which just watches interrupts and decides if the machine is idle or not and starts/stops applications. http://idlerun.kolivas.org

Tibors · Post by **Tibors** » Mon Mar 21, 2005 3:22 pm

Are those machines doing anything at night? I've seen people at http://forum.folding-community.org/
reporting to use schedulers or the crontab to run F@H outside of office hours.

ckolivas,
I know very little about linux, so bear with me. F@H consists of two processes running at once and interacting with each other, the client and the core. Can idlerun handle this?

ckolivas · Post by **ckolivas** » Mon Mar 21, 2005 3:36 pm

Tibors wrote: I know very little about linux, so bear with me. F@H consists of two processes running at once and interacting with each other, the client and the core. Can idlerun handle this?

From memory it pauses the child group so yes it should (I haven't looked at the code in about 4 years though...)

ilh · Post by **ilh** » Mon Mar 21, 2005 5:06 pm

I've written the equivalent of idlerun in the past (called "polite") and may try to resurrect it. The thing is we have our own custom distributed job system that grabs free cycles on machines (no non-nice processes running, X input activity, shell input, ignore browsers eating CPU due to flash ads, etc.). What I might try to do is integrate F@H SIG_STOP/SIG_CONT process group control into the daemon that runs on each machine since it already has a good idea about whether or not the machine is idle according to our own criteria.

The fact is jobs can pop up 24/7 since we have nocturnal graduate students.

If SHED_IDLE/SCHED_BATCH were available, I wouldn't have had to do a thing. Oh well, nothing wrong with a little challenge.

JanW · Post by **JanW** » Mon Mar 21, 2005 5:31 pm

ckolivas wrote:Well in that case you could try an ancient script I created a while ago called idlerun which just watches interrupts and decides if the machine is idle or not and starts/stops applications. http://idlerun.kolivas.org

That sounds interesting. Me too, I had noticed the F@H CPU useage of ~10% even upon load. But I had troubles running F@H with the script:

Code: Select all

[root@xxxxxxx CPU1]# /home/jan/progs/idlerun-0.21/idlerun -i 1 -c 30 -w -- /home/jan/foldingathome/CPU1/FAH502-Linux.exe -forceasm -advmethods -verbosity 9

Note: Please read the license agreement (FAH502-Linux.exe -license). Further
use of this software requires that you have read and accepted this agreement.



--- Opening Log file [March 22 01:06:00]


# Linux Console Edition #######################################################
###############################################################################

                       Folding@Home Client Version 5.02

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/jan/foldingathome/CPU1
Executable: /home/jan/foldingathome/CPU1/FAH502-Linux.exe
Arguments: -forceasm -advmethods -verbosity 9

Warning:
 By using the -forceasm flag, you are overriding
 safeguards in the program. If you did not intend to
 do this, please restart the program without -forceasm.
 If work units are not completing fully (and particularly
 if your machine is overclocked), then please discontinue
 use of the flag.

[01:06:00] - Ask before connecting: No
[01:06:00] - User name: JanW (Team 31574)
[01:06:00] - User ID: xxxxxxxxxxxxxxxx
[01:06:00] - Machine ID: 1
[01:06:00]
[01:06:00] Loaded queue successfully.
[01:06:00] + Benchmarking ...

Child not running

I tried various options, quotes around the command, backgrounding the process... the result is always the same. The "Child not running" is output exactly 1min after I launched the command. This is on a (custom) 2.6.5 Kernel, Mandrake 10.1.

Don't mean to turn this into a "idlerun support thread". Just thought I'd give some feedback.

ckolivas · Post by **ckolivas** » Tue Mar 22, 2005 2:51 am

JanW wrote:Don't mean to turn this into a "idlerun support thread". Just thought I'd give some feedback.

Well I don't mind if the moderators don't since it is fairly on topic for FAH, and I haven't had any interest in this program for quite a while. I tried it at home with the FAH client and it works fine, but you need to recompile it so that it understands the threading model of your current installation. As a bonus I've updated idlerun with some minor cleanups and changed the default mode to start/stop according to cpu load so for the most common usage you need only specify

Code: Select all

idlerun ./FAH502-Linux.exe

Here is the url of the updated special SPCR version:
http://members.optusnet.com.au/ckolivas ... .22.tar.gz

I highly recommend you compile it yourself with 'make' and then install it where you want it.

ckolivas · Post by **ckolivas** » Wed Mar 30, 2005 3:01 am

Did anyone try this? I coded it up just for spcr...

Tibors · Post by **Tibors** » Wed Mar 30, 2005 3:55 am

I didn't try it yet. For some reason or other I keep postponing setting up a linux server. As soon as I have that server I will certainly try.

ilh · Post by **ilh** » Wed Mar 30, 2005 6:42 am

I'm sorry, but I got my own similar program working that interacts properly with our distributed job system. It's been working great for several days on 12 CPUs.

StarfishChris · Post by **StarfishChris** » Wed Mar 30, 2005 7:24 am

I'll try it later tonight when I reboot from Windows.

JanW · Post by **JanW** » Wed Mar 30, 2005 8:38 am

I'll definately try it, but I hardly had time to stop by SPCR lately, let alone tinker with F@H. Thanks a lot for your efforts, though. This could be very helpful for me, as I use my CPU mainly in short bursts of data processing, during which I idle myself, waiting for the results.

Re: compiling the source. I know it's preferrable to compile, but the previous version had failed to compile on my box. Whatever the reasons, you seem to have fixed them as v0.22 compiles just fine.

ckolivas · Post by **ckolivas** » Thu Apr 07, 2005 8:09 pm

There is a "desktop kernel" set of rpms for FC3 which include my cpu scheduler and SCHED_BATCH (idle scheduling) support for those that wish to only use rpm based kernels.
http://apt.bea.ki.se/kernel-desktop/

JanW · Post by **JanW** » Fri Apr 08, 2005 8:28 am

Ok, here are my first impressions (only tried just now):
First off, it runs! I just used, as you suggested "idlerun ./FAH502-Linux.exe". But I likely need to fine-tune some parameters. Currently, FAH will regularly produce enough CPU activity to shut itself down (at least that's what I assume): I was observing the output of "top", with FAH sitting at 97--98% and no other CPU-intensive app running. And then, without any appearent reason, FAH stopped, only to reappear some time (1-2min) later. This seems to happen regularly, and even when the computer is otherwise idle, the time needed to complete a frame is up by about 70% on average, compared to FAH w/o iderun.

Incidentally, what does the according parameter in idlerun mean?

the idlerun documentation wrote:-h
The average (h)ighmark of CPU usage at which the command will be paused. The default is 1.05 (105%).

What exactly reaches 105% of what?

Any idea on how I can make FAH use idlerun when I start it as a service?

ckolivas · Post by **ckolivas** » Fri Apr 08, 2005 8:46 am

105% cpu demand. If only one application is running and it is fully cpu bound (like FAH) then the cpu demand is 100%. If two are fully cpu bound then the demand slowly rises to 200%. Since it slowly averages the cpu demand over time, you cant just set it to 200% because it only slowly rises to 200%. Setting it to 105% means if only FAH is running the load should be 100% and it wont be paused. Setting it to 105% means that if _anything_ else tries to run it will raise the load quickly above 105% and FAH will be shut down. If it is easily being stopped you can raise this value to something more reasonable say 135%. At the moment it has to be below the threshhold for at least 1 min before starting again. I can easily change this to seconds instead of minutes. I erred on the side of stopping the task rather than allowing it to run but code modifications are easy to do.

Probably ideally I should monitor the rate of rise and determine from that instead of the average cpu.

I'm not sure how FAH is set up to run as a service, but wherever the script is that launches it, you can edit it to read 'idlerun -h 135 FAHblah' instead of just FAHblah.

JanW · Post by **JanW** » Fri Apr 08, 2005 12:17 pm

Great, thank you for that answer, ckolivas. Seems a lot clearer now. I suppose what timing is best for restarting the process depends on typical machine useage. For me it would be seconds, but others might prefer your current default. Any chance of making it a command line parameter?

I'll try again with -h 1.5 as I really only care about processes that want lots of CPU for several minutes. It doesn't matter if it takes a while before FAH shuts down, as it's gonna get only 10% CPU during that time anyway. 10% of, say, 2 minutes = 12s, that's no big deal. I'm in a hurry, but not that desperate.

ckolivas · Post by **ckolivas** » Fri Apr 08, 2005 4:11 pm

Ok since it is so easy to modify, here is a slightly newer one:
http://members.optusnet.com.au/ckolivas ... .23.tar.gz.

This one has the idle polling interval the same as the busy polling interval (default 10 seconds) and has the high and low watermarks set to 1.37 and 0.37 (the .37 comes from the exponential function related to rate of rise by the way).

It may be worth mentioning that if you are running hyperthreading on linux, you will want a kernel at least 2.6.7+ as this is the first kernel that has 'nice' awareness between hyperthread siblings (which no other operating system has yet

)

ckolivas · Post by **ckolivas** » Fri Apr 08, 2005 5:20 pm

What the heck I may as well do it properly.

Hang in there and I'll make one that watches the rate of rise or fall and decides what the load will plateau at.

ckolivas · Post by **ckolivas** » Fri Apr 08, 2005 7:02 pm

Ok now this version should do what you want without extra parameters. It watches the rate of rise of cpu load and determines what the dynamic cpu load is. It will therefore stop it faster under heavy load and restart it faster under light load than previously. The default load check interval is 10 seconds, and since the kernel only calculates load every 5 seconds there is not much point going lower than 10s. The watermarks are set to 163% high and 37% low load. I took the opportunity to clean it up a bit more too

Get it here:
http://members.optusnet.com.au/ckolivas ... .24.tar.gz

JanW · Post by **JanW** » Fri Apr 08, 2005 11:29 pm

I'm trying it out as we speak and will report back as soon as I have some info on how it performs. Thanks a lot for coding this up for us!!!!!

niceness and Linux

niceness and Linux

FC3 rpms

0.23

v0.24