-advmethods on AMD now a mixed bag

haysdb · Post by **haysdb** » Fri Apr 23, 2004 11:36 am

My experience with my Athlon systems since the new points system went into effect has been decidedly mixed.

They like Tinkers (>1K PPW), so remove -advmethods.

However, they seem to like DGromacs (Double Gromacs) even more (>1300 PPW!), so put -advmethods back on.

Flip a coin.

The irony is, regular Gromacs used to be the most desireable work units, and now they are perhaps the LEAST desireable, as least on fast Athlon XP's.

David

isp · Post by **isp** » Fri Apr 23, 2004 11:57 am

So thing's are still not balanced then?

haysdb · Post by **haysdb** » Fri Apr 23, 2004 12:25 pm

isp wrote:So thing's are still not balanced then?

Not necessarily. Keep in mind that the benchmark machine is an Intel Pentium P4 with SSE2 disabled. Other architectures (P3, P4M, AXP, A64, G4, G5) will perform better or worse on Tinkers vs Gromacs vs Double Gromacs, because of the relative strengths of those architectures.

It looks to me like the strength of AMD's FPU (Floating Point Unit) is strutting it's stuff on the Tinkers and DGromacs.

One irony of the benchmark machine is that it is a P4 with SSE2 disabled, but all P4's HAVE SSE2, so NOBOBY has a machine that will exactly mirror the performance of the benchmark machine. In particular, SSE2 puts the afterburners to the Double Gromacs. One of my P4 machines is showing 1870 PPW on a DGro.

Keep -advmethods on all of your P4's

It's less clear-cut on Athlon XP. Without -advmethods, you will tend to get more Tinkers, which is good. With -advmethods you will get some DGromacs, which are ALSO good, even better in fact, than Tinkers, but it remains to be seen whether there will be as many DGros as there are Tinkers. At this point, it doesn't seem to matter. You will get different work units, but the end result (as measured in PPD/PPW) can be expected to be about the same.

Do keep in mind though, it's still early in the game, and the number of data points is still small. For example, I have had just TWO (Edit: now 3)DGromacs on my Athlon systems.

David

ColdFlame · Post by **ColdFlame** » Fri Apr 23, 2004 1:59 pm

The controversy is that a particular CPU can be vastly superior on a particular WU if it supports a particular instruction set (like SSE). If it does not then it runs in "compatibility mode" which is much slower. Like running Windows under Linux or something. Can't be good. And since I'm forced to do that (because of the assignment server) I'm only getting frustrated when my CPU isn't completely utilized.

This is more a philosophy but it seems that Stanford has 2 choices for assigning point values:
1) based on how much science was done - favors Stanford
2) based on CPU time * "CPU speed" spent - favors users

It seems that prior to the point change they were using option 1 which was favoring Gromacs that were using all CPU resources (SSE, etc.) as opposed to Tinkers that weren't. So Gromacs gave you more PPW.

Now it seems they changed it to option 2 but because "CPU speed" is a very subjective thing it does not seem like it is very successful. For example, my 2200+ vastly outperforms my 3600+ if it gets a nice Tinker. Looks like a lottery.

IMHO they might define classes of machines like:
1) basic CPU machines
2) CPUs with SSE
3) CPUs with SSE2

And assign WUs that utilize specific instruction sets to machines that support them, something like DGromacs are only going to p4. Then, they can measure performance of a specific WU on a specific architecture and assign a point value. (They'd still need to define tying references between different architectures).

So Tinkers will only go to non-SSE machines and will be benchmarked on such machines and get points assigned.
Regular Gromacs will only go to Athlons and DGromacs will go to p4. Nothing will go to Macs becuase we need to pass them

Otherwise it will be exactly like what we had before, just that Tinkers will become a desirable WUs.

haysdb · Post by **haysdb** » Fri Apr 23, 2004 2:15 pm

ColdFlame,

If everyone would remove -advmethods and let the assignment servers determine who gets what, it would work a little more like this, although not completely in the way you suggests since, if I am understanding you correctly, your way would require multiple benchmark machines, one for each "class" of computer.

I more-or-less suggested something like this when I asked why DGromacs were benchmarked on a P4 with SSE2 disabled, IF these WU's were only being assigned to processors WITH SSE2. The answer I got is these WU's are NOT being assigned ONLY to processors with SSE2, but it begs the question: "Why not?" Why not benchmark these projects on an SSE2 cable cpu, and then assign them ONLY to SSE2 cable cpu's?

I think the problem is, the assignment server logic is already devilishly complicated, from what I read, and it would only get worse if a whole bunch of new rules were added, limiting the AS to what kinds of clients it could assign each kind of work to.

David

bcassell · Post by **bcassell** » Fri Apr 23, 2004 2:36 pm

Another thing to think about is that there just might not be enough SSE2 capable machines to handle all the double gromacs work units. I'm not sure what the schedules are for any given protein, but maybe they would like to be able to just hand out a bunch of double gromacs to everyone for a few days to get them all done sooner. In that case, there will be a huge discrepency between machines with SSE2 and those without. So what do they do? Hand out differing points for the same protein based on the capabilities of the machine that computed it? Most people wouldn't consider that "fair". So, it would seem that rather than drastically lower the point values of these proteins for everyone who does not have SSE2, they decided to drastically increase the point values for those of us who are lucky enough to have SSE2.

With that said, it does seem that the assigning of WU's leaves something to be desired. I understand if they need to assign double gromacs to machine without SSE2 in order to get them done, but why then does my p4 not have ONLY double gromacs? I understand assignment is a very compilcated process, and what proteins get handed out are based on what proteins have been recently submitted, but it seems to me that a refactoring of the assignment process could lead to a much more effecient processing of WU's all around. I mean, the point of FAH (from stanford's perspective) is not to gain the most points, but to get the most work done. And in that respect, gromacs should always go to machines with SSE, and double gromacs should always go to machines with SSE2. This is unless, of course, there aren't enough of those machines to process the work at hand. Even though I'm sure it's a large undertaking, it seems to me that a refactoring of the way work units are assigned could bring large payoffs to stanford in terms of the total amount of work that gets done.

Ok, I realize I kind of went off on a tangent there, oh well, sorry =P

Bryan

ColdFlame · Post by **ColdFlame** » Fri Apr 23, 2004 3:13 pm

You are both right that there is a "real life" complication that there isn't always enough WUs of a given type and not enough PCs of a given type to do a "proper" mapping. Hence we are stuck with Tinkers running on p4 with SSE2 producing 3x times less "science" per time interval.

mas92264 · Post by **mas92264** » Fri Apr 23, 2004 4:56 pm

I don't have any dgromacs on my amd boxes and some of my sse2 boxes are working on single gromacs. I've got more singles on my sse2 boxes than doubles right now.

Rarely do I get a Tinker on my Intel folders. Likely it's all down to timing, when your box asks for a new wu, there may be nothing available on that server but the "wrong wu" for the requesting processor.

Or, it could be part of a vast left-wing conspiracy, funded by the tri-laterals.

M