Rybka 2

Rybka 2.0 Beta (multi-processor only)

June 10, 2006

Rybka 2.0 Beta is the first release from the Rybka 2 series. Customers who purchase this Rybka version will receive free upgrades until November 1, 2006.

The full Rybka 2 schedule is as follows:

June 10: Rybka 2.0 Beta (multi-processor only)

July 1: Rybka 2.1 (multi-processor and single-processor)

Sep 1: Rybka 2.2 (multi-processor and single-processor)

Nov 1: Rybka 2.3 (multi-processor and single-processor)

Owners of single-processor machines should wait until July 1 to make their purchase.

FAQ

I am using Rybka 1.2. Should I upgrade to Rybka 2.0 Beta?

Only if you have a system with multiple processors and are comfortable using our software in the early stages of its life cycle. Otherwise, please wait until the Rybka 2.1 release on July 1.

How much does Rybka 2 cost?

The multi-processor version costs 59 Euro. The single-processor version will cost 34 Euro.

Why is this version labeled as a Beta?

This is the first public release of the multi-processor version of Rybka. It’s inevitable that some issues will pop up. If you prefer to work only with mature, 100% reliable software, we suggest that you wait until July 1 for Rybka 2.1.

Aside from multi-processor support, what else is improved since Rybka 1.2?

Very little. Some smaller bugs were fixed, and a few minor improvements were made. The playing strength increase on one-processor machines is probably in the range of 5 to 10 rating points.

I have only a single-processor system. What should I do?

You should wait until July 1, when Rybka 2.1 is released.

What can I expect from Rybka 2.1?

I plan to continue the work on the multi-processor implementation, and to also finally return to work on the evaluation and search.

I am a tester. Should I include Rybka 2.0 Beta on my rating list?

I will be honored if you do so, but please be warned that the multi-processor implementation is still undergoing changes. It is very likely that there will be a few additional releases in June which impact the multi-processing performance. If you prefer to test only major versions, it may make sense to wait until July 1 – by this time, the dust should be settled.

I have an AMD dual-core system. Will this benefit from the multi-processor support?

Yes. From a software point of view, multi-core systems behave (roughly) the same as multi-processor systems, and benefit (roughly) the same amount from the additional cores. I use the terms “core” and “processor” interchangeably throughout this document.

How much stronger is Rybka on multi-processor systems?

As a first approximation, every time you double the number of processors, the engine level increases by approximately 50 rating points. For an elaboration, see the section below.

When exactly can I buy this version?

The version itself will be released on midnight, June 10. The various ordering options are still being set up (on this day) and some of them may not be functional until a day or two after June 10. If you were a Rybka 1 customer, you can send me an email with your intention to pay and I will send you the version. Non-Rybka 1 customers may need to be a bit patient.

How will I receive my copy of Rybka 2.0 Beta, as well as the free upgrades to Rybka 2.1, Rybka 2.2 and Rybka 2.3?

You will receive download links via email. Please note that we will not be using a yahoo group this time, as we had with Rybka 1.

Is there any copy protection?

Not for the Rybka 2.0 Beta version. Personally, I think it’s a waste of everybody’s time, but I reserve the right to change this policy and add copy protection to later releases from the Rybka 2 series. (The reasons for this will be clear later.)

Multi-processing Implementation

If you are a simple user, then take the Rybka 2.0 Beta executable, install it as you would install any other UCI engine, and use it as you would use any other UCI engine. The utilization of the additional processors is seamless.

In my experience, owners of multi-core systems can be quite serious about hardware and performance. If this describes you, and you want to understand a bit better what happens under the hood and what the performance implications are, then read the following FAQ.

I’ll start with a few softballs and go from there ..

Is multi-processing easy or hard?

Getting a multi-processor version of a chess engine running with a decent speedup is not that hard. Getting past various impediments to performance is potentially an endless job.

Is multi-processing performance relevant?

There are two points to make here.

One is that by the end of 2007, experts project that as many as 70% of new computer systems will have more than one core. This is the obvious point.

The less obvious point is that flaws in a parallel implementation magnify exponentially as the number of cores in the system increases. When running on a two-core system, even a pretty weak implementation is not that far from the optimal. As the number of cores increases, a good parallel implementation diverges further from a bad one. More on this below.

What speedup do you get from your multi-processor implementation?

Ok, this is where the fun begins. The implementation which is released as the default in Rybka 2.0 Beta gives the following speedups:

2 Processors: 1.7

4 Processors: 2.8

8 Processors: 4.4

Here, speedup is defined as the effective increase in speed. In other words, if you play a match between Rybka on two processors and Rybka on one processor, and give the one-processor version 1.7 times more time (with ponder off), the match will be equal.

Since the released implementation is fairly safe, there is not really much doubt about the above values. They are certainly not inaccurate by more than 5% or so. In the case of more ambitious implementations, where the shape of the search tree of the multi-processor version starts to differ more significantly from the shape of the search tree of the single-processor version, giving true effective speedups is trickier. In such cases, it is probably necessary to measure actual playing strength to be really sure.

How much better would a perfect multi-processing implementation perform compared to the above values?

Here, “perfect” is defined as an implementation which gives a speedup of 2.0 on two cores, 4.0 on four cores, etc. It is (probably) not quite possible to achieve this. The question is relevant for us because this represents an upper bound on potential multi-processing performance.

The following table shows how actual Rybka performance diverges from this theoretical upper bound as the number of cores increase.

# of cores	Actual speedup	Speedup upper bound	Upper bound / Actual	Elo difference between upper bound and actual
2	1.7	2.0	1.17	~11.9
4	2.8	4.0	1.42	~29.4
8	4.4	8.0	1.81	~56.7

The values from the last column were computed using the formula of ((upper bound speedup) / (Actual speedup) – 1.0) * 70. In other words, they are based on the assumption that a doubling in speed gives 70 rating points.

The point is that for the two-core systems which are common today, the multi-processing implementation in Rybka 2.0 Beta is already close enough to perfect that more work would not be productive. For bigger systems, however, as we are likely to see in the future, further improvement will pay off.

So, are you planning to improve the implementation?

Yes. In fact, the work is already continuing - the released implementation is simply the safest one I have at the moment. If you have a system with 4 or more cores, you can still expect improved performance.

Why are the nodes per second rates of the multi-processor version so low?

The nodes searched by the child processes are not counted. Anyway, node counts do not reflect true multi-processing efficiency and are not relevant to measuring multi-processor performance.

Why are you using processes rather than threads?

I am still experimenting with this, but for now, it is safer, especially when using NUMA (non-uniform memory architecture) systems. Processes do their own memory allocation, and we are guaranteed that the system allocates this memory in the correct (local) segment.

One nice consequence of this is that you can observe Rybka using various system measurements tools such as the task manager.

When I allocate a hash size of 512 MB, the task manager shows that each of the Rybka processes is using 512 MB. Is this correct?

No – this is a bug (or odd feature) in the task manager. There is only one hash table, which is shared between the Rybka processes.

In other words, you can allocate roughly half of your RAM for the hash table without overflowing your available RAM, despite the task manager’s indications to the contrary.

What is the purpose of the “Max CPUs” engine option?

This puts an upper bound on the number of cores used by Rybka. If it is left at the default value, Rybka will start exactly one process for each core in your system.

Users can use this option to keep CPU time available for other, non-Rybka processes.

Does the “Max CPUs” value need to be set to a power of 2?

No. Any integral value of at least 1 is fine.

Is there any limit to how many processors Rybka 2.0 Beta can use?

In theory, no. We have tested on up to 8 processors. If you have a bigger machine, I’d be very interested to run my benchmark.

Are there any special interactions between the 32 bit vs 64 bit compiles and the multi-processor support?

No. The 64 bit version continues to run about 60% faster than the 32 bit version, regardless of how many processes are being used.