Rybka 2.0 Beta (multi-processor only)
June 10, 2006
Rybka 2.0 Beta is the first release from the
Rybka 2 series. Customers who purchase this Rybka version will receive free
upgrades until November 1, 2006.
The full Rybka 2 schedule is as follows:
June 10: Rybka
2.0 Beta (multi-processor only)
July 1: Rybka
2.1 (multi-processor and single-processor)
Sep 1: Rybka
2.2 (multi-processor and single-processor)
Nov 1: Rybka
2.3 (multi-processor and single-processor)
Owners of single-processor machines should wait
until July 1 to make their purchase.
FAQ
I am using Rybka 1.2. Should I
upgrade to Rybka 2.0 Beta?
Only if you have a system with multiple
processors and are comfortable using our software in the early stages of its
life cycle. Otherwise, please wait until the Rybka 2.1 release on July 1.
How much does Rybka 2 cost?
The multi-processor version costs 59 Euro. The
single-processor version will cost 34 Euro.
Why is this version labeled as a
Beta?
This is the first public release of the
multi-processor version of Rybka. It’s inevitable that some issues will pop up.
If you prefer to work only with mature, 100% reliable software, we suggest that
you wait until July 1 for Rybka 2.1.
Aside from multi-processor support,
what else is improved since Rybka 1.2?
Very little. Some smaller bugs were fixed, and
a few minor improvements were made. The playing strength increase on
one-processor machines is probably in the range of 5 to 10 rating points.
I have only a single-processor system.
What should I do?
You should wait until July 1, when
Rybka 2.1 is released.
What can I expect from Rybka 2.1?
I plan to continue the work on the
multi-processor implementation, and to also finally return to work on the
evaluation and search.
I am a tester. Should I include
Rybka 2.0 Beta on my rating list?
I will be honored if you do so, but please be
warned that the multi-processor implementation is still undergoing changes. It
is very likely that there will be a few additional releases in June which
impact the multi-processing performance. If you prefer to test only major
versions, it may make sense to wait until July 1 – by this time, the dust
should be settled.
I have an AMD dual-core system. Will
this benefit from the multi-processor support?
Yes. From a software point of view,
multi-core systems behave (roughly) the same as multi-processor systems, and
benefit (roughly) the same amount from the additional cores. I use the terms
“core” and “processor” interchangeably throughout this document.
How much stronger is Rybka on
multi-processor systems?
As a first approximation, every
time you double the number of processors, the engine level increases by
approximately 50 rating points. For an elaboration, see the section below.
When exactly can I buy this version?
The version itself will be released on
midnight, June 10. The various ordering options are still being set up (on this
day) and some of them may not be functional until a day or two after June 10.
If you were a Rybka 1 customer, you can send me an email with your intention to
pay and I will send you the version. Non-Rybka 1 customers may need to be a bit
patient.
How will I receive my copy of Rybka
2.0 Beta, as well as the free upgrades to Rybka 2.1, Rybka 2.2 and Rybka 2.3?
You will receive download links via email.
Please note that we will not be using a yahoo group this time, as we had with
Rybka 1.
Is there any copy protection?
Not for the Rybka 2.0 Beta version.
Personally, I think it’s a waste of everybody’s time, but I reserve the right
to change this policy and add copy protection to later releases from the Rybka
2 series. (The reasons for this will be clear later.)
Multi-processing Implementation
If you are a simple user, then take the Rybka
2.0 Beta executable, install it as you would install any other UCI engine, and
use it as you would use any other UCI engine. The utilization of the additional
processors is seamless.
In my experience, owners of multi-core systems
can be quite serious about hardware and performance. If this describes you, and
you want to understand a bit better what happens under the hood and what the
performance implications are, then read the following FAQ.
I’ll start with a few softballs and go from
there ..
Is multi-processing easy or hard?
Getting a multi-processor version of a chess
engine running with a decent speedup is not that hard. Getting past various
impediments to performance is potentially an endless job.
Is multi-processing performance
relevant?
There are two points to make here.
One is that by the end of 2007, experts project
that as many as 70% of new computer systems will have more than one core. This
is the obvious point.
The less obvious point is that flaws in a
parallel implementation magnify exponentially as the number of cores in the
system increases. When running on a two-core system, even a pretty weak
implementation is not that far from the optimal. As the number of cores
increases, a good parallel implementation diverges further from a bad one. More
on this below.
What speedup do you get from your
multi-processor implementation?
Ok, this is where the fun begins.
The implementation which is released as the default in Rybka 2.0 Beta gives the
following speedups:
2 Processors: 1.7
4 Processors: 2.8
8 Processors: 4.4
Here, speedup is defined as the effective
increase in speed. In other words, if you play a match between Rybka on two processors
and Rybka on one processor, and give the one-processor version 1.7 times more
time (with ponder off), the match will be equal.
Since the released implementation
is fairly safe, there is not really much doubt about the above values. They are
certainly not inaccurate by more than 5% or so. In the case of more ambitious
implementations, where the shape of the search tree of the multi-processor
version starts to differ more significantly from the shape of the search tree
of the single-processor version, giving true effective speedups is trickier. In
such cases, it is probably necessary to measure actual playing strength to be
really sure.
How much better would a perfect
multi-processing implementation perform compared to the above values?
Here, “perfect” is defined as an
implementation which gives a speedup of 2.0 on two cores, 4.0 on four cores,
etc. It is (probably) not quite possible to achieve this. The question is
relevant for us because this represents an upper bound on potential multi-processing
performance.
The following table shows how
actual Rybka performance diverges from this theoretical upper bound as the
number of cores increase.
# of cores |
Actual speedup |
Speedup upper bound |
Upper bound / Actual |
Elo difference between upper
bound and actual |
2 |
1.7 |
2.0 |
1.17 |
~11.9 |
4 |
2.8 |
4.0 |
1.42 |
~29.4 |
8 |
4.4 |
8.0 |
1.81 |
~56.7 |
The values from the last column were
computed using the formula of ((upper bound speedup) / (Actual speedup) – 1.0)
* 70. In other words, they are based on the assumption that a doubling in speed
gives 70 rating points.
The point is that for the two-core
systems which are common today, the multi-processing implementation in Rybka
2.0 Beta is already close enough to perfect that more work would not be
productive. For bigger systems, however, as we are likely to see in the future,
further improvement will pay off.
So, are you planning to improve the
implementation?
Yes. In fact, the work is already
continuing - the released implementation is simply the safest one I have at the
moment. If you have a system with 4 or more cores, you can still expect
improved performance.
Why are the nodes per second rates
of the multi-processor version so low?
The nodes searched by the child processes are
not counted. Anyway, node counts do not reflect true multi-processing
efficiency and are not relevant to measuring multi-processor performance.
Why are you using processes rather
than threads?
I am still experimenting with this, but for
now, it is safer, especially when using NUMA (non-uniform memory architecture)
systems. Processes do their own memory allocation, and we are guaranteed that
the system allocates this memory in the correct (local) segment.
One nice consequence of this is that you can
observe Rybka using various system measurements tools such as the task manager.
When I allocate a hash size of 512
MB, the task manager shows that each of the Rybka processes is using 512 MB. Is
this correct?
No – this is a bug (or odd feature) in the task
manager. There is only one hash table, which is shared between the Rybka
processes.
In other words, you can allocate roughly half
of your RAM for the hash table without overflowing your available RAM, despite
the task manager’s indications to the contrary.
What is the purpose of the “Max
CPUs” engine option?
This puts an upper bound on the number of cores
used by Rybka. If it is left at the default value, Rybka will start exactly one
process for each core in your system.
Users can use this option to keep CPU time
available for other, non-Rybka processes.
Does the “Max CPUs” value need to be
set to a power of 2?
No. Any integral value of at least
1 is fine.
Is there any limit to how many
processors Rybka 2.0 Beta can use?
In theory, no. We have tested on up
to 8 processors. If you have a bigger machine, I’d be very interested to run my
benchmark.
Are there any special interactions
between the 32 bit vs 64 bit compiles and the multi-processor support?
No. The 64 bit version continues to
run about 60% faster than the 32 bit version, regardless of how many processes
are being used.