Leapfrog: K-slice Streaming

K-slice Streaming

6-Men Slices and 5-Men sub-slices

If we keep only one (white) piece of a 7-men TB static, a 6-men slice results. Such a slice contains 64^6 = 64G positions, as the static piece in general breaks all symmetry. Even at 1 bit/position this takes 8GB, and several such bitmaps would be needed. This exceeds our target for memory usage. If one of the dynamic white pieces is a King, however, the slice can be treated in a streaming way, as King moves have a high degree of locality. If we treat the slice (white) King square by King square, raster-scanning the board, the sub-slice of the slice where the King is somewhere around b2 are no longer needed by the time we have the King moving around g7. In fact, they have not been needed for a long time. We can thus start writing back the chunks which with we started long before we start loading the final chunks of the slice. This way the slice never needs to be in memory at once, it just streams through it.


Fig. 2 - One set of very local King moves

Fig. 3 - The complementary set of King moves

Enhancing locality

If we want to cash in maximally on this idea, we can make use of the fact that the white King is a dynamic piece in both slicings we apply. If the three white pieces used to indicate the chunks are called WK (King), W1 and W2, we alternate processing the TB in W1- and W2-slices. During both these steps we have the opportunity to process King moves,and we can use this freedom to minimize the time that WK-sub-slices of a W1- or W2-slice have to be retained in memory. Fig. 2 and 3 show how the total set of King moves can be split into two groups, such that the squares can be ordered in a sequence connecting only squares that are neighbors, or next-to-neighbors. This means that only the chunk corresponding to 3 King locations have to be kept in memory simueltneously (for all posible locations of the other dynamic pieces, of course.) This is ony 5% of a slice, reducing the memory requirement to 3G positions. Even with a 2GB memory limit, this would allow us to use 5 bits per position, which is more than enough.

The required order is b1-a1-a2-b2-b3-a3-a4-b4- ... -b8-d1-c1-c2-d2-d3- ... for one pass, and a1-a2- ... -a8-c1-b1-b2-c2-c3-b3- ... at the other pass. In both cases most squares are needed in pairs neighboring along a rank. This is why we store the won info for the chunks with the same W1 and W2 location, but different WK locations, contiguous, in the order WK = a1, b1, c1, d1, ... We can then read the won bitmaps two chunks at a time, with only a single seek. doing this, we are forced to read the 200KB spacer between the two as well, which is why we don't want it to be any bigger than 200KB.