The tablebase is subdivided in 4-men chunks. These are the smallest units of disk traffic. Each chunk has a won-bitmap section, and a lost(N) info section (one for each winning distance N), and these can be read separately and independently (even for only a single N). The chunks will contain all constellations of black pieces for a fixed constellation of white pieces. (From this it follows that we won't be able to handle more than 4 black pieces.)
Logically, the tablebase is a multi-dimensional array of chunks. A 6-men TB is a 64x64 array, indexed by two white pieces. A 7-men TB is a 64x64x64 array, indexed by three white pieces, amongst which the white King. To profit from symmetry, in practice some of the chunks will be missing, and in the process described below, the symmetry-equivalent chunk will be substituted for it.
From the viewpoint of disk traffic, the EGT will be generated through the leapfrog algorithm. Alternately the moves of one white piece (W1) and the other (W2) are treated, by running through the EGT W1-slice by W1-slice when moves of W2 are treated, and vice versa. For a 6-men EGT, the entire Wx-slice (x=1,2) fits in memory. (A 5-men slice contains 1G positions, so we can easily afford even 4 bits per position.) As leapfrog does two (partial) cycles at the time, we need to gather both the lost(N) and lost(N+1) info for the slice (from 64 chunks). The lost(N+1) will be the partially generated list, only including moves of the other white piece, that was the result from the previous leapfrog cycle.
We will also need the (cumulative) won info, which is updated upto the partial lost(N+1) cycle. After all this info is gathered from disk, the lost(N+1) info will be completed by adding the positions reached from lost(N) through (retro-)moves with the currently dynamic white piece (plus black moves). Once the lost(N+1) info is completed, the next cycle will commence with generating a partially complete lost(N+2) list, including only move of the currently dynamic piece. These stay within the current memory slice, and can thus be done before lost(N+1) of all slices is completed, without perturbing the lost(N) -> lost(N+1) cycle of these other slices.
In the process of generating lost(k) info, the won info will be updated as well. So after this, we write this won info, the partially complete lost(N+2) list and the now completed lost(N+1) list back to disk, and have advanced the state by one full cycle. (Note that the lost(N) info was aleady complete when we read it, and can be discarded from memory.) We are then ready to read this information back in another order, (another slicing), to treat the moves of the other white piece.
Once a slice is in memory, each sub-cycle is treated in two passes. Each pass is designed so that the moves treated by it map into the cached part of the slice. In the first pass we treat the white moves. This generates a list of newly-won positions from the lost(N) positions, by doing white retrograde moves, and screening them against the won info to see if they were new (and updating the won info accordingly). In the second pass we then treat the black retrograde moves starting from these newly-won positions (by reading the latter in another order, so that black moves will produce cache hits), and for every potentially lost position this produces, we do an immediate verification with black normal moves, probing the won data. This will produce the lost(N+1) positions. These two passes will then be repeated for the next sub-cycle on the memory-loaded data, to produce (partially complete) lost(N+2) from lost(N+1).
For a 7-men EGT, the Wx-slices are too large to fit in memory. To avoid having to use three different slicings that each have only one white piece dynamic and two pieces fixed, we use 'K-slice streaming': The moves of the first three white pieces are divided into two groups, those of white piece nr 1 (W1), plus half the white King (WK) moves, and the moves of white piece nr 2 (W2) plus the other half of the WK moves. Each cycle treats the Wx slices (x = 1,2) one after another. Each Wx-slice is streamed through memory, WK-sub-slice by WK-sub-slice, and only a small number (5-7) of (Wx,WK)-slices (enough to capture the currently treated white King moves) are in memory at any time.
The K-slice streaming allows the process to proceed as if the W1 and W2 slices did fit into memory. There is a skew of a few King locations between the streaming of the first and the second sub-cycle of a leaprog cycle, as the second sub-cycle (producing that partially complete lost(N+2) can only start on the K-sub-slices that were completed (and ready to be streamed out) handed to it by the first sub-cycle. This effectively makes the memory buffer act as a pipeline, some of the K-sub-slices in it being involved in the first sub-cycle, (lost(N) -> lost(N+1)), others in the second (lost(N+1) -> lost(N+2)), while they can also be involved in a different memory pass (white or black) of each sub-cycle.