COMPARISON BETWEEN SORTING PROGRAMS These tests have been executed in july 1992 SORTING PROGRAMS The following programs have been tested and compared: A SORT a Unix compatible sort of the Free Software Foundation GNUSORT version 0.3, ported to MS-DOS. We will call it GNUSORT, reserving the name SORT for the program supplied with MS-DOS. B BIGSORT a versatile sort from an ASP associate version 1.02 from Meadow Brook Industries C SSORT the new sort from SP&AR&C The Netherlands SuperSort version 0.1 In general SSORT appears to be the most robust one. BIGSORT as well as GNUSORT can give wrong results without warning, if they run out of disk space. This may happen if the intermediate results are redirected to a RAM disk for speed up. SSORT has no such problems. Also SSORT appears to be the most capable one. SSORT was the only of the three that could handle some real life problems: A. Sorting a help file with multiple line records B. Sorting a free text database on zip codes C. Sorting a dbase delimited file on zip-code, street name and lowest house number. The other programs come not near getting a grip of problems A and B. Problem C is a more classical problem. Here GNUSORT fails because it cannot be made to ignore a delimiter within a quoted string. (See example 10 in REGEXPR.TXT). BIGSORT fails because it cannot handle the street numbers as numbers in a delimited context. Furthermore none of the other programs can sort a 100K file with one character per line. SSORT can do this, even on a 128K byte machine, with a single floppy disk, no hard disk. SYSTEMS Tests have been run on two systems : A a Pentium 90, with WD Caviar 1 Gbyte hard disk, 90% full, complete conventional memory B An Olivetti XT, 8086 10 MHz, one floppy disk, 128 K conventional memory FILES Different tests have been run with the following files A a delimited file (originating from dbase) with data about the U.S. states (the SAMPLE.DEL file coming with BIGSORT), about 2 Kbyte. B a fixed record file with the same information as A, about 2Kbyte. C a 4M8 file with addresses plus data, line length 133 D a 1M5 file with addresses plus data in a more compact form, line length 41 similar to C E a 170K file with essentially random data F a 4M8 file with text data G a 100K file with 1 char lines. H a 37K file with 1 char lines. I file D four times, 6M with lines of 41 chars J file D eight times, 12M with lines of 41 chars TESTS Time are in format [hh:]mm:ss. No fraction of seconds measured. These are real life stopwatch times, and subject to deviations of the order of a second. A '+' means this test is about instantaneous (<1 second), a '-' means this test cannot be specified towards the sort program. If it accepts the command and then cannot handle it, a note is filled in. The number of records and characters are tabulated also, in order to aid in getting the overview. GNUSORT BIGSORT SSORT #char #rec Description of test + + + 1K8 50 System A file A, straightforward sort of whole lines. + + + 1K8 50 System A file A, sorting on first (delimited) field + 9) + 1K8 50 System A file A, sorting on number of inhabitants numeric sorting on delimited field + + + 2K1 50 System A file B, straightforward sort of whole lines. - + + 2K1 50 System A file B, sorting on first field (of fixed size) - + + 2K1 50 System A file B, sorting on number of inhabitants (fixed offset) - 1) 1:15 170K 170K System A file E, fixed layout: one field of one byte - 2) 0:15 170K 37K5 System A file E, fixed layout: one field of 4 bytes - 0:12 0:14 170K 37K5 System A file E, fixed layout: field of 1 byte, field of 3 bytes sorting on field 0 - 2) 18:27 4M8 1M2 System A file F, fixed layout: one field of 4 bytes - 9:01 15:40 4M8 1M2 System A file F, fixed layout: field of 1 byte, field of 3 bytes sorting on field 0 1:43 1:46 0:56 4M8 37K System A file C, straightforward sort of whole lines. 0:31 1:37 0:50 4M8 37K System A file C, 7) straightforward sort of whole lines, scratch files on ram disk. 1:47 0:58 0:26 4M8 37K System A, already sorted version of file C, straightforward sort of whole lines. 1:52 1:14 0:55 4M8 37K System A, already sorted version of file C, reverse sorting, whole lines. 0:31 0:36 0:29 1M5 37K System A, file D, (0:46) (0:37) straightforward sort of whole lines. 6) 2:59 2:45 3:07 6M0 150K System A, file I, straightforward sort of whole lines. 0:53 8) 2:46 6M0 150K System A, file I, 7) straightforward sort of whole lines, scratch files on ram disk of 12M. 8) 7) 8) 10) 6M0 150K System A, file I, straightforward sort of whole lines, scratch files on ram disk of 9M. 6:15 6:37 6:37 12M 300K System A, file J, straightforward sort of whole lines. 5:57 4:24 2:11 12M 300K System A, already sorted version of file J, straightforward sort of whole lines. 5:35 5:25 6:45 12M 300K System A, already sorted version of file J, reverse sorting, whole lines. 3) 0:44 0:11 98K 32K System A file G, straightforward sort of whole lines. 3) 0:12 0:06 37K 12K System A file H, straightforward sort of whole lines. 4) 5) 0:12 2K1 50 System B file B, straightforward sort of whole lines. 4) 5) 1:12:46 98K 32K System B file G, straightforward sort of whole lines. 1) Hangs beeping 2) Runtime error 103 3) Comes with a neat advice to reduce buffer size using the -S option. However experimenting with this leads to A. memory exhausted ( -S 16000/12000/9000/7500/4000/2500) B. the same neat advice again ( -S 1000/100 ) 4) Error message : memory exhausted 5) Runtime error 203 6) Values between brackets denote the figures for not calling the programs with optimal options, i.e. without brackets the option for BIGSORT is "41 1(41)" en for SSORT "-B". The values between brackets is for no option whatsoever. 7) Considerable effort went into making GNUSORT run with a ram disk. Running the gnu sort, with the ram disk as the current directory fails with the message " Cannot open ./sort0bb.040 (w) Permission denied. " A similar problem occurs if you try to use the -T option with a ram disk. The problems vanish if you create a subdirectory on the ram disk and make that current. 8) Gave a wrong result: a truncated file. No warning was given. Apparently the ram disk was not large enough. 9) The documentation that goes with the share ware distribution says : "By not specifying a length in the delimited field, BIGSORT knows to" "treat the field as a number. " However this simply does not work. The command was copied verbatim from "BIGSORT.DOC". 10) Gave a warning "write error, probably disk full". The ram disk was indeed full. Following the instructions on crash recovery, the intermediate file on ram disk was successfully used to continue sorting.