Notes on Benchmarks

What are “benchmarks”?

Benchmarks are small programs aimed at measuring the speed of a computer system and/or some of its parts. Usually, the benchmark lets the measured part execute some task (the processor executes a certain number of instructions, the disk copies a certain amount of data, etc.) and takes note of the elapsed time, converting it into an “absolute” value. Thus, if your hard disk copied, say, 10 Mb of data in 20 secs., the benchmark will credit it of a “transfer speed” of 500 Kb/sec.

Should we trust benchmarks?

If we want benchmarks to have some value, we must compare systems using exactly the same conditions, that is the same settings. Performance may change dramatically if we modify some configuration switch, or if we simply change video driver. This is especially true when we use some “graphic interface” operating system, such as Windows. In such case, we may notice that, for instance, video speed depends more on the quality of the software driver than on the “real” (i.e. – hardware) speed of the video card.
But, even if we try to use “the very same conditions” to compare systems, benchmarks still must be viewed with a little of suspicion. For instance, many low-level CPU benchmarks (such as those I used) make use of a very small code, which is entirely contained in the processor's internal cache of a 486. This will result in a very good performance for a 486 in comparison to a 386, which has no internal cache. But they will also overestimate the speed of, say, a 486DX2-50 MHz over that of a 486-25 MHz, giving a value which is about two times higher. It is known, instead, that a 486DX2-50, having an external speed of 25 MHz only, is, in real practice, about 60-80% faster than its “smaller” half-speed brother.

Most benchmarks I used are 16-bit. This was necessary as I wanted to compare older 16-bit processors (8086, 8088, 80286) with newer ones. If we would recompile all of our programs from 16-bit to 32-bit, we would get better performances for 32-bit CPUs (386, 486, etc.). It is known that, when running 16-bit code, a 386 CPU offers little advantages over a 286 at the same clock (if we compare the instruction timings of 286 and 386 CPUs, we would notice that most of them run at the same number of cycles).
The 486, instead, was the first x86 chip to hard-wire the most common operations, so it could run most instructions (including 16-bit ones) in one clock cycle. It also features an onboard cache memory of 8 Kb (16 Kb in later DX4 versions).

Most of these benchmarks, then, seem definitely unreliable when 686 class CPUs (Pentium Pro, Pentium II) are involved. For instance, CheckIt credits a Pentium-Pro-200 to be just as fast as a Pentium-100, and equals a Pentium-II-266 to a Pentium-166.

Disk benchmarks are even more objectionable: just browse my small database to see how certain systems have faster disks according to one benchmark, which become much slower if we turn to another. A certain hard disk may be very fast when copying large files, but may result much slower when dealing with very small files, or with empty directories. Note also that most DOS benchmarks underestimate the performance of SCSI devices, as DOS is a single-task o.s., while SCSI is designed for working in a multi-tasking environment.

Video benchmarks are in some cases completely unreliable: just have a look at my WinTune results and you will be able to find absurd values or gross inconsistencies.

So, do we need these benchmarks?

Probably not. My benchmarks are definitely non-professional, or non-scientific if you prefer, and you should not use them to extract "absolute" classifications between systems. I tested all machines at my disposal with five different DOS-based benchmarking programs so to give a wider degree of accuracy.
I tried to use exactly the same conditions (a plain MS-DOS 6.22 bootable floppy with no CONFIG.SYS nor AUTOEXEC.BAT, so not to have any program loaded at startup). However, many limitations still result.

Choice of benchmarks is rather fortuitous, as I used either freeware programs, or commercial programs of which I have a regular using license. This may help somebody wanting to compare his system.

Cache Check v. 4 (CACHECHK.EXE dated 8/2/96)

Freeware (postcard-ware) by Ray Van Tassle.
Testing areas: memory.
Cachechk performs memory access timing tests, which allow to understand how large your memory cache is, how many caches you have, and to check their access speed. It also verifies main memory speed and calculates effective RAM access time. It runs on 386+ systems only.

CheckIt! v. 2.01 (CHECKIT.EXE dated 17/11/89)

PC Diagnostic Tool by Touchstone Software.
Testing areas: CPU, FPU, video, hard disk.
This version is rather old, but benchmarks run on almost any PC, including Pentiums (1, 2, 3, 4, etc. etc.). It also runs on old 8088s, but it won't run on an AMD Athlon. It needs DOS 2.0 or later (Win 9x must be in "pure DOS" mode). I would not recommend to run any diagnostic routine on newer systems (though it usually hangs, without any damage to the system).
CPU/FPU benchmarks conform to Dhrystone (version 1) and Whetstone.
Dhrystone is a very well known benchmark, written by Reinhold Weicker of Siemens Nixdorf in 1984. It should be easy to find other results, either in term of "pure" dhrystones or of MIPS (million instructions per second, one MIP equals 1757 dhrystones).
Whetstone was originally published in 1976 by Curnow and Wichman, and intended to be representative for numerical (floating-point intensive) programming. Whetstone performance however depends on the speed of the coprocessor as well as the CPU.
CheckIt's implementation of these benchmarks are 16-bit (otherwise they wouldn't run on 8088s and 286s) and this results in a limitation for 32-bit processors (386+).
CheckIt's disk benchmarks, instead, are quite reliable. They are low-level, and thus they would run on almost any disk drive regardless of the partition type you use (i.e. on HPFS, NTFS, ext2, or even on unpartitioned drives). Disk diagnostic is also very trustful, and it runs on almost every EIDE disk, except newer Ultra-ATAs.

Landmark v. 2.0 (SPEED.EXE dated 30/5/90)

System speed test by Landmark Research Intl. Corp. Download.
Testing areas: CPU, FPU, video.
Very well known benchmark, was widely used in the early 90s. It runs on almost every system, including 8088s. Landmark compares all processors to 286s: thus it would tell you that a 386DX-25 runs as fast as a 39MHz 286. CPU test is very small, thus it overestimates performances of 486s. It also overestimates the speed of faster 486DX2s and DX4s. Due to its small code, this benchmarks does not test the memory system outside the cache, that is, it is not influenced by the performance of motherboard, main memory, etc. FPU benchmark is more reliable. There is no disk benchmark. Video benchmark is of little practical value.

Snooper v. 3.44 (SNOOPER.EXE dated 19/7/96)

System information utility by Vias and Associates. Download.
Testing areas: CPU, video, hard disk.
Runs on almost any machine with a 80x86 CPU (from 8088 to Pentium 4, to Athlon) and 256 Kb of RAM, MS-DOS 3.1 or later.
CPU Benchmark uses a small 16-bit program, which overestimates 286s (as compared to both 8088s and 386s). Also 486DX4s and Pentiums seem excessively fast in comparison to other 486s. Apart from this, CPU test seems quite trustful. Processor throughput is compared to that of a 286 (cf. also Landmark/Speed). Video benchmark seems quite reliable, but, as in every DOS-based benchmark, it is of little value.

Norton Sysinfo ver. 7.0 (SYSINFO.EXE dated 8/6/93)

Part of the Norton Utilities by Symantec corp. (but you can find it in other packages as well).
Testing areas: CPU, hard disk.
It needs a 286+ CPU and MS-DOS 3.1 or later. This version won't run on modern AMDs (Athlons, etc.).
CPU test seems rather unreliable, as it underestimates the speed of 286s and of faster 486DX2 and - especially- DX4 CPUs (sometimes, after you upgraded a 486 with a DX2 or a DX4, it would even tell you that it had become slower). Disk benchmark, which only runs on FAT12/16 partitions, is also rather objectionable.

Where can I read more about benchmarks?

Keep in touch with the comp.alt.benchmark newsgroup.

Read the comp.benchmarks Frequently Asked Questions.

A very large database can be found at PDS (Performance Database Server).

I will add some links here, as soon as I have a bit of time...