The legacy of Konrad Zuse: Architecture Z1 and Z3 [Translation]

Original: Raúl Rojas - “Konrad Zuse's Legacy: The Architecture of the Z1 and Z3”, IEEE Annals of the History of Computing, Vol. 19, No. 2, 1997

This article presents a detailed description of the architecture of computers Z1 and Z3, developed by Konrad Zuse in Berlin from 1936 to 1941. The information was mainly obtained from a detailed study of the patent filed by Zuse in 1941. A deeper assessment was obtained from a software simulation of computer logic. Z1 was based solely on mechanical components, while Z3 used electromechanical relays, but both machines had a common logical structure and software model. Z1 and Z3 possessed such properties of modern computers as: separation of memory and processor, the ability to operate with floating-point numbers, calculate 4 basic arithmetic operations and a square root. Programs were stored on punched tape and read sequentially. In the next article I will look at the architecture of the Z1 and Z3 in a historical perspective, and make a comparison with computers built in other countries.

Introduction

Konrad Zuse is recognized in Germany as the father of a computer, and his Z1 , a programmable machine built from 1936 to 1938, is considered the first computer in the world. In other countries, this value is attached to other scientists, and there are long and sharp disputes over the question of who is the true inventor of the computer. Sometimes disputes turned into a deep and detailed description of the technological features of specific machines. For example, ENIAC (Electronic Numerical Integrator and Computer) is considered to be the “First large-scale, fully electronic computing machine in the world.” ENIAC was built at the Moore School of Electrical Engineering at the University of Pennsylvania (Moore School of Electrical Engineering of the University of Pennsylvania) from 1943 to 1945. He solved the first task in December 1945 and was introduced in February 1946. Mark I , built by Howard Aiken at Harvard University from 1939 to 1944, is considered another candidate for the title of the first computer. Mark I was an electromechanical machine, i.e. built not on purely mechanical elements, as in earlier computing devices, and not on electronics already available at that time. John Atanasov’s car (later called ABC ) from Iowa State College , under construction from 1939 to 1944, used vacuum tubes, but the machine had limitations on vector addition and subtraction, and the current structure was not suitable for universal computing. Z1 is the exact opposite of these machines, more flexible, built to execute long and modifiable sequences of instructions read from punched tape. Machines Z3 and Z4 were not electronic and had smaller dimensions. The Z3 was built and successfully tested earlier than the Mark I, and therefore was named the first programmable computing machine in the world. Of course, long-standing discussions will not be completed on this article, but I want to show how advanced Zuse's machines were in terms of modern computer architecture and in comparison with the architecture of that time.

Zuse, a student at the Berlin Polytechnic, began working on a computer in the 1930s. He realized that he could build an automaton capable of performing a sequence of mathematical operations that are required for constructing mathematical tables. Coming from civil engineering, he had experience in the field of electronics, but was not familiar with the technologies used in conventional mechanical calculators. However, this lack of knowledge gave him an advantage, because He was able to rethink all the problems of machine arithmetic and find new original solutions.
')
Zuse decided to build his first experimental computing machine based on two main ideas:

The machine must operate in binary notation.
Computed and Control Data must be shared in memory.

A year earlier, John von Neumann explained the advantage of computer architecture with a processor sharing memory. Zuse came to the same conclusion. It should be noted that Charles Babbage came to the same conclusions when designing his Analytical engine in the last century. In 1936, Zuse completed work on the memory for his car, and called it Speicherwerk (it is Memory Storage) - the term Speicher was used in Germany instead of the anthropomorphic term Memory (eng. Memory), which was introduced by von Neumann, while Babbage used term Store (eng. - Storage). Memory was a mechanical device, but not the usual type. Instead of using gears (Babbage used them in the last century), Zuse implemented logical and arithmetic operations on sliding metal slats. Reiki could move in one of two directions (direct and reverse) and therefore were a binary mechanism. The Z1 processor was built 5 months after memory construction. When working with memory, he was extremely unreliable. The main problem was the accuracy of synchronization, which is required to prevent excessive mechanical stress on moving parts. It is interesting to note that in the same year, when Zuse finished working on memory, Alan Turing wrote his pioneering work On computable numbers (English - On computable numbers), in which he formalized intuitive calculations.

The Z1 machine, despite its unreliability, showed that Architectural design was expedient and prompted Zuse to explore other types of technology. Following the advice of his friend Helmut Schreyer, he considered electromechanical relays. Zuse built an “intermediate” model (Z2) on a mixed technology (the processor was built on a relay, but the memory remained mechanical). In 1938, Zuse began developing the Z3, a machine entirely built on a relay, but with a logical structure from Z1. It was ready for operation in 1941, 4 years before ENIAC.

This work involves a detailed review of the architecture of machines Z1 and Z3. Zuse reconstructed his Z1 car in Berlin in the 80s, and now it is on display at the Berlin Museum of Transport and Technology . However, the available information describes the mechanical memory device [12]. Zuse documented Z3 in patent application Z-391 1941, but its analysis is difficult due to non-standard notation and terminology [14]. The book [4] about Z3 is a good source for understanding the historical environment surrounding Zuse's research, but does not describe the Z3 in detail. Since Z1 and Z3 are equivalent from a logical and functional point of view, in the future, I will only refer to Z3. The main architectural difference is that Z1 does not support the square root operation. The remaining minimal differences are reduced to the difference in the number of bits in the arithmetic operators of the processor (Z1 uses a lower bit of the mantissa in floating-point numbers) and in the number of cycles required for each instruction. With minimal caveat, taking into account only the architecture, it is possible to consider Z1 and Z3 equivalent machines. There have been controversies whether the reconstructed Z1 corresponds to the original Z1 destroyed during the Second World War. Zuse reconstructed the Z1 car based solely on his memories, and it is possible that the reconstructed car has more in common with the Z3 than with the original Z1. Also, Zuse points out in his memoirs the basic similarities of Z1 and Z3 [15], and confirms this aspect of his work in a private interview.

Z1 and Z3 architecture overview

In this section, most of the architectural features are given for Z3. First, I will provide an overview of the architecture, then move on to the details.

Structural scheme

Z3 worked with floating-point numbers, unlike other early computing machines, such as Mark I, ABC or ENIAC, working with fixed-point numbers. Zuse developed what he later called a “semi-logarithmic record”, which corresponds to the modern representation of floating-point numbers.

In fig. Figure 1 shows the main composite blocks of the Z3. The first feature of the architecture is the separation of the processor and memory. Z3 includes a binary memory (capable of storing 64 floating point numbers), a binary processor for floating point numbers, a control device, and input / output devices. The memory and the arithmetic unit are connected by a data bus transferring the exponent and the mantissa of the floating point number. The control unit contains firmware for each instruction. Control signals go to the processor, memory, and I / O device to synchronize all blocks. The punched tape reader provides an operation code for each instruction, as well as an address for accessing the memory. An I / O device is connected to the processor by a data bus.

Floating point representation

In fig. 2 shows the representation used in Z3. The first digit is used to store the mark, the next 7 digits store the exponent, and the last 14 digits store the mantissa. The digits of the exponent are called Part A and are denoted as a ₆ , ..., a ₀ . The bits of the mantissa are called Part B and are denoted as b ₀ , b _-1 , ..., b _-14 . The exponent is encoded in an additional code and may contain values in the range from -64 to 64. The mantissa is stored in a normalized form, in which the first decimal place (b ₀ ) must always be set to one [5]. This bit is not stored in memory (therefore, it is not shown in Figure 2), so the effective range of mantissa values in memory is 15 bits. However, there is a problem with the number 0, since normalized mantissa can not express it. In Z3, there is an agreement according to which, any mantissa with the exponent -64 is considered to be 0. Any number with an exponent of 63 is considered to be infinitely large. Operations on 0 or infinitely large cause exceptions, and special hardware controllers set exceptions flags in the processor (discussed below).
Given this convention, the minimum number in memory Z3 is 2 ^-63 = 1.08x10 ^-19 , and the maximum 1.999x2 ⁶² = 9.2x10 ¹⁸ . Arguments for calculation can be entered as decimal numbers from the keyboard (four-digit). The exponent of the decimal representation is entered by pressing the corresponding button in the row of buttons marked -8, -7, ..., 7.8. The original Z3 allowed you to enter only numbers in the range from 1x10 ^-8 to 9,999x10 ⁸ . The reconstructed Z3 from the German Museum in Munich has additional buttons for large exhibitors. With this improvement, it became possible to enter the full numerical range of the machine from the keyboard. Let's say a little about the conclusion. Z3 did not print the results of the software procedures. A single number was displayed on the array of lamps representing numbers from 0 to 9. The maximum number that Z3 was capable of displaying was 19,999, and the minimum number was 00001. The maximum displayed exponent was +8, and the minimum was -8.

Instruction set

Z3 program is stored on punched tape. One instruction uses 8 bits from each row on the tape. The Z3 instruction set consists of 9 instructions presented in Table 1. They are divided into 3 groups: input / output, memory, and arithmetic operators. The operation code has a variable length from 2 to 4 bits. Memory operators encode the address of the word in the lower 6 bits, which are capable of addressing 64 words, as mentioned earlier.

Punched tape instructions can be placed in any order. The Lu and Ld instructions (Reading from the keyboard and Displaying on the display) stop the machine so that the operator has enough time to enter the number and record the result.

Z3 has no conditional jump instructions. The cycles are implemented by simply fastening the two ends of the punched tape, but this does not allow for the realization of conditional sequences of instructions. Consequently, Z3 is not a universal computer in Turing's description.

Table 1. Many instructions and their Z3 machine codes

Type of	Instruction	Description	KOP
In / in	Lu	Keyboard input	01 110000
In / in	Ld	Output result	01 111000
Memory	Pr z	Download from z	11 z ₆ z ₅ z ₄ z ₃ z ₂ z ₁
Memory	Ps z	Write to z	10 z ₆ z ₅ z ₄ z ₃ z ₂ z ₁
Arithmetic	Lm	Multiplication	01 001000
	Li	Division	01 010000
	Lw	Square root	01 011000
	Ls ₁	Addition	01 100,000
	Ls ₂	Subtraction	01 101000

Number of cycles

Z3 is a synchronous machine. Each cycle is divided into 5 stages, called I, II, III, IV, and V. At stage I, the instruction is decoded from a punched tape. The main arithmetic operations are the addition and subtraction of exponents and mantissas. Operations can be performed in the first three stages of each cycle. Stages IV and V are used to prepare arguments for the following operations or output the result.

Table 2. The number of cycles required to execute instructions.

Operation	Number of cycles
Multiplication	sixteen
Division	18
Square root	20
Addition	3
Subtraction	4 or 5, depending on the result
Keyboard input	from 9 to 41, depends on the exponent
Output	from 9 to 41, depends on the exponent
Load from memory	one
Memory entry	0 or 1

According to Zuse, multiplication is done in 3 seconds. Given that the multiplication operation requires 16 cycles, it can be estimated that the operating frequency of Z3 is 3/16 = 5.33 Hz. Curiously, the coincidence that the Z3 simulator, which my students implemented on a personal computer, also requires about 3 seconds to multiply.

The number of cycles required for the Keyboard and Display instructions is variable, because it depends on the exponent of the arguments. Since the input must convert the decimal representation to binary, the number of multiplications by 10 or 0.1 is determined by the value of the decimal exponent (consider below).

Addition and subtraction require more than one cycle because in the case of floating-point numbers, we need to bring the exponents of both arguments to the same value. This requires additional comparisons and shifts.

A number can be written to the memory in 0 cycles in the case when the result of the last arithmetic operation can be redirected to the desired memory address. In this case, the cycle required for the write instruction coincides with the last cycle of the arithmetic operation.

Software model

It is very important to describe the machine Z3 from the point of view of the programmer. From a software point of view, Z3 consists of 64 words in memory, capable of being loaded into 2 floating point registers, which I will call R1 and R2 for simplicity. Two registers hold arguments for an arithmetic operation. The programmer can write any sequence of instructions, but must remember about the state of the registers.

It is important to remember the sequence: The first load operation in the program (Pr z) loads the contents of the address z into R1. All subsequent load operations load the word from memory into R2. The Keyboard Input instruction loads a number into R1 and CLEARS R2, which is used as a temporary value when converting a decimal input to a binary representation.

Arithmetic operations do not contain arguments in the opcode. The following arguments are always used:

Multiplication	R1: = R1 * R2
Division	R1: = R1 / R2
Addition	R1: = R1 + R2
Subtraction	R1: = R1 - R2
Square root	R1: = sqrt (R1)

R2 is reset to 0 after arithmetic instructions that write the result to R1. Subsequent load operations from memory send the value to R2. The instructions for writing to memory and displaying always take the value from R1, which contains the result of the last arithmetic operation. After the save operation or output, R1 is reset to 0 (when the relay is turned off, which will then be ready to accept the new value). The next load from memory occurs in R1.

We give an example for a better understanding of the software model Z3. Suppose we need to calculate a polynomial by the Horner method .

x (a ₂ + x (a ₃ + xa ₄ )) + x ₅

Further, we assume that the arguments a ₄ , a ₃ , a ₂ and a ₁ are stored in the fourth, third, second and first memory cells, respectively. The value of z is stored in the fifth. The program is shown below:

 Pr 4 download a4 to R1 
 Pr 5 download x to R2 
 Lm multiply R1 and R2, the result is in R1 
 Pr 3 load a3 into R2 
 Lsl add R1 and R2, result in R1 
 Pr 5 download x to R2 
 Lm multiply R1 and R2, the result is in R1 
 Pr 2 load a2 into R2 
 Lsl add R1 and R2, result in R1 
 Pr 5 download x to R2 
 Lm multiply R1 and R2, the result is in R1 
 Pr 1 download a1 to R2 
 Ls1 add R1 and R2, result in R1 
 Ld display the result

After the last instruction is executed, the processor is reset to its original state.
state.

Z3 block diagram

In this section, I will look at the structure of the Z3 and describe the main building blocks in
details. It is important to show how synchronization between
components.

CPU

In fig. 3 shows a simplified diagram of an arithmetic unit. It consists of two parts: the left handles operations on the exponent of a floating point number, and the right handles operations on the mantissa. The registers Af and Bf store the exponent and the mantissa of what is programmatically called R1. I will represent R1 as a pair of registers <Af, Bf>. A pair of registers <Ab, Bb> stores the exponent and the mantissa R2. The pair <Aa, Ba> stores the exponent and mantissa of the third temporary floating point register, invisible to the programmer. Two Arithmetic Logic Devices (ALU) A and B are used to add and subtract exponents and mantis, respectively. The exponent and the mantissa of the result are placed in Ae and Be respectively. The pair <Ae, Be> is treated as an internal register, inaccessible to the programmer. In Part B, the multiplexer allows you to select the result of the operation between Ba and the output of the ALU. The multiplexer is controlled by the switch Bt (if Bt = 0, then Be gets the value of Ba).

The small blocks designated Ea, Eb, Ec, Ed, Ef, Fa, Fb, Fc, Fd, and Ff are relay blocks (hereinafter RB), opening and closing tires. For example, to transfer the value of the contents of register Af to register Aa, you need to set RB Ea to one. As we can see from fig. 3, the contents of Af can be transferred to Aa or Ab, and the contents of Ae can be transferred to Aa, Ab or Af, depending on the state of the RB. Part B of the arithmetic unit is arranged similarly, but with the addition of a multiplexer controlled by a switch Bt and shifters between Ba and Ba, and between Bf and Bb. The first shifter can shift the mantissa within 2 digits to the left or 1 digit to the right. It divides Bf by 4 or multiplying by 2. The second shifter can shift Af by one of 16 positions to the left or one of 15 positions to the right. These shifting devices are required to add and subtract floating point numbers. Multiplication and division by powers of two can be performed with the subsequent arithmetic operation without the cost of additional time.

Table 3. Digit registers

Af	7 digits	Bf	17 ranks
Aa	eight	Ba	nineteen
Ab	eight	Bb	18
Ae	eight	Be	18

As we can see from the list, Ae has an extra bit to save the result of the addition of exponential arguments. Part B has 2 additional digits (b _-15 and b _-16 ) and explicit b ₀ , not stored in memory. -15 and -16 digits are entered to increase the accuracy of calculations. Consequently, the total bit needed to store the result of arithmetic operations in Bf is 17 bits. The Ba and Bb registers require additional bits (ba2, ba1 and bb1) to store intermediate results of some calculations. For example, the square root algorithm may require 3 digits to the left of the decimal point.

The simplest operation on a data channel is the addition and subtraction of the mantissa and the exponent. When the As or Bs switches are set to 1, the ALU translates the second argument to an additional code, i.e. changes its sign (Ab or Bb). Therefore, if the As switch is set to 1, then the AUL of Part A subtracts, otherwise the addition. Part B and switch Bs work in a similar way.

Suppose there are two numbers with equal exponents that need to be added. The first exhibitor is stored in Af, and the second in Ab. Since they are equal, this part of the machine should not perform any operations. In part B, the mantissa of the first number is stored in Bf, and the mantissa of the second in Bb. The first stage consists in transferring the number from Bf to Bf, for which it is necessary to set the relay block Fa to 1. Next, the addition occurs. In order for the result of Ba + Bb to be registered in Be, you must set the Bt switch to 1. RB Ff is currently set to 1, and the result is placed in Bf. The architecture must provide the correct sequence of actions on the switches to perform the desired operation. The Z3 does this thanks to a technique that is very similar to firmware.

Control device

In fig. 4 shows a detailed diagram of the control device and I / O panels. The circuit Pa decodes the opcode read from punched tape. If this is a memory instruction, the Pb circuit sends the lower 6 bits of the opcode to the address bus. The control device determines the microsequence instructions. For each operator from the instruction set, there are special schemes.

Diagram Z represents the keyboard used to enter decimal numbers. Only one button in each of the four columns can be pressed. The exponent is set by pressing one of the buttons marked from -8 to 8 in the K diagram. The display is very similar to the input panel, its lamps illuminate the decimal number received, and its exponent (Q scheme). Please note that the display shows the fifth digit (it can take values only 0 or 1).

After entering a decimal number, it is transmitted via the data bus to the Ba register, and a special sequence of operations begins. The decimal number entered from the keyboard must be converted to binary representation. This requires several multiplications, the number of which is determined by the value of the exponent. For the exponent equal to zero, 9 cycles are spent for the complete conversion, and for the exponent equal to -8, 9 + 4 * 8 = 41 cycles are spent.

Z3 micromanagement

At the heart of the Control Device is the microsequence controller. Before considering the principle of its operation, we need to consider in detail the algorithms of arithmetic instructions Z3. In fig.5 shows the main idea. Each cycle consists of 5 stages. At stages IV and V, information is entered into the machine. In stages I, II and III, addition / subtraction occurs, in parts A and B. I called this phase of the instruction - “execution”. A typical instruction selects arguments, executes and records the result. Tsuze was able to reduce the execution time by overlapping the stages of selecting the arguments for the next instruction and recording the result of the current one. As a result, we can assume that the executive cycle consists of two stages (in Fig. 5, the first two cycles are indicated).

The microsequence controller is implemented using a special control wheel. Implemented separate wheels for the multiplication algorithm, division and taking the square root. The movable lever shown in Fig. 6, begins to rotate clockwise as soon as the control device decodes the corresponding instruction. For each cycle, the lever moves one position. The lever conducts electricity and activates the circuitry with which it comes in contact. In the example shown in fig. 6, the movable lever sets the switch Ea to 1 on the first cycle. This leads to the transfer of numbers from the register Af to Aa. On the next cycle, switches Ec and Fc are activated. In this case, the results of the operation in parts A and B are recorded in the registers Aa and Ba, respectively. As we can seeThe use of such control wheels provides a platform for easy modification of microsequences of certain operators. Microprocessor controllers in modern microprocessors are similarly arranged. I did not call this method microprogramming, because in our case the microsequence controller is hardware, but it is obvious that microsequences and microprograms are closely related.

The use of microsequence controllers allowed Zuse to greatly simplify the machine. After the basic schemes were prepared on their basis, the control device was improved until it was possible to achieve an optimal sequence of micro-instructions. The engineer, when designing the “firmware”, must take into account many details, otherwise a short circuit could destroy the machine. Mechanical Z1 was more sensitive in this respect than Z3. At the end of this stage there was a list of sequences that the programmer should avoid in order to prevent damage to the equipment. One of these sequences was negligently launched at the Berlin Museum of Technology and Transport, and resulted in minor damage to the Z1 reconstruction in 1994.

Adders

An important feature of the adder Z3, is the calculation of the sum and difference method, called the scheme of accelerated transfer . If binary addition is performed directly, the transfer is transferred from each digit one bit further. In the case of a mantissa, 16 cycles are required to transfer the transfer of one bit. Adding to the implementation of Zuse is much faster, where addition and subtraction are carried out during phases I, II and III of one cycle. For subtraction, the second argument is translated into reverse code with the addition of one to the least significant bit (additional code).

Consider the addition of the registers Ba and Bb. I will denote the i-th digit of the register Bb as bbi or Bb [i]. In this notation, all registers will be denoted. The first intermediate result is obtained bitwise Exclusive OR between two registers, i.e. bc _i = ba _i XOR bb _i . The second intermediate result is obtained by a bitwise conjunction between two registers, i.e. ba _i AND bb _i . The next operation finds digits for which transfer is required. Intermediate result bd _icalculated by the scheme shown in Fig. 7. Please note that if the discharge is set to one, then a current flows through the corresponding transfer line, otherwise the power supply is disconnected (so that a short circuit does not occur). The wiring diagram of the switches bc ₁ , ..., bc _{-16 is} shown in fig. 7. If the digit bc _{i is} set to one, the corresponding switch is open. The final result is be _i = bd _i XOR bc _i . This scheme allows you to move the transfer to the last digit much faster. In case all switches are active, the transfer moves from the first digit to the last without wasting time.

Numerical Algorithms

In this section, I will describe the algorithms above floating-point numbers used in Z3. All of them, without exception, are used in ordinary small sequential floating point processors.

Floating point exceptions

The main problem with a floating-point entry is the convention regarding the representation of the number 0. Z3 solves this problem with exceptions (overflow and loss of digits) for tracking the exponent value after arithmetic or load from memory. Special schemes monitor the state of the Ae bus and catch exceptions. Any number with an exponent of -64 is 0: A switch denoted by Nn1 is set to 1 if such a number is stored in a pair of registers <Af, Bf>. If such a number is stored in a pair of registers <Ab, Bb>, the switch Nn2 is set to 1. Therefore, we always know whether one of the arguments of the arithmetic operation is zero. Something similar happens for any number with exponent 63 (infinitely large). In this case, the switches Ni ₁ or Ni ₂are reset to 0 if the corresponding register pairs store such numbers.

Operations involving “exceptional” numbers (zero and infinity) are performed as usual, but the result is replaced with a special scheme. Consider, for example, multiplication with the first argument equal to zero (Nn ₁set to 1). Calculation takes place as usual, but in each cycle a special scheme gives the result -64 from the adder of Part A. It does not matter at all what operations were performed with the mantissa, since the exponent is -64, and therefore the final result is zero. The division by zero is similar. Z3 can detect uncertainties such as 0/0, ∞-∞, ∞ / ∞, and 0 * ∞. In all these cases, the corresponding exception indicator on the output panel lights up and the machine stops. Z3 always calculates the correct result when one of the arguments is zero or infinity, and the second is a finite non-zero number. But this does not apply to Z1. Zuse was thinking about exception handling in Z1, but it remained unrealized. The machine produces an incorrect result in some calculations with zero.

The additional circuit looks at the exponent of the result, at the output of the adder of Part A. If the exponent is greater than or equal to 63, an overflow has occurred and the result must be set to infinity. If the exponent fell to -64, then a loss of digits occurred and the result should be set to zero. To do this, the corresponding switches (Nn ₁ or Ni ₂ ) are set to 1.

Zuse implemented exception handling using a total of 5 relays. This feature of the Z3 is one of the most elegant in the whole design. Most early microprocessors in the 1970s did not have exception handling and implemented them programmatically. Zuse's approach is more advanced, since it frees programmers from having to constantly check the boundaries of numbers before each operation.

Addition and Subtraction

To add or subtract two floating-point numbers x and y, they must be represented with the same exponent. After that, addition and subtraction is carried out only over the mantissa. If the exponentials are different, the mantissa of a smaller number is shifted to the right by the required number of digits (and the exponent is correspondingly incremented so that the operand retains its value) so that the exponents become equal. Of course, it may happen that a smaller number becomes zero after 17 shifts to the right.

The characters of the operands are compared before deciding on the type of operation being performed. If addition is requested, the addition is performed in the case of the same characters and subtraction in the case of different characters. If subtraction is requested and the characters are the same, then subtraction is performed, and if different, addition is performed.

A special scheme establishes the sign of the result based on the signs of the operands and the sign of the intermediate result.
Addition and subtraction are controlled by a switch chain (and not by the Micro Sequence Control Wheel), since while the maximum number of required steps is not large. Table 4 shows the signals needed to add two numbers. Initially, the arguments of the addition operation are stored in pairs of registers <Af, Bf> and <Ab, Bb>. In the first cycle, the exponents are compared. In the second cycle, the mantissa of the number with the larger exponent is loaded into the Ba register, and the mantissa of the number with the smaller exponent is loaded into the Bb register. The mantissa in register Bb is shifted to the right by a number equal to the absolute value of the exponent difference (the exception handling system takes care of the case when a smaller number becomes zero after the shift). At stages I, II and III of the second cycle, the mantissas are added up and at the end the processor will check whether the result is greater than two. If isthe mantissa is shifted one position to the right, and the exhibitor is incremented. It is important to note that the “if (Be> 0)” check is performed in Part A of the arithmetic unit after Be is calculated in Part B in stages I, II and III of cycle 2.

Table 4. Addition Cycles

Cycle	Stage	Exhibitor	Mantissa
0	I, II, III
one	IV, V	Aa: = Af
one	I, II, III	Ae: = Aa - Ab	Be: = 0 + Bb
2	IV, V	If (Ae> = 0) then Ab: = 0, Aa: = Af else Aa: = 0	If (Ae> = 0) then Ba: = Bf, Bb: = Be (shifted) else Ba: = Be, Bb: = Bf (shifted)
2	I, II, III	If (Be> = 2) then Ae: = Aa + Ab + 1 else Ae: = Aa + Ab	Be: = Ba + Bb
3	IV, V	Af: = Ae	if (Be> = 2) then Bf: = Be / 2 else Bf: = Be

In the case of subtraction, 4 or 5 cycles are required. Table 5 presents the signals required for the subtraction operation. The first two cycles are completely identical to the first two cycles of the addition algorithm, except that the mantissas are subtracted. Cycle 3 is performed only if the mantissa of the result is positive. Cycle 4 is very important: the difference between the normalized mantissas can have many zeros in the first bits to the left. The result is normalized by shifting Be to the left by the required number of digits (this is done by a shifter between the switch Fd and the register Bb). The number of shifts is subtracted from the exponent in Part A of the processor. In loop 5, the result is written into a pair of registers <Af, Bf>.

Table 5. Cycles subtraction algorithm

Cycle	Stage	Exhibitor	Mantissa
0	I, II, III
one	IV, V	Aa: = Af
one	I, II, III	Ae: = Aa - Ab	Be: = 0 + Bb
2	IV, V	If( Ae >= 0 ) then Ab:=0, Aa:=Af else Aa:=0	If( Ae >= 0 ) then Ba:=Bf, Bb:=Be(shifted) else Ba:=Be, Bb:=Bf(shifted)
2	I, II, III	Ae:=Aa+Ab	Be:=Ba-Bb
3	IV, V	Af:=Ae, Ab:=0	Ba:=0, Bb:=Be
3	I, II, III	Ae:=Aa+Ab	Be:=Ba-Bb
four	IV, V	Aa:=Ae Ab:=	Bb:=Be(shifted) (Be )
four	I, II, III	Ae:=Aa-Ab	Be:=0+Bb
five	IV, V	Af:=Ae	Bf:=Be

Multiplication

The multiplication algorithm Z3 is similar to one of the methods of decimal multiplication on the fingers. The method is based on repeated additions of a multiplier with individual digits of a multiplicand. At the beginning of the algorithm, the first operand is stored in a pair of registers <Af, Bf>, and the second operand is stored in a pair of registers <Ab, Bb>. The intermediate pair of registers <Aa, Ba> is reset to 0. Table 6 shows the micro-sequence executed by the Micro-Sequence Control Wheel to perform multiplication. The algorithm requires 16 cycles. Only bits from -14 to 0 are involved in multiplication. Exponents are added in the first cycle, and the result of the addition is cyclically stored in Part A of the Arithmetic unit. Mantissas are processed in Part B. The Ba register stores the intermediate result of the calculations.

The main multiplication cycle is as follows:
Ba:= Be / 2
Be: = Ba + Bb * (i-th bit of Bf)

For i = -14, .., 0. The intermediate result in Be shifts one digit to the right to execute the expression Ba: = Be / 2. This operation is performed on a shifter connected to the Fc switch.

The result of the mantissa multiplication is the number 1 <= r <4 (for operands within the specified limits). On the last cycle, the result is checked for the condition r> = 2. If the condition is satisfied, the result is shifted by one digit to the right, and the exponent is incriminated.

Table 6. Multiplication Algorithm Cycles

Cycle	Stage	Exhibitor	Mantissa
0	I, II, III
one	IV, V	Aa: = Af
one	I, II, III	Ae: = Aa + Ab	If (Bf [-14] = 1) then Be: = Ba + Bb else Be: = Ba
2	IV, V	Aa: = Ae, Af: = 0, Ab: = 0	Ba: = Be / 2
2	I, II, III	Ae: = Aa + Ab	If( Bf[-13]=1 ) then Be:=Ba+Bb else Be:=Ba
3	IV, V	Aa:=Ae	Ba:=Be/2
3	I, II, III	Ae:=Aa+Ab	If( Bf[-12]=1 ) then Be:=Ba+Bb else Be:=Ba
...	...	...	...
i	IV, V	Ae := Aa + Ab	Ba:=Be/2
i	I, II, III	Aa:=Ae	If( Bf[i-15]=1 ) then Be:=Ba+Bb else Be:=Ba
...	...	...	...
15	IV, V	Af:=Ae	Ba:=Be/2
15	I, II, III	if(Be>=2) then Ae:=Aa+1	If( Bf[0]=1 ) then Be:=Ba+Bb else Be:=Ba
sixteen	IV, V	Af:=Ae	If( Be >= 2 ) then Bf:=Be/2 else Bf:=Be Bb:=0

Division

The division algorithm is very similar to the multiplication algorithm, with the exception that instead of cyclic addition, cyclic subtraction is used. At the beginning of the algorithm, the dividend is stored in a pair of registers <Af, Bf>, and the divisor in a pair of registers <Ab, Bb>. The intermediate pair of registers <Aa, Ba> is reset to 0. Table 7 shows the micro-sequence performed by the Micro-Sequence Control Wheel. The algorithm requires 18 cycles.

The basic idea of the algorithm is very simple. The result exponent is obtained by subtracting the exponent of the dividend and the divisor. Now about mantissas: Let us need to calculate x / y for mantisses x and y. Sincewe deal with normalized numbers, then the first digit of the result is equal to one if x> = y, and zero if x <y. In the first case, we assign the result 1 to the first digit and calculate the remainder, which is x - y. The remainder is recursively divided by y. To do this, it shifts one digit to the left and the new effective bit is stored in the Bf register at position -1 (thus compensating for the effect of the shift). If the resulting bit is zero, then the remainder is simply x and the recursive division continues as in the first case.

The main division cycle has the following form:
Ba: = 2 × Be
if (Ba – Bb> 0) then Be: = Ba – Bb, Bf [i]: = 1
else Be: = Ba
Bf [i]: = 0

For i = 0, .., - 14. The intermediate result in Be is shifted one digit to the left to execute the expression Ba: = 2 x Be. This operation is performed in the shifter connected to the switch Fc.

The result of mantissa division is the number 1/2 <r <2. This condition is checked in cycles 17 and 18. If r <1, then the unit is subtracted from the exponent and the result is shifted one digit to the left to get the normalized number.

Table 7. Multiplication Algorithm Cycles

Cycle	Stage	Exhibitor	Mantissa
0	I, II, III
one	IV, V	Aa: = Af	Ba: = Bf
one	I, II, III	Ae: = Aa - Ab	If (Ba-Bb> = 0) then Be: = Ba-Bb; bt: = 1 else Be: = Ba; bt: = 0
2	IV, V	Aa: = Ae, Ab: = 0	Bf: = 0 if (bt = 1) then Bf [0]: = 1 Ba: = 2 x Be
2	I, II, III	Ae: = Aa + Ab	If (Ba-Bb> = 0) then Be: = Ba-Bb; bt: = 1 else Be: = Ba; bt: = 0
3	IV, V	Aa: = Ae	if( bt = 1 ) then Bf[1] := 1 Ba:=2 x Be
3	I, II, III	Ae:=Aa+Ab	If( Ba-Bb >= 0 ) then Be:=Ba-Bb; bt:=1 else Be:=Ba; bt:=0
...	...	...	...
i	IV, V	Aa:=Ae	if( bt = 1 ) then Bf[2-i] := 1 Ba:=2 x Be
i	I, II, III	Ae:=Aa+Ab	If( Ba-Bb >= 0 ) then Be:=Ba-Bb; bt:=1 else Be:=Ba; bt:=0
...	...	...	...
sixteen	IV, V	Af:=Ae	if( bt = 1 ) then Bf[-14] := 1 Ba:=2 x Be
sixteen	I, II, III	Ae:=Aa+Ab	If( Ba-Bb >= 0 ) then Be:=Ba-Bb; bt:=1 else Be:=Ba; bt:=0
17	IV, V	If ( Bf[0] = 0 ) then Ab := -1	Ba:=Bf, Bb:=0
17	I, II, III	Ae:=Aa+Ab	Be:=Ba+Bb
18	IV, V	Af:=Ae	If( Bf[0] = 0 ) then Bf:=2xBe else Bf:=Be

The square root extraction algorithm is the main highlight of the Z3. Table 8 presents the microsequence, consisting of 20 cycles, necessary to calculate the square root. Initially, the operation argument is stored in a pair of registers <Af, Bf>. The pair of registers <Aa, Ba> is reset to 0. The algorithm calculates the square root of a number with an even exponent. If the exponent of a number is not even, then the mantissa is shifted 1 digit to the left and the exponent is decremented. The total exponent (calculated in cycle 19) is equal to half the original exponent.

The main idea of this classic algorithm is to reduce the square root operation to the division operation. To extract the square root of x, we need a series of Q such that x / Q = Q. The final Q is obtained by sequentially setting the i-th digit to 1, followed by checking the condition x> Q ² . If the condition has ceased to be fulfilled, then the i-th digit should be reset to 0.

Let we have already calculated the result from zero to -i + 1 digit. Denote by Q-i + 1 the mantissa:

Q _{-i + 1} = Bf [0] * 2 ⁰ + Bf [-1] * 2 ^-1 + ... + Bf [-i + 1] * 2 ^{-i + 1}

Next discharge -i is set to q _-i , which must preserve the condition:

x> = Q _-i ²= (Q _{-i + 1} + q _-i * 2 ^-i ) ²

This condition is true if:

x - Q _-i ² = (x - Q _{-i + 1} ² ) - 2 ^-i q _-i (2 Q _{-i + 1} + 2 ^-i q _-i )> = 0

Define t _-i using the expression:

2 ^-i * t _-i = x Q _-i ² = (x - Q _{-i + 1} ² ) - 2 ^{- i} q _-i (2 Q _{-i + 1} + 2 ^-i q_-i )

It can be written as:

2 ^-i t _-i = t _{-i + 1} 2 ^{-i + 1} - 2 ^-i q _-i (2 Q _{-i + 1} + 2 ^-i q _-i )

in which we use the recursive definition:

2 ^{-i + 1} t _{-i + 1} = (x - Q ² _{-i + 1} )

Simplifying the previous expression, we get:

t _-i = 2 t _{-i + 1} - q _-i (2 Q _{- i + 1} + 2 ^-i q _-i )

If t _{-i is} positive for q _-i = 1, then we set the digit -i of the final result to 1, i.e. Bf [-i]: = 1. If t _{-i is} negative, then we set Bf [-i]: = 0. Recursive calculations start with t0 = x. Q _{-i + 1} at each step represents a partial result stored in Bf. -I pre-discharge is set to 1 and the sign verified t _-i . The basic cycle of the square root extraction algorithm for the discharge -i has the following form:

Ba: = 2 x Be
Bb: = 2 x Bf
Bb [-i]: = 1
if (Ba - Bb> = 0) then Be: = Ba - Bb , Bf [-i]: = 1
else Be: = Ba, Bf [-i]: = 0

To calculate the square root, all bits of the Bf register are used. If the original number is in the specified range, the result is also in the specified range.

Table 8. Cycles of square root extraction algorithm

Cycle	Stage	Exhibitor	Mantissa
0	I, II, III
one	IV, V		If (Af [0] = 1) then Ba: = 2 * Bf else Ba: = Bf Bb [0]: = 1
one	I, II, III		If (Ba-Bb> = 0) then Be: = Ba-Bb, bt: = 1 else Be: = Ba, bt: = 0
2	IV, V		Bf: = 0 if (bt = 1) then Bb [0]: = 1, Ba: = 2 * Be Bb: = 2 * Bf, Bb [-1]: = 1
2	I, II, III		If (Ba-Bb> = 0) then Be: = Ba-Bb, bt: = 1 else Be: = Ba, bt: = 0
3	IV, V		if (bt = 1) then Bb [-1]: = 1, Ba: = 2 * Be Bb: = 2 * Bf, Bb [-2]: = 1
3	I, II, III		If (Ba-Bb> = 0) then Be: = Ba-Bb, bt: = 1 else Be: = Ba, bt: = 0
...	...	...	...
i	IV, V		if (bt = 1) then Bb [2-i]: = 1, Ba: = 2 * Be Bb: = 2 * Bf, Bb [1-i]: = 1
i	I, II, III		If (Ba-Bb> = 0) then Be: = Ba-Bb, bt: = 1 else Be: = Ba, bt: = 0
...	...	...	...
18	IV, V		if(bt=1) then Bb[-16]:=1, Ba:=2Be Bb:=2Bf
18	I, II, III		If( Ba-Bb >= 0 ) then Be:=Ba-Bb, bt:=1 else Be:=Ba, bt:=0
nineteen	IV, V	Aa:=Af / 2	Ba:=Bf, Bb:=0
nineteen	I, II, III	Ae:=Aa+0	Be:=Ba+Bb
20	IV, V	Af:=Ae	Bf:=Be

The two most complex instructions Z3 c input and output in the decimal number system. Decimal numbers of 4 digits are entered through the keyboard and converted into a binary representation. This is done by reading each digit sequentially, converting it into a binary representation and storing the Ba [-10], Ba [-11], Ba [-12] and Ba [-13] Ba registers into bits. The number in the register is multiplied by 10 and the procedure is repeated with the remaining digits. After 4 iterations, the decimal input is completely converted into a binary representation (the binary representation exponent is formed indirectly through shifts when multiplied by 10). The most difficult part is handling exponents. If the exponent e is positive, the mantissa is multiplied e times by 10. If negative, then | e | times by 0.1. Multiplication by 10 is relatively simple:the mantissa in Be can be shifted 1 digit to the left and then saved in Ba (ie, Ba: = 2 x Be). At the same time, Be can be shifted 3 bits to the left and then saved in Bb (ie, Bb: = 8 x Be). Adding Ba and Bb gives the desired result: multiplying a number from Be by 10. The process takes 4 cycles for each multiplication, for a total of 32 cycles for the decimal exponent +8. Since read operations require at least nine cycles, this means that the decimal number with the exponent +8 is read in 41 cycles.Since read operations require at least nine cycles, this means that the decimal number with the exponent +8 is read in 41 cycles.Since read operations require at least nine cycles, this means that the decimal number with the exponent +8 is read in 41 cycles.

In the case of a negative exponential, the multiplication by the constant 0.1 is also performed using shifts and addition. This multiplication is a bit more complicated, since the number 0.1 in the binary representation is periodic. The description of the microsequence would lead us too far away from the main topic, so we omit it. (It can be found in the patent application Zuse. [14]).

Display instructions are performed using iterative multiplications or divisions by 10. If the binary exponent in the R1 register is positive, then the number is multiplied by 0.1 as many times as required to make the binary exponent equal to two and while the 4 left digits of the Bf register contain the number from 0 to 9 (0000 and 1001). This is a decimal number that can be displayed in the next column of the output panel. The number is subtracted from the mantissa in Bf and the process continues with the next digit. If the binary exponent in R1 is negative, then the process is similar, except that the multiplication by 10 is performed.

Z3 common architecture

Now consider the detailed diagram of Z3, shown in Fig. 8. Control devices and I / O panels have been previously discussed. Note that the 4 decimal digits of the input keyboard are moved to the Ba register through the Za, Zb, Zc, and Zd switch blocks, which are activated one after the other.

The switch blocks Eg and Ei are used to load some of the constants used in the exponent registers (+13 and -4, which are used to convert the number system). The shifter Ee between the register Af and the register Aa is used in the square root algorithm. The exponent of the result (Aa) becomes equal to half the exponent of the original number (Af).

Ah1 is a two-state switch. When it is at zero, a pair of registers <Af, Bf> is available for loading operands. This switch is reset to zero by the control line ai. The control lines al, aj, bl and bj are used to clear the register Af, Ab, Bf and Bb if necessary.

The block called "zero, infinity" under Ae is an exception handling scheme. They continuously monitor the data bus (the results of operations and data from memory) and, if necessary, set the appropriate exception flags. The shifter under Be is used to shift the mantissa one rank to the right. This is necessary for normalization, in cases when Be> = 2.

The switches Fp and Fq control the amount and direction of the shift in the shifter under the switch blocks Fc and Fa. The switches Fh, Fi, Fk, Fl and Fm perform a similar function on another shifter. These 5 digits represent a number from -16 to 15, which corresponds to the range of the second shifter. After the shift, the number represented by the switches Fh through Fm is moved through the switch block Bn to the register Ab in accordance with the algorithm for changing the exponent of the result. If the number is shifted 10 bits to the left, then +10 is subtracted from the exponent of the result. Such large shifts are needed mainly after calculations.
Take another look at the Z3 machine diagram. Everything in this car looks like a modern floating point microprocessor. It is amazing how Zuse could find such a good architectural solution at the very dawn of computers. The Z3 processor includes a total of 600 relays; the memory goes three times more. In order to optimize the structure without the ability to change the hardware, Zuse was forced to rethink the logical structure of the machine all the time. He did not have the luxury of the unlimited funding allocated by the US Department of Defense to develop ENIAC or IBM at Mark I. He was alone. Despite the fact that it gave him an advantage on the conceptual side, it also brought its drawbacks, given the low impact the Z1 and Z3 machines had on the emerging computer industry in the United States after World War II. [13]

Computer invention

The main disadvantage of Z3 is the complete absence of conditional transitions in a variety of instructions. When a program is stored on a punched tape, a possible way to solve this problem is to introduce several tapes and a mechanism to switch between tapes (as was done in Harvard Mark I). Another way is to use a software counter so that the punched tape can be rewound back and forth on demand.
Sometimes the distinction between Computers and Universal Computers is drawn based on the way programs are stored (external and internal). I believe that this criterion is not appropriate. An external program can work as an interpreter of numeric data. The external program becomes a permanent part of the processor, and the data becomes the program, in the same way that the Universal Turing machine works as an interpreter. I believe that a prerequisite for a Universal Computer is a minimum set of instructions and indirect addressing. [11] Indirect addressing can be simulated by writing a self-modifying program, so the set of instructions becomes the decisive criterion. A machine with enough addressable memory, battery and capable of executing CLR (clear), INC (increment), LOAD, STORE and BZ (branching if zero) instructions is a Universal Computer. In this case, the Z1 was not a full-fledged computer, but so were many other early computers. The ABC machine was a specialized machine for solving systems of linear equations by the Gauss method. Harvard Mark I did not have a conditional transition, although he implemented cycles. ENIAC was not programmable - the data flows between the building blocks were set by hardware. Conditional transitions were available with restrictions, and of course, there could be no talk of self-modifiable programs.

Table 9. Comparison of architectural features

A machine	Memory and CPU are separated.	Conditional transitions	Software or hardware programmable	Self-modifying programs	Indirect addressing
Z1	Yes	not	soft	not	not
Atanasov's car	Yes	not	hard	not	not
Harvard Mark I	not	not	soft	not	not
ENIAC	not	partially	hard	not	not
Machester Mark I	Yes	Yes	soft	not	not

Table 10. Some additional architectural features

A machine	Internal number system	With constant or floating point	Bitwise arithmetic	Architecture	Technology
Z1	binary	floating	not	consistent	mechanical
Atanasov's car	binary	constant	Yes	vectorial	electronic
Harvard Mark I	decimal	constant	not	parallel	electromechanical
ENIAC	decimal	constant	not	stream given	electronic
Machester Mark I	binary	constant	Yes	consistent	electronic

Tables 9 and 10 show the most relevant information about the early computers mentioned above. As it should be clear from the tables, none of the early computers complied with all the conditions of the Universal Computer. I also mentioned the Manchester Mark I, which was built from 1946 to 1948 because, as far as I know, this is the first machine that fully complies with my Universal Computer conditions. The Mark I car was built under the guidance of Frederick Williams (FC Williams) and Tom Kilburn (T. Kilburn). She keeps her program in a random access memory implemented on a cathode ray tube. All the necessary elementary instructions are available (in a modified form), and despite the fact that there is no indirect addressing in it, self-modifiable programs can be implemented in it. The first program was launched in June 1948, which counted on the largest of its own divisors of a large number. In September 1948, Turing received the title of Reader in the mathematics department of the University of Manchester and wrote several programs for the first universal computer in the world. His vision of a universal computer was published in 1936, the same year when the memory device Z1 was completed. Tables 9 and 10 express that the invention of the computer was a collective achievement and covers 12 years.

Thanks

Deciphering schematic documentation was possible only in collaboration with some of my students at universities in Halle and Berlin. I thank Alexander Thurm and Axel Bauer for implementing the gata-level Z3 processor simulator. We realized what a synchronization problem is when the simulator refused to start. I also thank Franz Konieczny, Reimund Spitzer and Roland Schultes, who wrote part of the stand-alone processor simulator. We started working on the Z3 with the help of Konrad Zuz, who willingly answers our questions. We were amazed how, after more than 60 years, the whole Z3 design remains in his head. Unfortunately, Zuse died in December 1995, before the description of his work was completed. This work is dedicated to his memory.

Literature

H. Aiken and G. Hopper, “The Automatic Sequence Controlled Calculator,” reprinted in B. Randell, ed., The Origins of Digital Computers. Berlin: Springer Verlag, 1982, pp. 203-222.
AW Burks and AR Burks, “The ENIAC: First General Purpose Electronic Computer,” Annals of the History of Computing, vol. 3, no. 4, pp. 310-399, 1981.
AW Burks and The First Electronic Computer: The Atanasoff Story. Ann Arbor: Univ. of Michigan Press, 1988.
K.-H. Czauderna, Konrad Zuse, der Weg zu seinem Computer Z3. Munich: Oldenbourg Verlag, 1979.
D. Knuth, The Art of Computer Programming-Seminumerical Algorithms, vol. 2. Reading, Mass .: Addison Wesley, 1981.
I. Koren, Computer Arithmetic Algorithms. Englewood Cliffs, NJ: Prentice Hall, 1993.
SH Lavington, A History of Manchester Computers. Manchester, England: NCC Publications, 1975.
SH Lavington, Early British Computers. Manchester, England: Digital Press, 1980.
B. Randell, ed., The Origins of Digital Computers. Berlin: Springer Verlag, 1982.
R. Rojas, “Who Invented the Computer? The Futty Years Mathematics of Computation, Proceedings of Symposia in Applied Mathematics, AMS, pp. 361-366, 1993.
R. Rojas, “On Computer Systems,” Proc. 13th World Computer Congress, Hamburg, pp. 324-331, 1994.
U. Schweier and D. Saupe, “Funktions und Konstruktions prinzipien der programmgesteuerten mechanischen Rechen maschine Z1,” Arbeitspapiere der GMD 321, Bonn, 1988.
N. Stern, From ENIAC to UNIVAC. Bedford: Digital Press, 1981.
K. Zuse, Patentanmeldung Z-2391, German Patent Office, Berlin, 1941.
K. Zuse, Der Computer mein Lebenswerk. Berlin: Springer-Verlag, 1970.
K. Zuse, personal communication, Mar. 18, 1995.

Source: https://habr.com/ru/post/210412/

All Articles