
Recently on Habré there was an article about
designing your own computer , where the author first wanted to build a computer from transistors, but then decided to continue on 7400-series microcircuits due to the fact that it seemed too complicated and expensive for transistors.
A similar task interested me for the last 3 years - but I did not refuse the initial idea of ​​building on transistors, and now I can tell you my thoughts and show current developments, and also I want to ask your opinion about what you should be __ transistor decorative computer. But it should be noted right away that there is still a couple of years ahead :-)
The main question is why is all this necessary if
there is an FPGA and all sorts of Raspberry Pi?
')
The answer is simple:
1) I am interested in doing this in my free time and
2) Decorative computer (decorative is a matter of attitude to the computer, not its appearance) - it’s like decorative pets: the pug will not bite off the robber’s leg, and the Persian cat will not win the battle for the Metercats. But it is interesting to play with them and show them to the guests - even if in the area of ​​computing, guarding and hunting they are much inferior to "combat" counterparts.
Task Substitution and Architecture
What are our requirements for a decorative computer?
- Development for automated assembly . Hands to solder 5000 parts to a computer - you can turn gray.
- Performance is not essential, the main thing is that 100 thousand + operations per second be (at the level of Radio-86RK). Even 100 thousand op / s will solve many problems.
- Practice shows that 64 KiB of memory is a realistic minimum amount of memory. In 1-4 KiB a lot will not fit, because reducing the address bus is less than 16 bits is not worth it.
- Programming - on C. You can not force people to spend their free time on the assembler (but if there is a desire, then all the cards in hand).
- In the base model, only the processor itself is made of transistors. Memory and strapping - maybe from microchips. Fully transistor option is possible in the future, but it will be further development (with its limitations - more than 1-2 KiB of memory will be difficult)
- In the processor, the LEDs that are turned off show the status of the internal registers (IP, battery, read / write from memory ...)
- Not too high price. The cost of one transistor with strapping and automatic installation is about 2 rubles, respectively, it is not desirable to use more than 1000 transistors for a serial product. This means that the circuit should be simpler than the i8080 (4500 transistors), i4004 (2300 transistors), and MT15 (~ 3000 transistors), despite the requirement of a 16-bit address bus.
- uCLinux? Even without virtual memory, it would be very desirable to have 24-32 bits. Of course, this would greatly increase interest, but would require at least doubling the number of transistors (= doubling the price). Also, with an increase in the digit capacity, the speed of a sequential computer also decreases proportionally.
- Hardware support for interrupts is not required, since too much behind itself pulls (microcode, hardware stack, etc.), and you can do without it.
User interaction: The classic implementation is a keyboard + output to a TV or VGA monitor. It is too difficult to make a comfortable “your” keyboard - you need to use standard PS / 2 or USB. PS / 2 keyboards are rare - and USB support in a transistor computer will be difficult without using dirty hacks (like a microcontroller).
Probably, the terminal interface can be an optimal and simple solution - when a computer communicates with the outside world via the serial port (RS232), programs can be downloaded in the same way. Those. in the simplest case, the transistor computer is connected to a desktop computer (or specialized terminal) via a USB <> COM adapter, and you can work with it in any terminal program (for example, Putty).
Also, we need the ability to connect external devices via GPIO pins.
Serial or parallel ALU? 8 or 16 bits ?: Since the number of transistors is very limited (<1000), you have to sacrifice performance, and carry out all operations sequentially. This greatly reduces the required number of transistors - in fact, logic is needed for 1-bit and 16-bit shift registers. But with a clock frequency of 1 MHz - we will have only 62 thousand. operations per second, of course it is desirable to have a higher frequency.
Since in any case, we need to have a 16-bit address bus, to support 8-bit operands means to significantly complicate the instruction set and increase the number of registers required. Therefore, everything will probably be simple - a 16-bit address bus, and a 16-bit machine word. Opportunities to work with 8-bit data will not be, if someone needs only 8 bits - will have to work with shifts.
CPU architecture: Of course, a full-fledged computer needs to be able to write code in its memory. Therefore, if there is an opportunity to shift this to the processor's external strapping, then the Harvard architecture (with separate code and data memory) will be easier to implement, and faster in operation. But if the processor piping is transistor - then you will have to use shared code and data memory. So here the choice depends on the vote at the end of the article :-)
The processor itself will be hard-wired (microcode logic and microcode itself could take a lot of transistors), a very simple set of instructions (binary + add logic, shifts by 8 and 1 bits) will inevitably occur, and the minimum number of registers (1-2) with the ability to use memory as pseudo-registers (as in 6502). Perhaps, in the simplest case, all commands will be executed according to a single rigid scheme ax = mem [imm] = mem [imm] op ax + conditional transition bit - this will reduce the processor logic to a minimum, and put the maximum amount of work on a relatively fast RAM.
There will be no hardware stack and hardware interrupt handling - this can also be realized in software: we have more memory than transistors.
Power supply: 3.3 and 5V? Most older computers use 5V, and modern electronics have long been guided by 3.3V for external connections. This computer will also use 3.3V - but because low-impedance pull-up resistors will consume 2.5 times less energy - and accordingly, you can further reduce their resistance and increase speed.
Transistor processor building blocks
Of course, to make a processor of less than 1000 transistors, using standard approaches to the construction of logic circuits (even taking into account the serial ALU) will not work out - and you need to apply various circuit design compromises and tricks to reduce the number of transistors.
Also important is the issue of speed - and in the previous
article on the Habré and in the MT15 - the clock frequency at which the logical blocks could work turned out to be very low. For a serial computer, this issue becomes topical.
Simple logic speed
As it turned out, there are a few simple tricks that greatly accelerate logic on bipolar transistors: this is adding a Schottky diode to prevent the transistor from entering deep saturation (the output from which is very slow, up to 200-500 ns), and optionally adding a capacitor by 25-50 pF parallel to the base resistor in order to quickly recharge the parasitic capacitance of the circuit. And of course, just like for any high-speed digital circuits, power isolation of ceramic capacitors is needed near consumers, and long digital tracks in some cases will require termination.
After applying these tricks we get the following (here both optimizations are in the rightmost part of the scheme):

And it works very quickly, on the oscillogram - 100ns / division, fronts / delays of the order of 10ns:

Also, selecting the resistance of the resistor from the base to the ground - it is possible to control the transfer characteristic of the logic, so that the threshold voltage is about CMOS, about 3.3 / 2 = 1.65V. Such a change, in addition to noise immunity, will give another important advantage: gently sloping fronts at the input — they will become sharper after passing through logic. Also, if we do not need to save electricity, we can throw out the "upper" transistor, and replace it with a resistor. The scheme is as follows:

And the transfer characteristic:

3 color lines is a simulation at different temperatures (20, 40 and 60 degrees), the parameters of bipolar transistors noticeably float with temperature, and this must be taken into account in more complex circuits.
More complex logic
T-trigger - T-trigger on the clock signal changes its state to the opposite. It can be used to build a parallel instruction counter, but it probably will not be used because everything will work consistently. The principle of operation - a bistable multivibrator, by a short negative clock pulse - switches to the opposite state due to capacitors parallel to resistors R8 and R9.

The circuit was implemented "in hardware", together with inverters. Inverters showed the expected speed (i.e. ~ 10-20 ns fronts). Do not be intimidated by the quality of soldering - the board has gone through a lot of experiments and transistors / parameters options:
A full adder is one of the most important and complex digital blocks. Canonical CMOS implementation of a full adder - requires 28 transistors:

Modern implementations using transmission gate and various tricks - require 8-11 transistors with more stringent requirements for the selection of transistors, but these schemes are not implemented directly from discrete transistors - 4 output transistors are needed (and they are rare), and due to degradation Level 1 requires a high supply voltage (because the threshold voltage of the available discrete field-effect transistors is 1.5-2 volts versus 0.5V for integrated transistors).
The very minimum that one had to see was of 6 transistors, using capacitors (but reliability raises questions). Known implementations on bipolar transistors - also
require 22 transistors .
But is it possible to do with only 4 transistors? I thought about it a bit, and the following came out:

Work schedules:

The scheme for simulation in LTspice IV can be downloaded here .
The principle of operation is as follows: The order of the terms does not matter, we simply mix them up analog, and just selecting the threshold voltage of the dual inverter - we immediately get the transfer. Then subtracting the transistor Q3 from the analog sum transfer - we get the sum. Of course, all this requires precise selection of trigger levels, and simulation based on temperature. Schottky diodes - to prevent the entry of transistors in deep saturation, which dramatically reduces the speed of work.
The use of field-effect transistors is possible, and provides the best temperature stability, the main thing is that they have a sufficiently low threshold voltage.
The shift register is the most important part of this transistor computer. Classic implementation on synchronous D-flip-flops - requires a monstrous number of transistors per bit.
I managed to fit in 2 transistors per bit, with the following features:
1) Registers are based on capacitors, and if they are “not moved”, then over time the data will disappear. But with a field-effect transistor, the storage time is quite long.
2) Data transfer to the next step - a bipolar transistor. In half of the cases it works in the opposite, non-standard mode - the breakdown voltage is much lower (but 3.3V should hold), and the gain is much lower than the direct connection (but I hope it will be enough).
3) Each next stage - inverts the signal, this is not a problem when only sequential access is needed (for example, in the case of processor registers). If you need a non-inverted parallel output, you will need to add 8 inverters (that is, a 16-bit shift register will require 40 transistors, not 32).
4) There is a problem with the saturation of the bipolar transistor.

Schedule:

With these compact implementations of digital circuits - I think it will be quite possible to meet the 1000 transistors.
That's all for now - the monstrously much work awaits me ahead 
And now - a few questions to the readers: What options seem acceptable to you?