Trigger metastability and inter-cycle synchronization

It is an amazing fact, but many students who successfully complete a digital electronics course remain in the dark about such a phenomenon as metastability (or consider it so unimportant that they are completely forgotten about 2 days after the exam). Meanwhile, device malfunctions caused by metastability are extremely difficult to diagnose. If you recognize yourself in such a student, and if you are somehow connected with developments based on digital circuits, I highly recommend reading this text. It is possible that spending 10 minutes now will save you many days spent in debugging in the future.

Recall what a trigger is.

The classic D-flip-flop is the basic unit of information storage in modern digital circuits. At a high level of abstraction, it is a memory device with data input D, data output Q, as well as an input control clock signal (clk - clock in English terminology). All signals here are logical, can stay either in logical 0 (hereinafter referred to as “0”) or in logical 1 (hereinafter referred to as “1”). The absolute value of voltages in volts does not interest us yet. Upon arrival at the clock input of the signal front, i.e. transition from "0" to "1", the trigger remembers the current value at the input of data D and sends it to output Q. In all other situations, the trigger continues to store its old value. Also, the trigger may be present and other control signals, in particular the signal asynchronous reset aclr, setting the trigger to "0", despite the other inputs.

Any trigger has two time parameters (requirements) that must be performed in order to work correctly:

Ts - setup time - preset time. The time during which the signal at input D must remain stable before the arrival of the clock edge;
Th - hold time - hold time. The time during which the signal at input D must remain stable after the arrival of the clock edge;

The sum of these two parameters gives us a time window in the vicinity of the clock front during which the input data should be stable. There are similar parameters for other inputs, in particular, for an asynchronous reset input (they are called recovery and removal time). Failure to comply with these requirements leads to an undefined trigger state. In the simplest case, it will be either the “old” or “new” value that came to the input of data D in the immediate vicinity of the front of the clock, but which one is unknown. In many situations you can come to terms with it. In the next picture, the outputs Q (1) and Q (2) are two implementations of triggering when the input time parameters are violated.

Coin stuck in the air

However, sometimes when the time parameters are violated, the digital world goes beyond the boundaries of the digital world and instead of accepting a state that clearly corresponds to “0” or “1”, the trigger output begins to dangle in an intermediate state with a voltage of about half the voltage between “0” and “ one". A rough analogy from the "mechanical world" is in the header image. In a stable position, the ball is on the left or the right. In order for him to fall into another state - he needs to be pushed. If you do it strongly enough, he will pass over the top without any problems and will find himself on the other side (normal trigger switching). If you push too weakly, it will immediately roll back. If you are a lucky person, then there is a chance that pushing the ball will make it freeze at the top.

This effect is called metastability . In the picture above, this option is designated as Q (3). Ultimately, the trigger will “fall” into one of certain states, but it is impossible to say for sure when this will happen. The residence time of the trigger in a metastable state is a probabilistic value and depends on the technological parameters of the element base, temperature, etc. On the oscilloscope, this picture looks like this (the blue signal shows the input signal that violates the time parameters of the trigger, pink - the trigger output):

Why is metastability bad? Firstly, this state itself is indefinite and gives rise to an unknown result in subsequent elements. Moreover, in many cases, the output signal of a single trigger is sent to several other elements at once, which, due to the non-ideal technological base, may have slightly different trigger thresholds. So, the same signal can be interpreted as "1" in one place and as "0" in another. This can lead to out-of-sync operation of various parts of the device and a general failure (i.e. such a situation from which the device is unable to get out). Moreover, if the metastable state lasts until the next clock signal arrives, then it can enter the next trigger in the chain into the metastable state as well. The probability of this is low, because The input voltage range at which the trigger falls into the metastability is extremely narrow. However, given the fact that this situation will occur regularly with a high frequency (modern schemes can operate at frequencies up to 1 GHz, or even higher), this event is quite real. Again, the likelihood of such an event can be estimated based on device parameters, clocking frequency, and data frequency. The general formula for estimating time between failures is as follows:

where
t_R - time interval "protection" against metastability. For a rough estimate, we can assume that it is equal to 0 for one trigger, the period of the clock signal minus the delays on the preset and the propagation for the second trigger, and plus over the period for each subsequent one;
\ tau is a technology dependent parameter, for modern devices, units / tens of picoseconds;
T_0 is a time window for hitting metastability. Tens / hundreds of picoseconds;
f_c is the clock frequency. Hundreds of megahertz;
f_D - the frequency of data changes. In the range from 0 to f_c.
For modern equipment, the period of hitting the first trigger in the chain in a metastable state is measured in micro / milliseconds, the second - in hours, the third - billions-trillion years.

The effect of metastability can occur in several classic situations:

A clear violation of the parameters Ts and Th trigger. It usually occurs when the device is trying to make it work at a frequency at which it cannot physically work due to too long signal propagation paths. This situation is controlled by the time analyzer and during normal operation (when the time requirements are met) should not occur.
Violation of the time parameters on the inputs of asynchronous reset. Surprisingly, despite the fact that the reset is asynchronous , it must be synchronized to the clock signal of the trigger that resets, i.e. its time parameters are declared relative to the clock edge. This is often forgotten, giving a reset from anywhere. Again, if the reset signal is generated using the same clock signal as the reset trigger clock signal, then the time analyzer will resolve itself.
Signals sent synchronously (at the same frequency with a known delay) from other devices - here the simplest time analysis (ie declaring only the clock frequency) does not work and you must explicitly set the time characteristics of your input / output signals. This is a separate challenging (but quite solvable) task of temporal analysis.
And finally, the worst case is that the signals are transmitted asynchronously, maybe at a completely different clock frequency (and with unknown phase shifts) with respect to the receiving triggers. Their source can be either external equipment (even a button on the board or a serial port) or blocks of the same microcircuit operating at a different clock frequency. It is about this situation and will further discussion.

But first, a few words about where the many clock signals come from. Of course, the easiest and most pleasant option is when there is only one clock signal. Then work with the project is easy and convenient. Unfortunately, this is only possible for very simple devices. Any more or less complex device usually communicates with several external devices / interfaces, each of which can have its own clock signal. For example, we need to accept data from the ADC, buffer it in external dynamic memory, and then transfer it via Ethernet. In this case, we will have at least 3 clock domains unrelated to each other — the ADC interface, the system bus, and the DDR memory, as well as a patch of the Ethernet interface. So, we are faced with the task of correctly transmitting data (as well as parameters, control signals, status flags, etc.) between these clock domains.

How to fight

To begin with, the bad news is: it is impossible to overcome the full effect of the metastability, so there is always a chance that the nearest nuclear power plant will suddenly explode or your plane will suddenly fall. However, a number of simple actions can significantly reduce this probability, approximately equaling the likelihood that tomorrow Archangel Raphael will sound and announce Judgment Day. The general tactic is clear: we put several triggers in a row (as the formula above shows, two synchronizing triggers are enough for realists) after which the data can be used. However, there are a number of technical nuances that depend on the type of data transmitted.

The simplest case is a pseudostatic scalar (single-bit) signal . For example, a mode selection signal, etc. - switching extremely rarely. In this case, our only concern is to protect against metastability - we put two triggers of the receiving frequency and the trick is done. With uncertainty, 1 clock (not exciting) will be transmitted from the source to the receiver.

1-cycle resolution signal . The difficulty lies in the fact that if the frequencies do not match, there is a high probability of either completely losing the signal (if the receiving frequency is lower than the original one) or stretching it by several cycles, which is also not great.

The general approach used in this case is to “lengthen” the signal for several cycles, transfer it to a new clock domain in a known manner and select the front event there, re-generating the signal for 1 clock duration.

A multi-bit bus that transmits a single pseudo-static value (for example, a configuration parameter). It would seem that everything is simple - put on a pair of triggers for each level of the bus - what else do you need? However, we know that after leaving the metastable state, a trigger can take on both the old and the new value. Moreover, even without entering the metastability, just due to slightly different parameters, the trigger of one of the digits can catch the new value, and the trigger of the other - the old one.

In this case, for at least 1 clock cycle, an incorrect value will be present at the bus output. It will be equal to neither the old nor the new from the input. For some schemes it may not matter, but for some it may be important. Therefore, trying to create a more reliable scheme with a confirmation signal indicating at what point you can safely flip the input data to the output.

And finally, the most difficult case is the flow of data transmitted from one clock domain to another . If the data is relatively sparse (1 data word for 10 or more clock cycles), then you can implement a scheme with a handshake similar to the previous paragraph. But what if the data goes on each clock (or close to that)? Here we will come to the aid of a special hardware unit - a two-port memory with independent clock signals. Such memory is in both modern FPGAs and technology libraries for custom-made microchips. It can be written and read completely independently through two separate ports. The only limitation is the simultaneous write access and read access to the same memory address - it can lead to an undefined result. On the basis of such a block of memory, a FIFO module is often created, which allows, on the one hand, recording data from one clock domain, and on the other, taking a clock domain into another. At the same time, the FIFO logic ensures that there is no access to the same memory cell.

Conclusion

Finally, I want to summarize and give some useful tips:

The less clock signals, the easier it is to work with your project.
The basic synchronizer is two (or more) triggers
Between all asynchronous clock domains, a data transfer mechanism should be thought out. It is best to draw the clock structure of the project at the stage of system design.
All external signals asynchronous to your device, including asynchronous reset signals, must be synchronized
Inside one module is better to use only one shred
If there is more than one shred within a module, then for each internal signal it is necessary to clearly understand how it is formed and how it is used later. It’s safer to code right in the signal name
Be especially attentive at the top level of the project: it is here that an erroneous connection of a signal from one clock domain to another happens without resynchronization.
The clock signals generated on the internal frequency multiplication modules (PLL / DCM / MMCM, etc.) are usually synchronous, they are considered by the time analyzer and for them resynchronization of data is most often not required. Although it does not hurt.
Modern design tools are trying to independently detect situations of data transfer between clock domains. For example, Design Assistant for Alter (built into Quartus), ALINT for Aldec. It is worth listening to the issued warnings and advice

And finally, a few comments on possible technical issues.
Timing charts were drawn using a simple free program TimingEditor . Some projects were going to Quartus - solely because of the convenience of its circuit editor. All the same can be (and should be!) Implemented on pure HDL (VHDL / Verilog). If you want a deeper understanding of the physics of metastability, then you can explore this topic in more detail using the metastability and clock domain crossing keywords. For example, a fairly detailed study of Steve Golson "Synchronization and Metastability" and a more brief document of Ran Ginosar "Metastability and Synchronizers", as well as many others, are available on the Internet.

Source: https://habr.com/ru/post/254869/

All Articles

Trigger metastability and inter-cycle synchronization

Recall what a trigger is.

Coin stuck in the air

How to fight

Conclusion

More articles: