We lived, not, not so ... One day, early in the morning, when I came to work again, I found out that we only had one power supply in the server room and it could turn off. The whole day there was nothing to do, and I decided to write an article on Habr. The article is aimed at beginners and idly interested.
CMOS technology has reached such a level that modern chips are huge and very complex structures and systems assembled from systems. At the same time, the cost of launching into production increases exponentially with decreasing technological standards. Therefore, when developing, it is required to model and verify everything to the maximum extent possible. The ideal case, which is even sometimes implemented in practice, when the chip is earned from the first run.
Since we live in the analog world, even a digital chip should be able to communicate with this world. Digital chips contain dozens of large analog blocks on a chip, such as ADC, DAC, PLL, secondary power supply, etc. The exception to this rule is probably only large processors, such as Core i, etc., where all of this equipment is in the chipset.
Traditionally, analog blocks are used to simulate spice simulators, such as pi-spice, mmsim, hspice, etc. In such simulators, the scheme is described by a system of differential equations of enormous dimension (or by the matrix representing it). Spice simulators at each step of the calculations find the solution of this system of equations by numerical methods. Of course, methods are used to accelerate these calculations, such as: partitioning the matrix into submatrices, paralleling into a number of threads and computational cores, variable computing steps, etc.
Unfortunately, numerical methods are fundamentally iterative and poorly parallelized, so this type of simulation, all the same, remains slow enough to simulate the system as a whole. Nevertheless, it is widely used in the development of the analog blocks and analog circuits themselves. We, however, lead the story about digital (as a whole) microcircuits containing analog blocks and analog-digital systems, where we would like to describe our blocks as formulas and equations, and solve these Navier-Stokes equations (joke) analytically. The use of this technique does not cancel a much more accurate calculation on the spice simulator, but only complements it, allowing you to speed up development and modeling.
A floating point type is well suited to represent analog signals. In System Verilog, these are the types shortreal (equivalent to float in C) and real. It should be noted that these are types of data with memory. The value in them is updated only at the moment of assignment, i.e. This type is similar to reg, but in which the memorized value is not represented by 0 or 1, but by voltage or current, represented in turn as a floating-point number.
Now, we really need a type similar to wire, which is updated continuously, and not just at the time of recording. I must say that there is no such type in System Verilog. It is rumored that when discussing the standard, there was some movement in order to insert this functionality into it, but it was not realized in anything concrete. However, if you use the ncsim simulator, then it contains the var modifier, which makes the real type and other types an analogue of wire. Example:
real a; var real b; assign a = in1+in2; // assign b = in1+in2; // , b – in1+in2
The verilog program is a parallel program. All lines of code are, in principle, independent and are executed both sequentially and in parallel, depending on certain conditions. In this case, assign will work when you run this program and will work to its very end, calculate the amount continuously.
If your simulator does not support var, you can do this:
real b; always @( * ) // always in1 in2 b <= in1+in2;
The recording is less convenient, nevertheless quite working.
The following functions are built into verilog for data conversion.
$itor() // integer to real $rtoi() // real to integer $bitstoreal() //reg [ : ] to real $realtobits() // real to reg [ : ]
If the code that you want to convert to real - sign and is presented in an additional code, you need to be careful when using these functions, you may need to convert or expand the sign. If for some reason you do not want to use these functions, you can use the following technique.
reg [7:0] code; int a; real voltage; always @( * ) begin a = {{24{code[7]}}, code[7:0]}; // int voltage = a; end
module amp(input var real in, output real out); parameter k = 10; // parameter seed = 60; parameter noise_power = -20; // dB real noise; always @(*) begin noise = $sqrt(10**(noise_power/10))* $itor($dist_normal(seed, 0 , 100_000))/100_000; out = in * k + noise; end endmodule
`timescale 1ns / 1ps module DAC(input signed [7:0] DAC_code, output real out); parameter fs = 10e-9; parameter ffilt = fs/64; // parameter CUTOFF = 100e6; // parameter a = ffilt/(ffilt+(1/(2*3.141592* CUTOFF))); real DAC_out; // always @( * ) DAC_out <= $bitstoint(DAC_code[7:0]); // 1 always #(0.5*ffilt) out <= a* DAC_out + (1-a)*out; endmodule
module ADC (input real in, input clk, output reg [7:0] ADC_code) real adc_tf[0:255]; real min_dist; int i,j; int dnl_file; initial begin dnl_file=$fopen("DNL_file","r"); if(dnl_file==0) $stop; for(i=0;i<256;i=i+1) $fscanf(dnl_file, "%f;", adc_tf[i]);// end always @(posedge clk) begin min_dist = 10; for(j=0;j<256; j=j+1) // if($abs(in- adc_tf[j]) < min_dist) begin min_dist = delta_abs; ADC_code[7:0]=j; end end endmodule
module MPLL (input en, input [5:0]phase, output clk_out); parameter REFERENCE_CLOCK_PERIOD=10e-6; parameter PHASES_NUMBER=64; reg [PHASES_NUMBER-1:0]PLL_phase=64'h00000000_FFFFFFFF; // always #(REFERENCE_CLOCK_PERIOD/PHASES_NUMBER) if(en===1) PLL_phase[PHASES_NUMBER-1:0] <= {PLL_phase[PHASES_NUMBER-2:0], PLL_phase[PHASES_NUMBER-1]}; // assign clk_out = PLL_phase[phase]; // endmodule
The use of such and similar, but more complex analytical models, speeds up calculations by orders of magnitude compared to spice modeling and allows you to actually simulate and verify the complete system on System Verilog.
Unfortunately, modern systems are already so complex that this acceleration is not enough, in this case it is necessary to resort to parallelization. Multi-threaded Verilog simulators, as far as I know, have not yet been invented, so you have to hand to hand.
A new mechanism has been introduced in SystemVerilog for accessing external program modules — the Direct Programming Interface (DPI). Since This mechanism is simpler, compared to the other two, we will use it.
At the beginning of the module, where we want to call an external function, we need to insert the import line.
import "DPI-C" function int some_funct (input string file_name, input int in, output real out);
Then you can use it in Verilog in the usual way, for example, like this:
always @(posedge clk) res1 <= some_funct (“file.name”, in1, out1);
How to compile and where the libraries are located is described in the documentation for the simulator.
Below is an example of a program running in several threads.
#include <pthread.h> typedef struct { //work specific double in; // double out; // … //thread specific char processing; // pthread_mutex_t mutex; pthread_cond_t cond_start; pthread_cond_t cond_finish; void *next_th_params; pthread_t tid; }th_params; static th_params th_pool[POOL_SIZE];
Calculation function:
void* worker_thread(void *x_void_ptr) { th_params *x_ptr = (th_params *)x_void_ptr; while(1) // { // pthread_mutex_lock (&x_ptr->mutex); // x_ptr->processing = 0; //, pthread_cond_signal(&x_ptr->cond_finish); // , while(x_ptr->processing == 0) pthread_cond_wait(&x_ptr->cond_start, &x_ptr->mutex); // x_ptr->processing = 1; // - pthread_mutex_unlock(&x_ptr->mutex); // // - , SSE2 … } }
The function to start the calculated functions
void init(th_params *tp) { int i=0; for(;i<12;i++) { pthread_attr_t attr; pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); pthread_create(th_pool->tid, &attr, &worker_thread, tp); } }
The function that distributes the work to the calculated functions (we will call it from Verilog all the time)
int ch(double in, double *out) { int i; for(i=0;i<12;i+=1) { // pthread_mutex_lock(&th_pool[i].mutex); // while(th_pool[i].processing == 1) pthread_cond_wait(&th_pool[i].cond_finish, &th_pool[i].mutex); // pthread_mutex_unlock(&th_pool[i].mutex); // } // Verilog for(i=0;i<12;i+=1) out[i] = th_pool[i].out; for(i=0;i<12;i+=1) { pthread_mutex_lock (&th_pool[i].mutex); // th_pool[i].in = in; // th_pool[i].processing = 1; // pthread_cond_signal (&th_pool[i].cond_start); // , pthread_mutex_unlock (&th_pool[i].mutex); // } }
Unfortunately, modern systems are already so complex that this acceleration is not enough. In this case, you have to resort to using OpenCL for calculations on a video card (no more complicated than DPI), calculations on a cluster or in the cloud. In all these cases, the transport component is a serious limitation, i.e. data transfer time to and from the calculating device. The optimal task, in this case, is one where you need to count a lot, while there is, with respect to this calculation, a small amount of data, both source and result. The same applies to the presented program, but to a lesser extent. If this condition is not met, then often, it is faster to read on the processor alone.
It should be noted that none of the presented methods does not work when there is no power in the server, however, it has just been submitted, youtube has started working again. On this joyful note, I hasten to finish my story, work is waiting.
Source: https://habr.com/ru/post/338922/
All Articles