📜 ⬆️ ⬇️

Verilog. Digital filter on ram

What if you need to place a large digital filter on an FPGA? And if the board is already divorced? Iron is old? Not much space left in the project? In this topic, one of the possible implementations of the digital FIR filter on the Altera Cyclone II EP2C15 FPGA will be considered. In fact, this is a continuation of this theme from the sandbox.
It will be described how to make a shift register on RAM, while reducing LE costs, and how to obtain a digital filter from this.


How does the filter work? The basic operation is multiplication with accumulation. The filter coefficients are multiplied with the values ​​in the shift register and summed. All, if not go into details. The necessary ingredients are voiced, now let's get down to business.

Multiplication with accumulation

We believe that we have already decided on the desired type of frequency response of the filter, with the order of the filter, received its coefficients, we know the speed of the input data. Even better, if these parameters in any way parametrized. So try to do. This is the implementation of multiplication with accumulation:
module mult #(parameter COEF_WIDTH = 24, parameter DATA_WIDTH = 16, parameter ADDR_WIDTH = 9, parameter MULT_WIDTH = COEF_WIDTH + DATA_WIDTH) ( input wire clk, input wire en, input wire [ (ADDR_WIDTH-1) : 0 ] ad, input wire signed [ (COEF_WIDTH-1) : 0 ] coe, input wire signed [ (DATA_WIDTH-1) : 0 ] pip, output wire signed [ (DATA_WIDTH-1) : 0 ] dout ); wire signed [(MULT_WIDTH-1) : 0 ] mu = coe * pip; reg signed [ (MULT_WIDTH-1) : 0 ] rac = {(MULT_WIDTH){1'b0}}; reg signed [ (DATA_WIDTH-1) : 0 ] ro = {DATA_WIDTH{1'b0}}; assign dout = ro; always @(posedge clk) if(en) if(ad == {ADDR_WIDTH{1'b0}}) begin rac <= mu; ro <= rac[ (MULT_WIDTH-2) -: (DATA_WIDTH) ]; end else rac <= rac + mu; endmodule 

')
Why ADDR_WIDTH = 9? Because the order of the filter is chosen to be 2 ^ 9 = 512. First, this is done for ease of obtaining the frequency from the divider or PLL. Secondly, I had the opportunity to increase the frequency by 512 times, because the sample rate was 16 kHz. But more about that further. Of course, not very readable due to parameterization, but you can figure it out.

Filter coefficients

Read the topic from the sandbox on the link that was at the top? Was there a RAM template? This template doesn't suit us anymore. I did not manage to force that RAM to read / write in one cycle. Maybe everything is not from knowledge, but the filter coefficients are now stored in this module:

 module coef #(parameter DATA_WIDTH=24, parameter ADDR_WIDTH=9) ( input wire [(DATA_WIDTH-1):0] data, input wire [(ADDR_WIDTH-1):0] addr, input wire we, input wire clk, output wire [(DATA_WIDTH-1):0] coef_rom ); reg [DATA_WIDTH-1:0] rom[2**ADDR_WIDTH-1:0]; reg [(DATA_WIDTH-1):0] data_out; assign coef_rom = data_out; initial begin rom[0 ] = 24'b000000000000000000000000; rom[1 ] = 24'b000000000000000000000001; //new year tree rom[510] = 24'b000000000000000000000001; rom[511] = 24'b000000000000000000000000; end always @ (posedge clk) begin data_out <= rom[addr]; if (we) rom[addr] <= data; end endmodule 


Approximately 508 coefficients were missed so as not to catch the gloom. Why 24 bits, not 16? I like the spectrum better. But it does not matter. Changing the coefficients is not a long occupation. In addition, you can attach a memory initialization file with the $ readmemb or $ readmemh script after initial begin.

Shift register

This is actually the main reason why I write this. Maybe someone will think to himself that he already knew that. Maybe something else would think about the author of the good, something there about the wheel.
It will be written here how to make a shift register in RAM using a wrapper. Likely everyone read in handbook on the FPGA that RAM can work, as the shift register. How? I did it, there is nothing complicated about it. But why? The family of Cyclone is positioned as a device with a memory slope "devices feature of the FPGA designs." And you need to be able to use this memory. The problem is solved in two this: RAM and wrapper. RAM is similar to the case with the storage of filter coefficients:

 module pip #(parameter DATA_WIDTH=16, parameter ADDR_WIDTH=9) ( input wire [(DATA_WIDTH-1):0] data, input wire [(ADDR_WIDTH-1):0] read_addr, write_addr, input wire we, input wire clk, output wire [(DATA_WIDTH-1):0] pip_ram ); reg [DATA_WIDTH-1:0] ram[2**ADDR_WIDTH-1:0]; reg [(DATA_WIDTH-1):0] data_out; assign pip_ram = data_out; always @ (posedge clk) begin data_out <= ram[read_addr]; if (we) ram[write_addr] <= data; end endmodule 


The only thing that non-initializing RAM is automatically filled with zeros. By the way, this technique can be used when recording filter coefficients, if they are less than 2 ^ N.
Now the wrapper itself:

 module upr #(parameter COEF_WIDTH = 24, parameter DATA_WIDTH = 16, parameter ADDR_WIDTH = 9) ( input wire clk, input wire en, input wire [ (DATA_WIDTH-1) : 0 ] ram_upr, input wire [ (DATA_WIDTH-1) : 0 ] data_in, output wire [ (DATA_WIDTH-1) : 0 ] upr_ram, output wire we_ram, output wire [ (ADDR_WIDTH-1) : 0 ] adr_out ); assign upr_ram = (r_adr == {ADDR_WIDTH{1'b0}}) ? data_in : ram_upr; assign we_ram = (r_state == state1) ? 1'b1 : 1'b0; assign adr_out = r_adr; reg [ 2 : 0 ] r_state = state0; localparam state0 = 3'b001, state1 = 3'b010, state2 = 3'b100; reg [ (ADDR_WIDTH-1) : 0 ] r_adr = {ADDR_WIDTH{1'b0}}; always @(posedge clk) if(en) begin case(r_state) state0: r_state <= state1; state1: r_state <= state1; state2: begin end endcase end always @(posedge clk) case(r_state) state0: r_adr <= {ADDR_WIDTH{1'b0}}; state1: r_adr <= r_adr + 1'b1; state2: begin end endcase endmodule 

The same address is fed to the RAM with coefficients and shift register. Feedback through RAM from the shift register is fed to the module the previous value, which is recorded at the current address. Thus, the shift is not carried out in one measure, but for each one value. An input word is written to every zero address.
Why do I persistently use the state machine, although some states are not involved? Remember what was written on the link at the very beginning. Now this module works twice as fast, which means, other things being equal, it also stands idle half the time. Theoretically, this half can take something. This may be a recalculation of filter coefficients for adaptive filtering, or the operation of a second filter (something like a time slot). There is nothing here and FSM is not needed here, but I still left this atavism. Removing the FSM is always easier than entering it.

Total

Here I will give the top file that came out of shimantika:

 module filtr_ram( CLK, D_IN, MULT ); input CLK; input [15:0] D_IN; output [15:0] MULT; wire SYNTHESIZED_WIRE_13; wire [15:0] SYNTHESIZED_WIRE_1; wire [8:0] SYNTHESIZED_WIRE_14; wire SYNTHESIZED_WIRE_4; wire [15:0] SYNTHESIZED_WIRE_15; wire SYNTHESIZED_WIRE_6; wire [0:23] SYNTHESIZED_WIRE_8; wire [23:0] SYNTHESIZED_WIRE_11; assign SYNTHESIZED_WIRE_4 = 1; assign SYNTHESIZED_WIRE_6 = 0; assign SYNTHESIZED_WIRE_8 = 0; pip b2v_inst( .we(SYNTHESIZED_WIRE_13), .clk(CLK), .data(SYNTHESIZED_WIRE_1), .read_addr(SYNTHESIZED_WIRE_14), .write_addr(SYNTHESIZED_WIRE_14), .pip_ram(SYNTHESIZED_WIRE_15)); defparam b2v_inst.ADDR_WIDTH = 9; defparam b2v_inst.DATA_WIDTH = 16; upr b2v_inst1( .clk(CLK), .en(SYNTHESIZED_WIRE_4), .data_in(D_IN), .ram_upr(SYNTHESIZED_WIRE_15), .we_ram(SYNTHESIZED_WIRE_13), .adr_out(SYNTHESIZED_WIRE_14), .upr_ram(SYNTHESIZED_WIRE_1)); defparam b2v_inst1.ADDR_WIDTH = 9; defparam b2v_inst1.COEF_WIDTH = 24; defparam b2v_inst1.DATA_WIDTH = 16; coef b2v_inst3( .we(SYNTHESIZED_WIRE_6), .clk(CLK), .addr(SYNTHESIZED_WIRE_14), .data(SYNTHESIZED_WIRE_8), .coef_rom(SYNTHESIZED_WIRE_11)); defparam b2v_inst3.ADDR_WIDTH = 9; defparam b2v_inst3.DATA_WIDTH = 24; mult b2v_inst5( .clk(CLK), .en(SYNTHESIZED_WIRE_13), .ad(SYNTHESIZED_WIRE_14), .coe(SYNTHESIZED_WIRE_11), .pip(SYNTHESIZED_WIRE_15), .dout(MULT)); defparam b2v_inst5.ADDR_WIDTH = 9; defparam b2v_inst5.COEF_WIDTH = 24; defparam b2v_inst5.DATA_WIDTH = 16; endmodule 


We immediately see that you can fix it to make it more beautiful.
Now again about what happened. The main disadvantage is this full serial filter. That is, the frequency of the filter should be raised 2 ^ (ADDR_WIDTH) times relative to the speed of the input data. This problem can be solved if the impulse response of the filter is symmetrical, but the shift register RAM will have to be split into two modules, to which 2 addresses will be sent, the values ​​from RAM will be added and multiplied in the mult module, which will have to write another input. Then the frequency will need to be raised 2 ^ (ADDR_WIDTH-1) times.

Sources and project in Quartus 9.0
ifolder.ru/27556340

Source: https://habr.com/ru/post/134485/


All Articles