

# DA-BASED RECONFIGURABLE FIR DIGITAL FILTER USING PROFICIENT FPGA AND ASIC REALIZATIONS

### DASARI VASANTHI

M.Tech (VLSI) Department of ECE Priyadarshini Institute of Technology and Management. Pulladigunta,Guntur, A.P.

### BELLAM VARALAKSHMI

Assistant professor Department of ECE Piyadarshini Institute of Technology and Management Pulladigunta, Guntur, A.P.

### ABSTRACT:

In various telecommunication applications Digital Signal Processors are the key components in transferring the data between devices. The implementation of FIR filter on FPGA is based conventional methods increasing the need for considerable hardware resources, which in turn raises the circuit size and lowers the system speed. Most important operation performed in digital signal processing is multiply and Accumulation (MAC). Usually this operation is realized using novel hardware multipliers. The computations for sum of products can be performed more effectively using Distributed Arithmetic. This paper provides modified Distributed Arithmetic based technique to compute sum of products saving appreciable number of multiply And accumulation blocks and this consecutively reduces circuit size. In this technique multiplexer based structure is used to reuse the blocks so as to reduce the required memory locations.

**Keywords:** Finite impulse response (FIR) filter, linear convolution, systolic array, field programmable gate arrays (FPGA), distributed arithmetic

### I. INTRODUCITON

Traditional FIR filters require multiply and Accumulation (MAC) blocks which are expensive to implement. To avoid this problem, we present Distributed Arithmetic technique serving as multiplier-less architecture for digital signal processing applications. In this paper we are optimizing

this technique in terms of area-delay product [3]. To get overall performance and to minimize the access delay and power dissipation either the processor has been move to memory or memory has been moved to processor. since semiconductors are scaled down to tremendous value, memory based architectures are well suited for many DSP applications [3]. Memory components like ROM and RAM are utilized as part of or whole arithmetic structures. Memory based architectures are regular to the greater extent, has very high potential, reduced circuit size and less dynamic power consumption compared to conventional multipliers. One of memory based technique is Distributed co-efficient arithmetic whose transformed to another numeric representation where the arithmetic manipulation is more efficient than traditional implementation [5]. Finite impulse response (FIR) digital filters are extensively used due to their key role in various digital signals processing (DSP) applications [1], [2]. Along with the advancement in very large scale integration (VLSI) technology as the DSP has become increasingly popular over the years, the high speed realization of FIR filters with less power consumption has become much more



Since the complexity of demanding. implementation grows with the filter order and the precision of computation, real-time realization of these filters with desired level of accuracy is a challenging task. Several attempts have, therefore, been made to dedicated develop and reconfigurable architectures for realization of FIR filter in specific application inte grated (ASIC) and field programmable gate arrays Systolic (FPGA) platforms. designs attractive architectural represent an for efficient hard ware paradigm implementation of computation-intensive DSP applications, being supported by the features like simplicity, regularity and modularity of structure. Additionally, they also possess significant potential to yield high-throughput rate by exploiting highlevel of concurrency using pipelining or parallel processing or both. To utilize the advantages of systolic processing, several algorithms and architectures have been suggested for systemization of FIR filters. However, the multipliers in these structures require a large portion of the chip-area, and consequently enforce limitation on the maximum possible number of processing elements (PEs) that can be accommodated and the highest order of the filter that can be realized. Multiplier less distributed arithmetic (DA)-based technique, has gained substantial popularity, in recent years, for their high-throughput processing capability, and increased regularity which results in cost effective and area-time efficient computing structures. The main operations required for DA based Computation of inner-product are a sequence of look-uptable (LUT)-accesses followed by shift

accumulation operations of the LUT output. DA-based computation is well suited for FPGA realization, because the LUT as well as the shift-add operations can be efficiently mapped to the LUT-based FPGA logic structures.

### II. PROBLEM STATEMENT:

Due to the high performance requirements and increasing complexity of DSP and multimedia communication applications, filters with large number of taps are required to increase the performance in terms of high sampling rate. As a result the filtering operations are computationally intensive and more complex in terms of hardware requirements. The FIR filters perform the weighted summations of input sequences with constant coefficients in most of the processing and signal multimedia applications. These filters are widely used in convolutions functions, signal preconditioning, and other communication applications. The decrease in computational complexity causes the increase in the performance, in terms of speed, area and power. High speed, low area and power efficient conscious design techniques in SoC include efforts at all level of abstraction. One way to efficiently incorporate high performance design technique is implement IP cores [4].

These cores have following major advantages.

- 1. Reusability across designs
- 2. Reduction of the design effort
- 3. Shorter time to market

The disadvantage of FIR filters is that they require high order. The high order demands



more hardware. area and power consumption. To minimize these parameters, our goal is to implement an efficient high order filterin digital systems. By reduction of arithmetic terms ofmultipliers, our goal is to reduce the parameters namely, hardware, area and power. This is ultimate goal of the implementation of an efficient FIR filter and DA algorithm is used implementation of high order FIR filter. FIR filter is incorporated with a MAC unit. The purpose of MAC unit is to multiply the input with constant coefficients, to shift and then to add them. This process is repeated until all partial products produce the output after accumulation. It increases the hardware complexity because a simple multiplier circuitry is used. The idea is to somehow bypass or replace the multiply and shift operations with less complex operations. Distributed Arithmetic (DA) Algorithm can be used to replace MAC unit. The DA Algorithm actually uses lookup table for storing constant coefficients. So the use of lookup tables reduces the hard ware complexity and hence the new design is more efficient in terms of less area, more speed and low power consumption. FIR filter reference core uses a simple MAC unit. We have replaced MAC unit in FIR filter reference core with DA Algorithm. In this study, performance of Reference Core with Simple MAC and reference core with DA is compared.

## III. IMPLEMENTING THE FIR FILTER USING DA

FIR filter has 16-taps. Each tap has 16- bit input data width and 16 filter coefficients. In designing FIR filterusing DA (Distributed

Arithmetic), these coefficients are placed in a look-up table. This is because these coefficients are constants. The look-up table exponentially when the filter grows coefficients are increased, so the break-up in the design is necessary and one must place four coefficients in each look-up table. The width of each coefficient may be 8-bits or 16-bits depending upon the design. The width of the inputs data also vary to 8-bits and 16-bits, each LSB bits of input data combined in parallel to form the address of the look-up table. Distributed arithmetic Algorithm replaces "AND" operation as compared with MAC unit. The four look-up table store 16 coefficients of FIR filter. More than four look-up tables are used for storing more coefficients for the better response of the FIR filter. The LUTsin DA algorithm uses the multiplier less technique. The **LUTs** used less (configuration logic blocks) in the FPGA to increase the throughput and data rates. The FPGA has no multiplier and can be used SRAM based DA algorithm.Single FPGA chip can be used instead of using multiple DSP devices for better performance in terms of speed area and power, due to SRAM present in FPGA, FPGA is more efficient for the implementation of signal processing applications. DA became best algorithm relating to filtering operation, because SRAM based FPGA stored look-up table values which are pre-computed and also FPGA gives surrounding logic in a single chip. Distributed arithmetic algorithm gives best performance when we used in filtering operation because hardware complexity less in DA as compared to conventional MAC.





Fig.4.Implementation of FIR filter using DA

### IV. SIMULATION AND RESULTS

Implementation of FIR filter cores has been observed and we can see that fir filter cores have been implemented with both reference and DA structure. Results have been taken in terms of area utilized, power dissipated and speed performance for 16bits-20 taps and 8bits-20 taps. FIR filter cores have been designed in Verilog HDL and implemented using Xilinx 10.1i tool. Simulations were performed using Modelsim6.4b.

### **Area Comparison (16 bit 16-taps)**

Table 1 show that the area of Conv.UDF FIR Filter is less as compared with same core implemented with DA algorithm

Table 1. Area Comparison of 16 bit 16-taps

| Filter Cores           | Conv.UDF FIR<br>Filter Core | Conv.UDF FIR<br>Filter Core with<br>DA Algorithm |
|------------------------|-----------------------------|--------------------------------------------------|
| No. of Slices          | 70%                         | 74%                                              |
| Slice Flip<br>Flops    | 32%                         | 37%                                              |
| Input LUTs             | 38%                         | 37%                                              |
| Bonded IOBs            | 60%                         | 60%                                              |
| Total Eq Gate<br>Count | 13610                       | 13573                                            |

### Speed comparison (16 bit 16-taps)

Table 2 and Fig. 7 show results comparison of conv.UDF FIR Filter Core and conv.UDF FIR Core withDA Algorithm for speed. About 47% increase in speed has been found when using DA algorithm.

Table 2. Speed Comparison (16 bit 16-taps)

| Filter Cores       | Conv.UDF FIR<br>Filter Core | Conv.UDF FIR<br>Filter Core with<br>DA Algorithm |
|--------------------|-----------------------------|--------------------------------------------------|
| Min Period         | 26.798 nS                   | 18.161 nS                                        |
| Input Arrival time | 9.574 nS                    | 9.574 nS                                         |
| Output Req Time    | 15.842 nS                   | 16.526 nS                                        |
| Max Freq           | 37.316 Hz                   | 55.063 Hz                                        |
| Speed Improvement  |                             | 47.56 %                                          |

### V. CONCLUSION

The results show that distributed arithmetic algorithm is better for FIR filters implementation on FPGAs. The efficiency in terms of area, speed and power has been analyzed. Comparison of results clearly shows that efficiency in terms of power



dissipation and speed has been increased having almost same area consumption. The DA has two techniques, one of which is the serial DA and other one is the parallel DA. In this thesis. the serial distributed arithmetic is used to make the FIR Filter more efficient. In future, the parallel DA can be used to increase the efficiency of FIR Filter in terms of data rates. implementation of DA based algorithm, serial distributed arithmetic algorithm and parallel distributed arithmetic use the look up table. The size of the look up table increases when the number of filter taps is increased.

#### REFERNCES

- [1] Sang Yoon Park and Pramod Kumar Meher, "Efficient FPGA and ASIC Realizations of a DA-Based Reconfigurable FIR Digital Filter", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 61, NO. 7, JULY 2014.
- [2] T Hentschel, M.Henker and G.Fettweis,"The digital Front- end of software radio terminals" IEEE Pers. Commun. Mag., vol.6 no.4,pp. 40-46,Aug.1999.
- [3] K.-H. Chen and T.-D. Chiueh, "A low-power digitbased reconfigurable FIR filter," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 617–621, Aug.
- [4] L. Ming and Y. Chao, "The multiplexed structure of multichannel FIR filter and its resources evaluation," in Proc. Int. Conf. CDCIEM. Mar.2012, pp.764-768.
- [5] P. K. Meher, "Hardware-efficient systolization of DAbased calculation of finite digital convolution," IEEE Trans. Circuits Syst. II, Exp. Briefs,vol. 53, no. 8, pp. 707–711, Aug. 2006.
- [6] P. K. Meher, S. Chandrasekaran, and A. Amira, "FPGA realization of FIR filters by efficient and

- flexible systolization using distributed arithmetic," IEEE Trans. Signal Process.,vol.56, no.7, pp.3009-3017,Jul.2008.
- [7] E. Ozalevli, W. Huang, P. E. Hasler, and D. V. Anderson, "A reconfigurable mixed-signal VLSI implementation of distributed arithmetic used for finiteimpulse response filtering," IEEE Trans. Circuits Syt. 1, Reg. Papers, vol. 55, no. 2, pp. 510-521, Mar. 2008.
- [8]. Muhammad Akhtar khan and A.T .Rrdogan, "Parameterized and programmable low power soft FIR filtering IP cores", Proceedings of the 4th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision, 2004.
- [9]. P. K. Meher, S. Chandrasekaran and A. Amira, "FPGA realization of FIR filters by efficient and flexible systemization using distributed arithmetic", IEEE Transactions on Signal Processing, vol. 56, pp. 3009-3017, 2008.
- [10]. D. J. Allred, H. Yoo, V. Krishnan, W. Huang and D. V. Anderson, "LMS adaptive filters using distributed arithmetic for high throughput," IEEE Transactions on Circuits and Systems, vol, 52, pp, 1327-1337, 2005.

38