## DESIGN AND FPGA IMPLEMENTATION OF AN XOR BASED 16-BIT CARRY SELECT ADDER FOR AREA DELAY AND POWER MINIMIZATION

Anusha Banothu

Mtech Student :VLSI &ES, Department Of Electronics And Communications, Sana Engineering College, Kodad, Telangana, India. anusha.sharapova@gmail.com

#### Abstract:

Now a days in the world of VLSI Technology, the word low power consumption is only possible with the concept of Reversible logic design. Reversible concepts will attain more attraction of researchers in the past two decades, mainly due to low-power dissipation and high reliability. It has received great importance due to because of there is no loss of information, while we are processing the data from input to output. The study of designing fast circuits that use less power and take up less space is one of the most important aspects of VLSI design. To implement the CSLA in FPGA, the Verilog code of 4-bit CSLA, 8-bit CSLA and 16-bit CSLA have been written. These codes are then simulated in Model sim software to check the functionality of the design. When the simulation results are ok then the synthesis and power analysis is done in Quartus II software. Finally, The 4-bit CSLA and 8-bit CSLA in Altera DE2-115 FPGA board. Due to pin constraints up to 8-bit CSLA have been implemented.

Keywords: low power, area efficient, XOR based adder, carry select adder

#### **1.0 INTRODUCTION**

In digital integrated circuit design, adders are among the most common building blocks of microprocessor processors. You can't have DSP programmes without them. Researchers have attempted, and continue to attempt, to create adders that provide either fast speed, low power consumption, reduced area, or some combination of these benefits as technology progresses. There is a carry value that is produced by

#### Banothu Balasubramanyam

Research Scholar, Department Of Electrical And Electronics Engineering , College Of Engineering, Osmania University, Telangana, Hyderabad. balumahendrabanothu@gmail.com

each adder in an arithmetic sequence and must be passed on to the next adder in the sequence. As a result, this significantly lengthens the circuit's critical route delay. There is less delay in the circuit if the carry is propagated across fewer stages. The necessary total is determined using a multiplexer. Adders can be treated as building blocks of the arithmetic component. For the operations like complementing, decoding and encoding adders are used. Generally, addition involves adding of two numbers which generates sum and carry. All adder architectures either simple or complex are constructed by using fundamental blocks which are half adder and full adder. For small number of bits, simple adders like ripple carry adder, carry look ahead adders are sufficient. However, delay increases as the bits number increases because of the passing of the carry to the next stage. So, we use Parallel Prefix Adders to perform arithmetic operations on large number of bits. Parallel prefix adders are high speed adders and takes small area and gives less delay. These adders consume low power and relatively takes less area on chip. Primary concern of adders is speed and later we have chip area and power consumption of adder.



## **Related Work:**

FPGAs are susceptible to soft errors (SEs), although there are ways to protect against or identify them. Data cleaning is one example (DS) Scanning the entire device's memory at regular intervals to fix any problems found is the essence of DS. To replace the corrupted frames, you need to save the original configuration data or at least a portion of it. However, as ECC codes can only correct a single bit or two neighbouring bits in a frame, including ECC bits in every frame of the FPGA bitstream may not be enough. Bitstream security is provided by both ECC and cyclic redundancy check (CRC) codes, however the latter are only effective for integrity checks and not error repair. Another popular approach to improving dependability and spotting SEs is the use of triple module redundancy (TMR) in conjunction with a voter mechanism. Even though this technique greatly extends the MTTF, it also triples the area and power requirements compared to the pioneering design.

## 2.0 LITERATURE REVIEW

Ananthakrishnan et.al [1] the snake is an essential element in every modern area. In this digital age, everyone is working on miniaturization. The three main aspects of design, namely, area, power consumption and delay need to achieve optimal balance. Because helpers have been used as a key component of complex digital networks, increasing the performance of digital providers will accelerate the speed of binary operations in such complex zones.

**Apoorva Raghunandan et al. [2]** a good VLSI model is a design with small footprints and quick surgery. According to Moore's law, as the number of transistors in a chip increases, so does the overall chip area. In VLSI design, it is important to improve the Area and Delay parameters.

**Basavoju Harish et al. (2019)** in the field of Very Large-Scale Integration (VLSI) design; circuit summing is one of the most widely used data transmission architectures. With the advancement of VLSI technology, research is emerging to design low-speed, high-speed, small-area, or combination of two architectures.

In this article, Nagaraja Revanna et al. [3] discussed the design of ads implemented through memristor. Explains the memristor-based design for standard ad architectures (ripple-bearing adder, bearing adder, and corresponding prefinder adder). Compare area and waiting time. Surprisingly, the Radix-2 CLA has the same complexity as the parallel adder prefect. The results show that in the adjacent price adder, the Kogge-Stone design has the best metrics for latency and area.

Krishna Vamsi et al. [4] proposed an effective insect repellent design, which multiplexer-based multiplexer uses a design rather than using a snake-bearing wound, but an improved enhancer Replace the ripple bearing snake for effective results. Using this improved snake can reduce power consumption and reduce gate delays. The proposed proposal is to carry and store oil from 8-bit to 64-bit. With today's digital technology system, which is the most widely used 64-bit format? Since ripple-carrying snakes are one of the most common types of auxiliaries used in many forms, there is a prolonged delay in propagation and consuming more area and energy

Shilpa K.C.; et.al [5] All modern processors, including microprocessors and digital signal processors, have an arithmetic logic unit (ALU). The computational performance of these modern processors depends on the success of the ALU. The serpent is the foundation stone of the ALU which performs arithmetic and logical work. Existing helpers (such as half helpers, full helpers, ripple converters, skip carry assistants and pre-loaders) cannot respond to improvement goals, so this paper offers four types of introductions.

C. Selsi Aulvina et.al [6] This article evaluates the decrease in conventional power consumption by lending collectors money at low voltage and analyses the effects of the difference between the effectiveness of the ripple bear adder (RCA) and the adder te-save-adder (BSA). A higher rate of deferral and the provision of technological means to enhance the BSA's adaptability to change. In addition to discussing the effectiveness of power reduction in snakes, this article also delves into the concept of so-called pipelines and the typical methods used to obtain currency.

## **3.0 RESEARCH METHOLOGY**

Design of logic networks with the highest performance requires deliberate design of design of transistor logic networks, circuits, layout of these transistor circuits most compactly and manufacturing of them. Such logic networks are realized by full-custom design. In contrast to full design, semi-custom custom design simplifies design and layout of transistor circuits to save expenses and design time. Depending on how design and layout of transistor circuits are simplified (e.g., repetition of small transistor sub circuit or not so compact layout) and even how logic design is simplified. In semi-custom design the designer has little control over the specification and functionality of the specific function but the required time is

less. It uses pre-designed logic cell (AND gates, OR gate, multiplexers) known as standard cells and the designer use pretested or pre-characterized cell. In this chapter semi-custom design of the 16-bit CSLA will be discussed. Here Verilog codes will be generated using schematic circuits designed in DSCH software. Using this Verilog code, layout will be constructed in Micro wind software.

#### **Conventional 2×1 Multiplexer:**

Multiplexing is the generic term used to describe the operation of sending one or more analogue or digital signals over a common transmission line at different times or speeds. The multiplexer, shortened to "MUX" is a combinational logic circuit designed to switch one of several input lines through to a single common output line by the application of a control signal.



Fig 1: Schematic circuit of 2×1 Multiplexer





## Figure: Layout of Conventional 2×1 Multiplexer

## Table: Area, Delay and Power Dissipation of the Conventional 2×1 MUX

| Parameter   | Value          | Parameter | Value |
|-------------|----------------|-----------|-------|
| Area        | $47.7 \mu m^2$ | IDD       | 0.690 |
|             |                | (Max)     | mA    |
| Delay       | 15 ps          | No. of    | 10    |
|             |                | NMOS      |       |
| Power       | 0.793          | No. of    | 10    |
| Dissipation | μW             | PMOS      |       |

## VLSI DESIGN:

advancement To exponential in convergence technology, large-scale systems architecture, the electronics sector has seen phenomenally development over the last two decades-briefly because of the There's VLSI introduction. been а steadfast and very rapid increment in the amount and applications including integrated circuits in high-performance computing, telecom and electronic products. The needed processing capacity of these applications is usually the catalyst for the accelerated growth of this area, or in other terms the knowledge of such applications. Takes an analysis of the leading developments in IT in the next several decades. The new technology (including such low-bit video including wireless communication) provides end consumers with a certain amount of electricity and functionality in transmission.

FPGA: Users were able to incorporate preferred features for completelyfabricated FPGA chips through thousands or maybe more logic gates including programmable interconnections, including custom hardware programming. This style of design offers a means for rapid prototyping but cost-effective chip design, particularly for applications with low volume. The chip contains an I / O buffer, a collection of configurable logic blocks programmable connections (CLBs), architectures. and a standard Field Programming Gate Array (FPGA). Connectivity programming becomes achieved by the programming of RAM cells, which connect their output terminals to both the MOS passing gates.

## Gate array design

Provided the rapid prototype features, the gate array (GA) approaches the FPGA. Whereas the FPGA chip is designed utilising user programming, the gate array is designed and processed using a metal mask. The very first phase, focused mostly on generic (standard) mask, results in either a selection of unconsumingtransists on even a GA chip. the very first phase entails a two-step processing process. This uncommitted chip could be deposited for adaptation, which would later be completed by specifying the metal connections here between array integrated circuits Because the metal interconnections are patterned only at end of chip output, the turning period may also be low, from a few days to a couple of weeks.

#### **FPGA Design Flow Overview**

The ISETM design flow includes the following steps: design input, design synthesis, design execution and programming of the Xilinx ® system. During design flow, design testing, which involves both practical verification and verification, timing takes place at numerous stages. This segment explains what you have to do for each phase. Click a box throughout Figure for more information from each design phase.

#### **Carry Select Adder:**

The CSLA is constructed from two RCAs and a multiplexer. Addition of two n-bit numbers with CSLA is nothing but adding two numbers taking input carry first as zero then using another adder taking input carry as one. After calculation of the two results depending on the correct carry-in the correct sum as well as the correct carry-out is selected with the multiplexer connected at last to get the final output.



Fig: FPGA design flow overview

#### **XOR Based 4-bit CSLA:**

The 4-bit CSLA generally consists of two 4-bit RCA. In one RCA the Cin bit is taken as zero and for other the Cin bit is taken as one. When the addition is completed the correct output as well as Cout is taken out with the MUX from one of the RCAs depending on the actual Cin. The schematic circuit, semi-custom layout and the input-output wave shapes of the 4bit CSLA are shown in Fig



# Fig. Layout of the XOR based 4-bit CSLA

### XOR Based 8-bit CSLA:

The 8-bit CSLA generally consists of three 4-bit RCA and five 2×1 MUX. As shown in Fig. the 8-bit CSLA is divided in two groups. In first group 4-bit RCA and in second group 4-bit CSLA is used. The semi-custom layout and the input-output wave shapes of the XOR based 8-bit CSLA are shown in Fig



Figure: Layout of XOR based 8-bit CSLA



Figure: Layout of XOR based 32-bit CSLA

**XOR Based 16-bit CSLA**: The 16-bit CSLA consists of seven 4-bit RCA and fifteen  $2\times1$  MUX. As shown in Fig. 3.19 the 16-bit CSLA has one 4-bit RCA and three 4-bit CSLA. The semi-custom layout and the input-output wave shapes of the XOR based 16-bit CSLA are shown in Fig **4.0 RESULTS** 

Comparison of 16-bit CSLA Performance analysis of conventional, semi-custom and full custom 16-bit CSLA is given in Table and the graphical representation of area, delay and power are given in Fig

#### Table: Performance analysis of Conventional, Semi and Full Custom 16-bit CSLA

| 10-DIL COLA |       |        |             |  |  |  |  |
|-------------|-------|--------|-------------|--|--|--|--|
| parameter   | Conv. | Semi-  | %           |  |  |  |  |
| S           |       | Custo  | Reduction   |  |  |  |  |
|             |       | m      | of full     |  |  |  |  |
|             |       |        | custom      |  |  |  |  |
|             |       |        | compared    |  |  |  |  |
|             |       |        | to          |  |  |  |  |
|             |       |        | conventiona |  |  |  |  |
|             |       |        | 1           |  |  |  |  |
| Area        | 8046. | 9732.4 | 1263.5      |  |  |  |  |
| $(\mu m^2)$ | 8     |        |             |  |  |  |  |
| Power       | 2.803 | 1.029  | 0.823       |  |  |  |  |
| (mW)        |       |        |             |  |  |  |  |
| Delay       | 140   | 136    | 61          |  |  |  |  |
| (ps)        |       |        |             |  |  |  |  |
| NMOS        | 465   | 297    | 285         |  |  |  |  |
| PMOS        | 465   | 297    | 285         |  |  |  |  |
|             | -     | •      |             |  |  |  |  |

| Total | 930 | 594 | 570 |
|-------|-----|-----|-----|
| Gate  |     |     |     |
| Count |     |     |     |









# Fig: Area, Power and Delay comparison of 16-bit CSLA

For XOR based 16-bit CSLA the full custom design has 84.3% area, 70.6% power and 56.5% delay reduction over conventional design where as it has 87.0% area, 20.1% power and 55.2% delay reduction over semi-custom design.

Anveshana's International Journal of Research in Engineering and Applied Sciences EMAILID:<u>anveshanaindia@gmail.com</u>,WEBSITE:<u>www.anveshanaindia.com</u>

#### 4 FPGA Implementation of 4-bit CSLA:



Fig. Simulation waveform results of 4bit CSLA (Binary)



Fig. Power analysis of 4-bit CSLA

**FPGA Implementation of 8-bit CSLA** 



Fig. Simulation waveform results of 8bit CSLA (Binary)



Fig. Power analysis of 8-bit CSLA

## **FPGA Implementation of 8-bit CSLA**



Fig: Simulation waveform results of 16-bit CSLA (Binary)



Fig: Synthesis summary of 16-bit CSLA Table: Summary of the all results obtained from the synthesis and power analysis using Quartus II software

| Full4-bit8-bit16-bit |
|----------------------|
|----------------------|



|           | Adde  |       |       |       |
|-----------|-------|-------|-------|-------|
|           | r     |       |       |       |
| Total     | 5     | 14    | 28    | 56    |
| pins      |       |       |       |       |
| Total     | 2     | 8     | 24    | 56    |
| Logic     |       |       |       |       |
| Elements  |       |       |       |       |
| Total     | 115.8 | 116.7 | 118.0 | 120.6 |
| Thermal   | 8     | 21    | 4     | 7     |
| Power     |       |       |       |       |
| Dissipati |       |       |       |       |
| on (mW)   |       |       |       |       |
| Core      | 99.09 | 99.09 | 99.10 | 99.10 |
| Static    |       |       |       |       |
| Thermal   |       |       |       |       |
| Power     |       |       |       |       |
| Dissipati |       |       |       |       |
| on (mW)   |       |       |       |       |
| I/O       | 16.79 | 17.63 | 18.94 | 21.57 |
| Thermal   |       |       |       |       |
| Power     |       |       |       |       |
| Dissipati |       |       |       |       |
| on (mW)   |       |       |       |       |

## CONCLUSION

Implementation of general-purpose DSP implementations often lacks the performance necessary for moderate sampling rates, and ASIC approaches are limited in flexibility and may not be cost effective. A Verilog implementation of FPGA based digital filters produces appreciable results because of various benefits like low power consumption, higher efficiency, faster etc. A 16-bit CSA is implemented here using XOR-based 1bit full adder as a building block. The schematic has been designed in DSCH software and synthesized using 90 nm CMOS technology. The layout has been created and simulated in Micro wind The comparison has software. been performed with area, delay and power

dissipation. The Performance analysis, simulation result and comparison are reported. From the simulation results of  $2 \times 1$  MUX, 94.07% reduction in power consumption has been achieved using full custom design over conventional design and 84.89% over semi-custom design. For full custom design is the area is 85.12% less than the conventional and 92% less than the semi-custom design for this at first Verilog code has been simulated in Model sim software, then the simulation results are checked, when the simulation results are OK then the synthesis and power analysis is done in Quartus II software and implementation has been done in Altera DE2-115 FPGA board. By giving some arbitrary inputs we have checked that the implemented hardware was performing correctly.

### **FUTURE WORK:**

The designs can be further developed for higher bits. These designs can be implemented on FPGA and ASIC also. Also, by combining the different tree adders as well as the technology used to implement them, a suitable in future work, it is needed to design unique CSLA which provides low area as well as delay in order to meet the needs of current VLSI industry. Further, this work can be extended by designing and simulating the adders with increased number of bits such as 32-bit, 64-bit and 128-bit.

#### REFERENCES

1. Ananthakrishnan; Anaswar Ajit S. FPGA Based Performance Comparison of Different Basic Adder Topologies with Parallel Processing Adde 2019 3<sup>rd</sup> International conference on Electronics, Communication and Aerospace Technology (ICECA) Year: 2019DOI:10.1109/IEEE Coimbatore, India, India

2. Apoorva Raghunandan; H V Ravish Aradhya Area and Timing Analysis of Advanced Adders under changing Technologies 2019 4th



International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT) Year: 2019 DOI: 10.1109/ IEEE Bangalore, India, India

3. Basavoju Harish;K. Sivani; M.S.S. Rukmini Design and Performance Comparison among Various types of Adder Topologies 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC) Year: 2019 DOI: 10.1109/IEEE Erode, India, India 4. Nagaraja Revanna; Earl E. Swartzlander Memristor Adder Design 2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS) Year: 2018 DOI: 10.1109/ IEEE Windsor, ON, Canada, Canada

5. A Krishna Vamsi;N Udaya Kumar;K Bala Sindhuri;G Sai Chandra Teja A Systematic Delay and Power Dominant Carry Save Adder Design 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT) Year: 2018 DOI: 10.1109: IEEE Tirunelveli, India, India

6. Shilpa K.C.;Shwetha M.;Geetha B.C.;Lohitha D.M.;Navya;Pramod N.V. Performance Analysis of Parallel Prefix Adder for Datapath Vlsi Design 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT) Year: 2018 DOI: 10.1109 IEEE Coimbatore, India

7. C. Selsi Aulvina; R. KABILAN LOW Power and Area Efficient Borrow Save adder Design 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT) Year: 2018 DOI: 10.1109/IEEE Tirunelveli, India, India