# Low Power Dissipation in BIST Schemes for Modified Booth Multipliers

D. Bakalis<sup>1,2</sup>, H. T. Vergos<sup>1,2</sup>, D. Nikolos<sup>1,2</sup>, X. Kavousianos<sup>1</sup> & G. Ph. Alexiou<sup>1,2</sup>

<sup>1</sup>Dept. of Comp. Engineering & Informatics, University of Patras, 26 500, Rio, Patras, Greece <sup>2</sup>Computer Technology Institute, 3, Kolokotroni Str., 262 21 Patras, Greece

e-mail: {bakalis, vergos, nikolosd, alexiou}@cti.gr, kabousia@ceid.upatras.gr

### Abstract

Aiming low power dissipation during testing, in this paper we present a methodology for deriving a novel BIST scheme for Modified Booth Multipliers. Reduction of the power dissipation is achieved by: (a) introducing a suitable Test Pattern Generator (TPG) built of a 4-bit binary and a 4-bit Gray counter, (b) properly assigning the TPG outputs to the multiplier inputs and (c) significantly reducing the test set length. The achieved reduction of the total power dissipation is from 44.1% to 54.9%, the average reduction per test vector is from 21.4% to 36.5% while the reduction of the peaks is from 15.8% to 34.3%, depending on the implementation of the basic cells and the size of the MBM. The test application time is also reduced by 28.9% while the introduced BIST scheme implementation overhead is very small.

### 1. Introduction

The ever-increasing trend towards denser and faster ICs has resulted in embedded logic blocks with low controllability and observability that need to be tested at speed in order for the whole chip to become a viable product. BIST structures are well suited for testing such blocks, since they can cut down the cost of testing by eliminating the need of external testing for every embedded logic block as well as apply the test vectors at speed.

The main objectives of BIST designers have traditionally been high fault coverage, small area overhead and small application time. While these objectives still remain important, a new BIST design objective, namely low power dissipation during test application, has recently emerged [1-5], and is expected to become one of the major objectives in the near future [6].

The power dissipated during test application is an important factor because of :

- a) Cost issues. Consumer electronic products typically require a plastic package which imposes a strong limit on the energy dissipated. Excessive dissipation during testing may also prevent periodic testing of battery operated systems that use an on-line testing strategy.
- b) Reliability issues. Although there is a significant correlation between consecutive vectors applied to a circuit during its normal operation, the correlation between consecutive test vectors is significantly lower. Therefore the switching activity in the circuit can be significantly higher during testing than that during its normal operation [2]. The latter may cause a circuit under test to be permanently damaged due to excessive heat dissipation or give

rise to metal migration (electromigration) that causes the erosion of conductors and leads to subsequent failure of circuits [7].

c) Technology related issues. The multi-chip module (MCM) technology which is becoming highly popular requires sophisticated probing to bare dies for fully testing them [8]. Absence of packaging of these bare dies precludes the traditional heat removal techniques. In such cases, power dissipated during testing can adversely affect the overall yield, increasing the production cost.

A more detailed presentation of the motivations for low power dissipation during test application can be found in [9].

In [9] a modified PODEM was presented which derives a test set with reduced switching activity between consecutive test vectors, aiming the reduction of power dissipation during testing. A BIST technique for reducing switching activity has been presented in [2], based on the use of two LFSR TPGs operating at different speeds. [3] describes a method for synthesizing a counter in order to reproduce on chip a set of pre-computed test patterns, derived for hard to detect faults, so that the total heat dissipation is minimized. However, a test set targeting the hard to detect faults of a circuit C has some characteristics not available to a test set targeting all faults of C. In a BIST scheme some vectors generated by the TPG circuit are not useful for testing purposes. A technique that inhibits such consecutive test vectors, by the use of a three state buffer and the associated control logic, for LFSR TPGs was proposed in [5]. The drawbacks of this method are that it fails to reduce test application time and suffers from high implementation cost.

The above mentioned techniques try to solve the general problem. However there are cases that exploiting the inherent properties of a class of circuits a more efficient low power BIST scheme can be obtained. Such a circuit is the multiplier. Multipliers are met in almost all contemporary general and special purpose processors. An effective low power BIST scheme for Carry Save Array Multipliers has been proposed in [4].

To the best of our knowledge no BIST scheme for Modified Booth Multipliers (MBMs) targeting also low power dissipation during test application has been proposed in the open literature. In this paper we address this problem by introducing a novel BIST scheme for MBMs with sign generate. We consider MBMs with the final stage implemented both as: (a) a ripple carry adder and (b) a group carry look ahead adder with ripple carry between groups. The notation RC-MBMs and CL-MBMs for cases (a) and (b) will be used respectively. For the RC-MBMs the cell fault model [10] is used. The cell fault model is also used for all other modules of the CL-MBMs except the carry look ahead adder where single stuck at faults are considered.

The rest of the paper is organized as follows: Preliminaries with respect to MBM and low power are given respectively in Sections 2.1 and 2.2. The assignment of the TPG outputs to the multiplier inputs is addressed in Section 3. In Section 4 we introduce a new TPG. In the same Section, we also discuss the power dissipation characteristics of the proposed BIST scheme.

### 2. Preliminaries

### 2.1. MBM and Built – In Self Testing

Array multipliers implementing the modified Booth algorithm with 2-bit recoding feature regularity, short execution time and small area compared to other implementations of multipliers for signed multiplication [11]. We consider nxn MBMs (n=2<sup>k</sup>), with sign generate. A nxn MBM is a combinational circuit with inputs  $a_0a_1...a_{n-1}$ ,  $b_0b_1...b_{n-1}$  and outputs  $p_0p_1...p_{2n-1}$ . Figure 1 presents the 8 x 8 MBM. An nxn MBM is composed by : i) r-cells, ii) ps-cells, iii) 1\_ps-cells (the leftmost cell in a ps-cell row), iv) r\_ps-cells (the rightmost cell in a ps-cell row) v) full adders, vi) half adders, vii) 2-input OR gates and viii) the final result 2n-bit forming adder.



Figure 2. BIST circuit

C- Testable MBM designs have been proposed in the past for the cell fault model [12] as well as for stuck-at, transistor stuck-open and stuck-closed faults [13]. A BIST scheme, under the cell fault model, for RC-MBMs was proposed in [14]. Unfortunately in [14] neither CL-MBMs nor the low power dissipation objective were considered. The Test Pattern Generator (TPG) circuit of [14] is an 8-bit counter that goes through all of its 256 states (see Figure 2). During testing, the low nibble of the TPG outputs is used repeatedly to form the multiplier input A while the high nibble is used repeatedly to form the multiplier input B. During application of the 256 vectors, all

cells of the MBM are exhaustively tested with all their input combinations, except for a few that do not receive all possible input combinations. Multiplexers are used to select between normal inputs and BIST inputs. An accumulator with rotate carry [15] or multiple rotate carry adders [14] is used for Output Data Compaction (ODC). The test length was later reduced to 225 vectors by avoiding the all 0's patterns in any nibble of the counter TPG [16].

In this paper aiming low power dissipation during testing and starting off by the TPG given in [14, 16] we present the methodology for introducing a new TPG. The latter succeeds both less power dissipation and less test application time, without affecting the fault coverage.

### 2.2. Low Power

Charging and discharging of capacitance is the dominant factor of power dissipation (denoted by P) in full static CMOS circuits [17], the dominant today technology. It has been reported (p.8 of [17]) that in high frequency CMOS circuits this accounts for at least 90% of the total power dissipation. Denoting the power supply voltage by V<sub>dd</sub>, the load capacitance at line 1 by C<sub>1</sub>, and the total number of transitions at line 1 by T(1), P can be formulated by:  $P = (1/2)V_{dd}^2 \sum_{l} C_l T(l)$  (1)

It is evident that the power dissipation can be reduced by reducing T(l). By reducing the number of transitions at the primary inputs of the circuit it is expected that the total number of transitions at the lines of the circuit will also be reduced leading to lower power dissipation. However, depending on the circuit structure, the transitions at some primary inputs cause more transitions at internal lines than those at other primary inputs. A procedure has been presented in [2, 3] for identifying those primary inputs that cause more transitions at internal lines. Let f(l) denote the function of line 1, and  $\frac{\partial f(l)}{\partial in_i}$  the Boolean difference of f(l) with respect to input in<sub>i</sub>. Let  $f(l)_{in_i}$  (respectively  $f(l)_{in_i'}$ ) denote the cofactor of f(l) with respect to input variable  $in_i$  (respectively  $in_i'$ ) and  $\oplus$  be the XOR operator. The Boolean difference is precisely:  $\frac{\partial f(l)}{\partial in_i} = f(l)_{in_i} \oplus f(l)_{in_i'}$  (2)

Let 
$$P(\frac{\partial f(l)}{\partial in_i})$$
 denote the probability that function  $\frac{\partial f(l)}{\partial in_i}$  evaluates to 1. The power dissipation

is then estimated as: 
$$P = (1/2)V_{dd}^2 \sum_i C_i \sum_i P(\frac{\partial f(l)}{\partial in_i})T(in_i)$$
 (3)

Equation (3) shows that the total power dissipation of a circuit can be reduced by reducing the total number of transitions on inputs. Once the probability  $P(\frac{\partial f(l)}{\partial in_i})$  is computed, a weight is

assigned to every input 
$$in_i$$
:  $w(in_i) = \sum_i C_i P(\frac{\partial f(l)}{\partial in_i})$  (4)

Weights  $w(in_i)$  are a good metric of how many lines of the circuit, weighted by the associated capacitance, are affected by input  $in_i$ .

Relation (3) implies that power dissipation can be reduced by cutting down the number of transitions at the inputs of the circuit. The reduction is larger when the number of transitions at the inputs with greater weights is reduced. Therefore, the assignment of the TPG outputs to the circuit inputs is very significant. Also since the vectors of a test set are distinct, the reduction of the cardinality of the test set will reduce the number of transitions and thus the power dissipation.

### **3.** Assignment of the TPG outputs to the multiplier inputs

In this section, we address the problem of properly assigning the TPG outputs to the multiplier inputs for achieving low power dissipation. Although we consider the cell fault model, two reasons enforce us to take into account specific implementations of the cells : a) the error aliasing calculation of the ODC circuit and b) the estimation of the power dissipation during testing.

Our aim is the proposed BIST scheme to be effective regardless of the specific cell implementation; therefore the cell fault model was chosen. To this end, the analysis of the MBM, that will lead us to the new BIST scheme, as well as the evaluation of it, must be based on more than one implementations of the adder cells. Hence, we consider three distinct implementations of the half and full adder cells, presented respectively in [18, 7, 19]. We will refer to these implementations as Cell 1, Cell 2 and Cell 3 respectively. The same implementations were used for the adders of the ripple carry adder at the last stage of the MBM. The implementations considered for the rcells, the ps-cells, the r\_ps-cell, the l\_ps-cell were taken from [20]. The group carry look ahead circuit considered in the case of CL-MBMs was the one presented in [21].

|        | B input weights                                                                  |                    |                    |                    |                    |                    |                    |                    | A input weights    |                    |                    |                    |                    |                    |                    |                    |
|--------|----------------------------------------------------------------------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|--------------------|
|        | w(b <sub>7</sub> )                                                               | w(b <sub>6</sub> ) | w(b <sub>5</sub> ) | w(b <sub>4</sub> ) | w(b <sub>3</sub> ) | w(b <sub>2</sub> ) | w(b <sub>1</sub> ) | w(b <sub>0</sub> ) | w(a <sub>7</sub> ) | w(a <sub>6</sub> ) | w(a <sub>5</sub> ) | w(a <sub>4</sub> ) | w(a <sub>3</sub> ) | w(a <sub>2</sub> ) | w(a <sub>1</sub> ) | w(a <sub>0</sub> ) |
|        | Modified Booth Multiplier – Ripple Carry Adder as the Final Result Forming Adder |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |                    |
| Cell 1 | 57                                                                               | 56                 | 94                 | 78                 | 105                | 86                 | 102                | 81                 | 73                 | 74                 | 78                 | 76                 | 74                 | 69                 | 63                 | 57                 |
| Cell 2 | 97                                                                               | 96                 | 156                | 140                | 182                | 160                | 183                | 154                | 128                | 130                | 140                | 140                | 134                | 128                | 117                | 106                |
| Cell 3 | 89                                                                               | 88                 | 144                | 127                | 167                | 144                | 163                | 136                | 117                | 119                | 127                | 126                | 120                | 113                | 102                | 91                 |
|        |                                                                                  | Μ                  | odified            | Booth              | Multipl            | lier - G           | roup Ca            | arry Lo            | ok Ahe             | ad as th           | he Fina            | l Resul            | t Form             | ing Add            | ler                |                    |
| Cell 1 | 68                                                                               | 68                 | 107                | 90                 | 119                | 99                 | 115                | 92                 | 84                 | 86                 | 90                 | 88                 | 84                 | 80                 | 73                 | 66                 |
| Cell 2 | 86                                                                               | 85                 | 142                | 125                | 163                | 140                | 160                | 129                | 113                | 119                | 127                | 124                | 115                | 106                | 93                 | 79                 |
| Cell 3 | 83                                                                               | 83                 | 135                | 118                | 156                | 133                | 149                | 121                | 109                | 113                | 120                | 118                | 110                | 102                | 90                 | 77                 |

Table 1: Weights of the 8x8 multiplier inputs

The primary inputs weights for MBMs of various sizes for each of the possible cells were computed using relation (4). Table 1 lists the weights for the 8x8 MBM inputs for all the cells considered and indicates that the distribution of weights is independent of the specific full and half adder cells. Comparing any possible pair of inputs, the one with the larger weight contributes more than the other to the power dissipation. Similar distribution of weights has also been observed in the larger MBMs. Hence, the same conclusions are also valid for the larger MBMs.

From Table 1 we can easily see that the sum of weights of B inputs is greater than the sum of weights of A inputs. Therefore, the 4 most significant outputs of the TPG should drive the B inputs while its 4 least significant outputs should drive the A inputs.

|        | Tuble 2. Sum of weights of the oxo within inputs |                                                                            |               |                                |                                   |                                                                               |             |                                |  |  |  |  |
|--------|--------------------------------------------------|----------------------------------------------------------------------------|---------------|--------------------------------|-----------------------------------|-------------------------------------------------------------------------------|-------------|--------------------------------|--|--|--|--|
|        | Sı                                               | ım of weigh                                                                | ts for input  | В                              | Sum of weights for input A        |                                                                               |             |                                |  |  |  |  |
|        | $\sum_{i=0}^{1} w(b_{(4i+3)})$                   | $\sum_{i=0}^{1} w(b_{(4i+3)}) \left  \sum_{i=0}^{1} w(b_{(4i+2)}) \right $ |               | $\sum_{i=0}^{1} w(b_{(4i+0)})$ | $\sum_{i=0}^{1} w(a_{_{(4i+3)}})$ | $\sum_{i=0}^{1} w(\mathbf{a}_{(4i+3)}) \sum_{i=0}^{1} w(\mathbf{a}_{(4i+2)})$ |             | $\sum_{i=0}^{1} w(a_{(4i+0)})$ |  |  |  |  |
|        | Modifie                                          | d Booth Mu                                                                 | ltiplier - Ri | pple Carry                     | Adder as th                       | ne Final Res                                                                  | sult Formin | g Adder                        |  |  |  |  |
| Cell 1 | 162                                              | 142                                                                        | 196           | 158                            | 147                               | 143                                                                           | 141         | 133                            |  |  |  |  |
| Cell 2 | 278                                              | 256                                                                        | 339           | 294                            | 262                               | 257                                                                           | 256         | 246                            |  |  |  |  |
| Cell 3 | 256                                              | 232                                                                        | 307           | 262                            | 237                               | 232                                                                           | 229         | 217                            |  |  |  |  |
|        | Modified B                                       | looth Multip                                                               | olier - Grou  | p Carry Lo                     | ok Ahead a                        | s the Final                                                                   | Result Form | ning Adder                     |  |  |  |  |
| Cell 1 | 187                                              | 166                                                                        | 222           | 182                            | 168                               | 166                                                                           | 163         | 153                            |  |  |  |  |
| Cell 2 | 249                                              | 225                                                                        | 301           | 254                            | 228                               | 225                                                                           | 219         | 202                            |  |  |  |  |
| Cell 3 | 238                                              | 214                                                                        | 284           | 239                            | 219                               | 215                                                                           | 210         | 194                            |  |  |  |  |

 Table 2: Sum of weights of the 8x8 MBM inputs

The next step is to assign the low nibble of the TPG  $(c_3c_2c_1c_0)$  to specific inputs  $a_i$ , with i = 0, 1, ..., n-1. Since this nibble is repeatedly assigned to the A multiplier inputs, we sum the weights of the inputs that receive the same TPG output bit. The results for the 8x8 MBM are listed in Table 2. Larger multipliers also present similar behavior. For maximum reduction of the number of transitions, the signals with the least number of transitions should be assigned to the inputs with the largest sum of weights. Therefore we assign the TPG output bit having the most transitions (that is  $c_0$ ) to the inputs with the smallest sum of weight (that is,  $a_{4i}$ , with i=0, 1, 2, ...). The assignment of the rest bits of the low nibble of the counter is made in the same way. The number of transitions at the primary inputs of the MBM can be reduced using as TPG a Gray instead of a binary counter. To this end we decided to use a Gray counter.

For verifying the above analysis, we used the gate level power simulator developed in [4]. The power simulator estimates the power dissipation of the whole circuit consisting of the MBM and the BIST circuitry. Table 3 presents the simulation results. The first and second columns list the MBM size and cell implementation used respectively. We suppose a reference architecture [16] in which the test set consists of 225 vectors, the bits  $c_3c_2c_1c_0$  are generated by a binary counter and the assignment of its output lines to the MBM inputs A is given by the relations  $c_3=a_{4i}$ ,  $c_2=a_{4i+1}$ ,  $c_1=a_{4i+2}$ ,  $c_0=a_{4i+3}$ , with i = 0,1,2,... The following columns of Table 3 present the power reduction percentage achieved over the reference architecture, when a binary as well as a Gray counter is respectively used for the production of the bits  $c_3$ ,  $c_2$ ,  $c_1$  and  $c_0$ , and for three different assignments of  $c_3c_2c_1c_0$  to  $a_{n-1}a_{n-2}...a_0$ . In all cases the B inputs of the MBM are driven by a binary counter with output bits  $c_7, c_6, c_5$  and  $c_4$  according to the assignment :  $c_7=b_{4i+3}$ ,  $c_6=b_{4i+2}$ ,  $c_5=b_{4i+1}$ ,  $c_4=b_{4i}$ , with i = 0, 1, ... From Table 3 we can easily see that the maximum power reduction is achieved by using a Gray counter and by assigning its outputs with the most transitions to the inputs that have the less sum of weights (columns five and eight).

| Multiplier |        |                | MBM with RCA   |              | MBM with CLA   |              |              |  |  |
|------------|--------|----------------|----------------|--------------|----------------|--------------|--------------|--|--|
|            |        | Binary counter | Gray co        | ounter       | Binary counter | Gray counter |              |  |  |
|            |        | Assignment A*  | Assignment B** | Assignment A | Assignment A   | Assignment B | Assignment A |  |  |
|            | Cell 1 | 11.5           | 28.5           | 35.3         | 10.6           | 27.6         | 33.9         |  |  |
| 8x8        | Cell 2 | 11.1           | 27.8           | 34.3         | 10.4           | 27.6         | 33.6         |  |  |
|            | Cell 3 | 11.1           | 26.6           | 33.2         | 10.8           | 27.0         | 33.4         |  |  |
|            | Cell 1 | 4.3            | 27.2           | 30.2         | 3.6            | 26.2         | 29.1         |  |  |
| 16x16      | Cell 2 | 2.6            | 25.3           | 28.0         | 2.2            | 25.3         | 27.8         |  |  |
|            | Cell 3 | 4.0            | 24.6           | 27.5         | 3.6            | 24.7         | 27.5         |  |  |
|            | Cell 1 | 1.5            | 24.9           | 26.5         | 1.3            | 24.4         | 25.9         |  |  |
| 32x32      | Cell 2 | 0.2            | 22.9           | 24.3         | 0.1            | 22.9         | 24.2         |  |  |
|            | Cell 3 | 1.4            | 21.9           | 23.4         | 1.3            | 22.0         | 23.5         |  |  |
|            | Cell 1 | 1.3            | 24.0           | 25.0         | 1.2            | 23.7         | 24.7         |  |  |
| 64x64      | Cell 2 | 0.7            | 22.0           | 23.1         | 0.6            | 22.0         | 23.0         |  |  |
|            | Cell 3 | 1.4            | 20.9           | 21.9         | 1.4            | 20.9         | 21.9         |  |  |

 Table 3: Power reduction percentage for 3 different assignments of the A inputs

<sup>\*</sup>Assignment A :  $c_3=a_{4i+3}$ ,  $c_2=a_{4i+2}$ ,  $c_1=a_{4i+1}$ ,  $c_0=a_{4i}$ ,  $i=0,1,2, \dots$ <sup>\*\*</sup>Assignment B :  $c_3=a_{4i}$ ,  $c_2=a_{4i+1}$ ,  $c_1=a_{4i+2}$ ,  $c_0=a_{4i+3}$ ,  $i=0,1,2, \dots$ 

The above procedure can also be performed for the high nibble of the counter ( $c_7c_6c_5c_4$ ) and the inputs B of the MBM. The weights for the  $b_i$  8x8 MBM inputs are listed in Table 1. In this case though, the power reduction achieved would be far less since : a) the bits of the high nibble have far less transitions than those of the low and b) three distinct of the four bits of the high nibble are applied on each row of the MBM, while the fourth on the subsequent row, therefore choosing those three with the smallest number of transitions for a certain row will lead to the subsequent row getting the one with the largest number of transitions. Our power simulator confirmed this intuition producing negligible power dissipation differences for distinct assignments. Hence,

taking into account that the hardware overhead for a binary counter is slightly smaller than that for the implementation of a Gray counter, we decided to use a binary counter for producing  $c_7$ ,  $c_6$ ,  $c_5$  and  $c_4$ . The assignment chosen was  $c_7=b_{4i+3}$ ,  $c_6=b_{4i+2}$ ,  $c_5=b_{4i+1}$ ,  $c_4=b_{4i}$ , for i=0,1,2,...

## 4. Test Length Reduction

Another way for reducing the power dissipated is to reduce the number of vectors applied to the circuit under test. For determining if all 256 vectors produced by the TPG proposed in [14] are necessary for providing all the possible input combinations to the inputs of the MBM cells we have developed a cell fault simulator. We remind that in [16] the all 0's vectors in any of the nibbles of the TPG were removed leading to a TPG producing only 225 vectors. Using this simulator, and starting off by the 256 vector TPG, we verified that the values  $c_7c_6c_5c_4 = 0000$ , 0010, 0101, 0111, 1001 and 1111, are redundant. The remaining values of  $c_7c_6c_5c_4$  are capable of applying to every cell of the MBM the same input combinations with those applied when  $c_7c_6c_5c_4$  get all their possible values. The above was verified for all realistic MBM sizes (with operands length of 8, 16, 32, 64 and 128 bits).

Therefore, 96 out of the 256 vectors that the TPG of [14] applies to the MBM, can be removed. A circuit that only produces the 160 necessary counter states can be easily synthesized. The circuit is initialized to state 0001 0000 and at every cycle, its low nibble counts in Gray code whereas its high nibble in straight binary omitting unnecessary values.

The total power dissipation reduction of the proposed BIST scheme over the reference BIST scheme [16] defined in Section 3 is presented in Table 4. The power reduction achieved varies from 44.1% to 54.9%. The average power dissipation reduction per vector applied is presented in Table 5. Reduction varies from 21.4% to 36.5%. Table 6 lists the reduction of the peak power dissipation. This varies from 15.8% to 34.3%. The test application time is also reduced by 28.9%.

For obtaining the above comparison results, our gate – level simulator assumes a zero gate delay. We believe that the reductions in the total power dissipated would be even greater if glitches were also taken into account, since the switching activity in the nodes of the multiplier is reduced during the application of the test by the proposed BIST.

Although the proposed BIST scheme can significantly cut down the power dissipated during test, the fault coverage may drop due to increased error aliasing, since every change of the test set implies new values for the error aliasing. Therefore, we need to verify that the fault coverage attained by the reduced test set, with respect to single stuck-at faults, remains at high levels. Table 7 lists the error aliasing and the fault coverage achieved, assuming a rotate carry adder as the ODC, for 8x8 or 16x16 MBMs and the three cell implementations. We can observe that due to increased error aliasing in the ODC the fault coverage may drop below the acceptable level of 99% especially in the case of the CL-MBMs. In this case, there is also a slight increase in the undetectable faults (located at the group carry look ahead adder) by the proposed test set compared to the application of the 225 vectors, but this is not the dominant factor for the fault coverage drop. To reduce the error aliasing in the ODC, this can be implemented as a multiple rotate carry accumulator [14]. The 7<sup>th</sup> and the 12<sup>th</sup> columns of Table 7 present the attained fault coverage if a multiple rotate carry accumulator is used as the ODC. Then, the aliasing is either significantly reduced or totally eliminated, leading to a fault coverage always larger than 99%.

|        | •    | MBM w | ith RCA |       | MBM with CLA |       |       |       |  |  |
|--------|------|-------|---------|-------|--------------|-------|-------|-------|--|--|
|        | 8x8  | 16x16 | 32x32   | 64x64 | 8x8          | 16x16 | 32x32 | 64x64 |  |  |
| Cell 1 | 54.9 | 50.4  | 47.6    | 46.4  | 53.6         | 49.5  | 47.2  | 46.2  |  |  |
| Cell 2 | 54.0 | 48.4  | 45.5    | 44.5  | 53.4         | 48.2  | 45.5  | 44.5  |  |  |
| Cell 3 | 53.3 | 48.3  | 45.3    | 44.1  | 53.3         | 48.3  | 45.4  | 44.1  |  |  |

Table 4: Total power dissipation reduction % of the proposed 160 cycles TPG

|        |      | MBM w | ith RCA |       | MBM with CLA |       |       |       |  |
|--------|------|-------|---------|-------|--------------|-------|-------|-------|--|
|        | 8x8  | 16x16 | 32x32   | 64x64 | 8x8          | 16x16 | 32x32 | 64x64 |  |
| Cell 1 | 36.5 | 30.3  | 26.4    | 24.7  | 34.8         | 28.9  | 25.7  | 24.4  |  |
| Cell 2 | 35.2 | 27.4  | 23.4    | 21.9  | 34.4         | 27.2  | 23.3  | 21.9  |  |
| Cell 3 | 34.3 | 27.3  | 23.1    | 21.4  | 34.4         | 27.3  | 23.2  | 21.4  |  |

 Table 5: Average power dissipation reduction % per vector of the proposed 160 cycles TPG

Table 6: Peak power dissipation reduction % of the proposed 160 cycles TPG

|        |      |       | <b>_</b> |       |              |       |       |       |  |  |
|--------|------|-------|----------|-------|--------------|-------|-------|-------|--|--|
|        |      | MBM w | ith RCA  |       | MBM with CLA |       |       |       |  |  |
|        | 8x8  | 16x16 | 32x32    | 64x64 | 8x8          | 16x16 | 32x32 | 64x64 |  |  |
| Cell 1 | 19.3 | 19.1  | 24.0     | 26.6  | 19.2         | 20.2  | 24.9  | 26.9  |  |  |
| Cell 2 | 22.4 | 27.4  | 31.8     | 34.3  | 18.3         | 26.0  | 31.4  | 34.1  |  |  |
| Cell 3 | 20.0 | 15.8  | 21.5     | 23.9  | 19.6         | 17.6  | 22.6  | 24.5  |  |  |

 Table 7: Fault Coverage

| Multiplier |        |      | Original  | TPG (225  | cycles) [16] | Proposed TPG (160 cycles) |      |          |             |             |                       |  |  |
|------------|--------|------|-----------|-----------|--------------|---------------------------|------|----------|-------------|-------------|-----------------------|--|--|
|            |        |      | Rotate Ca | rry Adder | Multiple Ro  | Multiple Rotate Carry     |      | Rotate ( | Carry Adder | Multiple Ro | Multiple Rotate Carry |  |  |
|            |        | UF   | AL        | FC        | AL           | FC                        | UF   | AL       | FC          | AL          | FC                    |  |  |
|            |        |      | RC-MBM    |           |              |                           |      |          |             |             |                       |  |  |
|            | Cell 1 | 0.00 | 0.59      | 99.41     | 0.25         | 99.75                     | 0.00 | 1.34     | 98.66       | 0.20        | 99.80                 |  |  |
| 8x8        | Cell 2 | 0.45 | 0.30      | 99.25     | 0.45         | 99.10                     | 0.30 | 0.71     | 98.99       | 0.23        | 99.47                 |  |  |
| 0.10       | Cell 3 | 0.00 | 0.30      | 99.70     | 0.19         | 99.81                     | 0.00 | 0.87     | 99.13       | 0.11        | 99.89                 |  |  |
|            | Cell 1 | 0.03 | 0.31      | 99.66     | 0.00         | 99.97                     | 0.03 | 0.37     | 99.60       | 0.00        | 99.97                 |  |  |
| 16x16      | Cell 2 | 0.16 | 0.11      | 99.73     | 0.00         | 99.84                     | 0.12 | 0.17     | 99.71       | 0.00        | 99.88                 |  |  |
|            | Cell 3 | 0.00 | 0.12      | 99.88     | 0.00         | 100.0                     | 0.00 | 0.29     | 99.71       | 0.00        | 100.0                 |  |  |
|            |        |      |           |           |              | CL-M                      | 1BM  |          |             |             |                       |  |  |
|            | Cell 1 | 0.23 | 1.39      | 98.38     | 0.22         | 99.55                     | 0.45 | 2.52     | 97.03       | 0.14        | 99.41                 |  |  |
| 8x8        | Cell 2 | 0.19 | 1.17      | 98.64     | 0.47         | 99.34                     | 0.39 | 2.10     | 97.51       | 0.23        | 99.38                 |  |  |
|            | Cell 3 | 0.19 | 1.17      | 98.64     | 0.20         | 99.61                     | 0.39 | 2.25     | 97.36       | 0.12        | 99.49                 |  |  |
|            | Cell 1 | 0.34 | 0.73      | 98.93     | 0.00         | 99.66                     | 0.25 | 0.85     | 98.90       | 0.00        | 99.75                 |  |  |
| 16x16      | Cell 2 | 0.30 | 0.50      | 99.20     | 0.00         | 99.70                     | 0.23 | 0.64     | 99.13       | 0.00        | 99.77                 |  |  |
|            | Cell 3 | 0.26 | 0.50      | 99.24     | 0.00         | 99.74                     | 0.19 | 0.76     | 99.05       | 0.00        | 99.81                 |  |  |

UF : Percentage of Undetected Faults, AL : Percentage of Aliasing, FC : Fault Coverage Percentage

The hardware overhead imposed by the proposed BIST scheme can be approximately estimated in gate equivalents as follows (we assume that 1 gate equivalent is equal to one 2-input NAND gate) : a full and a half adder equal to 10 and 5 gate equivalents respectively, each r-cell, ps\_cell, 1\_ps-cell and r\_ps-cell respectively requires 15, 13, 13 and 8 gate equivalents for its implementation, one 4-bit CLA cell requires 80 gate equivalents, one flip-flop equals 10 gate equivalents, one multiplexer equals 3 gate equivalents, one 2 or 3-input AND or OR gate equals 2 gate equivalents and one 2-input XOR or XNOR gate equals 4 gate equivalents.

The design of the MBM consists of n/2 r-cells, (n-1)(n/2) ps-cells, n/2 l\_ps-cells, n/2 r\_ps-cells, (n-1)[(n/2)-2]+1 full adders, n+(n/2)-3 half adders and n/2 2-input OR gates. We add 2n full adders for the RC-MBM or n/2 carry look ahead adder cells for the CL-MBM. We assume that an accumulator is already part of the circuit so we add 2n full adders and an equal number of flip-flops for the Rotate Carry Adder (when Multiple Rotate Carry is used, we need to also add n/4 2-input XOR gates). Thus the total number of gates is:  $(23n^2+110n+30)/2$  in the case of RC-MBMs and  $(23n^2+150n+30)/2$  in the case of CL-MBMs.

The hardware required for the implementation of the proposed BIST scheme consists of 2n multiplexers, 8 flip-flops and the combinational circuit of the TPG circuit, giving a total number of 6n+136 gate equivalents. Hence the hardware overhead (HO) for the RC-MBM and the CL-MBM is given by the relations:  $HO_{RC-MBM} = \frac{2(6n+136)}{23n^2+110n+30}$ ,  $HO_{CL-MBM} = \frac{2(6n+136)}{23n^2+150n+30}$ (5)

Both above relations for n = 16, 32 and 64 result in a hardware overhead less than 6.1%, 2.5% and 1.1% respectively, that is, very small.

### **5.** Conclusions

Aiming low power dissipation during testing, in this paper we have presented a methodology for deriving a novel BIST scheme for Modified Booth Multipliers. Starting off by the already proposed BIST schemes we showed how the low power objective can be achieved by: a) proper assignment of the TPG outputs to the multiplier inputs, b) the use of Gray code and c) reducing significantly the test set without affecting the fault coverage. A novel BIST scheme, capable to reduce the total, the average per vector and the peak power dissipation up to 54.9%, 36.5% and 34.3% respectively over the one of [16], was also introduced by the combination of these techniques. The introduced BIST scheme has a very small hardware overhead and also achieves a test application time reduction of 37.5% and 28.9% compared respectively with the BIST scheme initially proposed in [14] and its later improvement in [16].

#### References

- [1] Y. Zorian, "A Distributed BIST control scheme for Complex VLSI devices", VLSI Test Symp pp 4-9, 1993.
- [2] S. Wang & S. K. Gupta, "DS-LFSR: A New BIST TPG for Low Heat Dissipation", Proc. of Int. Test Conference, pp. 848-857, 1997
- [3] X. Kavousianos, D. Nikolos, S. Tragoudas, "On-Chip Deterministic Counter-Based TPG with Low Heat Dissipation", Proc. of Southwest Symposium on Mixed-Signal Design, pp. 87 92, 1999.
- [4] D. Bakalis & D. Nikolos, "On Low Power BIST for Carry Save Array Multipliers", Proc. of 5<sup>th</sup> IEEE Int. On-Line Testing Workshop, July 5-7, 1999, Rhodes, Greece, pp. 86-90.
- [5] P. Girard et. al., "A Test Vector Inhibiting Technique for Low Energy BIST Design", Proc. of VLSI Test Symposium, pp. 407-412, 1999
- [6] R. Roy, "Scaling Towards Nanometer Technologies: Design for Test Challenges", Panel, Design Automation and Test in Europe, 1999.
- [7] N. H. E. Weste and K. Eshragian, *Principles of CMOS VLSI Design: A Systems Perspective*, Addison-Wesley, 2<sup>nd</sup> edition, 1992.
- [8] R. Parkar, "Bare Die Test", Proc. of IEEE Multi-Chip Module Conference, pp. 24 27, 1992.
- [9] S. Wang & S. K. Gupta, "ATPG for Heat Dissipation Minimization During Test Application", Proc. of International Test Conference, pp. 250-258, 1994.
- [10] W. Kautz, "Testing for faults in cellular logic arrays", Proc. of the 8<sup>th</sup> Annual Symposium on Switching and Automata Theory, pp. 161-174, 1967.
- [11] M. Annaratone, Digital CMOS Circuit Design, Kluwer Academic Publishers, 1986.
- [12] D. Gizopoulos, D. Nikolos, A. Paschalis & C. Halatsis, "C-Testable Modified Booth Multipliers", Journal of Electronic Testing : Theory & Applications, Vol. 8. No. 3, pp. 241-259, June 1996.
- [13] J. Van Sas et. al., "Design of a C-Testable Booth Multiplier using a Realistic Fault Model", Journal of Electronic Testing : Theory & Applications, Vol. 5. No. 1, pp. 29-41, February 1994.
- [14] D. Gizopoulos, A. Paschalis & Y. Zorian, "Effective Built-In Self-Test for Booth Multipliers", IEEE Design and Test of Computers, pp. 105 – 111, July – September 1998.
- [15] J. Rajski & J. Tyszer, "Test Responses Compaction in Accumulators with Rotate Carry Adders", IEEE Trans. on CAD, Vol. 12, No. 4, pp. 531 – 539, April 1993.
- [16] D. Gizopoulos et. al., "An Effective BIST Scheme for Datapaths", Proc. of Int. Test Conference, pp. 76-85, 1996.
- [17] J. Rabaey & M Pedram, Low Power Design Methodologies, Kluwer Academic Publishers, 1996.
- [18] M. Morris Mano, *Digital Design*, Prentice Hall 1991, 2<sup>nd</sup> edition.
- [19] M. Abramovichi, M. Breuer, A. Friedman, *Digital Systems Testing and Testable Design*, Computer Science Press, 1990.
- [20] E. Kalligeros et. al., "Path Delay Fault Testable Modified Booth Multipliers", to be presented in XIV Design of Circuits and Integrated Systems Conference (DCIS '99), Palma de Mallorca, November 16 - 19, 1999.
- [21] T. Haniotakis et. al., "C-Testable One-Dimensional ILAs with Respect to Path Delay Faults : Theory and Applications", 1998 IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems, pp 155-163.