# Defect Aware X-Filling for Low-Power Scan Testing

S. Balatsouka, V. Tenentes, X. Kavousianos Dept. of Computer Science, University of Ioannina 45110 Ioannina, Greece sbalats@cs.uoi.gr, tenentes@cs.uoi.gr, kabousia@cs.uoi.gr

*Abstract*—Various X-filling methods have been proposed for reducing the shift and/or capture power in scan testing. The main drawback of these methods is that X-filling for low power leads to lower defect coverage than random-fill. We propose a unified low-power and defect-aware X-filling method for scan testing. The proposed method reduces shift power under constraints on the peak power during response capture, and the power reduction is comparable to that for the Fill-Adjacent X-filling method. At the same time, this approach provides high defect coverage, which approaches and in many cases is higher than that for random-fill, without increasing the pattern count. The advantages of the proposed method are demonstrated with simulation results for the largest ISCAS and the IWLS benchmark circuits.

## I. INTRODUCTION

Scan testing of integrated circuits is widely used today for defect screening and quality assurance. However, for very-deep submicron (VDSM) technologies, the increased complexity and the new types of defects dramatically increase the cost of testing. Traditional testing techniques decrease test costs by concurrently targeting as many defects as possible, leading thus to elevated test power consumption, which can be several times higher than that in functional mode [8].

Power consumption during scan testing consists of two switching activity components, namely shift and capture power. Numerous methods have been proposed in the literature for limiting power consumption during test application, targeting scan shifting [1, 2, 5-7, 10, 12, 15] or response capture [4, 13, 17, 22-26]. In addition, some methods simultaneously target the reduction of both shift and capture switching activity [3, 11, 14, 16, 18]. These methods can be further categorized as being either structural [2, 4-7, 11, 12, 15] or algorithmic [18, 22, 23]. Algorithmic methods also include techniques that manipulate the test cubes [1, 3, 10, 13, 14, 16, 17, 24-26]. The latter category, known also as X-filling, is aimed a power-aware logic assignment of the unspecified X-bits. X-filling has negligible impact on ATPG process, and affects neither the scan chain structure nor the circuit under test (CUT). Moreover, it can be combined with other techniques for further reducing test power.

A popular method for reducing shift power is Fill-Adjacent technique [1]. This technique targets only the scan-in portion of the shift power, but it also reduces the scan-out power, because,

K. Chakrabarty

Dept. of Electrical & Computer Engineering, Duke University 27708 Durham, NC, USA

krish@ee.duke.edu

as shown in [3], the scan-in power is highly correlated to the scan-out power. In addition, it can be easily combined with capture-power reduction techniques such as Preferred Fill [16, 17], to provide an efficient unified power-reduction solution.

A major drawback of power-aware X-filling techniques is that they are often accompanied by a reduction in defect coverage, since the impact on unmodeled fault coverage is not considered during X-filling. ATPG engines, on the other hand, increase the fortuitous detection of modeled as well as of unmodeled faults by filling randomly the Xs. However, this step elevates the test power. Consequently, a unified X-filling method that simultaneously targets power reduction and high defect coverage is needed for VDSM technologies.

In this paper, we present a new X-filling technique that achieves the following goals:

- 1. It provides substantial reduction in shift power during scan testing, close to that obtained using Fill-Adjacent X-filling.
- 2. It ensures that the capture switching activity is less than a pre-determined limit.
- 3. It provides increased defect coverage, which approaches and even outperforms in many cases, the random filling of Xs, without increasing the test pattern count.
- It offers a tradeoff between power efficiency and defect coverage and thus it can be adjusted to the specific requirements of a design.

The proposed method exploits different ways of filling the Xs, and selects the most effective one with respect to defect coverage and shift power, under constraints on the capture power. High defect coverage is ensured by the use of a surrogate metric based on output deviations [20], for evaluating the quality of test vectors. Output deviations provide an efficient probabilistic means to evaluate test vectors based on their potential for detecting arbitrary defects and, most importantly, without being biased towards any particular fault model. As shown in [21], unbiased testing provides higher test quality than a test method that is biased by a particular fault model. The efficiency of the proposed method is demonstrated through experiments with the ISCAS and IWLS [27] benchmark circuits. To the best of our knowledge this is the first X-filling method that achieves power reduction and high defect coverage in a unified manner.

## II. BACKGROUND

We use the term *test cube* to refer to a test pattern consisting of specified '0', '1' and unspecified 'X' bits, and the term *test vector* to refer to a pattern consisting only of specified bits.

The work of K. Chakrabarty and X. Kavousianos was supported in part by the National Science Foundation (NSF) under grant no. CCF-0903392 and the Semiconductor Research Corporation (SRC) under contract no.1992. The work of K. Chakrabarty was also supported in part by SRC under contract no. 1588.

|     | TABLE I. FA AND PROPOSED X-FILLING <sup>*</sup> |       |                      |  |  |  |  |  |  |  |  |  |  |
|-----|-------------------------------------------------|-------|----------------------|--|--|--|--|--|--|--|--|--|--|
|     | Test Cube Block                                 | FA    | MFA                  |  |  |  |  |  |  |  |  |  |  |
| i   | 0xx0, 0xx, xx0                                  | 0000  | 0000                 |  |  |  |  |  |  |  |  |  |  |
| ii  | 1xx1, 1xx, xx1                                  | 1111  | 1111                 |  |  |  |  |  |  |  |  |  |  |
| iii | 0xxx1                                           | 01111 | 01111, 00111,, 00001 |  |  |  |  |  |  |  |  |  |  |
| iv  | 1xxx0                                           | 10000 | 10000, 11000,, 11110 |  |  |  |  |  |  |  |  |  |  |

\*the rightmost bit is loaded first into the scan chain

# A. Overview of Fill-Adjacent and Preferred-Fill Techniques

Every two complementary consecutive test bits loaded into a scan chain generate switching activity as they travel along the scan chain. The Fill-Adjacent technique (denoted hereafter as FA) minimizes the shift power by exploiting the X-bits of the test cubes in order to minimize the volume of the consecutive complementary test bits loaded into the scan chains as well as the distance they travel along the scan chains. For instance, consider a CUT with c scan chains, and assume that the test cube segment  $S_i$ =XXX1XXX01XX0XXX1 has to be loaded into scan chain *j*  $(1 \le j \le c)$  from right to left. By applying FA to fill the Xs, we get the test vector segment  $T_i=1111000010001111$ . Table 1 shows all possible X-fillings produced by the FA technique. The first column shows all possible blocks of test bits comprising any test cube segment that consists of n ( $n \ge 1$ ) unspecified logic values bounded at the left and/or right by specified logic values. The second column shows the X-filling produced for all these blocks.

The Preferred Fill technique (denoted hereafter as PF) is an X-filling technique for reducing the switching activity during capture [16, 17]. Consider a two-pattern Launch-On-Capture (LOC) test  $\langle V_1, V_2 \rangle$  where  $V_1 = (v_{11}, v_{12}, v_{13}, ..., v_{1n})$  is the first *n*-bit vector applied on the CUT and  $V_2 = (v_{21}, v_{22}, v_{23}, ..., v_{2n})$  is the response of  $V_1$  which is applied as the second test vector to the CUT. If the logic value of  $V_1$  corresponding to cell *i*, (i.e.,  $v_{1i}$ ) is unspecified then it should be filled with value 1(0) provided that the probability of  $v_{2i}$  (i.e., the logic value of  $V_2$  corresponding to the value 0(1). In other words, the  $v_{1i}$  bit is filled with a value that is more likely to be held after the capture in the *i*<sup>th</sup> scan cell.

## B. Overview of Output Deviations

Output deviations [20] are probability measures at primary outputs and pseudo-outputs (all referred to as outputs) that reflect the likelihood of error detection at these outputs. As is shown in [20], test patterns with high deviations tend to be more effective for fault detection. Output deviations are based on a probabilistic fault model, in which a probability map (referred to as the confidence-level vector) is assigned to every gate in the circuit. Signal probabilities  $p_{i,0}$  and  $p_{i,1}$  are associated with each line *i* for every input pattern, where  $p_{i,0}$  and  $p_{i,1}$  are the probabilities for line *i* to be at logic 0 and 1, respectively. The confidence level  $R_i$  of a gate  $G_i$  with *m* inputs and a single output is a vector with 2<sup>*m*</sup> components,  $R_i = (r_i^{0..00} r_i^{0..01} ... r_i^{1..11})$ , where each component denotes the probability that the gate output is correct for the corresponding input combination. For example, let *y* be the output of a NAND gate  $G_i$ , with inputs *a*, *b*. We have,  $p_{y,0} = p_{a,1}p_{b,1}r_i^{11}+p_{a,0}p_{b,0}(1-r_i^{00})+p_{a,0}p_{b,1}(1-r_i^{01})+p_{a,1}p_{b,0}(1-r_i^{10})$ ,  $p_{y,1} = p_{a,0} p_{b,0} r_i^{00} + p_{a,0} p_{b,1} r_i^{01} + p_{a,1} p_{b,0} r_i^{10} + p_{a,1} p_{b,1}$  (1-  $r_i^{11}$ ). Likewise, the signal probabilities can be computed for other gates. For any gate  $G_i$  in a circuit, let its fault-free output value for any given input pattern  $t_j$  be d, with  $d \in \{0,1\}$ . The *output deviation*  $\Delta_{Gi,j}$  of  $G_i$  for  $t_j$  is defined as  $p_{G,\overline{d}}$ , where  $\overline{d}$  is the complement of d. Intuitively, the deviation for an input pattern is a measure of the likelihood that the gate output is incorrect for that pattern. Output deviations can be determined without explicit fault grading; hence the computation (linear in the number of gates) is feasible for large circuits and large test sets.

# III. PROPOSED METHOD

The proposed method generates multiple power-efficient candidate test vectors by filling the Xs of each test cube in multiple power-efficient ways. The candidate test vectors for each test cube are evaluated with respect to their potential for detecting defects, using an output-deviation based quality metric, and the most efficient one is selected. In this section, we describe first the process of generating the candidate test vectors and then the selection of the most efficient ones.

# A. Generation of Power Efficient Candidate Test Vectors

In order to reduce the average shift power for the candidate test vectors, a modified version of the FA technique, hereafter called MFA, is proposed. MFA fills the Xs of a test cube in multiple power efficient ways by compromising only a very small portion of the shift power efficiency offered by FA technique. Specifically, as it is shown in column 2 of Table I, only the blocks of types iii, iv cause scan-in switching activity when they are filled according to FA technique because they contain one pair of consecutive complementary test bits. FA fills the Xs in such a way as to locate every such pair at the leftmost position of each block in order to minimize the distance that this pair has to travel during scan-in (note that the leftmost position is loaded last). To retain a low power profile of the candidate test vectors, MFA fills blocks of types i and ii in the same way as FA, while for blocks of types iii and iv, MFA allows the pairs of consecutive complementary test bits to be located at any point relative to the scan output; see Column 3 of Table I. Consequently, FA can be considered as a special case of MFA. For any block of either type iii or iv consisting of n unspecified bits, n+1 different fillings exist according to MFA, and for any test cube consisting of *m* such blocks with  $n_1, n_2, ..., n_m$  unspecified bits each,  $(n_1+1)\cdot(n_2+1)\cdot\ldots\cdot(n_m+1)$  different candidate vectors can be generated. Note that as we move from the first to the last filling of MFA shown in Table I at both iii, iv types of blocks, scan-in switching activity increases because the pair of complemented test bits travels a longer distance in the scan chain.

*Example* 1. Table II presents a hypothetical test cube that is filled a) randomly, b) using FA, and c) using MFA. In order to evaluate the scan-in switching activity,  $P_{SI}(T)$ , of every test cube *T* generated using each of these fillings, we use the *normalized* weighted switching activity [15]. This metric counts the number

 TABLE II. X-FILLING FOR TEST CUBE T=XXX1XXX0XXX0XXXX1

|                                             | MFA                              | MFA+20              |
|---------------------------------------------|----------------------------------|---------------------|
| Random Fill                                 | Moderate SA                      | Moderate SA         |
| 010110100110101001                          | 111111000000001111               | 101111000000001111  |
| <i>P</i> <sub>SI</sub> ( <i>T</i> ): 75.16% | $P_{SI}(T)$ : 13.1%              | $P_{SI}(T): 15\%$   |
| FA                                          | Worst SA                         | Worst SA            |
| 111100000000111111                          | 111111100000000001               | 010111100000000001  |
| $P_{SI}(T)$ : 10.5%                         | <i>P<sub>SI</sub>(T</i> ): 15.7% | $P_{SI}(T)$ : 19.6% |

of transitions in successive scan cells, taking also into account their relative positions, and normalizes this value by dividing it by the upper bound of the volume of switching flip-flops. The values of this metric are in the range 0% (no switching activity) to 100% (all flip flops are switching at every cycle). In the first column of Table II, we show a potential random filling of the cube, the filling provided by FA technique, and their respective  $P_{st}(T)$  values. It is obvious that FA causes less switching activity than random filling. In the second column, we present two different X-fillings using MFA: one filling with moderate scan-in switching activity and one filling with the highest scan-in switching activity that can be possibly generated by MFA (i.e., for every block of either type iii or iv, the last combination shown in the third column of Table I is used). We can see that even in the worst case, the scan-in switching activity of the proposed method is only slightly higher than that of FA, while it is still much lower than that of random filling.

Even though the shift power of MFA is only slightly higher compared to FA, the test vectors generated using MFA exhibit significant differences with respect to their potential to detect un-modeled defects. The magnitude of these differences depends mainly on the diversity of these vectors, which is greatly affected by the way the Xs are filled. In order to increase the diversity of the candidate test vectors, a step that slightly increases the switching activity is required. This step is used sparingly in our X-filling technique. We randomly fill a small and carefully selected portion of the Xs of each test cube. Specifically, for every test cube segment loaded into any scan chain, we fill randomly the Xs corresponding to the leftmost scan cells (i.e. the scan cells that are closer to the input of the scan chain) that make a small contribution to the scan-in switching activity (they travel the shortest distance in the scan chains during scan-in). Thus, depending on a user-defined parameter P, all Xs corresponding to the P% leftmost scan cells of every scan chain are filled randomly. As P increases, the defect coverage of the test vectors increases but they consume more shift power. Thus, P offers a tradeoff between scan-in switching activity and defect coverage. This enhanced version of MFA is called MFA+P. Note that MFA is a special case of MFA+P with P=0%.

*Example* 2. In Column 3 of Table II, we present two fillings, one with moderate and one with the highest possible scan-in switching activity, using MFA+20 (P=20%). It is obvious that the scan-in switching activity is increased compared to MFA but it is still much lower than that for random fill.

It has been observed that the FA technique adversely affects the peak capture power, which may even be higher than the peak



Fig. 1. Generation of Candidate Test Vectors

capture power for random filling [3]. To eliminate this problem in the proposed method, we invoke the Preferred Fill (PF) technique [16, 17] for specifying as many Xs as necessary in order to limit the peak capture power under the power budget. This is done in a stepwise fashion and concurrently with the application of MFA/MFA+P technique in order to minimize the number of Xs specified according to PF. The capture power is measured as the Hamming distance between the test vector and the first response (this pair always exhibits the peak power as noted in [17]). Other, more sophisticated metrics can be also used. The functional limit on peak capture is considered as a maximum number *L* of scan cells switching during capture.

The complete flow is shown in Fig. 1. The goal of this process is to generate a set CS(t) of at most N candidate test vectors (N is a constant value pre-determined by the designer) for every test cube t. At first one test cube t of test set TS is selected, and it is filled using solely MFA or MFA+P (i.e., PF is not applied yet) in order to generate  $N \cdot C$  candidate test vectors (C is a also a constant pre-determined by the designer). All these  $N \cdot C$  candidate test vectors are checked for the violation of the peak capture power limit, and the test vectors that violate this limit are discarded. The remaining test vectors are inserted into the set of candidate test vectors CS(t). If these vectors are more than N, then N of them are randomly selected else the PF technique is invoked to provide additional test vectors as follows: at first the 10% of the Xs of the test cube t which are the most highly potential to reduce the peak capture power according to PF are specified. Then again  $N \cdot C$  candidate test vectors are generated using the MFA or MFA+P for the modified test cube t and these test vectors are checked for violating the peak capture power limit. Again the test vectors that do not violate this limit are appended into set CS(t). If CS(t) at this step still consists of less than N candidate test vectors, then the same flow is repeated by specifying another 10% of the Xs of test cube t with the highest possibility to reduce the peak capture power. When

either CS(t) contains N test vectors or test cube t is fully specified by PF, the generation of CS(t) stops and the process continues with the next test cube.

#### B. Evaluation and Selection of Test Vectors

The candidate test vectors CS(t) for every test cube  $t \in TS$  are evaluated using an output-deviation-based quality metric and the best test vector is selected for every test cube. This metric is an advanced version of the quality metric proposed in [9] as it evaluates the defect coverage potential of a test vector using concurrently both its first and its second test response (we consider LOC scheme). Thus the proposed technique targets both timing-dependent and timing- independent defects at the same time. The quality metric exploits the following properties: 1. For every candidate test vector v, the output deviation values for both responses are calculated. Then, the outputs where the deviations reach their highest values among all candidate vectors are the most promising for detecting defects (the rest are not further considered). These outputs are partitioned into four sets for each vector v as follows. For the first response of vector v, the outputs with maximum deviation values and fault free logic value 0 (1) form set  $MS_0(v,0)$  ( $MS_0(v,1)$ ). The respective outputs for the second response form sets  $MS_1(v,0)$  and  $MS_1(v,1)$ .

2. Every circuit output is weighted according to its potential to detect defects. This weight depends: a) on the amount of logic at the fan-in logic cone of the respective output (more defects can be potentially observed at the outputs of the large cones than at the outputs of the small ones), b) On the fault-free response at each output (different defects may be observable at every output for different fault free logic values) and c) on the volume of potential defects at each cone which are not yet detected by previously selected test vectors at the respective output. This volume is estimated by considering the number of previously selected test vectors which maximize the deviation at this output. The higher this number, the higher is the expected volume of defects already detected at this output, and thus the lower is the volume of defects remaining to be detected (at this output).

Based on the above properties, the evaluation process is conducted as follows. Initially, a set *CS* of all candidate vectors is generated as the union of sets *CS*(*t*) for all  $t \in TS$ . Then, one pair of weights is assigned at each circuit output *i* corresponding to the first response assuming both fault free logic values 0, 1, that is wo<sub>0</sub>(*i*,0) and wo<sub>0</sub>(*i*,1) respectively, and another pair of weights is assigned at each circuit output *i* corresponding to the second response and fault free logic values 0, 1, that is wo<sub>1</sub>(*i*,0) and wo<sub>1</sub>(*i*,1) respectively. All these weights are initially set equal to the number of lines in the fan-in logic cone of the respective output. Next, for every test vector  $v \in CS(t)$  the output deviation values are computed and the sets  $MS_0(v,0)$ ,  $MS_0(v,1)$ ,  $MS_1(v,0)$ ,  $MS_1(v,1)$  are generated. Then the following process is repeated and at each repetition one test vector is selected. At first the next formula is used to compute the quality metric of each vector:  $WT(v) = \sum_{i=0,1} \sum_{j=0,1,k \in MS_i[v,j]} wo_i(k,j)$ . Intuitively,

WT(v) is the sum of the weights of all outputs which have maximum deviation value at the first and/or second response

when test vector v is applied. Among the evaluated test vectors, the one with the highest value of this metric is selected since it is the most promising one for defect detection. After the selection of vector v, the rest vectors of set CS(t) are discarded from set CS and the weights  $wo_0(k,0)$  for all  $k \in MS_0(v,0)$ ,  $wo_0(k,1)$  for all  $k \in MS_0(v,1)$ ,  $wo_1(k,0)$  for all  $k \in MS_1(v,0)$  and  $wo_1(k,1)$  for all  $k \in MS_1(v,1)$ , are divided by a constant factor  $F_2$  (these outputs are expected to detect many defects, after the application of test vector v, and thus they are considered as less effective for the selection of the next vectors). As proposed in [9], the value of  $F_2$ was set equal to 8. Then, the new weights WT(v) are calculated for all remaining vectors v, and the next vector is selected.

## IV. EXPERIMENTS

The simulation platform was developed using the C language and the power simulations were done using commercial tools. We report the total power consumed in the scan chains and the combinational logic during the whole testing process. We conducted experiments on the largest ISCAS'89 circuits and a subset of the IWLS'05 circuits [27] for multiple scan chains. All methods were applied on dynamically compacted test sets generated using a commercial ATPG tool for complete stuck-at fault coverage. N=30 candidate test vectors were generated for every test cube and the value of C was set equal to 3.

Due to the dynamic compaction performed by the ATPG, the first few test cubes generated are usually densely specified and thus decrease the potential of the PF technique to reduce capture power below the pre-determined limit. In order to avoid time consuming bit-relaxation techniques, we replace these few but very densely specified test cubes with a small number of less specified test cubes generated in a second ATPG pass. We note that in the absence of the power profile of the benchmark circuits during normal operation, we set (unless otherwise noted) the capture power limit L equal to 30% of the scan cells switching during capture. The reason for this selection is twofold: a) the compacted nature (large volume of specified test bits) of the test sets prevents the reduction of the peak-capture power below certain values of L, b) as it will be apparent soon, the value of L does not affect the effectiveness of the proposed method to provide X-filling with high defect coverage. For even further reduction of the capture power, bit relaxation techniques can be utilized, and/or less compacted test sets can be used. Nevertheless, this study is beyond the scope of this paper.

We note that, as was expected, the FA method in most cases violated the capture power limit. Thus for providing a fair comparison with the proposed methods that do not violate this limit, we also implemented a slightly modified version of FA, denoted as FA<sup>\*</sup>: for every cube violating the capture power limit when filled according to FA, 10% of the Xs (the most efficient ones) of the test cube are filled according to PF and the rest according to FA. If the test vector still violates the capture power, then the percentage of the test cube's Xs specified using PF is increased by 10%. This is repeated until a test vector is generated which does not violate the capture power limit.

Table III presents the average power comparisons between FA, FA<sup>\*</sup>, MFA, MFA+10 and MFA+20 methods. The first two

TABLE III. TOTAL AVERAGE POWER REDUCTION COMPARED TO RF

| circuit # cube |      | FA     | FA <sup>*</sup> | MFA    | MFA+10 | MFA+20 |  |  |
|----------------|------|--------|-----------------|--------|--------|--------|--|--|
| s5378          | 134  | 51.39% | 37.68%          | 45.93% | 30.30% | 25.74% |  |  |
| s9234          | 166  | 36.96% | 33.91%          | 34.64% | 31.36% | 27.11% |  |  |
| s13207         | 269  | 43.63% | 41.11%          | 42.20% | 39.99% | 37.41% |  |  |
| s15850         | 162  | 49.97% | 49.86%          | 49.93% | 45.55% | 41.90% |  |  |
| s38417         | 143  | 55.31% | 55.47%          | 54.94% | 51.21% | 48.02% |  |  |
| s38584         | 185  | 49.50% | 49.08%          | 49.29% | 45.21% | 40.94% |  |  |
| ac97_ctrl      | 66   | 46.32% | 46.32%          | 45.56% | 42.94% | 39.04% |  |  |
| mem_ctrl       | 603  | 59.65% | 59.50%          | 58.37% | 53.31% | 47.59% |  |  |
| pci_bridge32   | 298  | 55.93% | 56.14%          | 55.61% | 51.35% | 46.73% |  |  |
| tv80           | 757  | 59.84% | 59.87%          | 58.88% | 53.94% | 50.45% |  |  |
| usb_funct      | 136  | 38.84% | 36.28%          | 36.74% | 34.73% | 32.47% |  |  |
| ethernet       | 1113 | 73.40% | 73.47%          | 73.17% | 66.01% | 58.68% |  |  |

columns present the circuit name and the number of test cubes for each circuit, which is the same for all methods. The remaining columns present the percentage reduction in average power consumption achieved by each method compared to random fill (RF). It is obvious that the highest average power reduction is offered by FA; however, FA is a capture power-unaware method. In the case of FA<sup>\*</sup>, the filling of a portion of the Xs for reducing the capture power increases, in most cases, the average power consumed compared to FA. The reductions in average power offered by FA<sup>\*</sup> and MFA compared to RF are almost the same. Methods MFA+10 and MFA+20 provide smaller reduction, which still remains significant. Note that there is an unexpected result in a few cases where the FA<sup>\*</sup> technique is inferior to MFA. This is caused by the PF technique, which tends to specify more Xs in the FA case than in the MFA case. This is explained by the fact that MFA generates many candidate test vectors for each test cube and thus the possibility of some of them to comply with the capture power limit at early stages of the generation process increases (at early stages PF has only limited effect on the filling of Xs). This does not happen in the case of FA<sup>\*</sup>, where any violation of the capture power limit causes an immediate increase in the number of bits that have to be specified according to PF technique.

For evaluating the effectiveness of the proposed methods for defect screening, we consider the coverage of un-modeled faults, namely transition and bridging faults, obtained by applying to the circuit under test the stuck-at test vectors generated by the proposed methods. As it is common in industry, we use the launch-on-capture (LOC) scheme, also referred to as broadside scan, to apply test-vector pairs. Note that none of these two fault models were targeted by the stuck-at test sets (transition and bridging faults are used as surrogate fault models). For evaluating the defect-screening potential of the proposed methods in respect to bridging faults, we first used the  $BCE^+$  metric [19], which is useful for comparing different methods (the method with the highest value of BCE<sup>+</sup> is deemed to be more effective for defect screening). Since  $BCE^+$  is not accurate for estimating the real bridging fault coverage, we additionally simulated 400K bridging faults as follows: 100K pairs of lines were selected randomly for each circuit, and four bridging faults were simulated for each pair by considering both lines as aggressors and victims, and by considering both logic values 0 and 1 at the aggressors.



Fig. 2. Transition delay fault coverage for various values of L (for s9234)

At first we present the transition delay fault coverage of the FA<sup>\*</sup>, MFA and MFA+20 methods for various values of the power limit *L* in the range [15%, 45%]. Due to space limitation we present results only for s9234 in Fig. 2 (the remaining circuits exhibit similar behaviour). We can see that, for every value of *L* the transition fault coverage improvements are significant. However, we have to note that due to the compacted nature of the test sets, in the case of L=15%, the PF technique fails to provide test vectors with capture power below the limit *L* for many test cubes. This problem can be overcome with the use of bit-relaxation techniques and/or less specified test sets.

Fig. 3 presents the trade-off between power reduction and defect coverage based on the value of *P*. Specifically the power reduction of MFA+P techniques for P=0% (i.e., MFA), 10%, 20%,..., 100% as well as the respective transition fault coverage achieved for circuit s38417 are reported (the other circuits exhibit similar behaviour). It is obvious that the defect coverage increases as *P* increases. On the other hand, the power reduction achieved compared to RF decreases linearly and tends to zero as *P* approaches 100%, where all Xs are filled randomly.

Table IV presents the defect coverage results. The first column presents the circuits' name and the next six columns present the transition fault coverage for RF, FA, FA<sup>\*</sup>, MFA, MFA+10, and MFA+20, respectively. It is obvious that the FA and FA<sup>\*</sup> techniques provide the lowest coverage, while all the proposed methods provide higher coverage, which is even higher than that for the RF method in the majority of cases. We note that the proposed techniques exhibit higher coverage ramp-up than RF and FA<sup>\*</sup>. This is a significant advantage as it decreases the test time in an abort-at-first-fail environment. Due to lack of space, only the graph for s9234 is presented in Fig. 4.



Fig. 3. Power reduction - Defect coverage tradeoff for s38417

|              | Transition-Fault Coverage |       |       |       |            |            | Bridging-Fault Coverage |       |       |       |            |            |                             |       |       |       |            |            |
|--------------|---------------------------|-------|-------|-------|------------|------------|-------------------------|-------|-------|-------|------------|------------|-----------------------------|-------|-------|-------|------------|------------|
| Circuit      |                           |       |       |       |            |            | BCE <sup>+</sup>        |       |       |       |            |            | 400K Random Faults Coverage |       |       |       |            |            |
| Circuit      | RF                        | FA    | FA*   | MFA   | MFA+<br>10 | MFA+<br>20 | RF                      | FA    | FA*   | MFA   | MFA+<br>10 | MFA+<br>20 | RF                          | FA    | FA*   | MFA   | MFA+<br>10 | MFA+<br>20 |
| s5378        | 61.47                     | 55.48 | 55.18 | 56.95 | 61.34      | 61.81      | 95.20                   | 93.81 | 93.55 | 94.00 | 94.14      | 94.26      | 94.27                       | 92.19 | 92.08 | 92.54 | 92.79      | 92.91      |
| s9234        | 41.47                     | 40.82 | 41.01 | 43.26 | 44.04      | 48.32      | 87.51                   | 87.12 | 87.28 | 87.28 | 87.44      | 87.42      | 86.38                       | 85.49 | 85.77 | 86.00 | 86.24      | 86.34      |
| s13207       | 62.29                     | 60.31 | 61.00 | 64.05 | 65.43      | 65.90      | 92.77                   | 92.71 | 92.16 | 92.93 | 93.02      | 93.11      | 91.97                       | 91.30 | 91.14 | 91.73 | 91.80      | 92.06      |
| s15850       | 51.53                     | 51.05 | 50.33 | 52.51 | 52.56      | 54.03      | 94.24                   | 94.09 | 93.84 | 93.82 | 93.91      | 93.98      | 93.52                       | 93.11 | 93.00 | 93.03 | 93.15      | 93.29      |
| s38417       | 79.53                     | 76.20 | 76.22 | 78.48 | 79.09      | 80.38      | 98.20                   | 97.56 | 97.62 | 97.75 | 97.83      | 97.85      | 97.16                       | 96.28 | 96.32 | 96.60 | 96.68      | 96.73      |
| s38584       | 61.80                     | 61.07 | 60.83 | 61.67 | 62.17      | 62.08      | 90.31                   | 90.10 | 89.53 | 89.80 | 89.85      | 89.91      | 89.86                       | 89.58 | 89.29 | 89.50 | 89.55      | 89.56      |
| ac97_ctrl    | 42.62                     | 42.53 | 42.48 | 44.31 | 44.39      | 44.96      | 94.54                   | 94.11 | 94.09 | 94.24 | 94.29      | 94.38      | 96.94                       | 96.66 | 96.65 | 96.72 | 96.76      | 96.83      |
| mem_ctrl     | 40.96                     | 36.97 | 36.76 | 38.18 | 38.92      | 40.26      | 62.32                   | 59.72 | 59.59 | 60.09 | 60.34      | 60.70      | 74.56                       | 72.54 | 72.43 | 72.84 | 72.99      | 73.24      |
| pci_bridge32 | 64.40                     | 61.74 | 61.82 | 65.38 | 66.68      | 67.50      | 95.75                   | 95.41 | 95.46 | 95.62 | 95.67      | 95.71      | 96.61                       | 96.33 | 96.36 | 96.48 | 96.52      | 96.54      |
| tv80         | 53.47                     | 51.10 | 51.10 | 53.26 | 57.48      | 58.14      | 91.49                   | 91.00 | 90.99 | 91.11 | 91.15      | 91.12      | 89.30                       | 88.38 | 88.38 | 88.87 | 88.87      | 88.90      |
| usb_funct    | 63.94                     | 63.14 | 62.46 | 64.41 | 63.98      | 64.27      | 93.74                   | 93.21 | 93.34 | 93.39 | 93.44      | 93.49      | 95.14                       | 94.77 | 94.86 | 94.96 | 95.03      | 95.04      |
| ethernet     | 47.58                     | 46.74 | 46.79 | 48.24 | 48.58      | 49.07      | 88.81                   | 88.57 | 88.54 | 88.68 | 88.74      | 88.79      | 90.70                       | 90.35 | 90.36 | 90.53 | 90.52      | 90.56      |

TABLE IV. DEFECT COVERAGE (%)



Fig. 4. Transition delay fault coverage ramp-up for s9234

The next twelve columns in Table IV present the bridging fault coverage comparisons of the above mentioned methods (the first six present the BCE+ comparisons and the next six present the random bridging fault coverage comparisons). All results indicate that the proposed methods achieve higher coverage than FA, FA<sup>\*</sup>, approaching that for the RF method.

# V. CONCLUSION

We presented two novel X-filling methods, MFA MFA+P, for reducing the power consumption during testing and for enhancing defect coverage. MFA considerably increases the defect coverage of the resulting (filled) test vectors compared to the power-efficient FA technique, with comparable average power consumption. MFA also ensures that peak power limits during response capture are not violated. Further improvements in defect coverage are achieved by the MFA+P technique, at the cost of a small increase in the average power consumption.

#### REFERENCES

- K. Butler, et. al, "Minimizing Power Consumption in Scan Testing: Pattern Generation and DFT Techniques", ITC 2004, pp.355-364.
- [2] A. Chandra and K. Chakrabarty, "Low-power scan testing and test data compression for system-on-a-chip", IEEE Trans. on CAD, Vol. 21, No 5, May 2002 pp:597–604
- [3] A. Chandra and R. Kapur, "Bounded Adjacent Fill for Low Capture Power Scan Testing", 26<sup>th</sup> IEEE VTS, April 27-May 1 2008, pp. 131-138.
- [4] B. Chen, et. al, "Response Inversion Scan Cell (RISC): A Peak Capture Power Reduction Technique", 16<sup>th</sup> IEEE ATS, 2007, pp. 425-432.
- [5] D. Czysz, et. al, "Low Power Embedded Deterministic Test", 25th IEEE

VTS, 2007: 75-83.

- [6] D. Czysz, et. al, "Low-Power Test Data Application in EDT Environment Through Decompressor Freeze", IEEE Trans. on CAD, vol. 27, no 7. pp: 1278-1290 (2008)
- [7] P. Girard, et. al, "A test vector inhibiting technique for low energy BIST design", 17<sup>th</sup> IEEE VTS, 1999, pp. 407–412.
- [8] P. Girard, "Low Power Testing of VLSI Circuits: Problems and Solutions", 1<sup>st</sup> ISQED, 2000, pp. 173-179.
- [9] X. Kavousianos and K. Chakrabarty, "Generation of Compact Test Sets with High Defect Coverage", DATE 2009, pp. 1130-1135
- [10] S. Kajihara, K. Ishida and K. Miyase, "Test Vector Modification for Power Reduction during Scan Testing" 20<sup>th</sup> IEEE VTS pp. 160-165, 2002
- [11] H. Ko and N. Nicolici, "Automated Scan Chain Division for Reducing Shift and Capture Power During Broadside At-Speed Test", IEEE Trans. on CAD, vol. 27, No 6, Page(s): 2092-2097.
- [12] J. Lee, and N. Touba, "Low Power Test Data Compression Based on LFSR Reseeding", ICCD, 2004, pp. 180-185.
- [13] W. Li, S. Reddy and I. Pomeranz, "On Reducing Peak Current and Power During Test", IEEE Annual Symposium on VLSI, 2005, 156-161.
- [14] J. Li, et. al, "iFill: An Impact-Oriented X-Filling Method for Shift- and Capture-Power Reduction in At-Speed Scan-Based Testing", DATE 2008, p. 1184.
- [15] G. Mrugalski, et. al, "New Test Data Decompressor for Low Power Applications", DAC 2007, pp. 539-544.
- [16] S. Remersaro, et. al, "Scan Based Tests with Low Switching Activity", IEEE Design & Test of Computers, May-June 2007, pp. 268-275.
- [17] S. Remersaro, et. al, "Preferred Fill: A Scalable Method to Reduce Capture Power for Scan Based Designs", ITC, 2006, pp. 1-10
- [18] R. Sankaralingam, R. Oruganti and N. Touba, "Static compaction techniques to control scan vector power consumption", 18<sup>th</sup> IEEE VTS, 2000, pp. 35–40.
- [19] H. Tang et. al., "Defect Aware Test Patterns", DATE 2005, pp. 450-455
- [20] Z. Wang and K. Chakrabarty, "Test-Quality/Cost Optimization Using Output-Deviation-Based Reordering of Test Patterns", IEEE Trans. CAD, vol. 27, No 2, pp. 352-365, 2008.
- [21] L. Wang, et. al., "On the Decline of Testing Efficiency as Fault Coverage Approaches 100%", 13<sup>th</sup> IEEE VTS, pp. 74-83, 1995.
- [22] X.Wen, et. al, "A Capture-Safe Test Generation Scheme for At-Speed Scan Testing", 13<sup>th</sup> IEEE ETS, 25-29 May 2008, pp: 55-60.
- [23] X. Wen, et. al, "A New ATPG Method for Efficient Capture Power Reduction During Scan Testing", 24<sup>th</sup> IEEE VTS, 2006.
- [24] X. Wen, et. al, "Low-capture-power test generation for scan-based at-speed testing", ITC, 2005, pp.10 pp.-1028.
- [25] X. Wen, et. al, "A Highly-Guided X-Filling Method for Effective Low-Capture-Power Scan Test Generation", ICCD 2006, pp. 251-258.
- [26] X. Wen, et. al, "Low Capture Switching Activity Test Generation for Reducing IR-Drop in At-Speed Scan Testing ", JETTA, vol. 24, No 4, 2008, pp 379-391.
- [27] IWLS'05 bench. Circ., http://www.iwls.org/iwls2005/benchmarks.html.