# Mixed-signal VLSI Independent Component Analyzer for Hearing Aid Applications

Shuo Li and Milutin Stanaćević

*Abstract*— We present a mixed-signal architecture for implementation of independent component analysis designed for the task of blind source separation of acoustic sources interfacing miniature microphone array. The matrix-vector multiplication is implemented through integration of switched current sources controlled by the pulse-width modulated signals. The proposed architecture implementing  $3x3$  static ICA in  $0.5\mu$ m CMOS technology occupies chip area of  $0.49$  mm<sup>2</sup> with the power consumption of  $80\mu$ W at 5 V supply voltage.

#### I. INTRODUCTION

A person can seamlessly focus and understand a specific speaker under various levels of the background noise. However, the performance of the current state-of-the-art hearing aids, as well as the speech recognition software, significantly deteriorates with other speakers present in the background. Smart sensing hearing aids could greatly benefit from robust speech separation in adverse acoustic environments.

We have proposed an algorithm that combines in a unique framework the spatial sampling, sub-band processing and independent component analysis to achieve improvement in the separation performance in moderate reverberant acoustic environments [1], [2]. The hardware implementation of the proposed algorithm requires a 16-channel 3x3 linear static ICA architecture leading to stringent constraints on the chip area and power consumption of a single ICA channel. These constraints are not met in various proposed implementations of static ICA in either analog [3], [4] or digital domain [5], [6]. In the proposed implementation, we exploit the pulse width modulation to implement matrix-vector multiplication [7], [8] in order to meet the constraints on area and power consumption.

## II. INDEPENDENT COMPONENT ANALYZER ARCHITECTURE

The independent component analysis is the signal processing technique for obtaining independent directions in multivariate data. In many cases it is exploited for the blind source separation (BSS), where the task is to recover the unknown sources s from their mixtures x, without prior information on them, except their independence. We assume that the mixing is linear  $x = As$  and that there exists an unmixing matrix W that would recover original sources,  $y = Wx$ , up to a scale and ordering uncertainty.

The independent component analysis implementation comprises the vector-matrix multiplication  $y = Wx$  and adaptation of the unmixing matrix coefficients according to an ICA learning rule. A wide variety of ICA learning rules have been proposed in the literature [9]. We implemented the natural gradient learning rule [10]

$$
\Delta \mathbf{W} = \mu \left( \mathbf{I} - f(\mathbf{y}) \mathbf{y}^T \right) \mathbf{W} = \mu (\mathbf{W} - f(\mathbf{y}) \mathbf{z}^T), \quad (1)
$$

where the  $f(y)$  presents a nonlinear scalar function, which in information-theoretic framework can be related to the cumulative distribution function of the unknown source signals. As the proposed implementation is intended for the acoustic source separation of the speech signals, that are approximately Laplacian distributed, the optimal choice of the nonlinear scalar function  $f(\mathbf{y})$  is sign(y). For efficient implementation, a feedback signal  $z = W<sup>T</sup>y$  in the learning rule can be approximated by a 3-level staircase function  $(-1, 0, +1)$ , a function implemented using 2-bit quantization denoted as  $q(\mathbf{z})$ . The quantization of the signal z in the update rule (1) simplifies the implementation of the update rule to a single-bit outer-product.

The block diagram of the proposed architecture for the implementation of the ICA algorithm is shown in Figure 1. In the Figure 1, the notation  $\langle x_i \rangle$  and  $\langle y_i \rangle$  denotes the pulse-width modulated signals controlled by the input signal  $x_i$  and the output signal  $y_i$ , respectively.

## III. CIRCUIT IMPLEMENTATION

The independent component analysis implementation comprises the vector-matrix multiplications  $y = Wx$  and  $z = W<sup>T</sup>y$  along with the adaptation of the unmixing matrix coefficients according to learning rule (1). There are two main circuit blocks that will be described, the adaptation cell and the voltage-to-time conversion circuit following the current integration.

## *A. Learning Rule Implementation*

In the proposed implementation, the unmixing coefficients  $W_{ij}$  are stored differentially as voltages  $V_{ij}^+$  and  $V_{ij}^-$  on two complementary switched current sources [11] as shown in Figure 2(a). For clarity of the Figure, we have omitted the replica of the current source  $M_0$  that contributes to the  $i_i^$ and  $i_j^-$  currents through the switches controlled by  $\langle x_j^- \rangle$ and  $\langle y_i^- \rangle$ , respectively. The outer-product update rule (1) is implemented using two transistors with the functions  $f(\mathbf{y})$ and  $q(z)$  time encoded, as illustrated in Figure 2(b).

<sup>\*</sup>This work was supported by NSF CAREER Award 0846265.

S. Li and M. Stanacevic are with the Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794–2350, USA shuo.li,milutin.stanacevic at stonybrook.edu



Fig. 1. Block diagram of the architecture of the Independent Component Analyzer with 3x3 unmixing matrix.

The proposed implementation enables fine updates of the unmixing coefficients with both positive and negative increments. The 3-level staircase function  $q(\mathbf{z})$  is approximated with the presence/absence of the voltage pulse and by the relative position of the pulse. The function  $f(\mathbf{y})$  is coded as a two-level signal, with the  $sign(y)$  determining the order of the levels  $V_{lo}$  and  $V_{hi}$ . These voltage levels are applied externally and control the value of the adaptation rate  $\mu$ . To reduce the required silicon area the  $C_w$  is implemented as a MOS capacitance with the total capacitance of 2 pF. When the *update* signals goes high, the charge on the small parasitic capacitance on the drain/source diffusion between transistors  $M_1$  and  $M_2$ , denoted as  $C_p$ , and  $C_w$  is shared. The resulting voltage change on the capacitor  $C_w$  is given by

$$
V_{ij}^{+}[n+1] = V_{ij}^{+}[n] + \frac{C_p}{C_w + C_p}(V_{Aij}^{+}[n] - V_{ij}^{+}[n]) \quad (2)
$$

The common mode component  $\frac{1}{2}(W_{ij}^+ + W_{ij}^-)$  is regulated by the weight decay term on the right side of (2), pulling the values towards the center of the range.

The effect of the charge injection and the clock feedthrough on the adaptation can be modeled as a constant offset plus the contribution that is dependant on the voltage  $V_{ij}$  which scales the decay term in the learning rule. The current source transistor  $M_0$  is sized to operate in the subthreshold region of operation with the range of currents that represent  $W_{ij}$  from 100 pA to 100 nA. The advantage of the nonlinear transformation of stored voltages  $V_{ij}$  into a current representing the unmixing coefficient  $W_{ij}$  is a wide dynamic range of coefficients over a limited linear range of stored voltages.

## *B. Matrix-Vector Multiplication*

The vector-matrix multiplications  $y = Wx$  and  $z =$  $W<sup>T</sup>y$  are implemented by integrating switched currents controlled by a pulse-width modulated signal. To minimize the chip area, the two multiplications and the quantization of the signal z are implemented in three phases using the same the integration and voltage-to-time conversion circuitry. In the first phase, y is computed along with the voltage-to-time



Fig. 2. (a) Circuit implementation of the ICA learning rule. (b) Timeencoding of the functions  $f(y)$  and  $q(z)$  along with the timing of the update pulse.

conversion of x; in the second phase, z is computed along with the voltage-to-time conversion of y and in the third phase, the quantization of the signal z through voltage-totime conversion is performed.

The implementation of the integration and voltage-to-time conversion is illustrated in Figure 3(a), with the corresponding clock timings of each switch shown in Figure 3(b). Clocks  $\phi_1$  and  $\phi_2$ , as well as  $\phi_5$  and  $\phi_6$  are non-overlapping clocks. Both the input and the output signal are differential,



Fig. 3. Integration of switched currents and voltage-to-time conversion circuitry.

as well as the coefficients of the mixing matrix:

$$
y_i^+ = \sum_{j=1}^3 (W_{ij}^+ x_j^+ + W_{ij}^- x_j^-)
$$
 (3)

$$
y_i^- = \sum_{j=1}^3 (W_{ij}^- x_j^+ + W_{ij}^+ x_j^-)
$$
 (4)

Current pulses are integrated on the capacitor  $C_{int}$  and the size of capacitor  $C_{int}$  is 2 pF. In the voltage-to-time converter, the input voltage signal precharges the integration capacitor  $C_t$ . The current fed into the input node of the inverting high-gain amplifier discharges the capacitor. A comparison of the decreasing voltage ramp signal at the output node of the amplifier with a reference voltage  $V_{comp}$ generates a pulsed signal with a pulse width proportional to the input voltage. The high-gain amplifiers are implemented as cascoded amplifiers operating in sub-threshold region of operation with the input PMOS transistor. The integrator is also followed by the sample-and-hold circuit that holds the output signal  $y_i$ .

As the pulse-width modulated output signals  $\langle y_i^+ \rangle$ and  $\langle y_i^{-} \rangle$  are available, with a single D-latch the sign of the  $y_i$  is determined. To generate the quantized signal  $q(z<sub>i</sub>)$ , a comparison with a positive and a negative threshold voltage  $V_{th}$  is required. As in the case of the output signal  $y_i$ , both pulse-width modulated  $z_i^+$  and  $z_i^-$  signals are available. The comparison with the threshold voltage is performed by delaying one of these pulses before the connection to the input of the D-latch. Voltage  $V_b$  controls the threshold voltage by controlling the delay time. In Figure 4, a single comparison of signal  $z_i$  with a threshold voltage is shown.



Fig. 4. Implementation of the comparison of signal  $z_i$  with a threshold voltage for generation of the quantized signal  $q(z<sub>i</sub>)$ .



Fig. 5. Layout of the proposed implementation in  $0.5 \mu$ m CMOS technology.

## IV. SIMULATION RESULTS

The proposed architecture was implemented in  $0.5\mu$ m 3M2P CMOS technology and the layout is shown in Figure 5. The total area of the 3x3 static ICA implementation is 0.49mm<sup>2</sup> . The simulation of the circuit was performed on the extracted layout.

To demonstrate the adaptation process, we have simulated the adaptation cell shown in Figure 2, with a constant sign of the update. The incremental values of the unmixing coefficient as the current of transistor  $M_0$  are shown in Figure 6.

The output voltage of the integrator  $y_1$  is shown in Figure 7 for three different values of the unmixing coefficient  $W_{11}$  while the other current sources representing unmixing coefficients are switched off. The input voltage  $x_1$  is varied from 1 V to 4 V. The measured linearity of the matrix-vector multiplication is 0.05%.

The proposed implementation of the ICA algorithm for the acoustic source separation in hearing aid applications was modeled in Matlab. To demonstrate the separation performance, the speech signals originating from two sources were artificially generated as received on the four microphone array with the distance between opposing microphone pairs set at 1 cm and with the sampling frequency of 16 kHz. The incidence angles of the two speech sources were  $30^{\circ}$  and  $70^{\circ}$ .



Fig. 6. Successive incremental updates of the unmixing coefficient in a single direction.



Fig. 7. Linearity of the matrix-vector multiplication for three different values of the unmixing coefficient.

Two first-order spatial gradient signals were obtained [2] and used as inputs to the model of the proposed ICA implementation. A white, spatially uncorrelated Gaussian noise sources were added to each sensor. The separation performance is quantized as the signal-to-interference ratio(SIR) in the output signals. The signal-to-interference ratio is computed as

$$
SIR = -10 \log_{10} \min_{i} \frac{\sum_{j} \langle y_{ij}^{2} \rangle - \max_{j} \langle y_{ij}^{2} \rangle}{\max_{j} \langle y_{ij}^{2} \rangle}, \quad (5)
$$

where  $y_{ij}$  is the contribution of the signal j to the output signal  $i$ . SIR, for different signal-to-noise ratio(SNR) in the sensor signals, is shown in Figure 8.

#### V. CONCLUSION

We have presented an architecture and circuit implementation of an independent component analyzer for the use in a blind acoustic source separation microsystem using microphone array for hearing aid applications. The proposed pulse width modulation implementation allows a power and



Fig. 8. The separation performance expressed as SIR in the output signals for two incident speech signals on miniature microphone array.

silicon area efficient application that can be used to realize multi-channel subband blind source separation and extended to other neural network applications.

## **REFERENCES**

- [1] S. Li and M. Stanaćević, "Subband Gradient Flow Acoustic Source Separation for Moderate Reverberation Environment," *Conf. Rec. of the 46th Asilomar Conference on Signals, Systems and Computers*, Pacific Grove CA, Nov 2012.
- [2] S. Li, Y. Lin and M. Stanaćević, "Mixed-signal VLSI Microsystem for Acoustic Source Separation," *Proc. 56th. IEEE Midwest Symp. on Circuits and Systems (MWSCAS'2013)*, Columbus, Ohio, 2013.
- [3] M.H. Cohen and A.G.Andreou, "Analog CMOS Integration and Experimentation with an Autoadaptive Independent Component Analyzer,' *IEEE Trans. Circuits and Systems II,* vol. 42 (2), pp. 65-77, Feb. 1995.
- [4] A. Celik, M. Stanacevic and G. Cauwenberghs, "Mixed-signal realtime adaptive blind source separation," *Proc. IEEE Int. Symp. Circuits Syst.*, pp. 760-763, 2004.
- [5] K.K. Shyu, M.H. Lee, Y.T. Wu and P.L. Lee, "Implementation of pipelined fastICA on FPGA for real-time blind source separation," *IEEE Trans. on Neural Network*, vol. 19 (6), pp. 958-970, 2008.
- [6] L.-D. Van, D-Y. Wu and C.-S. Chen, "Energy-Efficient FastICA Implementation for Biomedical Signal Separation," *IEEE Trans. on Neural Network*, vol. 22 (11), pp. 1809-1822, 2011.
- [7] K. Papathanasiou, T. Brandtner and A. Hamilton, "Palmo: pulsebased signal processing for programmable analog VLSI," *IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 49 (6), pp. 379-389, 2002.
- [8] J.-C. Bor and C.-Y. Wu, "Realization of the CMOS pulsewidth modulation (PWM) neural network with on-chip learning," *IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 45 (1), pp. 96-107, 1998.
- [9] A. Cichocki and S. Amari, *Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications*, New York: John Wiley, 2002.
- [10] S. Amari, A. Cichocki and H. Yang, "A new learning algorithm for blind signal Separation," *Adv. Neural Information Processing Systems*, MIT Press, Cambridge MA, vol. 8, pp. 757-763, 1996.
- [11] M. Stanaćević and G. Cauwenberghs, "Charge-Based CMOS FIR Adaptive Filter," *Proc. 43rd IEEE Midwest Symp. Circuits and Systems (MWSCAS'2000)*, Lansing MI, August 8-11, 2000.