A 0.44-μJ/dec, 39.9-μs/dec, Recurrent Attention In-Memory Processor for Keyword Spotting

Contributions:

A recurrent attention model (RAM) algorithm [11] for Keyword Spotting (KWS).
KeyRAM algorithm allows accuracy vs energy scalability via confidence based computation scheme.
Multi-bit, multi-bank – 2 banks – IMC architecture with 4-bit matrix-vector multiplies, alongside a digital co-processor.
Sparsity aware summation scheme – what are the challenges for IMC when doing spare summations?
Digital co-processor employs a diagonal major weight storage to compute without any stalls – what is that?
Metrics: energy delay product?

Note: [12] – conference paper version of this paper.

Keyword Spotting flow: feature extraction → classification.

IMC papers: [13]–[22], [23] – IMC was first proposed.

Digital Low-Power Techniques:

a. Voltage over-scaling [4], [5], [7] an [2].

b. [6] RNN-IMC using SRAM macro 65nm – google speech dataset [8].

Depth-wise separable CNN [9], implemented as IC with lowest power consumption RNN for KWS [7].

c. [10] - signal processing, can be interesting to read. Voice activity detector (VAD), $P_{VAD} = 200 nW$ . Trick: Add VAD to perform power-gating to the KWS engine.

CIM Macro Specs: 96 × 512 6T SRAM cells, based on [13] and [14], mixed-signal multi-bit dot product processor.