Amitabh Yadav

Recently reviewed

LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

Yujin Kim, Faraz Tahmasebi, Gunjae Koo, Hyoukjun Kwon — Korea University, UC Irvine

The authors apply low-precision (8-bit) LNS, and adaptively assign bits for the integer and fraction depending on the data distribution, which enables near FP16 accuracy/perplexity. We also co-design the LNS arithmetic and accelerator architecture, which leads to 33% less energy than FP8 (E4M3) accelerator with similar area as an INT8 accelerator, while delivering 30% lower perplexity compared to FP8 (E4M3).

IEEE Micro Special Issue · 2026

An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

SRC

Leland Chang et. al. — IBM T. J. Watson Research Center

An eight-transistor (8T) SRAM cell is proposed to improve variability tolerance and low-voltage operation in high-speed SRAM caches. No need for secondary or dynamic power supplies. The proposed 8T solution is demonstrated in a high-performance 32 kb subarray designed in 65 nm PD-SOI (Partially-Depleted Silicon-On-Insulator) CMOS that operates at 5.3 GHz at 1.2 V and 295 MHz at 0.41 V.

Journal of Solid-State Circuits (JSSC) · 2008

Must Read

A 16nm 72kb 120.5TFLOPS/W Versatile-Format Dual-Representation Gain-Cell CIM Macro for General Purpose AI Tasks

J-C. Tien et. al. — National Tsing Hua University and TSMC in Hsinchu, Taiwan

A reconfigurable 2’s-complement and sign-magnitude scheme integrated within a versatile-format CIM macro supporting MX, LNS, FP, and INT for MAC operations.
Stats: 16nm 72kb gain-cell array, energy efficiency of 120.5TFLOPS/W and throughput density of 3.18 TOPS/mm2 in MXINT8 mode.

IEEE International Solid-State Circuits Conference (ISSCC) · 2026

Good

Amdahl's law for tail latency

SRC

Christina Delimitrou and Christos Kozyrakis — Stanford University

Queueing theoretic models can guide design trade-offs in systems targeting tail latency, not just average performance.

Communications of the ACM · 2018

The tail at scale

SRC

Jeffrey Dean and Luiz André Barroso — Google

Software techniques that tolerate latency variability are vital to building responsive large-scale web services.

Communications of the ACM · 2013

Must Read

Recently reviewed

LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

SRC

Yujin Kim, Faraz Tahmasebi, Gunjae Koo, Hyoukjun Kwon — Korea University, UC Irvine

IEEE Micro Special Issue · 2026

An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

SRC

Leland Chang et. al. — IBM T. J. Watson Research Center

Journal of Solid-State Circuits (JSSC) · 2008

Must Read

A 16nm 72kb 120.5TFLOPS/W Versatile-Format Dual-Representation Gain-Cell CIM Macro for General Purpose AI Tasks

J-C. Tien et. al. — National Tsing Hua University and TSMC in Hsinchu, Taiwan

IEEE International Solid-State Circuits Conference (ISSCC) · 2026

Good

Amdahl's law for tail latency

SRC

Christina Delimitrou and Christos Kozyrakis — Stanford University

Queueing theoretic models can guide design trade-offs in systems targeting tail latency, not just average performance.

Communications of the ACM · 2018

The tail at scale

SRC

Jeffrey Dean and Luiz André Barroso — Google

Software techniques that tolerate latency variability are vital to building responsive large-scale web services.

Communications of the ACM · 2013

Must Read

Literature Map

Categories

Recently reviewed

LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

A 16nm 72kb 120.5TFLOPS/W Versatile-Format Dual-Representation Gain-Cell CIM Macro for General Purpose AI Tasks

Amdahl's law for tail latency

The tail at scale

Literature Map

Categories

Recently reviewed

LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

A 16nm 72kb 120.5TFLOPS/W Versatile-Format Dual-Representation Gain-Cell CIM Macro for General Purpose AI Tasks

Amdahl's law for tail latency

The tail at scale