Literature Map

GRAPHHOME

An evolving index of research papers and books I've read, with my notes on what matters and why.

Categories

Compute-in-Memory1Computer Arithmetic1Datacenter Computing3Digital CIM1Logarithmic Number System (LNS)1Processor1Quantization1SRAM1

Recently reviewed

  1. LogFlex: Flexible-bit Log Arithmetic Accelerator for Language Models on Edge

    SRC

    Yujin Kim, Faraz Tahmasebi, Gunjae Koo, Hyoukjun Kwon — Korea University, UC Irvine

    The authors apply low-precision (8-bit) LNS, and adaptively assign bits for the integer and fraction depending on the data distribution, which enables near FP16 accuracy/perplexity. We also co-design the LNS arithmetic and accelerator architecture, which leads to 33% less energy than FP8 (E4M3) accelerator with similar area as an INT8 accelerator, while delivering 30% lower perplexity compared to FP8 (E4M3).

    IEEE Micro Special Issue · 2026

  2. An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

    SRC

    Leland Chang et. al. — IBM T. J. Watson Research Center

    An eight-transistor (8T) SRAM cell is proposed to improve variability tolerance and low-voltage operation in high-speed SRAM caches. No need for secondary or dynamic power supplies. The proposed 8T solution is demonstrated in a high-performance 32 kb subarray designed in 65 nm PD-SOI (Partially-Depleted Silicon-On-Insulator) CMOS that operates at 5.3 GHz at 1.2 V and 295 MHz at 0.41 V.

    Journal of Solid-State Circuits (JSSC) · 2008

    Must Read
  3. A 16nm 72kb 120.5TFLOPS/W Versatile-Format Dual-Representation Gain-Cell CIM Macro for General Purpose AI Tasks

    J-C. Tien et. al. — National Tsing Hua University and TSMC in Hsinchu, Taiwan

    A reconfigurable 2’s-complement and sign-magnitude scheme integrated within a versatile-format CIM macro supporting MX, LNS, FP, and INT for MAC operations.
    Stats: 16nm 72kb gain-cell array, energy efficiency of 120.5TFLOPS/W and throughput density of 3.18 TOPS/mm2 in MXINT8 mode.

    IEEE International Solid-State Circuits Conference (ISSCC) · 2026

    Good
  4. Amdahl's law for tail latency

    SRC

    Christina Delimitrou and Christos Kozyrakis — Stanford University

    Queueing theoretic models can guide design trade-offs in systems targeting tail latency, not just average performance.

    Communications of the ACM · 2018

  5. The tail at scale

    SRC

    Jeffrey Dean and Luiz André Barroso — Google

    Software techniques that tolerate latency variability are vital to building responsive large-scale web services.

    Communications of the ACM · 2013

    Must Read
© 2026 Amitabh Yadav. All rights reserved.