An area-delay efficient Radix-8 12x12 Booth multiplier in CMOS for ML accelerator

Open Access

Issue		ITM Web Conf. Volume 74, 2025 International Conference on Contemporary Pervasive Computational Intelligence (ICCPCI-2024)


Article Number		02007
Number of page(s)		8
Section		Cybersecurity, Networks, and Computing Technologies
DOI		https://doi.org/10.1051/itmconf/20257402007
Published online		20 February 2025

Mohanty, B.K., Choubey, A. Efficient Design for Radix-8 Booth Multiplier and Its Application in Lifting 2-D DWT. Circuits Syst Signal Process 36, 1129–1149 (2017). [CrossRef] [Google Scholar]
N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” in Int. Symp. on Comp. Arch. (ISCA), 2017, p. 1–12. [Google Scholar]
A. Samajdar et al., “A systematic methodology for characterizing scalability of DNN accelerators using scale-sim,” in IEEE Int. Symp. on Perf. Analysis of Systems and Software (ISPASS), 2020, pp. 58–68. [Google Scholar]
X. Wei et al., “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” in Design Automation Conference (DAC), 2017. [Google Scholar]
B. Asgari, R. Hadidi, and H. Kim, “Meissa: Multiplying matrices efficiently in a scalable systolic architecture,” in IEEE Int. Conf. on Computer Design (ICCD), 2020, pp. 130–137. [Google Scholar]
I. Ullah, K. Inayat, J.-S. Yang, and J. Chung, “Factored radix-8 systolic array for tensor processing,” in Design Automation Conf. (DAC), 2020. [Google Scholar]
C. Peltekis, D. Filippas, G. Dimitrakopoulos, C. Nicopoulos, and D. Pnevmatikatos, “ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining,” in Design Automation and Test in Europe (DATE), 2023. [Google Scholar]
R. Xu et al., “Configurable multi-directional systolic array architecture for convolutional neural networks,” ACM TACO, vol. 18, no. 4, July 2021. [Google Scholar]
J. Lee, J. Choi, J. Kim, J. Lee, and Y. Kim, “Dataflow mirroring: Architectural support for highly efficient fine-grained spatial multitasking on systolic-array npus,” in Design Automation Conference (DAC). IEEE, 2021, pp. 247–252. [Google Scholar]
Google. (2022) The bfloat16 numerical format. [Online]. Available: https://cloud.google.com/tpu/docs/bfloat16 [Google Scholar]
D. Filippas, C. Peltekis, G. Dimitrakopoulos, and C. Nicopoulos, “Reduced-Precision Floating-Point Arithmetic in Systolic Arrays with Skewed Pipelines,” in IEEE Intern. Conference on Artificial Intelligence Circuits and Systems (AICAS), 2023. [Google Scholar]
N. Jouppi et al., “Ten lessons from three generations shaped Google’s tpuv4i: Industrial product,” in Intern. Symp. on Computer Architecture (ISCA). IEEE, 2021, pp. 1–14. [Google Scholar]
M. Andersch et al. (2022) NVIDIA Hopper architecture. [Online]. Available: https://developer.nvidia.com/blog/Nvidia-hopper-architecture-in-depth [Google Scholar]
O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Intern. Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015. [CrossRef] [Google Scholar]
S. Wimer and I. Koren, “Design flow for flip-flop grouping in data-driven clock gating,” IEEE Trans. on VLSI Systems, vol. 22, no. 4, pp. 771–778, 2014. [CrossRef] [Google Scholar]
M. R. Stan and W. P. Burleson, “Bus-invert coding for low-power i/o,” IEEE Trans. on VLSI systems, vol. 3, no. 1, pp. 49–58, 1995. [CrossRef] [Google Scholar]
Y. Shin, S.-I. Chae, and K. Choi, “Partial bus-invert coding for power optimization of application-specific systems,” IEEE Trans. on VLSI Systems, vol. 9, no. 2, pp. 377–383, 2001. [CrossRef] [Google Scholar]
L. Benini, A. Macii, E. Macii, M. Poncino, and R. Scarsi, “Synthesis of low-overhead interfaces for power-efficient communication over wide buses,” in Design Automation Conference (DAC), 1999, pp. 128–133. [Google Scholar]
A. Acquaviva and R. Scarsi, “A spatially-adaptive bus interface for low switching communication,” in Intern. Symp. on Low Power Electronics and Design (ISLPED), 2000, pp. 238–240. [CrossRef] [Google Scholar]
K. He et al., “Deep residual learning for image recognition,” in IEEE CVPR, 2016, pp. 770–778. [Google Scholar]
A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv: 1704.04861, 2017. [Google Scholar]
Z.-G. Liu, P. N. Whatmough, Y. Zhu, and M. Mattina, “S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration,” in IEEE Intern. Symp. on HighPerformance Computer Architecture (HPCA), 2022, pp. 573–586. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.