邮箱:yinlongxiang@ict.ac.cn
简介:
尹龙祥,长期从事智能计算系统设计与优化相关工作。带领团队基于飞腾及海光CPU、寒武纪NPU、复旦微FPGA等国产关键算力器件完成多型智算系统原型研制,攻关高算力密度集成关键技术,服务工业智能、星载智能领域应用,基于科学院自研存算芯片完成OAM规范的存算一体智算系统原型研制,攻关存算芯片系统级集成关键技术。在DAC、EDL、IEDM等计算机与电子信息领域高水平期刊会议上发表论文20余篇。
本人相关工作请关注https://yinlongsan.github.io
研究方向:
(1)天基存储方向相关的计算系统设计、硬件设计及器件可靠性设计;
(2)分布式场景下的智能计算系统设计与优化;
(3)分离式内存系统设计与优化;
现面向具有强烈科研热情的优秀人才(硕士生及实习生)开放招募!
简历投递:yinlongxiang@ict.ac.cn
I graduated from Northwestern Polytechnical University (Xi’an) with a bachelor's degree in Electric Engineering in 2012 and from Peking University (Beijing) with a doctoral degree in Microelectronics in 2018. After that, I joined the Institute of Computing Technology, Chinese Academy of Sciences in 2018 and was promoted to associate professor in 2021. With more than ten years of experience in Electric Engineering, now I’m doing research in the field of computer architecture, which is a thriving field aiming to make better use of electric devices and circuits. I have experiences in making various prototypes such as a Monte Carlo device simulator, an Open-Accelerator-Module-based Compute-in-Memory server system, an edge AI computing system and so on. My research interest includes disaggregated memory system, binary translation for heterogenerous computing and LLM edge computing system.
MyHomepage: https://yinlongsan.github.io
Research Interests:
lLLM edge computing system
LLM edge computing system design needs to deal with the problem deploying and running LLM models in resource-constrained edge environments. It aims to achieve efficient, low-latency, secure, and reliable model inference (sometimes including lightweight training) within the constraints of limited computing power, memory, power consumption, and network bandwidth, which allows intelligence to be truly decentralized to the source of data and meets the critical requirements of real-time response, and bandwidth conservation and private protection. This field faces significant challenges, centered on the significant conflict between model size and resource constraints. Large models with billions or even tens of billions of parameters require several to tens of GB of memory for their weights alone, far exceeding the capacity of most edge devices. The generative inference computational complexity of large models is enormous, and the limited computing power of edge device processors makes it difficult to meet real-time requirements. Edge devices are typically battery-powered, and the intensive computation of large models can dramatically reduce device battery life, placing extremely high demands on computational energy efficiency (TOPS/W). Unstable network connections and the need to cope with diverse hardware platforms and input data place high demands on system adaptability and robustness.
lDisaggregated memory system
Distributed memory is considered a cornerstone technology for next-generation cloud data centers, and is expected to completely solve the "memory wall" problem. In this technology, an independent, scalable memory resource pool in computing systems is formed by decoupling memory resources tightly coupled to computing units such as CPU. This memory resource pool provides on-demand, transparent memory access capabilities to all compute units via high-speed networks such as Remote Direct Memory Access (RDMA)、Peripheral Component Interconnect Express(PCIe)、Compute Express Link(CXL)and so on. In this scenario, compute nodes no longer monopolize their local memories but can access arbitrary memory resources through the network/interconnect with low latency, which just as if it were local memory. This will enable global sharing and flexible scheduling of memory resources and make computing system upgrading more effective. However, there are also significant challenges that need to be overcome. Network/Interconnect transmission latency is significantly higher than local memory access. This requires complex system software and runtime support to eliminate the impact. What’s more,the resource management and scheduling are more complex in disaggregate memory systems than traditional computing systems, requiring efficient handling of issues such as global memory allocation, data consistency, fault tolerance, and fault isolation.
Group openings: We are currently recruiting outstanding talents (Master’s studentsand interns) with a strong passion for scientific research!
代表作:
[1] 全振宇,尹龙祥,陈晓明,韩银和,“OODAFlow:面向智能无人系统的流式数据处理框架,”高技术通讯,vol. 34, no.9, 2024.
[2] Mengjun Shang, Longxiang Yin, Ning Xu, “Degradation analysis and optimization of temperature effect on MEMRISTOR-based Neural Network Accelerators by electro-thermal simulation,” Journal of Physics: Conference Series, volume 1812, pages 012025, 2021.
[3] Yuan Liang, Longxiang Yin, Ning Xu,“A Field Programmable Process-In-Memory Architecture Based on RRAM Technology,” In Proc. of 2020 5th IEEE International Conference on Mechanical, Control and Computer Engineering (ICMCCE), pp 2323–2326, 2020.
[4] Xiaoming Chen, Longxiang Yin, Bosheng Liu and Yinhe Han, “Merging Everything (ME): A Unified FPGA Architecture Based on Logic-in-Memory Techniques,” in Proc. Of 2019 ACM/IEEE Design Automation Conference (DAC), 2019, pp. 1-2. (CCF-A类)
[5] Longxiang Yin, Gang Du and Xiaoyan Liu, “Impact of Ambient Temperature on the Self-heating Effects in FinFETs,” Journal of Semiconductors, vol. 39, no. 9, pp. 094011, 2018.
[6] Longxiang Yin, Lei Shen, Shaoyan Di, Gang Du and Xiaoyan Liu, “Investigation of thermal effects on FinFETs in the quasi-ballistic regime,” Japanese Journal of Applied Physics, vol. 04FD14, no. 57, pp. 4F-14F, 2018.