What are the unique features of OpenAI's self-developed chips? II

release time:2023-10-10Author source:SlkorBrowse:11196

What are the unique features of OpenAI's self-developed chips?

Currently, there are many technology companies that have developed their own chips. So, what makes OpenAI's self-developed chips different from the self-developed chips of companies like Google and Amazon?

Firstly, OpenAI's self-developed chips are purely for their own model training purposes, which is different from the commercial models of companies like Google and Amazon that focus on providing self-developed chips for use on cloud servers by customers. For companies like Google and Amazon, who provide self-developed chips for cloud service customers, the usage scenarios of the models are not clear, the software stack used is uncertain, and the specific models being trained are also uncertain. Therefore, their chip designs need to meet compatibility requirements, often sacrificing efficiency and performance for each training task. On the other hand, OpenAI's self-developed chips are only for their own use, and the models being trained are very specific: large language models based on Transformers. OpenAI also has full control over the software stack used, ensuring highly targeted design.

The second difference lies in OpenAI's deep understanding of the models. OpenAI is a leading company in the field of generative models, and currently, the GPT series models are still the best-performing generative models. OpenAI has accumulated years of experience in generative model research, which means they have a deep understanding of various design solutions for generative models. This implies that OpenAI has the capability and knowledge to do chip-model co-design, designing models according to the characteristics of the chips, and specifying the design metrics of the chips based on the requirements of the models. This includes optimizing trade-offs between computational units, storage, and chip interconnects. Moreover, OpenAI has the most explicit roadmap for future generative large models in the industry, which means even if it takes several years for the self-developed chips to be ready, there is no need to worry about the chips being outdated once they are in mass production. From this perspective, OpenAI's self-developed chips are quite different from those of Google and Amazon, but they share some similarities with Tesla's Dojo series of self-developed model training chips. However, what sets OpenAI apart is that their demand for model training is evidently much higher than Tesla's, making the importance of self-developed chips even greater for OpenAI.

image (7).jpg

These unique features of OpenAI give it the opportunity to achieve high-performance chips designed specifically for unconventional purposes. Recently, Nvidia analyzed the performance improvement pattern of its GPUs in an official blog post. Nvidia's GPU computing power has increased by 1000 times in less than a decade. According to the analysis, within this 1000-fold increase in computing power, optimization of computational precision (such as using 16-bit or even 8-bit floating-point numbers instead of the original 32-bit floating-point calculation) combined with dedicated computing modules achieved a 16-fold performance improvement. In addition, collaborative optimization between chip architecture and compilers provided a 12.5-fold performance improvement. On the other hand, the performance improvement brought by semiconductor technology was only two-fold. Therefore, it can be seen that in the field of high-performance computing chips, algorithm and chip architecture co-design (including model algorithms and compiler algorithms) are the main driving forces for performance improvement (also known as Huang's Law). From this perspective, OpenAI is indeed in a very advantageous position. With its deep understanding of algorithms, OpenAI is expected to fully leverage Huang's Law and achieve high-performance computing chip designs in the next few years.

The challenges of OpenAI's self-developed chips

In addition to its advantages, there are indeed challenges for OpenAI's self-developed chips.

The primary challenge of high-performance chips, which are designed for large models, lies in their complexity. From the perspective of chip design, careful consideration is needed for the computational units, memory access, and interconnectivity within the high-performance computing chip. For example, to meet the requirements of large models, it is highly likely that the chip will utilize HBM (High Bandwidth Memory). To achieve high energy efficiency and scalability, advanced process technologies and chip die-stacking techniques are expected to be employed for high yield. Distributed computing is often used for large models, making interconnectivity between chips crucial (similar to Nvidia's NVLINK and InfiniBand technologies for GPUs, OpenAI would also require similar technologies). Each of these chip design components requires an experienced team to implement, and integrating them together demands excellent architectural design to ensure overall performance. Building an experienced team within a short period of time to tackle these challenging designs will be an important challenge for OpenAI.

In addition to chip design, another significant challenge for OpenAI is ensuring the synergy between software and hardware, or in other words, designing a high-performance compiler and related software ecosystem. Currently, one key advantage of Nvidia GPUs is their CUDA software system, which has accumulated high performance and compatibility over more than a decade. In OpenAI's self-developed chip, the compiler system also needs to achieve high performance like CUDA to fully utilize the chip's computing power. Unlike other tech companies that develop chips for cloud services, OpenAI's chips are primarily for internal use, so there is less concern about the ecosystem and user model support. However, the compiler's performance needs to approach that of Nvidia's CUDA. In fact, OpenAI has already made investments in this field earlier. In July of this year, OpenAI announced its AI model compilation solution based on the open-source Triton language. It compiles Python code into intermediate representation (IR) using the Triton compiler and LLVM compiler, which can then be compiled into PTX code and run directly on GPUs and AI accelerators that support PTX. From this perspective, OpenAI's investment in compilers may serve as a precursor to its self-developed chips.

Lastly, the specific production of the chip will also pose a challenge. As mentioned earlier, OpenAI is likely to utilize advanced process nodes and advanced packaging technologies. Therefore, ensuring high yield during production and, more importantly, obtaining sufficient production capacity to mass-produce within the potentially tight supply of advanced packaging and process nodes in the coming years will be a problem that needs to be addressed.

image (8).jpg

Considering these three challenges, we believe that OpenAI's self-developed chip plan may involve multiple steps. Firstly, before the technical and production issues are fully resolved, OpenAI can choose to collaborate with Microsoft (its largest shareholder, which also has its own chip plan called Athena) and Nvidia (or AMD). They can opt for semi-customized chips, where OpenAI provides certain specifications and even some intellectual property (IP) that the chip needs to support, while the collaborative partners handle the chip design and production. Once the technical and production issues are resolved, OpenAI can choose to heavily invest in fully customized chips, achieving optimal performance and controllability.

Previous:Global First! China Achieves Major Breakthrough in Chip Field
Next:What are the unique features of OpenAI's self-developed chips? I