Started at Google and started with OpenAI, which is the career trajectory of many GPT-4 contributors.
This week, the release of OpenAI's large model GPT-4 has brought technical competition in the global tech circle to a fever pitch. Within a few days, ChatGPT, Bing Search and Microsoft 365 were connected to GPT-4, Microsoft's AI applications instantly opened a position over competitors, and some even said that a new industrial revolution had begun.
On the one hand, we are shocked by the effect of GPT-4, on the other hand, we can't wait to understand the technology behind it, curious about its training methods, the computing power used, etc.
Unfortunately, OpenAI is not Open. In a published paper (which is actually more like a technical report), OpenAI makes it clear that the GPT-4 model uses RLHF fine-tuning and does not disclose any technical details.
Given the competitive and security implications of large models such as GPT-4, this report does not contain more details about the architecture (including model size), hardware, training computations, dataset building, training methods, and more.
However, in this report, OpenAI details the contributors and what they are responsible for. This deserves a closer look. The list and classification of these hundreds of contributors allows me to understand the efforts of the departments and technical branches behind the success of GPT-4.
In this article, we take stock of representative contributors and hope to inspire our readers.
R&D personnel account for the vast majority
From the perspective of organizational structure, the R&D team behind GPT-4 can be roughly divided into seven parts: Pretraining, Long context, Vision, RL & alignment, Evaluation & analysis, Deployment, and other contributors contributions)。
The work of the pre-training section is broken down into:
Compute cluster scaling
Data
Distributed training infrastructure
Hardware correctness
Optimization & architecture
Training run babysitting
The work of the long context section is broken down into:
Long context research
Long context kernels
The work of the visual section is subdivided into:
Architecture research
Compute cluster scaling
Distributed training infrastructure
Hardware correctness
Data
Alignment Data
Training run babysitting
Deployment & post-training
The work of the reinforcement learning & alignment section is broken down into:
Dataset contributions
Data infrastructure
ChatML format
Model safety
Refusals
Foundational RLHF and InstructGPT work
Flagship training runs
Code capability
The work in the Evaluation & Analysis section is broken down into:
OpenAI Evals library
Model-graded evaluation infrastructure
Acceleration forecasting
ChatGPT assessment
Capability evaluations
Coding evaluations
Real-world use case evaluations
Contamination investigations
Instruction following and API evals
Novel capability discovery
……
A perusal of the list of contributors reveals that the members of the GPT-4 project team usually "wear many hats." For tech companies looking to catch up with ChatGPT, the departmental architecture template provided by OpenAI provides some ideas to learn. In addition, it may also have some enlightenment for the future development direction of talents in the field of AI.
After the launch of ChatGPT, OpenAI also made some adjustments in talent acquisition, recruiting dozens of former Google and Meta employees to create AI chatbots.
On OpenAI, Google's title as the "Silicon Valley Whampoa Military Academy" is true: according to data from LeadGenius and Punks & Pinstripes, many of the company's more than 300 employees (as of January 2023) are from Cryptography, the parent company of Google and DeepMind. OpenAI currently employs about 59 former Google employees and about 34 former Meta employees, including several former Apple and Amazon employees.
In view of OpenAI's disclosure of all contributors at the first time of GPT-4's release, Machine Heart has compiled some of the Chinese scholars who participated in the work. If you miss it, you are welcome to add it.
Pre-training group
Trevor Cai
Trevor Cai is the Leader of the Throughput Team in the GPT-4 project. Trevor Cai graduated from the University of Southern California with a master's degree and joined OpenAI in March 2022. Prior to joining OpenAI, Trevor Cai worked at DeepMind for nearly 5 years as a software engineer.
Yuan Qiming
Qiming Yuan is the leader of the GPT-4 project dataset source and processing team. Qiming Yuan received his bachelor's degree from Tsinghua University and his master's degree from the University of Texas at Austin and joined OpenAI in 2018. Previously, Yuan Qiming worked at Microsoft for nearly three years.
Che Chang
Che Chang, who participated in GPT-4 development as Deputy General Counsel of OpenAI, graduated from Northwestern University in the United States with a Ph.D. and joined OpenAI in 2021 after leading the legal team in the AI/Machine Learning and Marketplaces practice at AWS. Recently, OpenAI's legal team has also been hiring AI product consultants.
Ouyang Long
Ouyang Long joined OpenAI in 2019 as a research scientist. Long Ouyang received his bachelor's degree from Harvard University, received his Ph.D. from Stanford University, and worked as a postdoctoral researcher at Stanford University. Ouyang Long has also participated in the development of ChatGPT-related technical projects, and he is also the first author of the InstructGPT paper.
Weng Lilian
Lilian Weng, the head of OpenAI's AI application research, joined OpenAI in 2018 and is mainly involved in pre-training, reinforcement learning & alignment, model security and other aspects of the GPT-4 project.
Tao Xu
Tao Xu joined OpenAI in 2019 and graduated from Peking University and Cornell University. Tao Xu worked for four years in Microsoft's Bing Machine Learning Research Group.
They Tang
Jie Tang received his Ph.D. in Computer Science from the University of California, Berkeley, under the supervision of Pieter Abbeel. Prior to joining OpenAI, he spent about four years at startups and Dropbox. Jie Tang received her undergraduate degree from Harvard University and received her bachelor's degree in computer science and economics in 2008.
Ben Wang
Ben Wang is currently an undergraduate at the University of Pennsylvania who joined OpenAI in 2021. Ben Wang was involved in the pre-training and long context aspects of the GPT-4 project.
Visual groups
Mark Chen
Mark Chen joined OpenAI in 2018 as a research scientist and graduated from the Massachusetts Institute of Technology (MIT). He worked on the visual side of the GPT-4 project.
Casey Chu
Casey Chu joined OpenAI in 2020 and graduated from Stanford University with a degree in computational mathematics. Casey Chu's main research interests are multimodal AI systems, and he is mainly involved in the vision aspect of the GPT-4 project.
Hu Rope Li
Shengli Hu joined OpenAI in 2022 and received her master's degree from Fudan University and her Ph.D. from Cornell University. Her research interests lie in the interdisciplinary study of social sciences, computational linguistics, computer vision, and speech. Hu has published many papers in top conferences and journals in natural language processing, computer vision, speech and applied statistics, including CVPR, ACL, EMNLP, ECCV, etc., and has been nominated for the Best Paper Award.
Tianhao Zheng
Tianhao Zheng joins OpenAI in 2022. He received his bachelor's degree from Tsinghua University and his Ph.D. from the University of Texas at Austin. Before joining OpenAI, he worked at NVIDIA, Google, and Twitter. Tianhao Zheng was mainly involved in the visual aspect of the GPT-4 project.
Weng Jiayi
Jiayi Weng received her undergraduate degree from the Department of Computer Science and Technology at Tsinghua University in 2020. During his undergraduate studies in Professor Zhu Jun's group, he mainly participated in the development of the reinforcement learning algorithm library Tianshou (Tianshou), which has won 5.9K GitHub Star. After graduating from CMU with a master's degree, Weng joined OpenAI as a research engineer.
Reinforcement learning & alignment groups
Chong Zhang
Chong Zhang studied computer science at Zhejiang University in 2010 and received his bachelor's degree from Simon Fraser University in Canada in 2014, and then worked as an engineer at Google and Apple. He attended UCLA in 2019 and has been working at OpenAI since receiving his master's degree in computer science in 2021.
Shengjia Zhao
Shengjia Zhao received her bachelor's degree from Tsinghua University in 2016 and her Ph.D. in Computer Science from Stanford University in 2022, under the tutelage of Stefano Ermon, and then joined OpenAI.
Stephanie Lin
Stephanie Lin attended MIT and Georgia Tech during her undergraduate and master's degrees, respectively. Prior to joining OpenAI, she was a research scholar at the University of Oxford.
Tong Mu
Tong Mu received his undergraduate degree from UCLA and his Ph.D. from Stanford University. Join OpenAI in 2022.
Jeff Wu
Jeff Wu both attended MIT. He is the second employee of the startup Terminal.com, which he worked at Google for about 2 years after the company was acquired. In 2018, Jeff Wu joined OpenAI.
Xiao Kai
Kai Xiao received his bachelor's and doctoral degrees from MIT and has interned at Microsoft, DeepMind, and others. Joined OpenAI in September 2022.
Kevin Yu
Kevin Yu received his B.S. in Physics and Ph.D. in Neuroscience from the University of California, Berkeley. Join OpenAI in 2022.
Haozhun Jin
Haozhun Jin received his bachelor's degree in computer science from Tsinghua University in 2013 and his master's degree from Stanford University in 2015. He worked as a software engineer at Meta from 2015 to 2018 before joining OpenAI in January 2023.
Gu Shixiang
A Japanese-born Chinese-Canadian, Gu is a former research scientist at Google Research, where his research interests include deep learning, reinforcement learning, probabilistic machine learning and robotics. He holds a PhD in Machine Learning from the University of Cambridge and the Max Planck Institute for Intelligent Systems, and a Bachelor of Science in Engineering from the University of Toronto, where he was supervised by Geoffrey Hinton.
Assessment & analysis team
Alvin Wang
Alvin Wang joined OpenAI in August 2022 as one of the core contributors to the Evaluation & Analytics team. Previously, he worked for several years at VMware, Tesla, and others. He received his bachelor's degree from the University of Southern California in 2013.
Angela Jiang
Angela Jiang joined OpenAI in November 2021 and spent a brief stint at Microsoft and Google, where she received her Ph.D. from Northwestern University.
Jason Wei
Jason Wei joined OpenAI in February this year and focuses on ChatGPT. Previously, he was a senior research scientist at Google Brain, where he popularized Thinkchain Tips and co-led instruction tuning efforts. He co-authored a paper at Google and Jeff Dean and others on the ability of large models to emerge.
Juntang Zhuang
Juntang Zhuang joined OpenAI in April 2022 after a four-month internship at Google. He received his bachelor's degree from Tsinghua University, his master's degree from Yale University, and his Ph.D. from Yale University. His research focuses on developing new machine learning techniques for biomedical applications.
Derek Chen
Derek Chen joined OpenAI in 2021 as a technical security analyst. He graduated from Northeastern University and previously worked at Google for less than a year.
Song Li
Yang Song is currently a researcher at OpenAI and will join Caltech's Department of Electrical Engineering and Computing and Mathematical Sciences as Assistant Professor in January 2024. Song graduated from Tsinghua University's basic science class in mathematics and science and received his Ph.D. in computer science from Stanford University in 2022, under the tutelage of Stefano Ermon. His main research interests are machine learning, including deep generative models, probabilistic inference, AI safety, and the intersection of AI methods with other scientific fields. He is one of the main founders of diffusion models and score-based generative models. His work published in NeurIPS 2019 surpasses generative adversarial networks (GANs) in image generation quality for the first time. During his Ph.D., he won the ICLR 2021 Outstanding Paper Award for his thesis, and his related research won the Apple Fellowship, JPMorgan Chase Fellowship, and the WAIC Yunfan Award.
Model deployment
Michael Wu
Michael Wu joined OpenAI in 2021 and focuses on AI application research. Michael Wu graduated from MIT and is the inference research leader for the GPT-4 project.
Andrew Peng
Andrew Peng joined OpenAI at the end of 2022 and spent two years at Microsoft. Andrew Peng graduated from the University of California, Berkeley, where he worked on GPT-4 API and ChatML deployments.
Wu Xuefeng
Sherwin Wu joined OpenAI in 2022 and focuses on AI applications and API development. Xuefeng Wu graduated from MIT and was mainly involved in API development and ChatML deployment in the GPT-4 project.
Jason Chen
Jason Chen received his undergraduate studies at MIT, worked as a software engineer at Google from 2007 to 2014, Apptimize from the startup Apptimize from 2014 to 2019, Argo AI from 2019 to February 2023, and OpenAI in February 2023.
Other contributors
Xin Hu
Xin Hu joined OpenAI in June 2022 and is primarily responsible for developing security services and platforms for cloud security, k8s security, authentication/authorization, and access control.
In addition, OpenAI also thanked Microsoft for the development of GPT-4, especially Microsoft Azure services provided infrastructure design and management support for model training, and Microsoft Bing team and security team also contributed to the deployment of GPT-4.
Reference Links:
https://openai.com/contributions/gpt-4?continueFlag=ee0eebd278339fc5ba428add63b4b4fd
https://cdn.openai.com/papers/gpt-4-system-card.pdf