Collection! Large Language Models (LLMs) inventory, including source code (with links, continuous update)

The popularity of ChatGPT has attracted widespread attention, and many domestic universities, research institutions and enterprises have launched similar ChatGPT release plans. However, since ChatGPT is not open source, it is very difficult to replicate it, even if so far, no unit or enterprise has been able to successfully replicate the full capabilities of GPT3. Recently, OpenAI announced the release of a GPT4 model with graphics and text, which has greatly improved its capabilities compared to ChatGPT, and seems to have smelled the fourth industrial revolution led by general artificial intelligence.

Whether it is foreign or domestic, the gap between OpenAI and OpenAI is getting bigger and bigger, and all parties are rushing to catch up in order to occupy a certain dominant position in this technological innovation. At present, many large enterprises basically choose the closed-source route for R&D. The details of the official announcements of ChatGPT and GPT4 are very limited, unlike the dozens of pages of paper introductions that were previously published. The era of commercialization of OpenAI has arrived, and when OpenAI is no longer publicly open source, we must seek open source models to break the limitations of technology. To this end, I have compiled some open source models for organizations or individuals, as follows:

ChatYuan: Developed and released by the metalanguage intelligence development team, it claims to be the earliest functional dialogue model in China. It can be used to write articles, write homework, write poetry, translate into Chinese and English, etc., and can also provide relevant information for some specific field problems. The model currently only supports Chinese.

❤️ [GitHub link] (https://github.com/clue-ai/ChatYuan)

Colossal AI: Recently, Colossal AI open-sourced their ChatGPT implementation. They shared their three-step strategy to fully implement the technical route of ChatGPT core.

Based on this project, I have clarified the three-step strategy and shared:

The first stage (stage1_sft.py): SFT supervision fine-tuning stage, the open source project did not implement, this is relatively simple, because ColossalAI seamlessly supports Huggingface, I directly use Huggingface's Trainer function a few lines of code to easily implement, here I used a gpt2 model, from its implementation, it supports GPT2, OPT and BLOOM models;
Stage 2 (stage2_rm.py): Reward Model (RM) training phase, i.e. the train_reward_model.py part of the project Examples;
Phase 3 (stage3_ppo.py): Reinforcement Learning (RLHF) Phase, i.e. Project train_prompts.py.

The execution of the three files needs to be placed in the ColossalAI project, where the cores in the code are chatgpt in the original project, and cores.nn becomes chatgpt.models in the original project.

❤️ [GitHub link] (https://github.com/hpcaitech/ColossalAI)

ChatGLM: ChatGLM is a dialogue model of the GLM series open source of Tsinghua's technology achievement transformation company, Zhipu AI, which supports both Chinese and English languages, and currently open-source its 6.2 billion parameter model. It inherits the advantages of GLM and optimizes the model architecture, so that the deployment and application threshold is lowered, and the inference application of large models on consumer-grade graphics cards is realized.

From the technical route point of view, it realizes ChatGPT reinforcement learning human alignment strategy, so that the generation effect is better close to human value, its current ability area mainly includes self-awareness, outline writing, copywriting, email writing assistant, information extraction, role play, review comparison, travel advice, etc., at present, it has developed a 130 billion super large model under internal testing, which is currently a large dialogue model with large parameters in the current open source replacement.

❤️ [GitHub link] (https://github.com/THUDM/ChatGLM-6B)

LLaMa: LLaMA is a new artificial intelligence large-scale language model released by Facebook parent company Meta that performs well at tasks such as generating text, dialogue, summarizing written material, proving mathematical theorems, or predicting protein structure. The LLaMA model supports 20 languages, including Latin and Cyrillic languages, and the original model does not support Chinese.

LLaMA's two top open source projects are ChatLLaMA and stanford_alpaca.

ChatLLaMA is an open source implementation of LLaMA+AI chatbot based on human feedback reinforcement learning launched by Nebuly+AI, and its technical route is similar to ChatGPT, which has been online for just 2 days and has won 5.2K stars.

The ChatLLaMA training process algorithm implementation is faster and cheaper than ChatGPT training, which is said to be nearly 15 times faster, and the main features are:

Complete open-source implementation that allows users to build ChatGPT-style services based on pre-trained LLaMA models;
LLaMA architecture is smaller, making the training process and inference faster and less expensive;
Built-in support for DeepSpeed ZERO to speed up the fine-tuning process;
LLaMA model architectures are supported in various sizes, allowing users to fine-tune the model to their liking.

❤️ [GitHub link] (https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama)

stanford_alpaca: Another popular is the recently released alpaca (alpaca model), a new model fine-tuned based on Stanford's LLaMA 7B model, which uses OpenAI's text-davinci-003 model to generate 52K instruction samples and fine-tune LLaMA. The project has open-sourced the training data, the code that generates the training data, and the hyperparameters, and the model file has not yet been made public, but it has attracted the attention of 5.6K stars. 、

❤️ [GitHub link] (https://github.com/tatsu-lab/stanford_alpaca)

[DEMO Address] (https://alpaca-ai-custom6.ngrok.io)

In addition to this, there are some other open source models to refer to. For example:

OpenChatKit: Created by the Together, LAION and Ontocord.ai teams of former OpenAI researchers. It contains 20 billion parameters, fine-tuned using GPT-NoX-20B, an open source version of GPT-3, and information filtering using an audit model with 6 billion parameters to ensure the security and quality of generated content.

❤️ [GitHub link] (https://github.com/togethercomputer/OpenChatKit)

BELLE: Based on the dialogue model implemented by Stanford Alpaca, supporting both Chinese and English. They open-sourced models with 6.2 billion parameters and 130 billion mega models in development.

❤️ [GitHub link] (https://github.com/THUDM/ChatGLM-6B)

PaLM-rlhf-pytorch: It claims to be the first open source ChatGPT replacement project, and its basic idea is based on the PaLM architecture of Google's large language model and the use of reinforcement learning from human feedback (RLHF). PaLM is Google's 540 billion parameter all-round large model released in April this year, trained on the Pathways system. It can complete tasks such as writing code, chat, language understanding, etc., and has powerful few-sample learning performance on most tasks. At the same time, the reinforcement learning mechanism like ChatGPT is adopted, which can make the AI's answers more in line with the requirements of the situation and reduce the toxicity of the model.

❤️ [GitHub link] (https://github.com/lucidrains/PaLM-rlhf-pytorch)

alpaca-lora: alpaca-lora is another masterpiece of Stanford University, which uses LoRA (low-rank adaptation) technology to replicate Alpaca's results, using a more low-cost method, only training on an RTX 4090 graphics card for 5 hours to obtain an Alpaca-level model. And, the model can run on the Raspberry Pi. In this project, it used Hugging Face's PEFT to achieve cheap and efficient fine-tuning. PEFT is a library (LoRA is one of the supported technologies) that allows you to fine-tune models using a variety of Transformer-based language models and using LoRA, making it cheaper and efficient to fine-tune models on general hardware.

❤️ [GitHub link] (https://github.com/tloen/alpaca-lora)

Although Alpaca and Alpaca-lora have improved considerably, their seed missions are in English and lack support for Chinese. On the one hand, in addition to the above mentioned Belle, a large number of Chinese corpus, on the other hand, based on the work of alpaca-lora and other predecessors, three individual developers from Central China Normal University and other institutions open source the Chinese language model Luotuo, a single card can complete the training and deployment. Currently the project releases two models Luotuo-Lora-7B-0.1, Luotuo-Lora-7B-0.3, and one more model is planned.

❤️ [GitHub link] (https://github.com/tloen/alpaca-lora)

Dolly: Dolly, inspired by Alpaca, used the Alpaca dataset to fine-tune on GPT-J-6B, and since Dolly itself is a "clone" of the model, the team eventually decided to name it "Dolly". This kind of cloning is more and more inspired by Alpaca, and it is summarized by roughly using Alpaca's open source data acquisition method, and fine-tuning instructions on 6B or 7B scale old models to obtain ChatGPT-like effects. This idea is very economical, and can quickly imitate the charm of ChatGPT, which is widely popular and exploded once it was launched.

❤️ [GitHub link] (https://github.com/databrickslabs/dolly)

Vicuna and Chinese-Vicuna: Following the launch of alpaca, Stanford scholars teamed up with CMU, UC Berkeley and others to launch a new model - the 13 billion parameter Vicuna (commonly known as alpaca, vicuña). As little as $300, you can achieve 90% performance in ChatGPT. Vicuna was trained on LLaMA fine-tuned on user sharing conversations collected by ShareGPT, and the testing process used GPT-4 as a judging criterion, and the results showed that the Vicuna-13B achieved the ability to match ChatGPT and Bard in more than 90% of cases. UC Berkeley LMSys org recently released a 7 billion parameter Vicuna, which is not only small, efficient, and capable, but also only needs two lines of commands to run on the Mac of the M1/M2 chip, and can also turn on GPU acceleration!

❤️ [GitHub link] (https://github.com/lm-sys/FastChat/)

❤️ [Chinese-Vicuna GitHub link] :(https://github.com/Facico/Chinese-Vicuna)

LMFLOW: After the explosion of ChatGPT, they were looking for a quick way to the temple, and some ChatGPT-like ChatGPT began to appear, especially low-cost imitation of ChatGPT became a popular way. LMFlow was born in this demand scenario, which makes it possible to refine models on ordinary graphics cards such as the 3090. Initiated by the team of the Statistics and Machine Learning Laboratory of the Hong Kong University of Science and Technology, the project is committed to establishing a fully open large model research platform to support various experiments under limited machine resources, and improve the existing data utilization methods and optimization algorithm efficiency on the platform, so that the platform can develop into a more efficient large model training system than the previous methods. With this project, even limited computing resources allow users to support personalized training for proprietary domains.

For example, LLaMA-7B, a 3090 takes 5 hours to complete training, and the cost is greatly reduced. The project also opened up a web-based instant Q&A service (lmflow.com). The advent and open source of LMFlow has made it possible for common resources to train in Q&A, accompaniment, writing, translation, expert domain consulting, and more. At present, many researchers are trying to train large models with 65 billion or more parameters with the project.

❤️ [GitHub link] (https://github.com/OptimalScale/LMFlow)

GPTrillion: The project claims to be the largest scale model of open source, up to 1.5 trillion, and is a multimodal model. Its competence domains include natural language understanding, machine translation, intelligent question answering, sentiment analysis, and graphic-text matching.

❤️ [GitHub link] (https://huggingface.co/banana-dev/GPTrillion)

OpenFlamingo: OpenFlamingo is a GPT-4 framework that supports the training and evaluation of large-scale multimodal models, released by the non-profit organization LAION, which is a replica of DeepMind's Flamingo model. Currently open source is its LLaMA-based OpenFlamingo-9B model. The Flamingo model is trained on a large-scale network corpus containing interleaved text and images, with contextual less-sample learning capabilities. OpenFlamingo implements the same architecture proposed in the original Flamingo, trained on 5M samples of a new multimodal C4 dataset and 10M samples of LAION-2B.

❤️ [GitHub link] (https://github.com/mlfoundations/open_flamingo)

Baize Shirasawa: The project proposes a method to automatically collect ChatGPT conversations, let ChatGPT talk to itself, and generate high-quality multi-round dialogue datasets in batches, collecting about 50,000 high-quality Q&A corpus from Quora, StackOverflow, and MedQA, respectively, and all of them have been open sourced. At the same time, it improves the LLama model, and the effect is not bad. Bai Ze also uses the current low-cost LoRA fine-tuning scheme to obtain three different scales of Shirasawa-7B, 13B and 30B, as well as a medical vertical model. Unfortunately Chinese name is good, but it still does not support Chinese, and the Chinese Shirasawa model is reported to be planned and released in the future.

❤️ [GitHub link] (https://github.com/project-baize/baize)

Koala Koala: Based on LLama's ChatGPT flat continues fermentation, UC Berkeley's Berkeley released a conversational model Koala that can run on a consumer GPU with parameters up to 13B. Koala's training dataset includes the following parts: ChatGPT data and open source data (Open Instruction Generalist (OIG), datasets used by Stanford Alpaca models, Anthropic HH, OpenAI WebGPT, OpenAI Summarization. The Koala model was implemented in EasyLM using JAX/Flax with 8 A100 GPUs and took 6 hours to complete 2 rounds of iterations. The evaluation results better than Alpaca, reaching 50% performance of ChatGPT.

❤️ [GitHub link] (https://github.com/young-geng/EasyLM)

StackLLaMA: With the advent of Stanford Alpaca, a whole host of LLama-based alpaca families and extended animal families began to emerge, culminating in the recent Hugging Face researchers publishing a blog post StackLLaMA: A Practical Guide to Training LLaMA with RLHF. A 7 billion parameter model, StackLLaMA, was also released. This is fine-tuned at LLaMA-7B through human feedback reinforcement learning

We hope that these open source projects can provide you with more choices and promote the development and application of artificial intelligence technology.

Edited by freeze

Collection! Large Language Models (LLMs) inventory, including source code (with links, continuous update)

Read on

Nature: ChatGPT breaks the Turing test – a race to find new ways to evaluate artificial intelligence is underway. Large language models mimic human conversation, but science

#New forces in finance#With the rapid development of the field of artificial intelligence, large-scale language models and chatbots have become hot spots in the industry. Especially in ChatGPT, which was released by OpenAI

"Why do daughters-in-law and mother-in-law have irreconcilable contradictions?" Today, Xiaobian began to test Baidu's "Wen Xin Yiyan", which is known as the Chinese version of chatGPT! [Like] [Like] [Like

Summary of Popular Large Language Models (LLMs) in 2023

With the rise of telemedicine, online consultation and consultation has become the preferred convenient and efficient medical support method for patients. Recently, Large Language Models (LLMs) have shown powerful nature

Using the python programming language, build a large language model to define knowledge for the robot

Nvidia is one of the hottest companies in the United States, and its market value is now several times that of Intel. The graphics card produced is difficult to find. Especially for superchips for large-scale language computing. One sheet

Amazon Selection and Applications in Operations: Don't panic. With the rapid development of large-scale language models, GenerativeAI has been

Databrick Dolly: A large language model that follows instructions

Large language model: SBERT — sentence BERT

Do large language models know what they don't?

The paper, titled SortedLLaMA, aims to reveal the potential advantages of the middle layer of large language models. The paper proposes a method called SoFT, which utilizes an intermediate layer

Intel CEO Pat Gelsinger unveiled the next era of personal computers: AI PCs generation Intel CEO Pat Gelsinger recently in San Jose

A beginner's guide to "Artificial Intelligence" Large Language Model (LLM).

The Game of Thrones author sued ChatGPT, some of the world's best-known novelists, and this week banded together to sue ChatGPT maker OpenAI, saying it used him

David Chalmers: Large language models predict that conscious AI will be possible in less than a decade

Llama 3: The next frontier of open-source large language models

The secret of using large language models: How to control AI with efficient prompt words?

Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?

No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10

Apple released OpenELM, an efficient language model based on an open-source training and inference framework

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA

The combination of deep learning and chemical language models is used for de novo drug design, which is published in the journal Nature

The tuyere belonging to major technology companies is here again! This large language model leads to the "new industrial revolution."

The landing of large language models Why the first step is to do customer service

OpenAI launches new large language model GPT-4o; Apple will start selling the Vision Pro in China; SoftBank sold almost all of its shares in Alibaba

探索大语言模型：理解Self Attention| 京东物流技术团队

The synergy of knowledge graphs with large language models

Multi-functional RNA analysis, the RNA language model of the Baidu team was published in the journal Nature

The parameters are improved slightly, and the performance index explodes! Google: Large language models hide mysterious skills