I. Preface

AI giants like OpenAI, Google, Anthropic, Microsoft, Cohere, Meta, Stability.AI, AI21 Labs, and others have taken the lead in the development and integration of LLMs. However, none of the above companies provide truly open-source commercial-applicable large-language models, and some do not even provide an entry point for model training fine-tuning.

At present, everyone's popularity of GPT models seems to have begun to enter a relatively stable and calm state, but for some open source LLM large models, almost a week or two can burst out an eye-catching news hot spot, are competing for LLMs rankings, domestic large language models are not to be outdone:

ChatGLM2 and ChatGLM2-6B weights jointly launched by Tsinghua and Zhipu AI are completely open to academic research, and commercial use is also allowed after obtaining official written permission.
TigerBot Technology launched the multimodal large-language model TigerBot, which contains two versions, 7 billion parameters and 180 billion parameters.

About April for the first time local deployment and tested the first generation of ChatGLM 6B model released by Tsinghua, at that time, some open source LLM models led by LLaMa emerged, mixed, in fact, these models are still far from fully achieving the effect of production applications, and most of the model deployment and training requirements for GPU are very high, the average person may be a little out of reach and give up, However, many individuals or enterprises may want to have an open source model that supports Chinese well, and can build enterprise-level lightweight AI applications by local fine-tuning or building vector libraries.

After the release of ChatGLM, it is indeed a good choice for local deployment lightweight models, because of its licensing reasons, although the code is open source, but the model weights are only completely open to academic research, and personal or enterprise commercial use needs to be licensed. It may not be easy to get this license either.

Fortunately, H2O.ai released a truly free, open-source, commercially available model h2oGPT, making it possible to train private LLM models locally and build commercial AI applications.

h2oGPT: An open source commercially available big language model based on the H2O.ai ecosystem

II. H2O.ai

H2O.ai has built several world-class machine learning, deep learning, and AI platforms over the past decade, most of which are open source software (built on top of existing open source software) and earned the trust of customers around the world. We are ideally located to provide an open source GPT ecosystem to businesses, organizations, and individuals around the world.

H2O.ai launched h2oGPT, an open-source codebase suite of LLMs based on generative pre-trained transformers (GPTs) to create the world's best truly open-source alternative closed-source approach. While working with the incredible and unstoppable open source community, we open source several finely tuned h2oGPT models with parameters ranging from 7 billion to 40 billion for commercial use under the fully permissive Apache 2.0 license. Our release also includes 100% private document search using natural language.

All content published by H2O.ai is based on fully licensed data and models (with the exception of LLaMa-based models explicitly marked as research-only only), and all code is open source. This allows more enterprise and commercial products to gain wider access without legal issues, expand the use of cutting-edge AI, and comply with licensing requirements.

Open-source language models help drive the development of AI and make it more accessible and trustworthy. They lower barriers to entry, enabling people and groups to tailor these models to their needs. This openness increases innovation, transparency and fairness. An open source strategy is needed to fairly share the benefits of AI, H2O.ai will continue to drive the democratization of AI and LLMs.

3. H2O.ai LLM open source ecosystem

H2O.ai's open source LLM ecosystem currently includes the following components:

Code, Data, and Models: Code is fully commercially permissible, as well as curated fine-tuning models ranging from 7 to 20 billion fine-tuning data and parameters.
State-of-the-art fine-tuning technology: Provides efficient fine-tuning code, including targeted data preparation, prompt engineering, and computational optimization, to fine-tune LLM with up to 20 billion parameters (or even larger models coming soon) in a matter of hours on common hardware or enterprise servers. Using techniques such as low-rank approximation (LoRA) and data compression can save orders of magnitude of computing resources.
h2oGPT chatbot: Provides code to run a multi-tenant chatbot on a GPU server, with easily shared endpoints and Python client APIs that can be used to evaluate and compare fine-tuned LLM performance.
Document chat with VectorDB: Provides a full-featured natural language-based document search system using vector databases and prompt engineering. This system is completely offline and does not require an internet connection.
H2O LLM Studio: A no-code LLM fine-tuning framework created by the world's top Kaggle masters that makes fine-tuning and evaluating LLM easier. H2O LLM Studio enables everyone to fine-tune LLM, including large open source LLMs on private data and servers (such as h2oGPT, etc.).

3.1 h2oGPT model on Hugging Face

H2O.ai put the open source model on Hugging Face's repository. Some of these important models include:

h2oai/h2ogpt-oasst1-falcon-40b
h2oai/h2ogpt-oig-oasst1-falcon-40b
h2oai/h2ogpt-oasst1-512-20b
h2oai/h2ogpt-oasst1-512-12b
h2oai/h2ogpt-oig-oasst1-512-6_9b
H2OAI/H2OGPT-GM-OASST1-EN-2048-FALCON-40B-V1
H2OAI/H2OGPT-GM-OASST1-EN-1024-20B
H2OAI/H2OGPT-GM-OASST1-EN-2048-FALCON-7B-V2
H2OAI/H2OGPT-Research-OAST1-512-30B (non-commercial use)
H2OAI/H2OGPT-Research-OAST1-512-65B (non-commercial use)

To use these models in Python is very simple:

!pip install transformers==4.29.2
!pip install accelerate==0.19.0
!pip install torch==2.0.1
!pip install einops==0.6.1

import torch
from transformers import pipeline, AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained("h2oai/h2ogpt-oasst1-falcon-40b", padding_side="left")
generate_text = pipeline(model="h2oai/h2ogpt-oasst1-falcon-40b", 
tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", prompt_type="human_bot") 

res = generate_text("Why is drinking water so healthy?", max_new_tokens=100) 
print(res[0]["generated_text"])

Output result:

>>> Drinking water is healthy because it helps to keep your body hydrated and functioning 
>>> properly. It also helps to flush out toxins and waste from the body, which can help >>> to improve your overall health. Additionally, drinking water can help to regulate
>>> your body temperature, which can help to prevent dehydration and heat exhaustion.

3.2 ChatBot chatbot

h2oGPT includes a simple Gradio-based chatbot GUI and client/server API.

python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b

Features of the chatbot include:

Support for any open source LLM from Hugging Face
Offline mode without internet access
Compare any two models
LoRA adapter weights are supported on top of any LLM
Multi-GPU sharding
Automatically score responses using a reward model trained on human feedback
4-bit quantization option
Automatically extend context from multiple back-and-forth conversations

3.3. Private document chat

LLM (Large Language Model) is known to produce hallucinatory or fictional answers, see On the Dangers of Stochastic Parrots. Currently, researchers are actively exploring under what conditions this will occur and how to control it. One way to tie LLM to reality is to provide the source content as context for any query. The query and source content are embedded, and similarity is estimated using a vector database. h2oGPT, which includes FAISS memory and a Chroma persistent vector database, relies on a guidedly tuned LLM to answer questions based on the context of the first k blocks of source content.

python generate.py --base_model=h2oai/h2ogpt-research-oasst1-512-30b --langchain_mode=wiki_full

Features of document chat include:

Fact-based document Q&A
Preloaded with 20GB of Wikipedia status
Offline mode, no internet access required
Persistent databases and vector embeddings
Ability to handle a variety of document types

3.4. No-code fine-tuning and H2O LLM Studio

H2O LLM Studio is an open-source framework that provides a no-code graphical user interface (GUI) and a command-line interface (CLI) for fine-tuning LLMs. It allows users to train and tune state-of-the-art LLMs using a variety of hyperparameters without any coding experience. It supports various advanced fine-tuning techniques such as low-rank adaptation (LoRA) and 8-bit model training with a low memory footprint. The software allows users to visually track and compare model performance and provides an option to chat with the model for instant performance feedback. It also makes it easy to export models to Hugging Face Hub to share with the community.

The latest updates to H2O LLM Studio include storing experiment configurations in a YAML format and adding the ability to support nested dialogs in data. System requirements include Ubuntu 16.04+ and NVIDIA GPUs with driver version >= 470.57.02. The software also supports Docker, which is easy to deploy, and expects CSV input, with at least two columns - one for the instruction column and one for the model's desired answer.

Starting H2O LLM Studio is easy:

Features of H2O LLM Studio include:

Without any coding experience, LLM can be fine-tuned easily and efficiently
Fine-tune any LLM using a variety of hyperparameters using a graphical user interface (GUI) designed specifically for large language models
Use the latest fine-tuning techniques such as low-rank adaptation (LoRA) and 8-bit model training with a low memory footprint
Use advanced evaluation metrics to judge the answers generated by the model
Track and compare your model performance visually. In addition, Neptune integration is also available.
Chat with your models and get instant feedback on your model performance
Easily export your models to Hugging Face Hub and share them with the community

4. Project address

• h2oGPT https://github.com/h2oai/h2ogpt

• H2O LLM Studio https://github.com/h2oai/h2o-llmstudio

• H2O.ai on Hugging Face https://huggingface.co/h2oai

h2oGPT: An open source commercially available big language model based on the H2O.ai ecosystem