laitimes

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

Tencent Technology News reported on April 18 that according to foreign media reports, on Wednesday local time in the United States, artificial intelligence startup OpenAI released an Assistants API update, adding a lot of features to this tool that specializes in helping to build artificial intelligence assistants, such as faster and more accurate file search tools, vector storage and tool selection parameters.

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

The beta version of the Assistants API was launched in November 2023, and the new version is called OpenAI-Beta: assistants=v2. This tool is designed to help developers build assistants for a variety of software tasks. It can invoke OpenAI models with specific instructions, access multiple tools in parallel, support persistent threads, and access files in multiple formats.

Each Assistant can retrieve 10,000 files

In the latest update, OpenAI has launched an improved retrieval tool called file_search. Each Assistant can now retrieve up to 10,000 files, a 500-fold increase in search capacity compared to previous versions. At the same time, the retrieval speed of this tool has also been significantly improved, and it supports multi-threaded search and parallel query, which greatly improves the retrieval efficiency. In addition, file_search has enhanced reordering and query rewriting capabilities to make search results more accurate and user-friendly.

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

In addition to file_search, OpenAI has also introduced a vector_store object feature in the Assistants API. When files are added to vector storage, they are automatically parsed, chunked, and embedded, making them ready for efficient searching. This feature not only simplifies the file management and billing process, but also enables vector storage to be used across assistants and threads, providing users with a more flexible and convenient experience.

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

Manage threads and messages

Together, threads and messages form the flow of conversation between the assistant and the user. In OpenAI's Assistants API, there is no limit to the number of messages a thread can store, which means that users can have long, in-depth conversations. However, when the total size of the message exceeds the context window that the model is able to handle, the thread intelligently truncates the message, prioritizing the deletion of those it considers relatively less important.

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

When you create a thread, you can include an initial list of messages, which can take many forms, such as text, images, or file attachments. To make it easier to add files to threads, OpenAI provides tool_resources as an auxiliary method. Of course, users can also add files directly through the thread.tool_resources. It's important to note that user-created messages don't yet support the inclusion of image files, but OpenAI is planning to add this feature in the future.

The Assistants API automatically manages the truncation of messages, ensuring that they stay within the maximum context length that the model can handle.

Controls the maximum number of tokens that can be used

OpenAI gives users more control by allowing them to set the maximum number of tokens used per run in the Assistants API. This feature not only helps users better manage the cost of token usage, but also effectively optimizes the use of resources. In addition, users have the flexibility to set limits on the number of previous or recent messages used in each run to meet different usage scenarios and needs.

In order to achieve precise control over the use of a single running (Run) tokens, users can set two parameters, max_prompt_tokens and max_completion_tokens, when creating a Run. These limits will apply to the total number of tokens used throughout the run lifecycle, ensuring that each run is within a predetermined resource range.

Let's take a practical operation as an example, let's say we initialize a run and set the max_prompt_tokens to 500 and the max_completion_tokens to 1000. This means that the first completion will truncate the thread to 500 tokens and limit the output to 1000 tokens. If only 200 hint tokens and 300 completion tokens are used in the first completion, then the second completion will have the remaining 300 hint tokens and 700 completion tokens available.

If the completion process reaches the max_completion_tokens limit, the run will terminate in an incomplete state.

Polling assistant

In order to keep the running status updated in real time, the user needs to retrieve the running object on a regular basis. With each retrieval, the user can check the running status and decide what to do next for the application. To simplify this process, OpenAI provides a polling assistant feature in the Node and Python SDKs.

These assistants are able to automatically poll the user for a running object and return it to the user when the running object reaches a terminal state. This greatly reduces the workload on the user, allowing them to focus more on the core logic of the application.

When the run is in a in_progress state (i.e., a non-terminal state), the thread is locked. This means that users will not be able to add new messages to the thread or create new running instances on the thread. This mechanism ensures the continuity and consistency of threads in processing runtime tasks, avoiding conflicts and errors that can arise due to concurrent operations.

In addition, OpenAI has made the following updates:

OpenAI has added support for tool_choice parameters that can be used to force the use of a specific tool such as a file_search, code_interpreter, or function in a specific run.

Users can now also use the Role Assistant to create messages to create custom conversation history in threads.

The Assistant and Runtime now support popular model configuration parameters such as temperature, response_format (JSON mode), and top_p.

OpenAI releases Assistants API update: up to 10,000 files can be retrieved

In addition, users can use fine-tuned models in the Assistants API. Currently, only the fine-tuned version of GPT-3.5-turbo-0125 is supported.

Read on