laitimes

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

author:CSDN
GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Following the explosion of ChatGPT and GPT-4 language models, the recently released Code Interpreter has pushed this field to a climax.

The code interpreter is part of the ChatGPT plugin update and can read uploaded files, execute code, generate charts, perform statistical analysis, and other functions. Between July 6 and 8, the code interpreter was rolled out to about 2 million ChatGPT Plus users.

Andrej Karpathy, an OpenAI scientist, believes that code interpreters are personal data analysts. However, it will take some time for communities to reach their full potential.

Given the power of the code interpreter, the author of this article, Swyx, treats it as GPT-4.5 and analyzes it in depth. In addition, the second half of this article was discussed by Swyx and Alessio Fanelli, directors of Latent Space, as well as AI technology experts such as Simon Willison (co-founder of the open source web framework Django), Alex Volkov, and Shyamal Anandkat (head of the OpenAI market), and shared user experience and core insights.

Source: https://www.latent.space/p/code-interpreter#details

This article was translated by Oneflow and published by CSDN with permission!

Source | Latent Space

Translate | Wan Zilin, Yang Ting, Jia Chuan

Windows 3.0 jumped to Windows 95 in order to promote their (now iconic) new system. Microsoft Excel upgraded from 5 to 7, in order to be consistent with other applications of Microsoft Office, Mac OS and Windows skipped version 9 to attract the Gen X group, React jumped directly from 0.14 to v15, while Kubernetes and Go showed that system developers adhere to the principle of not breaking any established steps, and the natural upgrade of the version number is powerless.

How should we version the base model? This may be a relatively unfamiliar concept to researchers, who may randomly train 400 lesser-known large language models (LLMs) to prove a point, but as AI engineers build products and businesses on these foundational models, version management is becoming increasingly important.

In the short history of generative AI, some interesting case studies have emerged. While the evolution from GPT1→2→3 has been marked each time, with the Midjourney 4→5 upgrade giving birth to the "Balenciaga Pope" (note: a photorealistic photo of Pope Francis wearing a Balenciaga puffer coat generated by Midjourney), other upgrades such as Stable Diffusion 1→2 are more controversial. Upgrading the minor version number should be uncontroversial, it could mean starting with the same checkpoint and doing more training, just like SD went from v1.3→ 1.4 → 1.5.

This brings us to today's topic of using the GPT version with ".5" as the framework.

As you may recall, with the release of GPT-3.5 at the same time as ChatGPT, text-davinci-003 and code-davinci-002 have since been included in its scope. In this way, two goals are achieved:

  1. The reasons for improving people's perception that the GPT-3.5 model is significantly better than the GPT-3 (2020 version) model are: 1) the addition of code functions, 2) the fine-tuning of instructions, and 3) the use of RLHF/PPO optimization algorithms.
  2. Heralding a new paradigm of dialogue is the future direction of general artificial intelligence (AGI).

My comment on the code interpreter model will revolve around the following two points:

  1. Increase awareness of the significance of GPT-4 upgrades.
  2. It is pointed out that this new paradigm is the development direction of general artificial intelligence.

Based on the above two points, I conclude that the code interpreter should be considered GPT-4.5, and if the relevant API is released one day in the future, I am willing to bet that it will also be retroactively given that designation.

This is followed by a review of previous discussions on ChatGPT, GPT-4 and Auto-GPT.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Code interpreter execution summary

The code interpreter is an "experimental ChatGPT model" that writes Python code to a Jupyter notebook and executes it in a sandbox environment. The model has the following characteristics:

1. Firewall protection isolated from other users and the Internet;

2. Support up to 100MB upload/download function (including the .csv of the entire Git repository. .xls. .png. .jpeg. .mov. .mp3. epub、. .pdf. file types such as zip);

Over 330 libraries are pre-installed, such as Pandas (data analysis), matplotlib, seaborn, folium (charts and maps), pytesseract (OCR), Pillow (image processing), Pymovie (FFmpeg), Scikit-Learn, PyTorch, and TensorFlow (machine learning). Based on (2), you can also upload additional dependencies such as GGML.

The code interpreter, part of the ChatGPT plugin update, was released on March 23 with a compelling demo by Andrew Mayne and Greg Brockman. Alpha users gained access in April, May and June. Finally, on July 6-8, the code interpreter was officially released as an optional beta feature to approximately 2 million ChatGPT Plus users.

Since these features can be flexibly combined in code infinitely, it is difficult to enumerate all of them, but learning by example can be very effective (e.g. p5.js game creation, emoji painting, creating interactive dashboards, data preprocessing, writing complex AST manipulation code, large-scale face detection, etc.), and browse the list of related libraries:

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

This is an example generated by Ethan Mollick, who doesn't know Python himself but is very good at using code interpreters. Ethan also summed up his experience in a long system prompt for setting good performing code interpreter default options. You can see additional examples and related content.

It's worth noting that the code interpreter actually introduces two new features, not just a sandbox and a model:

  • Most alpha testing prior to July focused on the Python sandbox and the operations that can be performed in it, with only a brief mention of the model's autonomous coding capabilities.
  • However, after the official release, the focus shifted to the quality of the model provided through the code interpreter, which, based on anecdotal records, appeared to be superior to the current GPT-4 (in terms of writing code, autonomously performing multiple steps, deciding whether to proceed, and asking the user to choose between a set of options).

The autonomy of this model must be seen to be believed. Here are examples of coding and debugging without human intervention at all:

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

After the March demo, many models that tried to mimic the code interpreter mostly failed. Just like ChatGPT before it, the code interpreter seems like a major improvement because it combines models with modalities.

Code interpreter limitations: Out of hardware system specifications

  • The environment frequently resets the code execution state, resulting in the loss of uploaded files and limited ability to recover from failures.
  • OCR (Optical Character Recognition) capabilities are nowhere near as good as GPT-4 versions[16].
  • will refuse to perform tasks that could have been done, and you must insist that they be carried out.
  • It can't call GPT-3/4 in code, it can't access the network, so it can't do tasks like data augmentation, it tries to solve the problem by writing code.

But overall, the code interpreter impresses:

"The code interpreter beta is very powerful. It's your personal data analyst: you can read uploaded files, execute code, generate charts, perform statistical analysis, and more. I expect it will take some time for the community to reach its full potential. ”——Karpathy

"If it's not a product that changes the world and affects GDP, I'm not sure anything else will be able to do it." Everyone can hire a 'script kid' for $20/month to work for themselves. ”——roon

"I started experimenting with the code interpreter, and it did everything I planned for the next two years." ——Simon Willison

GPT-4.5 Preview: Code Interpreter, A New Era of Programming
Reasoning: The next big frontier

If GPT-4 is just a simple combination of "8 220 billion experts", does OpenAI "lack novelty"? Leaving aside the substantial advances made by trillion-dollar parameter-level models like the Routed Language Model and Switch Transformer, the code interpreter shows that there is still room for improvement as long as the definition of progress is not limited to pure LLM inference capabilities, and OpenAI is already leading the way.

In 2017, Noam Brown (now an OpenAI research scientist) built an AI called Libratus, which defeated four top professionals in 120,000 unrestricted Texas Hold'em matches. What important conclusions can we draw from this?

"Usually neural networks have a response time of about 100 milliseconds... We found that just adding a few search functions was equivalent to expanding a precomputed strategy by a factor of 1,000. This discovery surpasses all previous related research results. (Link: https://youtu.be/2oHH4aClJQs)

In hindsight, the results are obvious:

  • In real life, when humans are faced with difficult problems, they spend more time thinking than simple problems. But GPT-3 is answering "Is the ball round?" "and" P = NP? "When two questions are issued, the time spent is almost the same. What if we let it take a year?
  • We've seen how Kojima et al.'s "Let's Think Step By Step" approach dramatically improves LLM's performance, mainly by allowing LLM to externalize its thought process in context, but also takes more reasoning time. Beam and Tree of Thought-type searches make more efficient use of reasoning time.
  • Every giant leap forward in AI comes from some form of extension. Transformer unlocks parallel pre-trained computations. Masking language modeling gives us the freedom to work with large amounts of unlabeled data. The Scaling Laws give us a guide to scaling up our models. Obviously, inference time computing/"real-time search" is the next frontier, allowing us to "just invest time".

In 2019, Noam Brown used this insight to solve the 6-player Texas Hold'em problem with the Pluribus model. In 2022, he took advantage of this idea again, using the Cicero model to reach human-level AI in the strategy game "Diplomacy" (using the search algorithm of AlphaGo and AlphaZero). Last month, he was still pondering this:

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Two weeks later, he joined OpenAI.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Code generation, sandboxing, and Agent Cloud

I have always emphasized the special status of LLM with programming capabilities, which is an important driver of the rise of AI engineers. More than the simple "Copilot is great for developers, but not for anyone else" – LLM with programming skills is also often useful for people who don't know programming because LLM is a perfect abstraction of code.

The earliest experiment I know of on "Code Core" came from Riley Goodside, who conducted the "You're GPT-3, You Can't Do Math" experiment last year.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

This inspired the likes of Amjad Masad of Replit and Sharif Shameem of Lexica to implement it.

This is the first sign that the best way to fix LLM's deficiencies (such as doing mathematical calculations, interacting with the external environment, interpretability, speed/cost, etc.) is to use its ability to write code to achieve tasks beyond the scope of LLM.

NVIDIA's Voyager offers a logical roadmap for this line of thinking:

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Probably the most important chart in the field of AI agents in 2023

There is an obvious problem with generalizing from Voyager: the real world is far more random than the Minecraft environment, the documentation is far less polished than in Minecraft, and the feedback loop is longer. Current implementations of agents such as Minion AI, Multion, and AutoGPT all run on your real-time browser/desktop, turning potential illusions and errors into disasters, the equivalent of a self-driving car with no hands off the wheel at all.

If you support "Code Core", you'll know where this issue is headed. Ever since Ada Lovelace began programming Babbage differential machines that don't exist yet, developers have been testing branches. You can improve code generation by adding a semantic layer, as Sarah Nagy of Seek AI did. But ultimately, the only way to know if the code is running and working as expected is to create a sandbox for it like Guardrails' Shreya Rajpal did, and generate tests like Codium AI's Itamar Friedman.

Most code generation/sandboxing operations can and should be done locally, but as the local host service nears its end, more and more agent builders and users realize that cloud infrastructure is needed to build and run the code part of LLM's inference process, and the rise of Agent Cloud is expected to meet this demand. In effect, this is a new type of serverless infrastructure requirement that provides necessary feedback to non-human operators. Naturally, a large number of candidates will emerge to participate in this emerging agent cloud sub-industry:

  • Replit's Amjad has been openly discussed (https://twitter.com/amasad/status/1669142526505394177).
  • E2B's Vasek has an open-source Firecracker micro-virtual machine implementation.
  • Codesandbox's Ives also has one.
  • Fly's Kurt launched Fly Machines in May.

You'll notice that all of these implementations use Firecracker, Amazon's 2018 open-sourced alternative to QEMU's microVM technology (microVM, which Amazon doesn't normally open source software, which is a nice achievement). However, the contrasting approach may come from Deno (JavaScript domain) and Modal (Python domain), whose auto-configuration runtimes provide agents developers and infrastructure providers with lighter protocols, but at the cost of greatly reducing familiarity.

Of course, OpenAI must build its own intelligent cloud to provide hosting and scaling of the code interpreter to 2 million customers in a single weekend. They have been using this technology in their work for years, and others are only now realizing its importance.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

The Road to GPT-5: Code Enhancement Inference

Putting everything together, we can compare the code interpreter with the previous method:

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

You can think about what developments led to the upgrade of GPT's major and minor versions, and based on the capabilities unleashed by the code interpreter, you can see why I consider it "GPT-4.5".

In later podcast conversations, loyal proponents of GPT-4 insist that the quality of GPT-4's benchmark model has deteriorated (Logan has stated that the model provided has not changed), and they also claim that the output of the code interpreter, without writing code, is on par with the performance of the original GPT-4 before it was "weakened".

Assuming this is true (which is difficult to falsify without an explicit code interpreter API verified by lm-eval-harness), then the additional fine-tuning made by the code interpreter for writing code may also improve the overall output quality (this is what we draw from the research, Replit, and the GPT-3.5 base model code-davinci-002). This makes the base model of the code interpreter, in the absence of a sandbox, a single model quality, effectively "GPT-4.5".

Notes that fail to categorize

  • OpenAI's lead: Sundar Pichai announced Google Bard's "implicit code execution" feature in June and implemented features that are simple and don't rely on Python. Interestingly, a month later, I ran the same prompt as in Google Propaganda again, but failed. At the same time, OpenAI is introducing a new LLM coding paradigm. OpenAI's leadership is simply incredible.
  • OpenAI as a cloud distribution: I know a lot about multiple "layer 2 clouds" (also known as cloud distributions) and can't help but notice that OpenAI is now a cloud distribution. In the near future, will it charge based on compute time, storage capacity, introduce IAM policies, and populate the rest of the cloud services? How soon will OpenAI remove the "Open" from the company name and become a pure AI cloud platform?

(Slent Space directors Swyx and Alessio Fanelli, as well as AI technologists such as Simon Willison, Alex Volkov, and Shyamal Anandkat (head of the OpenAI market), have an in-depth discussion on the code interpreter, and the following is an excerpt from the conversation.) )

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Highlights of the code interpreter

Alex Volkov: If you're a paying ChatGPT user, you can now use the code interpreter. One of its highlights is that the code interpreter is able to receive files uploaded by users; The second highlight is its ability to run code in a secure environment; The third highlight is that the code interpreter supports file downloads, which is also a new feature of ChatGPT.

Simon Willison: I use it almost every day, and it's no exaggeration to say that this is the most exciting AI tool out there because it offers so many features that even ChatGPT with plugins can't match. If you're an experienced developer, that's even better; If not, it turns out to be amazing with it.

Just a few weeks ago, the code interpreter was relatively unfamiliar; Now, I think everyone knows how powerful it is. It can write code like ChatGPT, which has been able to do this a long time ago, but the code interpreter can also run code and display the results. The most interesting thing is that it is able to run the code over and over again, find errors and prompt corrections: "I can fix it and try again." "By writing code, catching errors, thinking about it, and writing code again, it takes four or five attempts to get the right solution.

It's also interesting to watch it try out various tasks. In addition to running code, it also supports uploading files and downloading files, and the number of files that support uploading is also amazing. In addition to analyzing simple files like CSV, it is also capable of handling any file from the Python standard library, including SQLite.

In fact, you now have a versatile tool that can handle all kinds of files in different formats. What's really interesting is that if the code interpreter knows the layout of a file format, it can work with that file format even if it doesn't have the appropriate libraries.

You can tell it, "I'm uploading the file," and it might answer, "I don't have a library for this file type." Then you reply, "Okay, please read the binary bytes and interpret the file based on what you know about the file format." It then does that. This feature is fun and creative, and you can try it out.

Alex Volkov: I've noticed that sometimes it doesn't know what it's capable of, but it can encourage the code interpreter to try.

Simon Willison: Yes, you can encourage it, "you can do it." It would say, "Okay, let me try." "And then it will be successful. Think of it basically as a programming intern, sometimes very smart and capable, but at the same time stupid and unaware of his abilities. But its biggest advantage over human interns is that they never get frustrated and give up. Its processing speed is fast, allowing it to immediately leave behind its previous work and start a new task, and it will continue to process it.

The difference between a code interpreter and ChatGPT is that you can have the former write code, test it to make sure it works, and then iterate to fix bugs. Writing some features manually would have been tedious before, but using the code interpreter helped me simplify the process.

Some tasks can be tedious and contain edge cases that need to be solved step by step, so I'll leave it straight to the code interpreter. It runs the code, finds bugs, tries to fix them, and then runs other parts so I can find problems and debug them faster.

For humans, this process usually takes an hour, but the code interpreter takes only a few minutes to complete. This approach works great, and when you write code with normal ChatGPT, there's a good chance you'll create some non-existent API, create hallucinations, and make silly mistakes. The code interpreter can make errors when generating code, but it will automatically fix those errors before outputting the final result for you.

Daniel Wilson: So that's why I call it the most advanced agent in the world, and that's not to be overlooked.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Code interpreter + plugin

Daniel Wilson: Shyamal Anka is the marketing lead for OpenAI and is very interested in commercial use cases for code interpreters. He wondered what value there was in combining the problem of code interpreters and plugins and what could be gained from it.

Surya Danturi: Calling other plugins within your own plugin is an area worth exploring, but it involves security-related issues. First of all, we have to install the plugin first, and it would be great if inside the code interpreter there was a plugin that acts as its own small vector database.

If we can have the code interpreter interact with the plugin and call external APIs, we can add any external API inside the code interpreter. This allows the code interpreter to talk to the plugin, let the plugin do certain tasks, and add external APIs to the code interpreter.

OpenAI can't do that yet, and it involves security issues.

Alex Volkov: I think it's great, because currently when plugins are used in ChatGPT, plugins can only access external services through APIs because OpenAI restricts network access. If this plugin functionality is combined with the code interpreter, OpenAI has the potential to control external access and limit the use of APIs to only approved plugins, which would be a remarkable development.

Simon Willison: Plugins do have inherent security implications when it comes to injecting hints. I believe that OpenAI restricts code interpreter access to these features to prevent models from being tricked into running Python code and accessing private data, leading to data breaches.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Limitations of the code interpreter

Daniel Wilson: As developers, we know how to use these existing libraries. However, we should probably discuss the limitations of the code interpreter: firstly, it has no network access, and secondly, you can only upload up to a hundred megabytes of files.

Simon Willison: Previously it could run child processes and therefore be able to call other programs, but now it seems to have disabled some features.

My biggest breakthrough was trying to get the code interpreter to support other programming languages like Deno using a single binary. Then I uploaded the binary and told it, "Now with Deno, you can run JavaScript." "The code interpreter really ran Deno smoothly. But now OpenAI may have limited this functionality, which is a shame, because at some point, I was running and executing JavaScript on a code interpreter.

I also uploaded a Lua interpreter and the code interpreter started running and executing Lua, which was pretty cool. However, I don't think they support this feature anymore.

Alex Volkov: Sometimes the code interpreter disconnects with an orange notification at the top, in which case the previously generated download link will fail.

Simon Willison: Although there was a backup to save the conversation history, all the data uploaded before was lost. This situation is really frustrating, but at least you can easily redo everything you did in a new session because you have a detailed record of the last conversation.

Alex Volkov: Yes, after uploading the zipped files, I requested to unzip and do something, but at some point, the code interpreter actually lost those files. I'm not sure how it was lost, but it's important to note that sometimes it gets stuck in a loop. The code interpreter doesn't know if the file exists, or if it's making a mistake with the code, so it tries to pull from the library in a different code way. So if you get stuck in a loop, stop it in time, start a new session and start from scratch.

swyx: In some cases, it's actually a good thing that there are limitations. For example, when I input a large data table and let it do exploratory data analysis, that is, give me some interesting statistics. But in reality, because it took so long, it actively aborted the operation and generated a shorter piece of code to process a portion of the dataset, which behaved unexpectedly. So, sometimes you want it to time out, which also equates to an improvement in the user experience. But in other cases, obviously you want it to be fully executed. Therefore, we may need to make it provide a different execution mode, and sometimes this mode of active abort or timeout may not be desirable.

Daniel Wilson: I want to talk about another limitation. I try to use it for data augmentation, for example I have a list of superhero names and want to enhance this list by adding other information that the model knows, I know that the model already has this knowledge, but the problem is that the model tends to generate code rather than fill in the gaps in the table with its existing knowledge of the world. And because there is no network access, the code interpreter cannot call itself.

I was expecting it to embed the provided text in it, but it couldn't finish. So I've observed some limitations in some areas, and if you're using regular GPT-4 before, switching to the code interpreter may be a regression in those areas.

Simon Willison: It's really interesting. I'll admit I've never tried it for data augmentation, because I usually do similar work directly in regular GPT-4, such as outputting a Python dictionary and providing a name and bio for each superhero. Then I can copy and paste it back. Instead of copy-pasting actually, you're going to upload that JSON file to the code interpreter because uploading a file won't take up tokens, but copy-pasting code will.

swyx: Exactly. It's also an interesting point about when we use file uploads, when to use code interpreters, and when it's better to use the original GPT-4.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Non-programmatic code interpreter = GPT-4.5?

Gabriel: I've been using the code interpreter to do regular ChatGPT operations, so there's no coding involved, because this model is more powerful than the current default model. ChatGPT and GPT-4 have gotten worse in the last month or two, and I'm not sure if this has caused controversy in the space, but I did observe it.

The code interpreter model feels like the original ChatGPT model and performs very well when performing tasks such as answering questions, writing articles, and so on. My guess is that despite the current good performance of the code interpreter, this situation may not last long, because once the technology stabilizes, OpenAI will definitely make performance improvements, and by then, everything may not be so smooth.

Simon Willison: I'm skeptical of that statement. I think the argument that the model is deteriorating can be difficult to measure quantitatively, we can easily get personal experience (anecdotal evidence) of something, but it's hard to be completely sure what happened.

Gabriel: I ran some hints on ChatGPT-4 before and compared them to the results of today's default model. Apparently now the result is worse.

Simon Willison: These are personal experiences, but it might help if you can publish really detailed comparative data because the output of the model has some uncertainty, so you can run the same prompt multiple times, two of which may not perform well and three of which may perform well. So even if you have comparative data from months ago, it's hard to determine if you were just lucky on the first run.

Gabriel: I ran the same prompt on the April model, then on the current default model, and the comparison showed that the current model was much worse. However, I now tested in the code interpreter with the same hint and the result was just a lot of text generated, no code was involved, and the result was similar to the model from April.

Simon Willison: Are you using the GPT-4 model with a long window of 8,000 tokens?

Gabriel: I tested and found that the context window of the code interpreter is 8K, the same as the plugin model and ChatGPT-3. I used OpenAI's token generator to measure the number of tokens in my text and tried a few different lengths, only to find problems approaching 8K. When the text length exceeds 8K, the code interpreter prompts too much to get an answer. Therefore, the plugin model, code interpreter, and ChatGPT 3.5 context window are all 8K, while the default context window for ChatGPT-4 is 4K.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming

Future user needs

Host: The visual features of GPT-4 that we have been waiting for for a long time. While OpenAI announced that GPT-4 will launch vision soon, it did not disclose the exact time. Later rumors spread that OpenAI's roadmap showed that vision capabilities could launch next year. We also know that Bing has started using some visual instances in Bing Chat, but not as much as the ecosystem of available plugins we have now. And they don't provide APIs, which is what you want as a developer.

Simon Willison: I have a very simple need. If the code interpreter is running on a finely tuned model, then I want to use this model directly through the API, but let's have our own functions to evaluate the code so that we can build the ideal version of the code interpreter that the individual wants, with all the features that the individual wants, which is not harmful to OpenAI, and they can be billed per usage model. But we are free to play in our own Kubernetes containers, make network access, etc.

Host: Simon just mentioned that the code interpreter is a fine-tuned model. Because I was also developing the function model before they released it, I did some research and found that the new function model understood the task of the code interpreter better, so I think they fine-tuned it.

Kyle: I wanted to bring multimodal capabilities together. When you use a model in a code interpreter, sometimes you create hallucinations while drawing.

Audience: OpenAI should seriously consider having ChatGPT provide social features that allow everyone to collaborate with people using similar intent prompt models. In other words, make the experience social. Now that each of us operates in our own individual containers, literally and figuratively, the social version of ChatGPT will be a "quantum leap" and look forward to participating in these social experiences.

Host: OpenAI notes the "Shared GPT" phenomenon, where many people share their interactions with GPT by sharing the content of their sessions, thus driving communication and collaboration in the community. Now it seems that there is a way to share sessions to some extent, but not sure if it works for the code interpreter.

Simon Willison: Yes, the chart won't be displayed. So there will be a blank space where the chart output is displayed.

R5: The focus is not only on sharing, but also on being able to find like-minded analysts and prompters to make connections. Because now only the model and OpenAI know the content of our prompt, no one else knows unless we share it, and third-party plugins can't fill this loophole.

Simon Willison: The reason Midjourney is the best image tooling is because people have to use it openly and learn from each other.

Lantos: I hope OpenAI will release a Docker tool that allows users to run it on their own computers, connect GPT models directly to it, and then evaluate their code locally. I think this project may be realized soon, even by OpenAI interns.

Gabriel: I think the best use case for a code interpreter is for business analysts. Business analysts need a deep understanding of the business, users, and markets, but only basic data analysis skills. For junior business analysts who are new to the company, it takes a year or two to really understand the business and everything related to it; Executives, on the other hand, already have a deep understanding of the business, but may need some help with data analysis. So code interpreters have great potential in this area.

To realize the full potential of code interpreters for business analysts, OpenAI needs to provide better ways to feed data into models rather than just upload files. I think you can provide an API key that allows the model to query and analyze the data directly.

Simon Willison: I've built a plugin version with my dataset software, and the plugin really gives us a way to do that. If you can upload files up to 100 megabytes, then for most business problems, you can compress the data to less than 100 megabytes (for example, by querying the data warehouse, extracting important information from log files from the last 30 days, and storing it as SQLite files or CSV files under 100 megabytes), then uploading it to a code interpreter for final analysis. So if you're willing to take the effort to extract data into 100 megabytes of data, you can go quite far with the help of existing tools.

Gabriel: Yes. The code interpreter is useful to some extent, but it doesn't completely solve the problem, because ultimately it's you who decides what data to upload. But when it comes to solving problems, you don't know what data you actually need at first, it's a trial-and-error process trying to figure out which columns, which rows, which table of data you need. If I have to figure all of this out before I start solving the problem, then I'm already limited to specific content and can't follow the data.

Kyle: By giving different code style requirements, such as wanting the interpreter to write code like a data engineer or statistician, you can get different styles of code output. This approach allows you to get code explanations for different "characters". What's more interesting is that you can also have it do ETL (data extraction, transformation, and loading) work or perform EED (data engineering) tasks as if interacting with different "roles".

Swyx: OpenAI is developing vision models and is already testing them in alpha. In addition, OpenAI is also working on fine-tuning functions, gradually deprecating some old models or functions, and plans to launch new Instruct models. So what will OpenAI do next after completing the above project? I think it could be GPT-5. GPT-4 is currently likely to be the final product of OpenAI in Phase IV, just as successful as Phase IV of the Marvel Cinematic Universe. I hope to achieve fine-tuning in the fourth phase, like Spider-Man: Homeless, the climax and finale of this phase, and then bring new innovations in the fifth phase.

GPT-4.5 Preview: Code Interpreter, A New Era of Programming
era

Read on