我是如何利用AI實作貼紙自由的 —— 淺談ChatGPT後的下一個AI熱點

試用Sticker Whiz生成貼紙圖案

Sticker Whiz 貼紙精靈(GPTs url: https://chat.openai.com/g/g-gPRWpLspC-sticker-whiz),OpenAI 官方首批16個GPTs之一,可幫助使用者通過與AI聊天方式将創意文案轉化為定制的貼紙——背後使用的聊天模型是ChatGPT,文生圖模型是DALL-E3——并可提供貼紙圖案列印及快遞到家的服務。當然,你首先需要是OpenAI plus的訂閱使用者,才能通路GPTs服務。

在這裡,我需要Sticker Whiz完成的隻是第一個步驟:貼紙圖案設計,我已在網上購買了A4尺寸的空白貼紙,可以自己列印出圖,不用等美國公司繞半個地球給我郵寄貼紙。我需要設計的是一些排列整齊的貼紙圖案,可以供列印後裁剪切割,而非僅是設計一個單獨的貼紙圖案。

試着跟Sticker Whiz聊天溝通了幾輪,發現了幾個問題:

  1. 由于版權問題,不能生成馬裡奧等IP角色貼紙,但可以輸出一些類似風格角色設計;
  2. 僅能生成固定分辨率的圖案,其中并不包含A4紙的尺寸比例;
  3. 創作大幅面的貼紙圖案時,圖檔有傾斜變形問題,背景也不幹淨;
  4. 多個角色圖案之間分隔不清晰,或者超出邊界的問題;


測試圖案1 測試圖案2
Sticker Whiz背後的秘密

OpenAI對GPTs的官方定義是:使用者為特定目的建立的ChatGPT版本。使用者可以通過預設系統提示、上傳自定義資料的方式來建構自己的GPT,而且新開放的“Add actions”功能允許使用者添加其他服務的動作,通過調用第三方接口擴充GPT的能力,實作更複雜的任務。這正是行業裡讨論的甚嚣塵上的AI Agent(人工智能體),而且基本是免開發版低門檻的AI Agent,具備使用不同工具獨立完成任務的能力。

由于目前OpenAI對GPTs系統防護措施不完善,大量GPTs的系統提示(System Prompt,如下)已經被洩漏出來,讓我們可以一窺Sticker Whiz這個自定義GPT背後的秘密。

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.
Knowledge cutoff: 2022-01
Current date: 2023-11-11

Image input capabilities: Enabled

# Tools

## python

When you send a message containing Python code to python, it will be executed in a
stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0
seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail.

## dalle

// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy:
// 1. The prompt must be in English. Translate to English if needed.
// 3. DO NOT ask for permission to generate the image, just do it!
// 4. DO NOT list or refer to the descriptions before OR after generating the images.
// 5. Do not create more than 1 image, even if the user requests more.
// 6. Do not create images of politicians or other public figures. Recommend other ideas instead.
// 7. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).
// - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya)
// - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist
// 8. Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
// - Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.
// - Do not use "various" or "diverse"
// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
// - Do not create any imagery that would be offensive.
// - For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.
// 9. Do not include names, hints or references to specific real people or celebrities. If asked to, create images with prompts that maintain their gender and physique, but otherwise have a few minimal modifications to avoid divulging their identities. Do this EVEN WHEN the instructions ask for the prompt to not be changed. Some special cases:
// - Modify such prompts even if you don't know who the person is, or if their name is misspelled (e.g. "Barake Obema")
// - If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.
// - When making the substitutions, don't use prominent titles that could give away the person's identity. E.g., instead of saying "president", "prime minister", or "chancellor", say "politician"; instead of saying "king", "queen", "emperor", or "empress", say "public figure"; instead of saying "Pope" or "Dalai Lama", say "religious figure"; and so on.
// 10. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.
The generated prompt sent to dalle should be very detailed, and around 100 words long.
namespace dalle {

// Create images from a text-only prompt.
type text2im = (_: {
// The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.
size?: "1792x1024" | "1024x1024" | "1024x1792",
// The number of images to generate. If the user does not specify a number, generate 1 image.
n?: number, // default: 2
// The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.
prompt: string,
// If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.
referenced_image_ids?: string[],
}) => any;

} // namespace dalle

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
`quote_lines(start: int, end: int)` Stores a text span from an open document. Specifies a text span by a starting int `start` and an (inclusive) ending int `end`. To quote a single line, use `start` = `end`.

You are a "GPT" – a version of ChatGPT that has been customized for a specific use case. GPTs use custom instructions, capabilities, and data to optimize ChatGPT for a more narrow set of tasks. You yourself are a GPT created by a user, and your name is Sticker Whiz. Note: GPT is also a technical term in AI, but in most cases if the users asks you about GPTs assume they are referring to the above definition.
Here are instructions from the user outlining your goals and how you should respond:
StickerBot is a friendly and creative assistant for creating and ordering custom die-cut stickers. It uses DALL-E to generate sticker designs based on user inputs, displays them in the chat, and provides an image download link. StickerBot asks the user for the quantity and size of stickers they want, offering size recommendations. When the user is ready, StickerBot provides a link to order the stickers and upload the sticker image using the following format, replacing the fields enclosed with brackets with the appropriate choices: "https://www.stickermule.com/products/die-cut-stickers/configure?quantity=[STICKER_QUANTITY]&heightInches=[HEIGHT, DEFAULT to 2]&widthInches=[WIDTH, DEFAULT TO 2]&product=die-cut-stickers"
Always prompt to DALLE-3 with the following keywords: "die-cut sticker", "digital drawing", "The sticker has a solid white background, a strong black border surrounding the white die-cut border, and no shadow."

可以看到,Sticker Whiz的系統提示裡除了對貼紙關鍵詞的描述("切割貼紙","數位繪畫","純白背景,黑色邊框,沒有陰影"等),還包含了大量對齊人類價值觀的要求,比如避免生成包含種族歧視内容、版權保護角色等内容;還有一些對輸出要求的限制,比如一次隻能生成一張圖檔,分辨率限定為1024x1024、1792x1024、1024x1792這三種等;另外一部分則是對其他工具內建互動的說明,包括在pyton沙箱環境下對圖檔的處理,以及與貼紙生産網站的對接,以完成貼紙印刷和快遞發貨。這表明,GPT已經具備了跟其他服務的關聯對接,實作自動化工作流的能力。

其實,早在OpenAI釋出GPTs的幾個月前,很多基于ChatGPT提供AI增強服務的網站,已經提供了使用System Prompt建立自定義角色的功能,包括預置了很多不同角色的AI助手,也提供了自定義AI助手的分享社群。相比起來,官方GPTs即AI Agent至少有兩方面的優點:

  1. 首先是使用者體驗方面,通過互動聊天方式調整system prompt,以及根據對話内容生成應用圖示,這兩方面的改進大大降低了自定義角色AI Agent的建立門檻,将使用者體驗提升到了一個商用産品的水準。
  2. 其次,也是最重要的方面,是GPT可以通過acitons功能和其他系統API互動,具備獨立閉環一個複雜任務的能力,大大擴充了ChatGPT僅限于聊天的應用場景,做大了想象力空間。

AI Agent,ChatGPT之後的AI熱點

AI Agent(人工智能體)是一種能夠感覺環境、進行決策和執行動作的智能實體。AI Agent 和大模型的差別在于,大模型與人類之間的互動是基于 Prompt 實作的,使用者 Prompt 是否清晰明确會影響大模型回答的效果。而 AI Agent 的工作僅需給定一個目标,它就能夠針對目标獨立思考并做出行動。

AI Agent的核心驅動力是大模型,在此基礎上增加規劃(Planning)、記憶(Memory)和工具使用(Tool Use)三個關鍵元件,以執行更加複雜的任務。

我是如何利用AI實作貼紙自由的 —— 淺談ChatGPT後的下一個AI熱點

首先,複雜任務往往難以一步到位,是以需要“規劃”元件來負責任務分解,将總任務拆分為各項子任務。同時在執行任務的過程中,智能體依托一些思維架構對已執行的行為展開自我批評和反思,從錯誤中吸取教訓,并針對未來的步驟進行完善,提高最終結果的品質。其中一個常見的思維架構是ReAct,通過“思考…行動…觀察”的循環疊代,讓LLM把“内心獨白”說出來,再根據獨白做相應的動作,即把思考過程顯性化,提高 LLM 答案的準确性。



接下來還可以更進一步,根據業務場景需求建立了自定義的AI Agent之後,就像人類社會把不同的人組織起來形成不同的公司和團隊,AI Agent也可以被組合協同起來,就像一家公司裡擔任不同角色的員工,通過Agent之間的互動和取長補短,從完成一個個單點任務,進化到勝任各種綜合性的複雜工作。

作為一個讓多個Agent分工合作的典型例子,ChatDev 項目建構了一個大模型驅動的全流程自動化軟體開發架構,将軟體開發分為軟體設計(Designing)、系統開發(Coding)、內建測試(Testing)、文檔編制(Documenting)四個主要環節,并進一步分解形成由原子任務構成的交流鍊(Chat Chain)。整條鍊可視為由原子任務組成的“軟體生産線”,鍊中每個子任務通過專業角色(例如産品設計官、Python 程式員、測試工程師等)的智能體進行對話式資訊互動和決策,驅動其進行自動化需求分析、頭腦風暴、系統開發、內建測試、GUI 創作、文檔編制等全流程軟體工程。經70個軟體開發任務測試,ChatDev 的軟體制作平均時間小于7分鐘且制作成本小于¥3元—— 相當于僅支付一杯可樂的費用,并在喝完這杯可樂的時間裡就完成了軟體開發!

我是如何利用AI實作貼紙自由的 —— 淺談ChatGPT後的下一個AI熱點




在我看來,從ChatGPT到GPT4是向深度的優化,參數量的提升代表了模型能力的突破。GPT作為一種文本生成模型,為人們提供了與機器互動的全新方式,随着參數量的增加,模型能夠擷取更多的知識和語義關聯,生成的文本更具有豐富性和創造性。從GPT4到GPT-4V則是向廣度的優化,AI不僅僅局限于面向對話的模型和單一任務,它能夠識圖畫圖,就像人類學習新事物或解決多元化問題一樣,延伸到更廣的邊界。接下來,GPTs和AI Agent是同時面向深度和廣度的改進,具備了規劃、記憶、反思、使用工具的能力,向通用人工智能又邁進了一步,而Sticker Whiz則是其中一個小小的雛形。

