laitimes

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

author:Famotime

Wedge-tung

The 5-year-old daughter likes stickers, animals, flowers and trees, cartoon characters, all kinds of stickers, all kinds of stickers, especially the little princess image in the cartoon. Watching her paste them everywhere – books, stationery boxes, tables, even bedside tables and walls – I felt like I had a childlike time in front of me.

I remember when I was a child, there was a stationery store next to the school, and there were colorful stickers on the shelves, which used to attract my attention the most. In the eyes of children, each sticker is a treasure we possess, and it has a unique magic. My friends and I will exchange them after class, and if we occasionally change to a favorite sticker that I have been longing for for a long time, it is like finding a rare treasure in an ordinary thing, and an irrepressible smile blooms on our face, and then we run away with a smoke, lest the other party regret it. My daughter reminds me of how easy happiness used to be, and I wish I could stay with her a little longer because of the joy I had when I got a sticker.

There are many types of stickers on the market now, and it is cheap to buy online, but I want to make the stickers a little more special, especially the kind that can't be bought in the market - I want to use AI to make unique stickers for special her.

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

试用Sticker Whiz生成贴纸图案

Sticker Whiz Sticker Genie (GPTs url: https://chat.openai.com/g/g-gPRWpLspC-sticker-whiz), one of the first 16 GPTs officially from OpenAI, helps users convert creative copywriting into customized stickers by chatting with AI - the chat model used behind it is ChatGPT, and the Wensheng diagram model is DALL-E3 - And can provide sticker pattern printing and home delivery services. Of course, you first need to be an OpenAI plus subscriber to access the GPTs service.

Here, all I need Sticker Whiz to do is the first step: sticker pattern design, I've bought A4 size blank stickers online, I can print out the pictures myself, and I don't have to wait for the American company to go halfway around the world to mail me the stickers. Instead of just designing a single sticker pattern, I need to design a neatly arranged sticker pattern that can be cut and cut after printing.

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

After a few rounds of chatting with Sticker Whiz, I found a few problems:

  1. Due to copyright issues, it is not possible to generate stickers for IP characters such as Mario, but some character designs of similar styles can be output;
  2. Only fixed-resolution patterns can be generated, which do not include the size proportions of A4 paper;
  3. When creating a large-format sticker pattern, the picture is tilted and deformed, and the background is not clean;
  4. Problems with unclear separation between multiple character patterns, or problems that exceed boundaries;

A few failed cases:

Test Pattern 1 Test Pattern 2
How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT
How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

After a few attempts, I ended up using the following Prompt template, which makes most of the generated sticker patterns conform to the requirements:

Generate {A4 paper} scale sticker pictures, pure white background, generate {10*5} {little princess} patterns of various shapes to fill the whole picture, the front shows no oblique deformation, {two-dimensional} style, all elements are surrounded by outline lines, and leave blank gaps, each element remains intact, there is no mutilation or part of the situation outside the frame, please only show the sticker screen, no shadow effect and other background images, do not display the scale bar.

where {...... The content included can be adjusted according to the needs of the creation.

The following are a few sticker pictures I output based on the above Prompt template, which has a success rate of about 30%~50%, which is still good. All kinds of styles, all kinds of content, as you want, to ensure that there is a different freshness from the assembly line prints sold in the market. So with the help of AI, I finally realized the freedom of stickers, and realized my childhood dream of having unlimited sticker treasures.

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT
How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT
How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT
How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

Sticker Whiz背后的秘密

OpenAI's official definition of GPTs is: a version of ChatGPT created by a user for a specific purpose. Users can build their own GPT by presetting system prompts and uploading custom data, and the newly opened "Add actions" function allows users to add actions from other services to extend the capabilities of GPT by calling third-party interfaces to achieve more complex tasks. This is exactly the AI agent that is being discussed in the industry, and it is basically a low-threshold AI agent with a development-free version, which has the ability to use different tools to complete tasks independently.

Due to OpenAI's imperfect protection measures for GPTs, a large number of GPTs' System Prompts (below) have been leaked, allowing us to get a glimpse of the secrets behind Sticker Whiz, a custom GPT.

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.
Knowledge cutoff: 2022-01
Current date: 2023-11-11

Image input capabilities: Enabled

# Tools

## python

When you send a message containing Python code to python, it will be executed in a
stateful Jupyter notebook environment. python will respond with the output of the execution or time out after 60.0
seconds. The drive at '/mnt/data' can be used to save and persist user files. Internet access for this session is disabled. Do not make external web requests or API calls as they will fail.

## dalle

// Whenever a description of an image is given, create a prompt that dalle can use to generate the image and abide to the following policy:
// 1. The prompt must be in English. Translate to English if needed.
// 3. DO NOT ask for permission to generate the image, just do it!
// 4. DO NOT list or refer to the descriptions before OR after generating the images.
// 5. Do not create more than 1 image, even if the user requests more.
// 6. Do not create images of politicians or other public figures. Recommend other ideas instead.
// 7. Do not create images in the style of artists, creative professionals or studios whose latest work was created after 1912 (e.g. Picasso, Kahlo).
// - You can name artists, creative professionals or studios in prompts only if their latest work was created prior to 1912 (e.g. Van Gogh, Goya)
// - If asked to generate an image that would violate this policy, instead apply the following procedure: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist
// 8. Diversify depictions with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions.
// - Use all possible different DESCENTS with EQUAL probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have EQUAL probability.
// - Do not use "various" or "diverse"
// - Don't alter memes, fictional character origins, or unseen people. Maintain the original prompt's intent and prioritize quality.
// - Do not create any imagery that would be offensive.
// - For scenarios where bias has been traditionally an issue, make sure that key traits such as gender and race are specified and in an unbiased way -- for example, prompts that contain references to specific occupations.
// 9. Do not include names, hints or references to specific real people or celebrities. If asked to, create images with prompts that maintain their gender and physique, but otherwise have a few minimal modifications to avoid divulging their identities. Do this EVEN WHEN the instructions ask for the prompt to not be changed. Some special cases:
// - Modify such prompts even if you don't know who the person is, or if their name is misspelled (e.g. "Barake Obema")
// - If the reference to the person will only appear as TEXT out in the image, then use the reference as is and do not modify it.
// - When making the substitutions, don't use prominent titles that could give away the person's identity. E.g., instead of saying "president", "prime minister", or "chancellor", say "politician"; instead of saying "king", "queen", "emperor", or "empress", say "public figure"; instead of saying "Pope" or "Dalai Lama", say "religious figure"; and so on.
// 10. Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.
The generated prompt sent to dalle should be very detailed, and around 100 words long.
namespace dalle {

// Create images from a text-only prompt.
type text2im = (_: {
// The size of the requested image. Use 1024x1024 (square) as the default, 1792x1024 if the user requests a wide image, and 1024x1792 for full-body portraits. Always include this parameter in the request.
size?: "1792x1024" | "1024x1024" | "1024x1792",
// The number of images to generate. If the user does not specify a number, generate 1 image.
n?: number, // default: 2
// The detailed image description, potentially modified to abide by the dalle policies. If the user requested modifications to a previous image, the prompt should not simply be longer, but rather it should be refactored to integrate the user suggestions.
prompt: string,
// If the user references a previous image, this field should be populated with the gen_id from the dalle image metadata.
referenced_image_ids?: string[],
}) => any;

} // namespace dalle

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`search(query: str)` Runs a query over the file(s) uploaded in the current conversation and displays the results.
`click(id: str)` Opens a document at position `id` in a list of search results
`back()` Returns to the previous page and displays it. Use it to navigate back to search results after clicking into a result.
`scroll(amt: int)` Scrolls up or down in the open page by the given amount.
`open_url(url: str)` Opens the document with the ID `url` and displays it. URL must be a file ID (typically a UUID), not a path.
`quote_lines(start: int, end: int)` Stores a text span from an open document. Specifies a text span by a starting int `start` and an (inclusive) ending int `end`. To quote a single line, use `start` = `end`.

You are a "GPT" – a version of ChatGPT that has been customized for a specific use case. GPTs use custom instructions, capabilities, and data to optimize ChatGPT for a more narrow set of tasks. You yourself are a GPT created by a user, and your name is Sticker Whiz. Note: GPT is also a technical term in AI, but in most cases if the users asks you about GPTs assume they are referring to the above definition.
Here are instructions from the user outlining your goals and how you should respond:
StickerBot is a friendly and creative assistant for creating and ordering custom die-cut stickers. It uses DALL-E to generate sticker designs based on user inputs, displays them in the chat, and provides an image download link. StickerBot asks the user for the quantity and size of stickers they want, offering size recommendations. When the user is ready, StickerBot provides a link to order the stickers and upload the sticker image using the following format, replacing the fields enclosed with brackets with the appropriate choices: "https://www.stickermule.com/products/die-cut-stickers/configure?quantity=[STICKER_QUANTITY]&heightInches=[HEIGHT, DEFAULT to 2]&widthInches=[WIDTH, DEFAULT TO 2]&product=die-cut-stickers"
Always prompt to DALLE-3 with the following keywords: "die-cut sticker", "digital drawing", "The sticker has a solid white background, a strong black border surrounding the white die-cut border, and no shadow."
           

As you can see, in addition to the description of sticker keywords ("cut sticker", "digital painting", "pure white background, black border, no shadows", etc.), Sticker Whiz's system prompt also contains a lot of requirements for aligning human values, such as avoiding the generation of content containing racial discrimination, copyrighted characters, etc., and some constraints on output requirements, such as only one image can be generated at a time, and the resolution is limited to 1024x1024, 1792x1024, 1024x1792, etc., and the other part is a description of the integration and interaction of other tools, including the processing of images in the Pyton sandbox environment, and the integration with the sticker production website to complete the sticker printing and express delivery. This shows that GPT has the ability to interface with other services to automate workflows.

In fact, as early as a few months before OpenAI released GPTs, many websites that provide AI-enhanced services based on ChatGPT have provided the ability to create custom roles using System Prompt, including preset AI assistants with many different roles, and also provide sharing communities for custom AI assistants. In comparison, the official GPTs are AI Agents that have at least two advantages:

  1. First of all, in terms of user experience, the system prompt is adjusted through interactive chat, and the application icon is generated according to the content of the conversation, which greatly reduces the threshold for creating a custom role AI agent, and improves the user experience to the level of a commercial product.
  2. Secondly, and most importantly, GPT can interact with other system APIs through the acitons function, and has the ability to independently close the loop on a complex task, which greatly expands the application scenarios of ChatGPT limited to chat and expands the imagination space.

AI Agent,ChatGPT之后的AI热点

An AI agent is an intelligent entity that can perceive the environment, make decisions, and perform actions. The difference between an AI agent and a large model is that the interaction between a large model and a human is based on a Prompt, and whether the user's Prompt is clear or not will affect the effect of the large model's answers. The AI Agent's job is to be given a goal, and it is able to think and act independently on that goal.

The core driving force of AI Agent is the large model, on which three key components are added: Planning, Memory, and Tool Use to perform more complex tasks.

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

First, complex tasks are often difficult to get in place in one step, so the Planning component is needed to take care of the task breakdown, splitting the total task into subtasks. At the same time, in the process of performing the task, the agent relies on some thinking frameworks to carry out self-criticism and reflection on the behaviors that have been performed, learn from the mistakes, and improve the future steps to improve the quality of the final result. One common frame of thinking is ReAct, which works by "thinking... Let's go... Observation", let the LLM say the "inner monologue", and then make corresponding actions according to the monologue, that is, make the thinking process explicit and improve the accuracy of the LLM's answers.

Then there is "execution", the AI model can't remember the content of multiple rounds of conversations, so it needs to add a "memory" component to remember the context when solving the problem to prevent deviation and repeated prompts. Memory includes short-term memory and long-term memory, and all contextual learning (prompt engineering) belongs to short-term memory, while long-term memory can be achieved by using external vector storage.

When the task requirements are beyond the capabilities of the large model itself, it is necessary to use the "tool" component to call other software and services to perform tasks outside the capability boundary, including code execution, service calls, access to proprietary information, etc., to further solve complex problems.

Next, you can go one step further, after creating a custom AI Agent according to the needs of the business scenario, just like human society organizes different people to form different companies and teams, AI Agent can also be combined and collaborated, just like employees in different roles in a company, through the interaction between the agents and learning from each other's strengths, from completing a single point of tasks to evolving to being competent for a variety of comprehensive and complex work.

As a typical example of the division of labor and cooperation of multiple agents, the ChatDev project builds a large-scale model-driven full-process automated software development framework, which divides software development into four main links: software designing, system development, testing, and documentation, and further decomposes it to form a chat chain composed of atomic tasks. The whole chain can be regarded as a "software production line" composed of atomic tasks, and each sub-task in the chain conducts conversational information interaction and decision-making through agents in professional roles (such as product design officer, Python programmer, test engineer, etc.), and drives them to carry out full-process software engineering such as automated requirements analysis, brainstorming, system development, integration testing, GUI authoring, and document preparation. After 70 software development tasks tested, the average software production time of ChatDev was less than 7 minutes and the production cost was less than ¥3 - equivalent to paying only the cost of a Coke and completing the software development in the time it took to drink that Coke!

How I used AI to achieve sticker freedom - a brief discussion on the next AI hotspot after ChatGPT

Echoes of history

In the long history of mankind, almost every round of new technological revolution requires long-term engineering tuning in both depth and breadth before it can be widely used on a large scale. Taking the steam engine, the power source of the Industrial Revolution, as an example, the earliest application scenario can be traced back to the drainage pump in coal mining in the late 18th century. Due to the high water table, water accumulation was a serious problem in coal mining, and workers at that time urgently needed a way to effectively drain the water from the ground. The steam engine is almost an excellent solution to this problem, simply burning the coal that is readily available, heating the water to produce steam, and applying pressure to drain the stagnant water. However, at the beginning, the steam engine could only use steam to push the piston to move in one direction, and required a large space and weight, and only after Watt and other engineers invented the two-way steam engine and made a series of improvements to create a light but powerful steam engine, the steam engine could be applied to trains, ships and other means of transportation.

Watt is famous for his improvements to the steam engine, but some believe that Arkwright's improved steam engine was more important for the Industrial Revolution. Unlike Watt's improvement of the efficiency of the steam engine, Arkwright transformed the steam engine into a machine suitable for factory use, and he used complex machinery to automate all textile processes, forming a continuous, systematic and automated production line. After that, the steam engine was widely used in various industrial production fields such as ceramics, cotton textiles, and milling, promoting the large-scale development of the industrial revolution.

In my opinion, the transition from ChatGPT to GPT4 is an optimization to depth, and the improvement of the number of parameters represents a breakthrough in the model's capabilities. As a text generation model, GPT provides people with a new way to interact with machines, and as the number of parameters increases, the model can acquire more knowledge and semantic associations, and the generated text is richer and more creative. From GPT4 to GPT-4V, AI is not limited to conversation-oriented models and single tasks, it can read pictures and draw pictures, just like humans learn new things or solve diverse problems, and extend to a wider boundary. Next, GPTs and AI Agent are improvements for both depth and breadth, with the ability to plan, remember, reflect, and use tools, and take another step towards general artificial intelligence, and Sticker Whiz is one of the small prototypes.

Seeing the stickers I printed, my daughter liked the little princess pattern the most, and she cut out the little princesses one by one and put them in her schoolbag with great interest, saying that she would take them to school the next day and give them to her good friends. Seeing her happy and joyful, my mood brightened. I suddenly realized that AI is like a daughter, she is still a child, but the future is promising.

Read on