Midjourney's New Model V6 Beta Hands-On Review - The Semantic Revolution Has Finally Arrived

A little unexpected,Midjourney caught off guard,At 13:56 pm Beijing time,Lao Mei at about 10 o'clock at night,Released a new version of the V6 Beta model。。

They wanted to take advantage of the fact that they hadn't gone home for Christmas yet, so they hurried to send it first. Then let everyone spend Christmas over and over again, and give them black jobs to help them test the model.

This time it was sent to me, it was really a little caught off guard.

Now, you can use the /setting command and in the model selection box, you can see the V6 beta version.

按照Midjourney的话说：“This is an alpha test. Things will change frequently and without notice”

"It's an alpha test. Things are subject to change frequently without notice. ”

Includes mentioning that "speed, image quality, coherence, cue following, and text accuracy will improve in the coming weeks"

My guess is that the model is around 85% complete, and there is a lot of alignment left to be done.

Officially, the current V6 Beta update content:

1. More accurate cues to follow and longer cues to follow.

2. Improve coherence and model knowledge.

3. Improved image cue and remixing.

4. Smaller text-drawing abilities (you must write the text in "quotes" and --style raw or a lower --stylize value may help)

5. Improved upscaler with 'subtle' and 'creative' modes (2x higher resolution)

It's a bit too nonsense, but I translate it into human words: I can hold more word tokens, the semantic understanding is stronger, the image quality is better, I can embed a little English word into it, and the image can be enlarged more.

I ran with hundreds of group friends for more than three hours, and the biggest changes in the overall feeling were actually two parts:

Improved image quality and enhanced semantic understanding.

I'll take it apart and talk about it.

1. Improvement of image quality

Image quality I focus on the following latitudes: detail density, material texture, color performance, light and shadow performance, composition performance, and structural authenticity.

Midjourney V6 Beta has been greatly improved in terms of detail density, material texture, light and shadow performance, and structural authenticity. There is a slight improvement in composition and color expression.

Put the comparison picture directly, there is nothing to say, some of the pictures are from good friends @ fresh shrimp bun, @猫宅V酱, @大峰AI绘画.

I won't put too much.,Anyway, the core idea is one:Image quality up or up。

Just know that there is a huge improvement in detail density, material texture, light and shadow performance, and structural authenticity.

The semantic understanding of the following lies is the core focus.

2. Strengthening of semantic understanding

I'm going to focus on the changes in semantic understanding.

I once didn't shy away from scolding Midjourney, whose semantic understanding is a piece of. When I was doing AI works in the past, I put on a pain mask countless times and wanted to smash the computer.

But this time, the Semantic Understanding enhancement in Midjourney V6 Beta has finally reached the point where it is usable in the middle and upper reaches, note that I use the word "usable". It's not compared to Dalle3, which is against the sky.

First of all, please forget all the SD style writing, don't write tags, don't write garbage words like "photorealistic, 4k, 8k", they don't have any eggs for the new V6 Beta.

Don't write tag, don't write tag, don't write tag, I say important things 3 times.

Don't directly use the prompt of v5.2 to copy and paste, please Xi the writing of the new version of V6 again.

Overall, I think there is more emphasis on the ability to make the details clear. The structure I recommend is:

Scene + subject + detail + aesthetic style.

For example, my Prompt:

Three different beautiful girlfriends sit on the beach and smile at the camera. In the center is a cheerful blonde Caucasian woman wearing shorts and a red tank top. The friend on the left is a brunette American girl in a bikini and a sheer skirt. The friend on the right is a red-haired British girl in a bikini. The sea can be seen in the background with boats and seagulls flying on the sea. Medium shot shot with the Agfa Vista 200.

It's very complicated, but I almost always write in long sentences, clearly describing the scene and the subject, as well as some of the details of the clothes.

Looking at v5.2 again, we can make a clear comparison in terms of semantic understanding.

On the way the new version of the prompt is written, I will say some tips that I think are interesting:

1. You can specify any detail.

Don't be stingy with your imagination, let alone your pen and ink, use all logical and structured language to specify the details you want. What clothes she wore, what material she made, what color she was. Write clearly, write clearly more than everything.

Like what:

Close-up of Marvel's Black Widow. Black Widow is dressed in Spider-Man's black bodysuit, she holds Captain America's shield in her left hand and Thor's hammer in her right hand, and she is sad. --ar 16:9

2. You can use language to control the composition.

In V6, it's now easy to use semantics to locate things in relation to each other and to the camera. This one is almost unimaginable in V5.

Now we can describe a generic image with a phrase, focusing on this generic image. Fill in the details around it.

Like what:

On the table in the living room there are three baskets full of fruit. The middle basket contains strawberries. The basket on the left is filled with oranges. The basket on the right is filled with mangoes. In the background is a blank blue-green wall with round windows.

3. You can add text to your images.

Dalle3 is an old tradition. You can add text to your image, put the text you want to embed in the image in "quotation marks", and make sure that the screen you describe is suitable for embedding text. For example, if you say you want to write "shabi" on her nose, you can't blame MJ for not being able to write it, right?

Like what:

Close-up of Trump at the Apple conference. He held a sign in the crowd that read "SHABI."

Write at the end

The fun that Midjourney V6 Beta brings is certain, the enhancement of realism and image quality, basically to the peak of all AI drawings now, coupled with a unique aesthetic, and epic semantic enhancements.

It can be seen that the launch of the official version of Midjourney V6 in the future will definitely start to be "far ahead" again. ”

Admittedly, there are still a lot of problems, such as the fact that the reinforcement of other styles does not seem to be obvious.

For example, many features are not yet supported.

But the flaws are not hidden.

Midjourney V6 is not the last step in AI drawing.

It's another monument in the long history of AI.

Midjourney's New Model V6 Beta Hands-On Review - The Semantic Revolution Has Finally Arrived

Read on

CCF-Alimama Technology Bag Fund Press Conference and the First Phase of "Large Model Direction" Topic Seminar

AI Drawing: LoRa Model Training Carefully

Large models, autonomous driving, low-altitude economic ...... Director of the Economic and Information Commission talks about "new quality productivity"

Data Elements × Financial Services: The Role of Large Models |

From the language model to the Wensheng video model, does reality not exist? A disruptive technological revolution is taking place, and we will talk about the future of computing power with top experts in artificial intelligence

Beat the palm of the hand with the ruler and let the child smash the beloved model by himself. "Senior family education expert" Zhao Juying caused huge controversy, and some netizens called it a "bullying home visit"

Tesla Model 2 or iPhone 16 series models coming on sale next year have been revealed

The strongest Chinese model! Alibaba Cloud announced that it has fully caught up with GPT-4, and the number of downloads has exceeded 7 million [with AIGC industry market competition analysis]

How many minutes does a hitchhiker take orders? The Tick Travel AI model predicts this

Google Releases Blockbuster AI Models! Predicting all the biomolecules on Earth will greatly accelerate research into the treatment of diseases such as cancer

The new large model version 2.5 is on the back of Alibaba Cloud

The first domestic large cruise ship, hydrogen energy products, algorithm large model, Shanghai IP unveiled at the 2024 China Brand Day

Trading experience: a minimalist model that is off to a good start

How to design large-scale model products in order to truly integrate business and make users feel value?

Google Launches Next-Generation AI Model for Drug R&D! Can AI healthcare make a comeback?

The "Lenovo Department" model company has completed nearly 100 million yuan in Pre-A round of financing, and AI+ manufacturing is coming