WAIC transcript

At the recent WAIC 2023 Industry Development Forum, David Holz, founder and CEO of Midjourney, delivered a video presentation. David Holz expressed his love for China in his speech, revealing that Midjourney's name comes from the Taoist Zhuangzi. Holz also revealed the capabilities of the next version of Midjourney in his speech, as well as his own development of artificial intelligence products.

Here's a transcript of David Honz's speech: I'm David Holz, founder and CEO of Midjourney. Thank you Mr. Chen and the Shanghai Municipal Government for today's invitation. I am honored to participate in this WAIC and I look forward to being there one day.

Q: What is Midjourney's contribution to the AI industry and what does it mean for artists, designers, and media producers? A: I think one of the most important technologies in the world is the engine. Engines are machines that generate, transmit, and amplify actions. We use engines to build vehicles, planes and boats. It is important to think of AI as a new engine. At Midjourney, we're trying to use this engine to build a new kind of vehicle, not a physical vehicle, but a vehicle for our minds and imaginations, just as you would move in the world with a car. I hope we can create a means of transportation, not for movement, but for imagination. I think before we can create, we first have to imagine, what can we become? Where can we go? Think about everything we can. I think the tools being made focus more on amplifying the raw power of the imagination than anything. I think broadly speaking, this is an opportunity for humanity as a whole to think about effectively.

Q: You mentioned that you visited China when building hardware at Leap Motion, can you tell us about your connection to China and Shanghai? A: I've visited China many times in the past with Leap Motion. Leap Motion's first international office was in Shanghai, and I really liked the environment and style of Shanghai, feeling that classic and modern coexist, like various cities. For example, San Francisco, New York, some ancient cities in Europe and Chinese style can be enjoyed in one place at the same time. It has that ancient historical power, there's a reality and future excitement, and that's really, really cool. My two favorite books to read are science fiction and ancient Chinese literature. I think ancient Chinese literature has some of the most beautiful and deep reflections on human history. The name Midjourney actually comes from a translation of one of my favorite Taoist books, and it comes from Zhuangzi. I love the name. I like the word "Middle Way" translation because I think it's sometimes easy to forget the past. It's easy to feel lost and uncertain about the future. But more than that, I feel like we're actually on a journey and we're coming from this rich and beautiful past. And before us is this crazy and unimaginable precious future.

Q: Congratulations to Midjourney for launching V5.2! Can you tell us more about the latest features of MidJourney and plans for future releases? A: So we recently released version 5.2 of Midjourney, working on 5.3 before the major release. The latest feature we've introduced is image expansion, which can be generated with text prompts. So when you zoom in, you can create different stories around the central theme. This week we released a similar feature, PAN that allows users to pan shots. Then you can keep changing the cue as you move the camera sideways, and then tell the story. We've also released /weird, a smart feature that combines to give you more control over your images. You can combine it with the /style feature. The name is a bit confusing, but the idea is that you need to be able to tell AI how beautiful something to make and how much you're willing to risk to make that beauty unconventional, messy, and weird. This allows people to control the balance between risk and randomness, as well as the level of attention paid to the traditional aesthetics of the image. We also introduced something we call Turbo Mode. Turbo mode means that we use the GPU as much as possible to make image generation very fast. I think he's 4 or 5 times happier. I guess you're actually using 64 A100 graphics cards at the same time, which is equivalent to more than the average $100 computer. It's a bit crazy, but we're also working on even crazier techniques. While some features will take a long time to implement, we think over time Midget will evolve to create not only 2D images, but also 3D images, moving images, and direct interaction with pixels. Everything is constantly flowing and changing, depending on the graphic style. Maybe in the future, people could have a huge AI processor, and all these different worlds and dreams interact with our minds.

Q: Thanks to the advent of generative and diffusion models, there seems to be a significant leap in the capabilities of AI. How would you assess progress in these areas so far? What about other areas of artificial intelligence?

A: The discovery of Diffusion models, transformer models, and GLIP models really got me into the image space. That was about 2 years ago, and we were just discussing it in San Francisco before any service came out. I remember all the researchers saying that I felt that at the time, especially the Diffusion model, made me feel very different when it came out, especially compared to the most advanced GAN models of the past, which was what everyone used to generate images. I just remember everyone immediately nodding in an unusual way and saying that the Diffusion model is really different. It felt really real, it felt like something I had to be involved in and try to bring a more user-friendly user interface. But in terms of the future, it's hard to know exactly what the technology will look like. Sometimes we now talk about how the language model will evolve towards the Diffusion model, that is, maybe we will use the Diffusion model to make text. Or the image model will become more like a language model. Or NIO may become a hybrid model. It's really hard to say. I think we're just getting started in this area, but I'm 100 percent sure there's a lot of progress to go. 10x or even 100x progress is quite possible. Progress at this level is not just in raw performance, but in user interfaces and products that allow us to use these raw technologies, individually or together, to make really cool things that can get better and solve problems.

Q: How can we use AI in a more human way? What does Midjoyrney say about this? A: Douglas Engelbart was actually the first to create a text editor. At that time, punched cards were used, and holes were punched on the cards to program the computer. But then Douglas thought about it and said, what if we programmed with a computer? It sounded crazy at the time. The idea is that by programming on a computer, it is possible to speed up the loop, allowing us to operate more efficiently, making the computer better and amplifying everything. That idea worked, and while we have these different cultures, like artificial intelligence, and HCI (human-machine interface), intelligent application culture, I think most of the progress in technology so far has come from trying to make people more effective and empower people. In fact, we have not really seen the so-called AGI era really arrive. For example, some independent AI operates independently without user interaction. To solve some problems, I think if we think too much in this area, we may miss a lot of opportunities in the technology field. I think a lot, not just about what AI can do, but how to create flow and bonds between different things, because a tool shouldn't feel like a person. It should feel like an extension of yourself, your body, your mind. I thought a lot about how to build these techniques, and the intertwined feeling should not be that you're working with an artist, but that you're almost just imagining something and then it appears on the screen. Many people describe the feeling Midjounery gives them this way, feeling that it is almost part of their thoughts. I think that's how a lot of AI should be, and it should feel like an extension of us. So I would like to thank Mr. Chen and all the audience again, it is a pleasure to participate in this event, and I hope to participate in the event next time. I look forward to more cooperation with China, I remember all my good experiences in China, and I hope you enjoy interacting with China as well. Thank you.