This summer, text-to-image AIs have captured the imagination of architects. The software is a powerful tool, but one that should be integrated into ongoing discussions of architectural image making, technology, representation, bias, education, and labor. AN gathered Kory Bieg, Shelby Doyle, and Andrew Kudless to discuss these issues.
The Architect’s Newspaper: To start, could you share how you’ve been using Midjourney and related AI platforms so far? What kinds of explorations have you done? What types of images have you been making?
Andrew Kudless (AK): I’ve primarily been using Midjourney. So far, it’s been for open exploration. I’m trying to understand how to communicate with AI. On one hand, you can write a text and hope to get something that’s related to the text. But it’s not what a human would understand. The AI acts a bit differently; it’s almost like a dialect. So I’m trying to understand: What are the parameters? How might this be useful to me as an instructor or a designer?
It might be useful for exploring an early design concept without having to spend the time modeling and drawing everything before your ideas are fully formed. For the most part, I think a lot of clients might prefer to see these exploratory AI sketches, as it’s harder to understand a drawing. The AI is useful in that you can communicate a project’s mood and aspiration very quickly.
Kory Bieg (KB): I’ve been using an approach similar to Andrew’s with a number of the AI engines. I’ve been acting a little bit like an archaeologist, finding things in the images and then trying to figure out how they came to be. You ultimately uncover much more during the process.
I’m currently working on a camouflage series which uses a few terms based on patterns to see what happens with geometry. The output begins with what you’d imagine a camouflage building to look like—colors from army fatigues and even parts of buildings that resemble tanks. It’s clearly taking imagery from other objects that are not architecture, even though I’m asking for a building made of camouflage. But then as you go deeper and pursue the prompt further through iterations, variations, and upscaling, the images that began as camouflage start to replace pattern with form. The greens and tans become blob-shaped objects which then, after more iterations, become natural materials, like plants and stone.
One of my earlier series used the shape of letters to influence form. Letters have clear formal features that give them definition. I did a whole alphabet, A through Z, using basic text and tried to generate buildings with the different shape features of each letter. I finally strung a few letters together, looking for angular versus curvilinear features that combine into more complex combinations.
Shelby Doyle (SD): I came to AI engines critical of, rather than excited about, what was showing up on my Instagram feed. I wanted to see what would happen using Midjourney and if it would justify my concerns about these tools. I started by submitting prompts like “imagine / feminist architecture,” and it gave me back images that were pink and swoopy and curvy. Or I prompted “imagine / feminist architecture interior nighttime,” and it gave me images of a bed. My concern is about repeating the biases of existing imagery and architectures. If we’re building new architecture or sketching ideas from only historic imagery, then what new methods do we need to avoid these biases? If images that are tagged as “feminist” are pink, then how would future feminist AI architecture escape the trap of being intractably pink? How can we be critical of the labeling and tags in these massive image data sets if we can’t access them?
I’m hoping that we can calibrate the data sets we use in the future. In doing so, I’d be more excited about the work. How could you change those inputs to imagine a proactively feminist or antiracist architecture? And what’s the imagery needed to create the data set that could produce or imagine a more equitable future?
For this to be a more equitable future, the space where this work occurs needs to change. When I was on the Midjourney Discord channels I saw hypersexualized images of women that don’t necessarily break the “code of conduct,” but if that’s the space I need to be in to use these tools, then I choose not to be, and as an educator I can’t in good faith ask my students to be in these spaces either.
AK: I’d say the larger problem is that our data is fed back into the AI’s training model. Previously the model was trained on millions of images of real things. But now with Midjourney, the content is yours, but they have unlimited license to use the content—both the prompt and the images—to further train the model. So if the model is being trained on the visual garbage that constitutes a lot of internet culture, then the model is going to get really good at producing that, but it still won’t know the difference between perspectival and orthographic imagery. It will likely get good at producing imagery that is sexualized, racist, or violent. Architects are likely a small subset of the people training the model, so we don’t have the power to direct the model to go where we want it to go.
KB: It will be interesting to see what happens when there are more AI models to use. Now there are only a handful, but within a few months there are likely to be dozens, if not hundreds. I hope one of them will allow you to train the model with your own data set and image tags. Using your own terms to tag images will open a whole new way to collaborate and to control the output. A group of people with expertise in a specific area or with a shared agenda can agree on terms that are not only generically applied. Windows might not be the best classification for windows, for example—the potential to add specificity would be incredibly productive.
AK: There’s the image generation, but there’s also the text. Platforms like Midjourney and DALL-E focus on text-to-image generation, but they rely on the underlying translation model that can also work in reverse: image to text. You can take a text and generate an image or take an image and generate a text; these models work between the two formats. Recently, Kyle Steinfeld at UC Berkeley fed the AI an image to understand what it sees in the image, and this helps you understand the biases. Steinfeld uploaded an image of Louis Kahn’s Salk Institute, and the result came back as “concrete bench”! He also uploaded Herzog & de Meuron’s bird’s-nest stadium in Beijing, and it said, in response, “Zaha Hadid.” There are some strange relationships built in; it looked at something vaguely organic and immediately associated it with Zaha Hadid. You begin to see the limits of the AI’s understanding of the world.
KB: I try to avoid working with names or styles. I’ve found that if you want a building that looks like it was designed by Zaha Hadid, you should describe the architecture and you’ll get more interesting results.
AK: On the other hand, at times it feels a bit like sorcery. I was working on something where I wanted a flatter, more elevational view, so I thought, “How do I produce this”? How do I describe a photograph of something that is more of an elevation and not a perspective, something like an Andreas Gursky photograph? I put his name in, and all of a sudden the quality of the image skyrocketed as the AI understood what I meant. This happens all the time when speaking to other architects: We use shorthand terms and references to other architects’ or artists’ works to quickly communicate an idea. It’s amazing and a bit scary that this also works with these diffusion models.
SD: One of the challenges is that images going into the models seemed to be pulled from mostly renderings or photographs which preferences a perspectival view. You’re not getting a lot of windowsill details or plans—other than the most famous or well-documented projects—so much of everyday architecture is excluded from the model.
I wonder what that does to how we understand what constitutes architecture and architectural knowledge? It’s partly a machine learning [ML] issue: There isn’t a well-labeled global data set of every architectural floorplan for the machine to “learn from,” meaning that there are entire bodies of space-making practices that don’t lend themselves to being documented for use in ML or AI. What about images that are embedded with information about material extraction or labor abuses or supply chains? Or building practices that rely on oral traditions or teaching through construction? Architecture has a lot of strains of knowledge to offer, but if it isn’t cataloged in a specific format, then it doesn’t become part of these AI models.
AK: I think we might be too focused on the immediate explosion of these text-to-image AI uses, because it will become a small part of how AI is used. AI is already in use in architecture, but we don’t talk about it because it was previously seen as unimportant. In rendering, for example, you could spend an extra ten hours getting the light simulation to be perfect, or you could stop it after an hour and let the AI blur things out. This is at the tail end of the architectural process, and it’s the default in a lot of software.
Then there’s this other middle ground which isn’t captured in text-to-image AI but that people are working on. Like testfit.io, which is development-heavy explorations of zoning codes or office layouts. It’s not flashy, but, like what Shelby mentioned, they’re trying to build on the wealth of knowledge that the industry has produced.
SD: I just spent two days trying to figure out a window detail for a 3D-printed house project and I’m wondering if there are ways to harness AI more effectively in producing technical drawings: to call upon the collective knowledge of every waterproofing detail that has ever existed and say, “Here are six ways to solve this based on all of the knowledge that has come before you.” Maybe that’s the latent potential of computational design—the possibility of navigating competing outcomes across massive data sets: affordability, buildability, sustainability, etc. Producing representations beyond perspectival image making is an exciting possibility for AI in design.
AK: Another part of the educational aspect is that it’s hard to develop a design sensibility as a student, because it requires a lot of failing. I would love to find ways that we could use AI to help designers develop a design sensibility faster. Kory, you mentioned in your prior article for AN that you had created over 11,500 images using Midjourney. I’ve made around the same number of images. While our data is training the AI model, it’s also training our brains and hopefully in a positive way. You’re constantly asking, “Is this good?” You’re presented with four images, and your brain has to make a quick decision about why one is better than the others. Sometimes it isn’t good, and I need to go back to one of the earlier decisions. It might be helpful for students to look at something and make a decision. That’s not risky, right? When you’re asked to make ten study models, there’s a certain amount of risk involved. But if you’re constantly looking and judging, it could develop interesting pathways in your brain regarding what you value aesthetically in images. That might help you in the real world, where you can look at something and decide which is the best direction to go from here.
SD: As a teacher, it’s useful to encourage students to consider that we are all part of a lineage—or dozens of lineages. How can each of us work within, or against, these lineages of knowledge? How can we better recognize the enormous collective knowledge and labor of architecture as a way to keep challenging ideas of solo authorship and reconsider ways of making, building, and thinking about architecture and technology? AI in a way makes the very idea of working “alone” impossible, and that’s a refreshing idea.
KB: It would be useful to be able to converse with the AI so that you’d be able to edit the information while also feeding it information. One of the problems that I have as an educator is that I was trained in a specific way, so my knowledge is limited to what I’ve learned. But if I could start to connect what I know to other data sets with other histories and references, then cross-pollination can occur. Also, I might have students who gravitate towards specific interests that I don’t know enough about, so it would be amazing to direct them in a productive way toward this other information and knowledge, which could be accessed in a conversational manner with AI that is open and not opaque.
AK: There’s something there about the ambiguity of the images that Midjourney produces that is positive. Some of the concern previously was about deepfakes and making super photorealistic imagery. But now it reminds me of Piranesi’s prisons, where there are things that don’t make sense. Those are the images that I find most interesting, the ones that look a bit real but they’re actually ambiguous and vague. That’s a positive thing for students, especially early in individual designs or in one’s career, because this leaves so much more for you to think about.
SD: Maybe one of the challenges of these AI images is that the depth and complexity of the imagery appears simultaneously finished and fantastical. Perhaps there needs to be some distance between the representation and the “thing” being represented—which I think is architecture. Maybe these are not renderings of architecture?
AK: A lot of people interpret these AI-generated images as renderings. Normally, renderings come at the end of the process when things have been resolved to a certain extent. I’ve been calling them sketches.
Everything that rendering engines are good at, the AI is bad at. Renderings are good at taking the geometry of a model and precisely rendering that in 2D space. They’re also very good at accurate shadows. Unless you’re an expert, it’s incredibly hard to capture the mood or atmosphere of a space. To Shelby’s point about photorealism, it’s incredibly valuable to have the ability to see an image that captures the mood of a space early on in a project. You don’t have to spend 20 hours texturing and lighting and processing the model just to realize, “Oh wait, there’s not enough light in my design.” I’ve always resisted mood boards because they felt like a collage of disparate elements, but with these AI images we can create a much more synthetic and cohesive image that evokes a quick ambient or atmospheric sense.
SD: Like an AI Pinterest!
AK: When you start a project, you might have a precedent that’s inspiring or a set of materials. This is a tool that allows you to combine these elements synthetically without worrying about geometry or sizing or texture. It’s incredibly hard to get the weathering of materials right in renderings, for example, but the AI, while it might get geometry or shadows wrong, can evoke time or weather. That’s hard to do in a normal rendering.
KB: I love this idea of collage, because with a diffusion-based AI, you start with a cloud of pixels that come together to form an image of a supposedly 3D thing, but in reality, it’s not so clear. As an exercise, I took one of my favorite images from Midjourney and tried to “dimensionalize” it as a 3D model—it doesn’t work. Things just don’t come together cleanly in 3D. Gaps start to form, parts have to stretch to meet up with other parts, it’s impossible to find a view of the 3D model that matches the 2D image. For that reason, it’s better to think of these images as sketches. You have to rip these images apart—like a collage—and then combine them in new ways.
SD: I think collage is an apt metaphor. If you imagine these not as renderings, but as a collapse of image and movement and time, then they become something else. It would be interesting to avoid trying to slice a plan through one of these images, because it’s not really an object—it’s not representing a static moment or a thing.
AK: In the same way that it’s hard to model a sketch, right? Sketching is about imprecision. It’s about gesture, trying to capture a moment and a feeling of a project or to work out an idea quality. The value isn’t in the precision.
KB: As these AIs proliferate, each will have their own advantages. Midjourney might be used for sketching, and DALL-E for creating iterations of a project that’s already designed. You might use Stable Diffusion to change your prompt midstream. If you think, “I’m on a bad path; I want to go a different way,” you can change the text and the direction of the output.
AN: It’s good to get over the immediate excitement of this image generation and to think more deeply about how it becomes another resource that enters the tool kit of image fabrication. You might see a lot of these pinned up for a first critique, but maybe that dissipates over a semester.
AK: Shelby mentioned embedded labor, in the sense of how much time it took to produce a certain building and who is doing the work, but I also want to talk about how this relates to the labor of our own discipline. When I first started in architecture, I would spend countless hours getting rid of the background of a tree in Photoshop, then scaling it and pasting it and changing its hue and saturation to match the background. It was incredibly tedious and mind-numbing, and I was barely being paid to do any of it. Just making images takes a long time. It has gotten better, with 3D trees and 3D people, but there’s always that moment when you’re done with the rendering and someone comes over and says, “Can you put a different person in there?”
Photoshop already has some AI tools, and their improvement will reduce the labor of being an architect. With the amount of training we have, we shouldn’t be spending our time Photoshopping a tree into an image. That is something that AI is much better at doing. It would make the lives of many architects better if we could get rid of the tediousness of making decisions about these kinds of things that ultimately don’t matter, but that we obsess over. We shouldn’t be spending as much time as we do on design; we should be better at making decisions faster, and our tools should help us make those decisions.
AN: Thinking about larger trajectories of computation and the use of technology in architecture, what technology should architects be experts in? How would you place text-to-image AIs in the larger ecosystem of architectural technologies?
AK: These AI platforms are more accessible than a lot of the software that we use. I think that’s a good thing. The sooner a student or designer can produce an image that helps them produce the next image, the better. Typically, it takes years to learn Rhino or Revit, so there’s a slowness to architecture, and a struggle to learn these tools because they’re quite technical. If we can reduce the difficulty of that act, that’s a positive.
I also think that text-to-image AIs help us think about the role of language more than we normally do in architecture. If you ask students to write a thesis statement in a studio, it’s like pulling teeth; they don’t want to write about their work. But now, through describing their work, as a bonus they get hundreds of images of it. So there’s an advantage to thinking clearly about the works you use to describe your project. That’s interesting, but it isn’t exactly about technology.
SD: We just did a software workflow diagram for this project I’m working on and found it will take a dozen different software and file conversions. But after all that work, our contractor in rural Iowa really only needs a dimensioned PDF that will open on their phone when there’s cell service and a PDF they can print.
To Andrew’s point, we’ve created these enormously complex computational systems that, much like sitting at a chair staring at a computer for 80 hours a week, don’t create healthy work environments. The ecosystem of architectural technologies demands an enormous time and expertise to engage with; it seems that software begets more software, more complexity begets more complexity and more exclusion, and more exclusions mean fewer people contributing to the design discourse. If I’m being optimistic, maybe AI imagery can open up that technical space a bit more? Maybe it means returning to a verbal and pictorial tradition so we’re not creating giant BIM models that need to be chopped into PDFs?
KB: Over the last 20 years, architects started to specialize in certain areas of design. But in the last few years, we’ve started to see a rejection of that. People don’t want to specialize; they want to use every tool and expect the tools to be easier to use. Renderings are becoming more ordinary, even in their styling. I think it’s because people want to get back to the core of the discipline and don’t want to drift further apart.
Like Andrew said, these tools are accessible. They are going to become more valuable because more people will be able to use them, which then allows for more collaboration. Maybe you’ll no longer need to go to different people for different kinds of expertise. Instead, you’ll all be working in the same design space. And this isn’t just about architects, but also the people we design for. These tools might allow for more coherent conversations with people adjacent to our discipline and other communities. To me, that’s exciting.
Kory Bieg is the program director for architecture at The University of Texas at Austin and principal of OTA+.
Shelby Doyle is a registered architect, associate professor of architecture, and Stan G. Thurston Professor of Design-Build at Iowa State University College of Design, where she is codirector of the ISU Computation & Construction Lab and ISU Architectural Robotics Lab.
Andrew Kudless is an artist, designer, and educator based in Houston. He is the principal of Matsys and the Kendall Professor at the University of Houston Gerald D. Hines College of Architecture and Design, where he is also the director of the Construction Robotics and Fabrication Technology (CRAFT) Lab.