Applying GenAI to the Challenges of XR Content Creation

Written BY

Emily Friedman

February 20, 2024

Although talk of the metaverse quieted down when generative AI burst on the scene in 2022, proponents of the metaverse view GenAI as the answer to one of their biggest challenges: Developing content to “populate” the metaverse.

What is GenAI?

Generative AI is a type of artificial intelligence capable of producing or generating new (original) content based on a variety of inputs including natural language prompts. ChatGPT, for instance, can generate an essay on any topic from a simple text prompt. The technology can also be used to generate images, audio, video, and even code.

For our purposes, the “goal” is to use GenAI to democratize the development of high-quality XR content such as hyper realistic 3D objects, avatars, and environments using only carefully phrased language–no coding required.

In addition to improving the accessibility and scalability of XR development, GenAI could help enterprises get even more value from VR.

XR Content Challenges

A major bottleneck in XR adoption is content: Until now, generating 3D content for immersive experiences has been time-consuming and expensive, typically requiring specialized skills and complex modeling software.

Besides the small developer talent pool, the scale of most projects is a challenge. Think about what’s involved in building a virtual training simulation: You have to fill the scene with animated, responsive 3D objects and people from scratch. That means populating the room (everything from the furniture to the fixtures), creating characters, recording dialogue and other sounds, animating those characters and objects to make it interactive, etc.--all customized to a specific organization.

GenAI for XR Content Generation

Generative AI is already being used to simplify, accelerate, and scale XR content creation. Developers are using it to build basic elements of 3D environments such as backgrounds and placeholder assets. GenAI can be used to quickly generate 3D assets including digital twins as well as help add textures, materials, and even nuances like reflections to virtual objects, thereby improving the realism of XR experiences.

Behind these capabilities are LDMs (latent diffusion models) and NeRFs (neural radiance fields): LDMs can generate photorealistic images based on natural language, while NeRFs can generate or reconstruct 3D objects and scenes from 2D images. Both are machine learning models.

But what about non-developers? Something like text-to-VR would empower non-professionals to essentially speak or type 3D models, avatars, and entire virtual worlds into existence. The good news is that text-to-VR is fast advancing: OpenAI’s Point-E is a GenAI model that generates 3D point clouds from reference images or text prompts, enabling the creation of complex 3D shapes and structures. Midjourney is another tool that generates 3D images from text prompts, while Skybox can create 3D worlds from text prompts.

RT3D engines like Unity and Unreal are increasingly integrating generative AI tools to assist “budding” XR creators, as is NVIDIA’s Omniverse to assist with building complex industrial simulations. Apps, too: Engage VR, for instance, is hoping to become “the WordPress of spatial computing” by integrating AI into its remote collaboration platform.

Athena is Engage VR’s AI assistant or “virtual employee” powered by various Open AI tools like ChatGPT and DALL-E. She can answer your questions, play specific roles, “fetch” 3D models, and generate images including skyboxes (like a wrapper around a VR scene) to cut down on the time and cost of creating training environments.

Recently in November, Atlas emerged from stealth. The Vienna-based startup is working on a “self-service” 3D AI creator platform for enterprise users to build 3D assets and immersive worlds “on the fly.”

GenAI for Greater Realism

By helping to generate 3D models, GenAI significantly expands the range of digital assets available to an organization. Moreover, it has a role to play in enhancing the realism and visual quality of virtual content.

As mentioned, GenAI can help creators add realistic textures, lighting, and other details to virtual content. It can fill in missing areas or gaps in virtual environments, even simulate physics to make virtual objects respond realistically to the user–all of which would otherwise be painstaking to code or model.

Human-like interactions also enhance realism. GenAI can help create scenarios and write scripts for virtual learning experiences, while tools like Lovo can convert text to spoken dialogue.

GenAI to Bring Avatars to Life

ChatGPT is already being used in conversational AI systems to drive natural language interactions with avatars, create a sense of presence, and significantly speed up deployment of VR training.

Take one of Engage VR’s pharma clients who used Athena to role play as a patient with a specific disease. The physician-in-training conversed with Athena for several minutes in order to determine a diagnosis. Athena was able to generate natural responses to the user’s questions, making the simulation more interactive and lifelike.

Likewise, GenAI can assume the role of team members or customers in soft skills training sims. Scoot Airlines employs GenAI and XR to train flight attendants: Users go through a variety of unscripted scenarios with responsive virtual passengers (think crying babies and seat switchers). A virtual coach “trained on” the airline’s customer service manuals, support library, SOPs, etc. then assesses their performance and provides feedback.

You might use something like Audio2Face, too, to give avatars realistic facial animations and gestures from audio files.

Challenges

For the most part, the large datasets to train AI models for the above use cases don’t exist. Moreover, GenAI tools require a brand new skill: Prompt engineering. Generative AI is based on prompts; the better your prompts the better the output, so choose your words carefully. Text- or voice-to-VR may eliminate coding but it will require clear and precise, carefully engineered and even artistic language, which could pose a challenge to some. “3D living room” isn’t going to cut it.

Conclusion

We don’t know all the ways GenAI and spatial computing may converge in the coming years, but it does seem that generative XR is within reach, at the tip of our tongues. Beyond fast and easy creation of 3D content, the possibilities extend to planning, personalizing, and scaling immersive experiences in enterprise. The combo of XR and generative AI may even unlock new types of immersive experiences we have yet to dream up.

‍