Science & Technology

UT Austin professor uses AI to turn soundscapes into landscapes

By Becky Fogel

Published December 13, 2024 at 4:00 PM CST

An AI created image on right. The image was created from a video posted to YouTube with sound. The image at left is the original soundscape and what the environment was at the time the audio was used to create the AI image.

I know this is a funny thing to ask a reader right out of the gate, but close your eyes for a minute and picture a bustling city with crowded sidewalks and congested streets. What sounds come to mind? Maybe honking horns, whooshing vehicles and even some yelling?

Yuhao Kang, an assistant professor of geography and the environment at UT Austin, says we often underestimate how much sound contributes to our sense of place. When Kang and I spoke via Zoom, he played two audio clips before showing me three images. He then asked me to match the sounds I heard with the correct images.

“I’m sure that when you were hearing these sounds you could imagine the scenes,” he said.

I could. The first serene audio clip he played was associated with an image of a park, while the second, much noisier audio clip was linked to an image of a city’s downtown.

Kang and his fellow researchers wanted to see if they could train artificial intelligence to do what I had just done – use sound to conjure an accurate image of a specific place.

“We are curious whether we can integrate AI, or use AI, to understand our human experiences, especially those subjective experiences,” he said.

To figure this out, the researchers paired still images from videos with an audio clip from the same videos and used that to train an AI model to associate certain sounds with certain places. And it worked. The AI model translated the audio data into images that were similar to the original ones.

To further check the accuracy of the results, the researchers asked people to match audio clips to their corresponding AI-generated images. The participants correctly matched the sounds and images 80% of the time, which Kang said is pretty high. But, he added, there are still some inaccuracies in the AI-generated images.

“If you look at this carefully, there are still some blurred regions, or blurred pixels,” he said.

The results of Kang’s research is published in the journal Computers, Environment and Urban Systems. He and his fellow researchers said their research is significant for a few reasons. For one thing, it adds to existing research about how we develop a sense of place through both auditory and visual perceptions, which he said are often studied separately. Secondly, they came up with a new way to turn soundscapes into images.

Kang said this research also has practical applications.

“For instance, one application could be to understand our soundscape environment to reduce noise,” he said. “Noise is one key factor in urban planning.”

The AI-created image at right was generated using sound from a video posted on YouTube. The image at left is the original scene from the video.

Kang said the research could also be used to develop more multisensory experiences so that when you go to a museum, you’re not just looking at art but you may also hear audio associated with it. In fact, they are working on another algorithm that uses images to create sounds.

While the results of the study have practical applications, there is also a more existential one.

“We hope that AI can advance our understanding of ourselves,” Kang said.