Noise-canceling headphones have become very good at creating a barrier that filters out external sounds. This is possible, basically, thanks to microphones that capture the environment and cancel it with other frequencies. But this cancels out all the sounds, it does not allow us to distinguish those that interest us. At least that's how it was until now.
A team from the University of Washington has developed an artificial intelligence system that allows a user wearing headphones to look at a person who is speaking to “enroll” them. The system, called Target Speech Hearing, cancels out all other sounds in the environment and reproduces only the recorded speaker's voice in real time, even when the listener moves in noisy places and is no longer looking at the sound emitter. The team, led by Shyam Gollakota, presented his findings in ACM CHI Conference on Human Factors in Computing Systems. Code for the proof-of-concept device is available for others to develop, but is not commercially available for now.
“When thinking about AI as chatbots based on the website that answer questions – explains Gollakota in a statement. But in this project, we develop an AI to modify perceptioneitherhearing of anyone to use headphones, given your preferences. With our devices you can now hear a single speaker clearly, even if it istos in a noisy environment with many other people talking.
To use the system, a person wearing standard headphones equipped with microphones presses a button while points your head toward someone speaking for 3 to 5 seconds. Sound waves from that speaker's voice should reach the microphones on both sides of the headphones simultaneously; There is a margin of error of 16 degrees. The headphones send that signal to an onboard computer, where the machine's machine learning software learns the vocal patterns of the intended speaker.
The system captures that person's voice and continues playing it for the listener, even if they both move. The system's ability to focus on the recorded voice improves as the speaker continues speaking, providing the system with more training data. Taking into account that our voice could be used as a fingerprint, basically what the system does is link the voice to a specific pattern of sound waves and search for it permanently.
The team tested their system on 21 subjects, who rated the Recorded speaker voice clarity nearly double that of unfiltered audio on average. Currently, the TSH system can enroll only one speaker at a time, and can only enroll a speaker when there is no other strong voice coming from the same direction as the target speaker's voice. If a user is not satisfied with the sound quality, he can perform another recording on the speaker to improve clarity. The team is working to expand the system to headphones and earphones in the future.