Apple's latest entry in its online Machine Learning Journal emphases on the personalization process that users partake in when starting "Hey Siri" features on iOS devices. Across all Apple products, "Hey Siri" appeals the company's AI assistant, and can be trailed up by questions like "How is the weather?" or "Message Dad I'm on my way."
"Hey Siri" was introduced in iOS 8 on the iPhone 6, and at that time it could only be used while the iPhone was charging. Later, the trigger phrase could be used at all times thanks to a low-power and always-on processor that powered the iPhone and iPad's capability to incessantly listen for "Hey Siri."
In the new Machine Learning Journal entry, Apple's Siri team fragments its technical methodology to the advancement of a "speaker recognition system." The team shaped deep neural networks and "set the stage for enhancements" in future repetitions of Siri, all encouraged by the goal of generating "on-device personalization" for users.
Apple's team says that "Hey Siri" as a phrase was selected because of its "natural" phrasing, and described three scenarios where unintentional activations prove disturbance for "Hey Siri" functionality. These include "when the primary users says a similar phrase," "when other users say "Hey Siri"," and "when other users say a related phrase." According to the team, the last situation is "the most irritating false stimulation of all."
To reduce these unintentional activations of Siri, Apple adopts techniques from the field of speaker recognition. Notably, the Siri team says that it is concentrated on "who is speaking" and less on "what was spoken.
The overall goal of speaker recognition (SR) is to establish the personality of a person using his or her voice. We are interested in "who is speaking," as opposed to the problem of speech recognition, which aims to establish "what was spoken." SR performed using a phrase known a priori, such as "Hey Siri," is often referred to as text-dependent SR; otherwise, the problem is known as text-independent SR.
The journal entry then goes into how users register in a custom-made "Hey Siri" process using explicit and implicit registration. Explicit begins the minute that users speak the generate phrase a few times, but implicit is "created over a period of time" and made during "real-world situations."
The Siri team says that the left over challenges faced by speaker acknowledgment is figuring out how to get class performance in reverberant (large room) and noisy (car) atmosphere. You can check out the full Machine Learning Journal entry on "Hey Siri" right here.
Since it began last summer, Apple has shared abundant records in its Machine Learning Journal about difficult areas, which have previously included "Hey Siri", face discovery, and more. All past entries can be seen on Apple.com