
There is no secret in the World Cup: Someone squats in the corner and quietly reads the player's lips.

Halfway through the World Cup, the brutal knockout is about to enter the quarterfinals. Compared to the passion of the players in the field and the enthusiasm of the fans, the off-site reports on each team are also dizzying. One of the most curious, it is probably some conversation between the coach and the players.

For example, the penalty shootouts in Spain and Russia. When the Spanish coach Jero decided to let Cork kick the penalty, Costa expressed disagreement and repeatedly exchanged with Jero, Cork and captain Ramos. However, this is not very useful. Cork still kicked the third penalty and successfully lost the penalty, which led to the end of the World Cup in Spain.

A similar situation occurred when the group match Argentina 1:1 temporarily settled in Nigeria, Sang Baoli went to ask Messi whether to go to Aguero, only to see Messi nodded, then Aguero was replaced.

The problem is, as if we watched the ball in addition to the shouts of the fans on the scene, the most heard is the sound of playing football. The communication between the coaches and the players is only seen by others. How did those TV stations know what they said? Is it an interview in the background?

of course not. In fact, it is very simple to understand what they are saying, that is, the mouth shape. The formal saying is: read lip language.

Does it sound so high?

From "manual era" to AI.

The use of lip language is to help people with hearing impairments to obtain language information and let them live a normal life. In this sense, it works equally well in sign language. But unlike sign language, reading lip language is a very difficult task, and even if you try your best to practice it, you don't necessarily learn something.

Learning lip language first has good eyesight. Legend has it that the British professional lip language cracked Queen Jessica said that she can crack the contents of others by reading the lips 40 meters away.

It is precisely because there are few people in control, and there are not many people in reality. Reading lip language has become a mysterious and tall thing. In this era, we can turn it into a "manual era" of lip language.

In the past two years, the long-term lip language that relies on personal efforts to maintain the status quo suddenly became a fire, and it seems that it has become a simple matter overnight. The root cause of this phenomenon is the development of AI visual recognition technology. Perhaps we can call it the AI era of lip language recognition. For example, in Machine Ji, the robot Eve reads the mouth movements of Nathan and Jiale to identify the content. So, what is the performance of AI recognition lip language in reality?

From a technical point of view, reading lip language is suitable for AI. Through the capture of the continuous movement of the speaker's mouth, and then matching it with the pronunciation of the word, and finally through the correction, the most coordinated sentences of action, pronunciation and semantics are obtained, thus achieving the cracking of the lip language.

For example, in 2016, Deepmind and Oxford University launched the AI lip language recognition system. By training the system for more than 5,000 hours and 11,800 news videos, it achieved a 46.8% recognition accuracy rate in the final video test, compared with 12.4% for humans. This gap can be said to be quite obvious.

What is certain is that, with the continuous improvement of visual recognition technology, lip recognition will soon be transformed from a legendary "secret" into a daily tool that everyone can equip. But is it really easy to turn a lip language into a universal language aid?

Reading lip language is not easy for people, AI is also like

We know that the deep learning of AI is based on a large number of related materials, such as Deepmind training an AI of lip language recognition, the total duration is 5,000 hours, and the recognition success rate is less than 50%. However, for the current identification, this is already a very good data. After all, the top human lip recognition experts have a success rate of more than 10%. So, since lip recognition is so difficult, what problems should AI need to pay attention to when it wants to take it?

First, we must solve the problem of inconsistent mouth shape. The problem of inconsistent mouth is reflected in two aspects.

On the one hand, because of the difference in pitch, not all people will produce the same action when they send the same syllable. This action may be difficult to detect from a human perspective, but from the time of AI that is good at identifying micro-expressions, a slight difference in motion may cause misjudgment. For example, a fool is unclear, and people sound hard, not to mention lip recognition.

On the other hand, the same language may have different pronunciation patterns due to different regions. Then, in identifying the dialect, the troubles encountered by AI are not small.

Second, the tone of AI lip language recognition. Since it is to identify the content of the speech, it is inevitable to involve the tone of the speaker. The same word, the same sentence, may also have different semantics due to the different tone of the speaker. Simply recognizing the speaker's language content will limit the function of lip language recognition. How to unify the speaker's expression, movement, scene and other factors related to expression into lip language recognition is also a problem to be considered.

Third, in many cases we don't need a humanoid robot. We don't have to make lip recognition a know-how. When training, we can carry out special material collection according to the specific scene used. For example, if the subway station automatically sells tickets, it only needs to focus on the training of the site name. This avoids the cumbersome work caused by invalid work in a specific scenario.

More importantly, language is a very large knowledge base. If you want to accurately recognize the lip language in various situations, massive voice and action training will be a time-consuming task.

Although the lip language is small, the potential is large.

There is no doubt that when the lip recognition technology is mature, its application prospects are very optimistic, such as the following aspects.

1. Security field. The home camera function and scene are relatively simple, and generally there will be a microphone for sound collection. However, in a larger number of outdoor monitoring systems, basically only the picture can be seen without sound. This is a big flaw in electronic surveillance. Then, after the introduction of lip language recognition, it is equivalent to technically making a sound. The content of the offender's communication in front of the camera will be captured, and this is likely to become an important clue to solve the case. The British police used Jessica's lip language ability to crack an airport robbery.

2. The field of medical health. The most primitive role of lip language is to help people with hearing impairments to communicate as normal people, but the cost of learning is undoubtedly huge and may take years or even decades. Even if vision and perseverance are not good, they can only communicate with very troublesome sign language. Then, with the help of lip recognition technology, hearing-impaired patients will not have to spend such learning costs and can communicate directly with normal people.

3. Real-time voice conversion is more accurate. Nowadays, whether it is real-time subtitles in smart courts or simultaneous interpretation in international conferences, it basically relies on pure speech recognition. Through the addition of lip language recognition technology, it will change from a simple "listening" to a "seeing" and "listening", which is more in line with the characteristics of human acceptance of language information. Therefore, the application of AI in real-time subtitles, simultaneous interpretation and other scenarios will be more mature.

In addition, using lip language to recognize speech will have a positive impact on the resolution of speech separation. Before the team separated the voice through the comparison of video and voice, lip language can become a point worthy of attention.

4. Promote the construction of multimodal Internet of Things. For example, the voice ticket introduced by the subway is to determine the arrival station and the ticket holder by double recognition of the voice and face of the passenger. Based on this, combining lip language recognition technology with speech recognition will open up a larger imagination for the construction of the Internet of Things. In the home Internet of Things scene, the entrance represented by the smart speaker can not only rely on the voice recognition command, but also add lip language reading to improve the accuracy of recognition. After all, watching people talking and closing their eyes to listen to others is very different for the recipient of the message.

But lip recognition is also a very vigilant technique. After all, nature limits the distance of vocal transmission, just to protect the privacy of individuals. Once the lip language recognition is rampant, everyone may have no more secrets. The first thing to talk about to pout and go home is to pull the curtains and drive the opaque car film... This kind of world may not be willing to wait.

In any case, lip recognition is a worthwhile technology. Although it is still immature, it is necessary to consider many real problems that are likely to have adverse effects in the future. However, when it shows positive meaning to human society, we should not refuse to give it a hug.

Follow Me



Robot shopping guide wins over real person shopping guide by nearly 90% of users welcome

To the person that loves shopping, a good guide is bought be like close close small boudoir honey, let a person be like mu chun feng, enjo...