At the end of April, SberDevices opened beta testing of the Visper platform to create a virtual presenter which can read a text like a live speaker. It is the first year when Sberbank deals with digital avatars: Nika, a presence robot, was launched in 2018, and Elena, a virtual news anchor, was launched in 2019, the patent for the technology of creating human facial expressions based on text was obtained by the company in 2020. Last year Mail.ru Group introduced its platform with digital presenters.
Against the background of developing smart assistants, there is a trend of developing voice assistants beyond the voice only approach, that means developing voice assistants not limited only to voice interface.
ICT.Moscow spoke with key players of this market in Russia and foreign industry representatives to understand what is happening with the digital assistant industry now and what are the main trends in the near future. A complex picture of the industry has been formed on the basis of the opinions of 17 experts has formed a complex picture of the industry.
The main trends are: the development of multimodality of smart assistants; experimenting with device formats and user interaction mechanics; the growing expectation of secure and convenient voice commerce; hopes and concerns related to voice identification of users; the increased use of smart assistants in business. Here are some aspects of smart assistants that ICT.Moscow discussed with the representatives of the industry:
According to Strategy Analytics, in 2020, the global market for smart speakers surpassed the mark of 150 million units sold. At the same time, the share of smart screens reached 26%. According to Just AI, by the end of 2023 there will be 640 million smart speakers in the world. Juniper Research experts expect there will be 8.4 billion voice assistant devices in use by 2024.
According to Just AI’s estimates, in 2020 the number of users of voice assistants in Russia amounted to 52 million users. The most popular assistants in the country are Alice (45 million users), Google Assistant (11 million) and Siri (6 million). Part of the audience use several solutions at once. The Just AI survey among smartphone users showed that more and more people use smart assistants: in 2019 71% of respondents have interacted with such services, and in 2020 this figure reached 77%. Every day in Russia in 2020 32% of respondents used voice assistants against 29% in 2019.
Kirill Petrov, managing director of Just AI, explained that 2020 had become a turning point for smart assistants, and in 2021 their popularity will continue to grow.
The demand for smart speakers in Russia is also increasing. Sales of speakers with a voice assistant increased sevenfold in a year’s time. According to M.Video-Eldorado’s estimates, in January-July 2020, the vast majority of sales accrued to devices with Alice. In March Yandex announced that it had sold over 1.3 million speakers with its voice assistant in the three years since launch. Nevertheless, smart speakers have not yet become the main channel of interaction between a person and a smart assistant. The Mail.ru Group claims that smartphones are the leading category of devices with voice assistants.
Pavel Gvay, CEO and co-founder of Fabble.io, a tool for designing dialogues, also mentions the limitations of the voice only format.
Natural language processing (NLP) is the fourth largest area of work in Russia in the field of artificial intelligence (AI): according to the creators of the “Artificial Intelligence map of Russia” (as of April 29, 2021), 52 companies out of approximately 480 work in this area. The top 15 Russian companies who are developing NLP are Yandex, Speech Technology Center, ABBYY, Mail.ru Group, Just AI, Tinkoff, Sberbank, etc. (the list was compiled by the authors of “AI Almanac No. 2. AI Report — NLP” based on an expert survey).
As Anatoliy Kulbatskiy, Marusya Product Director at Mail.ru Group, notes, “ecosystems are the key players in the market of general-purpose assistants”. These players are, first of all, Yandex with its Alice, Mail.ru Group with Marusya and Sber with the Salut virtual assistants. Together with the development of voice assistants, these companies have created their own devices — Station (Yandex), Capsule (Mail.ru Group) and Portal (Sberbank). The latter is currently the only Russian smart screen similar to Google’s Nest Hub or Amazon’s Echo Show.
MTS is also working on its “speaker-assistant” pair. Last summer, the device was handed out to users for testing, but in early 2021 the media reported that the project had “come to a halt”. Tinkoff also has its own voice assistant “Oleg”; its core functionality is financial management, but it is also capable, for example, of answering incoming phone calls (when using Tinkoff Mobile).
Stepan Mitaki, head of the My Moscow mobile application, agrees that one of the current trends is the emergence of narrowly specialized voice assistants, each of which is aimed at solving specific user problems. An example of such an assistant is “Oleg”, which was described as “a voice assistant for financial and lifestyle services”. Experts recently discussed in Clubhouse that over time, companies will be less likely to create their own independent smart assistants and focus more on specialized skills within open platforms. For example, Pavel Kaplya, the head of the Yandex.Dialogues service, noted that “businesses should not set the task of making their own assistant — they need to think about how to effectively and concisely enter other general-purpose assistants”.
Another trend of the industry (which, however, the participants of the discussion called controversial) is the opening of platforms for third-party developers to create new skills of smart assistants, in other words — focusing on a model somewhat similar to the open source principle. Based on this model, Alice’s skills, Sber’s smartapps (applications which allows to promote goods and services on smart devices with a built-in Salut voice assistants) and Marusya’s skills are created. Experts see this model as similar to writing apps inside the App Store and Google Play, and predict that over time this area will gather pace and the mechanisms for creating skills will become simpler. But at the same time, they do not unequivocally claim that the industry will develop exactly according to this scenario.
The experts with whom ICT.Moscow discussed the trends in the development of digital assistants do not expect drastic changes in 2021, but they expect the emergence new mechanics of user interaction with smart assistants and foresee experiments with digital avatars and various devices.
Mikhail Burtsev, head of the Neural Networks and Deep Learning Laboratory at MIPT, notes that assistants will become cross-platform, and reminds that Alice is already available in the speaker, TV and car. The Speech Technology Center CEO Dmitry Dyrmovsky also speaks about the experiments. He notes that “banks and financial institutions traditionally give preference to modern AI solutions to improve user experience, they have already realized their effectiveness and will continue to conduct experiments”.
Co-founder and COO of Neuro.net Alexander Kuznetsov believes that “the potential of voice assistants has not yet been exhausted and there is definitely space for growth”. “It is possible that new formats will emerge, and the prerequisites for this are already appearing in the market”, he adds.
Pavel Gvay, co-founder of Fabble.io, says that the potential of the voice only format will always be limited to tasks that do not require visual contact. “The voice first format has almost limitless potential in this regard, inheriting the strengths of both the graphical and the voice interface,” the expert claims. Holger G. Weiss, head of German Autolabs, also highlights the limitations of voice only assistants, especially when it comes to interacting with lists. “That is why we are convinced,” he says, “that a combination [of formats] will win — at least for more complex use cases. Smart speakers will still be great for playing music and turning the lights on”.
Kirill Petrov, Managing Director of Just AI, recalls that at the end of last year, sales of smart displays started in Russia. According to the expert, “smart screens give more expressiveness and open up new opportunities, for example, video shopping”. At the same time, Roman Doronin from EORA does not expect a large demand for such devices and believes that smart speakers with a screen in 2021 will remain “devices for experts”. Denis Filippov, CTO of SberDevices, believes that the range of devices with virtual assistants will be actively increasing in the near future: any home appliance from a refrigerator to a TV is a place where an assistant can be installed.
Igor Kalinin, founder of TWIN (creates an automated communications platform), is convinced that in terms of technology, a turning point in the field of voice systems has already come, the next step is scaling, including in the Russian market.
SberDevices, the same division that released the first smart screen in Russia, is involved in the creation of digital avatars at Sberbank, together with other structures. They note that an avatar is needed to help business deliver content to an audience without searching and attracting live speakers, that means the process will be faster and cheaper. They described their virtual presenter in the same way in Mail.ru Group. At the time of the presentation of the service, the company predicted that by 2022, 79% of Internet traffic in Russia will be online video.
During a conversation in Clubhouse with profile experts, Fabble CEO Pavel Gvay spoke about the possibilities of multimodality and noted that, probably, “in the future we will be able not only to hear the assistant, but also to see his avatar with facial expressions”.
Another division of Sber, AR/VR Lab, is also engaged in developing digital avatars: in February, a free alpha access to the service was opened, the service creates facial animation of a 3D character from a sound file with a recording of a person’s speech. Holger Weiss, founder and CEO of the German Autolabs company, which develops voice assistants for the logistics sector, also points to the prospects for the interpenetration of augmented and virtual reality technologies with smart assistants.
There are already examples of digital avatars being used instead of TV presenters. For example, in the fall of last year, this technology began to be used on MBN, a Korean TV channel. Journalists believe that a virtual presenter can be especially useful in covering emergencies in case there is no right specialist. But the replacement of presenters or announcers with smart assistants is not always perceived positively: recently, the Moscow Department of Transport in the competition for the Metro announcer received applications submitted on behalf of Alice and Salut assistants, but still chose living people.
Alexander Kuznetsov from Neuro.net notes the increasing availability of technologies — including for medium and small businesses — and also speaks of the trend of introducing smart assistants into user interfaces. Denis Filippov from SberDevices emphasizes that digital avatar technologies can significantly diversify the video content market, reducing production costs. But the question of successful business models of such solutions remains open, the search for new options for their application continues.
Voice tech developers claim that smart assistants, performing part of the functions of people, will not replace human workers.
MegaFon representatives believe that new professions emerge with the development of technology. For example, the development team of Elena, a virtual assistant, includes configurators and dialogue designers, but five years ago there were no such jobs on the Russian market.
Even if voice assistants do not replace humans, they will have a great influence on human labor. Analysts at Gartner at the end of last year included the increase in labor productivity due to the use of speech technology in the top 10 strategic forecasts. They estimate that by 2025, 75% of all conversations at work will be recorded and analyzed, including through smart speakers. Gartner also sees hyper-automation as one of the global technology trends, which includes the use of AI and virtual assistants.
The experts agree that digital assistants are most actively introduced in the banking sector. At the same time, cases of using voice assistants, chatbots and smart avatars can be found not only in banking, but also in medicine, customer support, transport, city services, education, culture and media.
Alexander Kuznetsov, co-founder and COO of Neuro.net, calls the banking and financial industries and telecom the most active in the implementation of voice assistants. He expects that these industries will be joined by major players in retail, e-commerce and services.
The co-founder of Fabble.io Pavel Gvay says that the banking sector, medicine and vehicles the most promising areas. According to him, in medicine and the banking sector, it is necessary to collect a lot of information and answer the same type of questions: how to make an appointment with a doctor, what test needs to be done before an appointment. But in terms of highly qualified services, for example, doctors and consultants, digital assistants are unlikely to be replaced in the near future, the expert adds.
Just AI experts also mention that a smart assistant is originally the solution created by IT companies, and they say that Internet companies (Yandex, Mail.ru) and large banks and financial institutions will be the driver in the development of voice assistants in 2021.
Igor Kalinin from TWIN company says that over time, bots will emerge in all B2C industries. The only problem is that the Russian consumer is not yet used to communicating with bots.
Managing director of Just AI Kirill Petrov says that the voice search for goods in an electronic catalog is among the new scenarios that are gaining popularity. “This trend partly explains the fact that in the US more than 45% of users would like to be able to interact with mobile applications by voice”, he explains. “In addition, we will see more smart devices in commercial organizations, for example, in hotel rooms”.
Stepan Mitaki, head of the “My Moscow” mobile application, also mentions the fact that smart devices go beyond apartments. According to him, “in the West, you can now find voice assistants in hypermarkets or in various service institutions. And people are not afraid to talk to them”.
CEO of EORA Roman Doronin also pays attention to the efficiency shown by projects where different technologies are combined, for example, natural language processing and computer vision.
Another example of combining technologies would be digital avatars, which use speech technologies with the realistic video images generation. They primarily target industries that use audiovisual content, such as media.
Making purchases using a voice assistant is one of the basic functionalities that were announced during the presentation of both Alice and Salut. So far, however, commerce is not on the list of the main user scenarios for interacting with virtual assistants. Just AI polls show that in Russia, voice assistants are most frequently used to search the Internet, to navigate, to find out the weather forecast, to call, set an alarm or turn on the music. Dmitry Dyrmovsky, CEO of Speech Technology Center, states that so far most of the skills of voice assistants have a clear entertainment priority, and business orientation is only gaining momentum.
In 2018, experts from OC&C Strategy Consultants company optimistically predicted that by 2022 the volume of the voice commerce market in the United States will reach $40 billion and this sales channel will change retail. According to them, 36% of owners of smart speakers have already used these devices for shopping (for other analysts this figure was lower — 22% according to Edison Research and 23% according to Voicebot). Experts from Juniper Research in November last year predicted that in the next five years, the number of purchases using voice in smart home devices will grow by 630%, and about 20% of all purchases will be made using smart screens and smart TVs. By 2025, the value of transactions using voice on smart home devices will reach $164 billion.
Roman Doronin from EORA agrees that 2021 will be a breakthrough year for the commercialization of voice assistants. According to him, “the trend for this is set by Sber with the Salut assistants ecosystem and the ability to integrate payments into different types of applications”.
At the same time, Anatoliy Kulbatskiy from Mail.ru Group draws attention to the existing restrictions on the commercialization of both digital content and non-digital goods in Russia. Kulbatskiy points to a relatively small market of devices for digital goods (about 1.5 million devices in the Russian Federation) compared to the market of smartphones, PCs and TVs. Since “the dominant category of use of voice assistants are smartphones, the sale of digital goods via assistants falls under the regulation of sales on Apple and Google platforms”, he emphasizes. On the other hand, payment with voice confirmation is at an early stage, and users do not have a “buy using voice” pattern. But the expert expects a number of new and interesting solutions for purchasing goods, paying for services and payments to appear on the market this year.
The expectations of other experts are more restrained. For example, Arkady Sandler, an expert in the field of conversational interfaces and voice technologies (he was the CEO of “Nanosemantics”, a chatbot development company, and supervised the creation of “Marvin” smart speaker and voice assistant at MTS), believes that we will not see a boom in voice commerce this year, although he expects that experiments will be conducted in this area.
But assistants for special purposes are created and will be created in order to provide some kind of business model, optimize the business process, etc. Actually, such assistants started to be created long before general-purpose assistants. The very existence of special purpose assistants is the proof of economic feasibility.
Alexander Kuznetsov, co-founder of Neuro.net, is convinced, that data confidentiality is the main limitation for the commercialization of smart assistants. He says that the participants in this fast-growing market need to pay a lot of attention to this issue.
Denis Filippov, CTO of SberDevices, points out that at present smart assistants practically do not bring profit.
Nikita Murenky, VUI Team Lead of the TORTU conversational product design and development team, discusses the difficulties of another type of commercialization — payment for individual skills of assistants, rather than making purchases using it. In his opinion, in Russia the problems with commercialization are the same as in the rest of the world: “firstly, it is difficult to find the right skills in assistants, although Amazon and Google platforms are doing a lot to change this; secondly, the use cases are either of little value, or the user is simply not ready to pay for them yet”. Today, the culture of using smart devices in Russia and the world is only being formed, the expert emphasizes.
Another factor holding back the growth of the segment of smart speakers and other devices with smart assistants is the availability of electronic components to manufacturers. A representative of MTS draws attention to this. “There is an acute shortage of AI chips all over the world, and there are very few companies that already have ready-made chips and products based on them”, he says. “We estimate that the AI chip market will grow by an average of 25% annually”. Also, the expert added that to solve this problem the company had invested $10 million in a startup — the manufacturer of AI chips Kneron.
Over the past few years, smart assistants have begun to be used to simplify the receipt of various social and other services. For example, there is a beta version of the digital assistant on the federal portal of public services, smart chat bots are used in various services of Moscow.
Dmitry Dyrmovsky says that the Speech Technology Center receives more and more requests for intelligent dialogue systems, which become a convenient communicator, mediator between the city and its residents. As an example, he mentioned the “Alexandra” chatbot, created jointly with the Moscow Metro team, which answers 88% of passengers’ questions without transferring to an operator. And the head of the laboratory of neural systems and deep learning at MIPT, Mikhail Burtsev, says that in Tatarstan, on the basis of the open library DeepPavlov, they have developed and implemented “Lilia” — an intelligent assistant for public services. She can answer questions about COVID-19, register for vaccinations and take meter readings.
Arkady Sandler says that one of the most frequent options for the implementation of cognitive automation technologies is the creation of chatbots according to their subject areas, which, in fact, is the development of specialized virtual assistants. The main direction of work of states in voice tech is the implementation of AI into hotlines, summarizes Nikita Murenky from TORTU, adding that at the level of regional MFCs this is already happening in Russia right now.
Boris Mayatsky, a representative of thee “Citywide contact center” product of the IT Department of Moscow Government, considers it more promising for urban tasks to develop individual solutions taking into account information security measures, although some services will be implemented using the skills of voice assistants, for example, Alice or Salut. Stepan Mitaki, head of the “My Moscow” mobile app, speaks in favor of the combined approach. There are situations in which a particular solution can better cope with the user’s task and people have a greater level of trust in it. In some situations, it is possible to help a person through integration. The latter is most relevant for obtaining reference information.
Experts from MegaFon see a high interest in voice assistants from the state and say that it has especially increased during the pandemic. The press service of the telecom operator adds, that in government agencies, voice assistants are most often used to optimize the costs of routine processes: providing reference information, collecting data on metering devices, etc.
But there is also an opposite point of view: Oleg Kovpak, product director of ID R&D, does not yet see much interest from government agencies. “Despite the fact that such services would make it possible to automate the titanic volumes of requests from citizens, such implementations are still rare in Russia”, he explains.
The use of digital assistants is impossible without reliable protection systems. The industry is now exploring the possibilities and weaknesses of one of the options for such protection — voice biometrics (identification and authentication of users by voice). In mid-April, it became known about the government’s intention to restart the collection of biometric data of citizens, including voice samples, for the Unified Biometric System (UBS). Experts see voice biometrics as a key to new business models for smart assistants, but they are cautious in assessing the timing of widespread adoption of the technology. The central issue is still the issue of security, but the prospects and possibilities of interaction between business and the UBS are not yet clear.
Arkady Sandler emphasizes that for the use of voice biometrics in sensitive operations, sufficient legal security is required: either regulation, or a clear explanation to the user that he is acting at his own peril and risk.
Product Director of ID R&D Oleg Kovpak lists the factors necessary for accurate voice authentication: it should work on sufficiently short phrases, should not depend on the text of the phrase, and should be protected from possible attacks (for example, playing a command recorded on a dictaphone or a synthesized voice).
According to the expert, such technologies already exist. The UBS does not yet support such scenarios, although the legislative obstacles were removed at the end of last year, Oleg Kovpak says. In addition, some of these scenarios may be tied to voice processing on the device, rather than in the cloud. “I believe that the widespread use of biometrics depends not on the number of samples in the UBS or Sberbank database, but on the availability of services demanded by end customers,” the expert says. “The UBS and Sberbank have an excellent base for providing biometrics as a service to other companies, but it is not yet clear whether they will develop this potential”.
Nikita Murenky believes that it is better to combine voice biometrics with more familiar authentication methods. He explains this by the fact that “the biometric accuracy of the voice is in a fairly wide range of 90-99%”. In addition, using voice is inconvenient in crowded and noisy places, especially when it comes to confidential data, not to mention the fact that a voice sample can be stolen, and this is practiced by telephone fraudsters now.
Mail.ru Group ICT.Moscow says they will consider the option of integration with the UBS, if it is useful for users, but they also focus on the development of their own technologies and solutions. Neuro.net co-founder Alexander Kuznetsov believes that the participation of the state and large players can accelerate the implementation of the technology, but expects that it will be actively used no earlier than next year.
The citywide contact center does not plan to implement voice identification in city services and make payments by voice. “Within the framework of the city contact center, the applicants as well as the legal and regulatory framework are not yet ready for this”, explains Boris Mayatsky, a representative of the “Citywide contact center” product of the IT Department of Moscow Government. “Calling the payment service by voice within the mobile app is, of course, a simple function, but the identification and acceptance of the payment will still be carried out using the usual methods”.
Roman Doronin from EORA emphasizes that voice biometrics systems must be resistant to different types of attacks. “And this complexity lies not in the amount of data for training models, but in the logic of the security system and the mechanic of human validation. Attackers do not even use a deepfake now, but simply pre-record phrases while they are talking to you, and can send them to the model’s input”, he explains. Dmitry Dyrmovsky, CEO of the Speech Technology Center group of companies, also sees prospects in the combination of voice and facial biometrics. In his opinion, it will be not only convenient, but also safe.
Alexander Kuznetsov from Neuro.net, on the other hand, says that using the so-called “voice fingerprint” can effectively combat fraud, spoofing (voice substitution or synthesis) and collect a database of fraudsters’ voices.
Voice identification is not only a way to new services, but also a way to improve existing ones. For example, Anatoliy Kulbatskiy, product director of Marusya at Mail.ru Group, believes that there is a number of scenarios when it is important to determine whether a child or an adult is talking to an assistant in order to form the correct set of content.
Biometrics will be developed and it will help distinguish users for accessing sensitive data — payments, mail, correspondence on social networks, adds Kulbatskiy. This is a normal evolutionary development of the assistant's functionality. Dmitry Dyrmovsky, CEO of the Speech Technology Center, also speak about the ability of smart assistants to distinguish family members and differentiate access rights, forming relevant proposals. But he emphasizes that the main thing is to provide an opportunity to perform financially significant transactions to a strictly defined circle of people.
Experts from one of the Russian IT companies, during a discussion about voice tech in Clubhouse in February, argued that domestic voice systems are in many ways more developed than foreign ones due to the limitations faced by developers in other countries. Experts with whom ICT.Moscow discussed this issue partly agree with this statement, although there is no complete unanimity on this point.
Arkady Sandler notes that his colleagues in other countries do not feel constrained when they comply very predictable laws. “Where there is no clear regulation (not necessarily prohibitive, by the way), there is freedom of interpretation, and the tradition of interpretations by law enforcement agencies in the Russian Federation, to put it mildly, is opportunistic motivated and prone to bias”, the expert adds.
In April, the European commission prepared rules for regulating artificial intelligence systems. In particular, the rules classify chat-bots as “moderate risk” and instruct to clearly inform the user that they are not interacting with a person. And remote biometric identification systems are classified as “high risk”, which imposes even more restrictions and requirements on them.
Oleg Kovpak from ID R&D is convinced, that in Russia, there are rather tight restrictions, especially in terms of biometric personal data, and the latest changes signed by the president at the end of last year tighten them even more.
A representative of MTS speaks about the need to refine the existing standards. The company considers it important “to make point adjustments to the legislation on personal data so that companies have the opportunity to process pre-anonymised data, including those accumulated by the state, regulated by law”, and “at the legislative level, simplify the procedure for converting personal data into depersonalized information and allow the use of such information”.
Igor Kalinin from TWIN has the opposite point of view. He believes that in Russia bots are still minimally limited by regulators — and this gives developers more freedom. But the lack of legislation also indicates a lack of recognition. In his opinion, voice technologies do not yet seem to be a priority area for the government. Moreover, in order to build cooperation with state-owned companies, it is necessary to overcome many restrictions. But at the same time, he recalled that the Ministry of Digital Development intends to provide public services in a dialogue mode with a smart assistant, and, according to the expert, this plan can be implemented in the next few years.