Interview with Michael Khitrov, Speech Technology Center

With over 100 specialists in its R&D team, speech database access a possible growth area Michael Khitrov, president, Speech Technology Center, was interviewed by Bill Meisel in late May.

Michael Khitrov, PhD (Engineering), graduated from the Leningrad Institute of Aviation Instrument Engineering and in 1980 got a second degree from the Leningrad Electrotechnical Institute. In 1978, Michael became the head of the speech recognition department in the Dalnayai svayz Institute. In 1987, he was recognised to be the best branch engineer in the country. In 1990, Dr. Khitrov started Speech technology Center and saw the business grow from 3 people to over 200.

Meisel: Speech Technology Center (STC) has been in business since 1990, operating from your headquarters in Russia, so obviously the company has developed products the market accepts, including Russian speech recognition technology (SSN, March 2008, p. 20). Please outline your product line.

Khitrov: STC is a St.Petersburg, Russia based developer and producer of the comprehensive speech-related product line that includes solutions at all points of clients’ interaction with speech, including quality digital recording; forensic labs; noise cancellation software and hardware; and biometric voice verification and identification. We are a leading provider of speech solutions in Russia with Russian speech recognition and TTS solutions.

STC’s business model is built around the concept of a complete business cycle. New technologies are developed by the company’s 100 specialists’ strong R&D team based on the company’s assessment of the market demand. These technologies are then taken to the market as ready-to-use solutions for clients. This unique and holistic model sees STC spear product development and production at all levels of industrial engineering—from development to manufacturing.

STC boasts a number of signature speech technologies designed specifically for the Russian language. We are a market leader in the dynamic Russian speech technology marketplace including IVR solutions (Voice Navigator), speech-recognition-based products (Russograph), TTS solutions (automatic announcement system Rupor), and many others.

Globally, STC markets a number of language-independent speech solutions. There are a few products that proved to be extremely successful such as forensic audio labs IKAR LAB, voice verification technology VoiceKey, voice database technology VoiceNet (which includes a voice ID search option), and finally SoundCleaner, in our opinion technologically one of the best audio quality enhancement technology in the world.

Please elaborate on your speech technology products and how you expect them to evolve?
STC developed a comprehensive line of speech products—both language-independent for the global market and Russian-language-specific for the domestic market.
We are a knowledge-based company. We have built what I believe is one of the strongest R&D teams worldwide with over 100 specialists in a variety of disciplines: mathematics, linguistics, phonetics, and programming. Over 20% of our R&D personnel hold PhDs. This R&D depth allows us to be able to react to market demands in time.

Currently we focus on a number of key segments.

First of all, voice biometrics. Recently voice has emerged as one of the leading biometric characteristics after fingerprints and face scanning. We believe that our voice verification solution Voice Key-Service will be successful as a server-based as well as a standalone product. VoiceKey works well in a noisy environment and in the telephone channel. We also look into using Voice Key as an embedded function in personal and network computers and other electronic devices.

I think that the key trend on the biometric market is the move towards the multi-biometric systems that would use few biometric characteristics in order to provide the most accurate results. The future belongs to systems that would combine fingerprint, face scan, and voice. We are ready for this and look forward in participating in such programs.

There’s another segment of the biometric market that we are very active in. With the increased number of government and corporate entities that collect personal voice data there’s an increasingly growing market to store and manage these databases. Our solution that we call VoiceNet is a voice ID database with speaker identification option based on matching incoming voice sample to an existing database. Key customers of such solutions are law enforcement agencies and call centers. In the near future, VoiceNet can be integrated into a country/state-wide voice identification system. Due to the rapid growth of spending on law enforcement and ever increasing role of audio evidence, more and more countries want to collect their own national voice databases, therefore creating a new market segment. We plan to concentrate on emerging markets as we can understand them better than our colleagues from Western countries.

Another niche segment that we are working is Forensic audio labs. We consider STC as one of the world’s leading providers in this segment. IKAR Lab is a well-known forensic audio HW/SW complex, however we are working hard on improving the existing product towards further automatization and hence increased efficiency. Today we have a good level of automation, but nevertheless an expert has to spend a fair amount of time on each case. We look into decreasing this time. In 2009-10, we plan to launch an upgraded version of IKAR Lab that will reduce the decision-making time 2-3 times.

There is one more trend that we are excited about. Steve Ballmer and Bill Gates of Microsoft keep on talking about the concept of natural interface and speech as a key communication tool for human interaction with electronic devices. I believe speech will never replace the existing keyboards and switches, but every person will have a freedom of choice as far as the way of communication with machinery is concerned. Some users will want voice control for a TV or heavy machinery. We want to be one of the providers of such mass solutions. We recently launched the first product which gives operators a choice between manual and speech commands options (video surveillance system AVIDIUS). We are going to expand this to the rest of our hardware.

Finally I can’t neglect to mention the speech recognition and TTS technologies that we developed for the Russian language based on our proprietary engine. We are very proud of these technologies and look forward to providing our clients the best Russian language speech recognition and TTS products. We see IVR and dictation systems as the most attractive segments.

Where are most of your customers located?
Our customer base is spread over 60 countries that we’ve been working with. Of course our key clients are Russia and the former USSR where we hold a leadership position in a number of speech technology markets.

We have very strong relationships with our clients in Germany, Italy, Arab countries, the United States, and Canada. In the vast majority of foreign markets we work through a network of carefully selected dealers that can provide our clients support on the ground. Recently, we decided to pay special attention to the Latin American market and now have a sales force dedicated specifically to this promising region. We believe that, coming from Russia, we can understand these emerging markets better and provide them with better value for their money than our Western competitors.

Is Russia a significant market for speech technology today?
Russia is an extremely promising speech market. Currently the level of penetration of speech technologies is far behind US and even Western and Eastern Europe. However, as we already saw in the other high-tech segments before, Russia very quickly catches up with the rest of the world as far as technology usage is concerned. For instance, Russia is already Europe’s third biggest computer market. We expect the speech segment to grow rapidly. According to our projections, the Russian speech technology market will reach $250-300 million in 2-3 years time. Key growth segments are Russian language IVR solutions, “in-car” mobile “offices” with Russian speech recognition and TTS embedded options, and voice biometrics. Finally we foresee that the Russian dictation market will emerge within 1-2 years. We are very bullish on the Russian market and, as a market leader, expect a significant growth of our business here.

Do you operate with partners? If so, can you describe typical relationships?
As I’ve mentioned before we have sales in more than 60 countries worldwide with dealer’s networks in more than 30 countries globally.

Partnership with our local partners is our principal strategy. We believe it is extremely important for the end user of our products to be able to speak with someone in their local language who understands the customs of local business. This is particularly important in the emerging markets such as Latin America and the Persian Gulf. I’m confident that currently we cannot be as effective in marketing our products as properly motivated local dealers.

Normally we establish first contacts with future dealers at various international exhibitions, conferences, and trade shows all over the world. Lately we also started to use the web to look for possible partners. After signing a contract, all partners receive our product training courses, products for demonstration, and marketing support.

One of the most interesting directions of foreign partnership for us is the OEM model. It’s a rather difficult but very profitable approach for companies such as STC.

Any final comments?
In conclusion, I’d like to note that the key factor for STC`s success over the years was the depth of our R&D department. The knowledge on speech and all subjects related to it accumulated over the years is the most important asset of the Speech Technology Center. We take the speech technologies from the algorithms developed by our scientists all the way to the final products that we sell to our clients. We are a knowledgebased company and believe that this is our key competitive advantage. All of the products that we offer to the market were developed by STC from start to finish. This technological independence allows us to be flexible and provide clients with maximum customization. It also provides us with an optimistic outlook to the future as we can quickly respond to any market challenges.

Bill Meisel
Speech Strategy News
ISSN 1932-8214, June 2008