The system, which is, called SINHALA Wording TO Talk, is a one kind of totally research project. This documentation briefly details the features of my STTS and features quite and advantages of the project. Which means this system will allow user to enter in Sinhala text messages and internally it will convert directly into pronunciation form. Actually it will happen after end user choose the particular option (convert to words) to convert it directly into that pronunciation form. So totally this technique is with the capacity of accepting heroes in Sinhala words (Sinhala fonts) and makes them directly into sound waves, which may be captured by way of a technical thing (audio speakers). Individual will in a position to select the voice type, which he/she like, it indicate there are three option called child words, female words and adult (guy) voice to select. By selecting that function consumer can listen to the speech, which he/she like most. And the machine will carry out several benefits to users, those who'll use this system. The users who cannot read Sinhala, but those can understand verbally will encourage to use this system, because using the product they can beat that problem very easily. If somebody needs documents with Sinhala texts, then they might use this system to get that a person. In today world there are no such systems for Sinhala words like this.
Table of Contents
INTRODUCTION
We use talk as the primary communication advertising to talk between ourselves in our day to day life. However, when it comes to interacting with computers, apart from watching and carrying out actions, majority of communication is achieved nowadays through reading the screen. It involves searching the internet, reading messages, eBooks, research papers and so many more and this is very frustrating. Nevertheless, visually impaired community in Sri Lanka is faced with much trouble connecting with computers since the right tool is not available for convenient use. As an appropriate solution to this problem, this task proposes an efficient tool for Text-To-Speech change accommodating conversation in native terms.
What is text-to-speech?
Not every person can read content material when shown on the display screen or when paper. This can be because the individual is partly sighted, or because they are not literate. These people can be helped by making speech rather than by printing or showing it, by using a Text-to-Speech (TTS) System to produce the conversation for the given word. A Text-To-Speech (TTS) system takes written text message (can be from a web page, word editor, clipboard. . . etc. ) as the suggestions and convert it to the audible format so you can hear what is there in the text. It identifies and reads aloud what's being exhibited on the display. Having a TTS application, you can listen to computer text in place of reading it. That means you can pay attention to your messages, eBooks while you do something else which result in saving your precious time. Aside from time keeping and empowering the aesthetically impaired society, TTS can even be used to defeat the literacy hurdle of the normal masses, improve the possibilities of improved man-machine discussion through on-line newspapers reading from the internet and boosting other information systems such as learning manuals for students, IVR (Interactive Words Acceptance) systems, robotic weather forecasting systems and so on [1][2].
What is "Sinhala Wording To Speech"?
"Sinhala Text message To Talk" is the machine I picked as my last research project. As being a post graduate university student I selected a study project that will convert the Sinhala input wording into a verbal form.
Actually, the word "Text-To-speech" (TTS) refers to the conversion of input words into a spoken utterance. The insight is a Sinhala word, which may comprise of a number of words, sentences, paragraphs, numbers and abbreviations. TTS engine unit should identify it without the ambiguity and create the corresponding conversation sound wave with satisfactory quality. The outcome should be understandable for the average receiver without making much work. This means that the outcome should be made as close regarding the natural talk quality.
Speech is produced when air is obligated from the lungs through the vocal cords (glottis) and along the vocal tract. Conversation is put into a rapidly varying excitation signal and a gradually varying filter. The envelope of the energy spectra contains the vocal area information.
The verbal form of in type should be understandable for the recipient. This means that the outcome will be produced as nearer as the natural human voice. The machine will carry out few main features. Some of them are, after stepping into the text customer will with the capacity of selecting one of voice features, means women tone of voice, male tone of voice and child tone of voice. Also an individual is capable of doing variant in speed of the words.
Actually, my project will perform main few advantages to the users, those who plan to utilize this.
Below I have mentioned the essential architecture of job.
Sinhala Voice
Text in Sinhala
And
Voice and speed
Selection
Process
Figure 1. 2
1. 3 Why need "Sinhala Words To Speech"?
Since most commercial computer systems and applications are developed using English, utilization and the advantages of those systems are limited only to the people who have English literacy. Due to that fact, majority of world cannot take the features of such applications. This circumstance is also appropriate to Sri Lanka as well. Though Sri Lankans have a high vocabulary literacy, computer and British terminology literacy in sub urban areas are bit low. Therefore the amount of benefits and advantages which is often gained through computer and information systems are being held from people in rural areas. One way to overcome that would be through localization. With the "Sinhala Word To Conversation" will become a strong program to boost up software localization and also to reduce the gap between computers and people.
AIMS AND OBJECTIVES
The main purpose of the task is to build up a fully featured complete Sinhala Words to Speech system that provides a speech outcome similar to real human voice while conserving the indigenous prosodic characteristics in Sinhala language. The machine will be having a female voice which really is a huge requirement in today's localization software industry. It will act as the primary program for Sinhala Text message To Talk and developers will have the benefit of building end user applications on top of that. This will benefit visually impaired populace and folks with low IT literacy of Sri Lanka by permitting convenient access of information such as reading email messages, eBooks, website contents, documents and learning tutors. An end user windows program will be developed and it will act as a document audience as well as a screen audience.
To create a system, that can able to read words in Sinhala format and covert it directly into verbal (Sinhala) form. And also, It will capable to change the acoustics waves, It imply user would able to select tone of voice quality according to his/her view. You will discover might be three words selections. They are kind of female voice, kind of male tone and kind of kid's voice. And user can change the quickness of the voice. If somebody must hear low acceleration voices or high-speed speech, then he/she can change it according to their requirements.
SPECIFIC Research OBJECTIVES
Produce a verbal format for the type Sinhala word.
Input Sinhala content material which may be a user type or a given text record will be altered in to sound waves, which is then end result is captured by speakers. So the handicapped people will be one of the most beneficial stakeholders of Sinhala Text to Talk system. Also undergraduates and research people who need to use more personal references can send the written text to my system, just listen closely and get what they need.
The output would become more like natural speech.
The human tone of voice is a complex acoustic transmission, which is generated by an air stream expelled at either oral cavity, nose or both. Important characteristics of the conversation sound are rate, silence, accentuation and the level of energy output. The tongue properly controls air steam, lips with the aid of other articulators in the vocal system. Many variations of the speech signal are triggered by the individuals vocal system, to be able to convey the meaning and sentiment to the device who then understand the meaning. Also includes a great many other characteristics, that happen to be in receiver's reading system to identify what's being said.
Identify an efficient way of translating Sinhala words in to verbal form.
By developing this technique we would be able to identify and proposed a most suitable algorithm, which is often used to translate Sinhala format to verbal form by an easy and efficient manner.
Control the speech quickness and types of the words (e. g. man, women, child tone, etc. ).
Users would be capable of selecting the grade of the sound influx, that they want. Also they would be allowing reset the swiftness of the outcome as they need. People, those wish to learn Sinhala as their second dialect to learn elocution properly by changing the velocity (minimizing and increasing). Which means this will increase the listening capacities.
Small kids can be prompted to learn language by varying the swiftness and types.
Propose ways for that can be extended the current system in addition for future needs.
This system only gives the basic functions. The system is possible of enhancing additionally to be able to satisfy the changing requirements of the users. This is embedded directly into toys so may be used to improve children being attentive and elocution abilities. So those will Borden their speaking capacity.
RELEVANCE IN THE PROJECT
The considered creating a Sinhala Text message To Speech (STTS) engine unit have begun when I taking into consideration the opportunities designed for Sinhala speaking users to grasp the benefit for Information and Computer Technology (ICT). In Sri Lanka more than 75% of society talks in Sinhala, but it's very exceptional to find Sinhala softwares or Sinhala materials regarding ICT in market. That is directly effect to development of ICT in Sri Lanka.
In present few Sinhala content material to speech softwares are available but those have problems such as quality of sound, font schemas, pronunciation etc. Because of these problems creators are afraid to utilize those STTS for his or her applications. My concentrate on developing an engine unit that can convert Sinhala words in digitized form to Sinhala pronunciation with problem free manner. This engine unit will develop some applications.
Some applications where STTS can be used
Document audience. An already digitized report (i. e. e-mails, e-books, newspaper publishers, etc. ) or a conventional file by scanned and produced via an optical identity recognizer (OCR).
Aid to handicap person. The eyesight or tone of voice impaired community can use the personal computers aided devices, right to communicate with the earth. The vision-impaired person can be enlightened with a STTS system. The voice-impaired person can communicate with others by providing a keypad and a STTS system.
Talking catalogs & gadgets. Producing talking catalogs & playthings will boost the toys and games market and education.
Help assistant. Develop help helper talks in Sinhala like in MS Office help associate.
Automated Reports casting. The future of entirely new variety of television networks which have programs hosted by computer-generated character types can be done.
Sinhala SMS audience. SMS consist of several abbreviations. If a system that read those emails it can help to receivers.
Language education. A superior quality TTS system offered with a computer-aided device can be utilized as an instrument, in learning a fresh words. These tools can help the learner to boost very quickly since he/she gets the access to the correct pronunciation whenever needed.
Travelers guide. System that located inside the automobile or mobile device that will give information current location & other relevant information offered with GPRS.
Alert systems. Systems that can be offered with a TTS system to attract the interest of the managed elements since as humans are used to bring attention through voice.
Specially, countries like Sri Lanka, which continues to be struggling to harvest the ICT benefits, can use a Sinhala TTS engine as a solution to convey the info effectively. Users can get required information from their native terms (i. e. by converting the text to native dialect content material) would effortlessly move their thoughts to the achievable benefits and will be encouraged to utilize it much frequently.
Therefore the development of a TTS engine for Sinhala will bring personal benefits (e. g. help for handicapped, language learning) in a social point of view and definitely a financial gain in economic conditions (e. g. virtual television networks, toys produce) for the users.
RESEARCH METHODOLOGY
This has been developed using the agile software development method. We directed to develop the answer short time goals which allow having a feeling of fulfillment. Having short term goals make life easier. Task review was an extremely useful and powerful way of adding a continuing improvement mechanism. The task supervisors are consulted frequently for reviews and supply back order to make right decisions, clear misunderstandings and carry out the future innovations effectively and efficiently. Good planning and meeting follow-up was crucial to make these reviews a success.
BACKGROUND AND Books REVIEW
"Text to speech "is very popular area in computer research field. There are many research held upon this area. Most of research bottom part on "how to build up more natural conversation for given text message ". You can find freely available text message to speech program available on the globe. But most of software develops for some common dialect like English, Japanese, Chinese dialects. Even some software companies deliver "text to speech development tools "for British words as well. "Microsoft Talk SDK tool equipment" is one of the cases for freely sent out tool kit developed by Microsoft for British language.
Nowadays, some universities and research labs doing their research study on "Text message to speech". Carnegie Mellon School placed their research focus on text to talk (TTS). They offer Open Source Speech Software, Tool kits, related publication and important ways to undergraduate student and software designer as well. TCTS Laboratory also doing their research upon this area. They unveiled simple, but basic functional diagram of an TTS system [39].
Image Credit: Thierry Dutoit.
Figure: A straightforward, but general efficient diagram
Before the project initiation, a simple research was done to get familiarized with the TTS systems also to gather information about the prevailing such systems. Later a thorough literature study was done in the fields of Sinhala terms and its characteristics, Celebration and Festvox, generic TTS structures, building new man-made voices, Festival and Windows integration and exactly how to boost existing voices.
History of Speech Synthesizing
A historical evaluation is useful to know how the current systems work and how they have developed into their present form. Background of synthesized conversation from mechanical synthesis to the proper execution of today's high-quality synthesizers plus some milestones in synthesis related techniques will be reviewed under Background of Conversation Synthesizing.
Efforts have been made over two hundred years ago to create synthetic conversation. In 1779, Russian Teacher Christian Kratzenstein has described physiological differences between five long vowels (/a/, /e/, /i/, /o/, and /u/) and produced equipment to produce them. Also, acoustic resonators which were alike to human vocal tract were built and triggered with vibrating reeds.
In 1791, "Acoustic-Mechanical Speech Machine" was unveiled by Wolfgang von Kempelen which made single and combos of noises. He referred to his studies on conversation production and experiments with his talk machine in his publications. Pressure chamber for the lungs, a vibrating reed to act as vocal cords, and a leather tube for the vocal system action were the key the different parts of his machine and he was able to produce different vowel may seem by managing the condition of the leather pipe. Consonants were created by four different restricted passages manipulated by fingers and a style of vocal tract including hinged tongue and movable lips is utilized for plosive tones.
In middle 1800's, Charles Wheatstone applied a version of Kempelen's speaking machine that was capable of making vowels, consonant tones, some sound mixtures and even full words. Vowels were produced using vibrating reed with all passages closed and consonants including nasals were produced with turbulent move via an appropriate passing with reed-off.
In late 1800's, Alexander Graham Bell with his father built a same kind of machine without any significant success. He transformed vocal tract by hand to produce does sound using his dog between his hip and legs and by rendering it growl.
No significant advancements on research and tests with mechanical and semi electronic analogs of vocal systems were made until 1960s' [38].
The first completely electro-mechanical synthesis device was released by Stewart in 1922[17]. For the excitation, there was a buzzer in it and another two resonant circuits to model the acoustic resonances of the vocal tract. This machine was able to produce sole static vowel noises with two lowest formants. Nonetheless it couldn't do any consonants or connected utterances. A similar kind of synthesizer was made by Wanger [27]. This device consisted of four electric resonators connected parallel and it was also fired up by way of a buzz-like source. The four outputs by resonators were combined in the proper amplitudes to produce vowel spectra. In 1932, Obata and Teshima, two researchers discovered the third formant in vowels [28]. The three first formants are usually considered to be enough for intelligible man-made speech.
The first device that may be regarded as a speech synthesizer was the VODER (Words Operating DEmonstratoR) presented by Homer Dudley in New York's Good 1939 [17][27][29]. The VODER was influenced by the VOCODER (Voice CODER) which developed at the Bell Laboratories in mid-thirties that was mainly developed for the communication purpose. The VOCODER was built as voice transmitting device alternatively for low band telephones and the VOCODER analyzed wideband speech, turned it into slowly and gradually varying control signs, sent those over the low-band phone range, and finally changed those signals back to the original speech [36]. The VODER contains touch delicate switches to control tone and a pedal to control the fundamental rate of recurrence.
After the demonstration of VODER demonstrating the ability of a machine to create human voice intelligibly, the individuals were more considering conversation synthesis. In 1951, Franklin cooper and his associates developed a routine playback synthesizer at the Haskins Laboratories [17] [29]. Its technique was to reconvert documented spectrogram patterns into sounds either in original or revised form. The spectrogram patterns were stored optically on the transparent belts.
The Formant synthesizer was presented by Walter Lawrence in 1953 [17] and was named as PAT (Parametric Artificial Talker). It consisted of three electronic formant resonators connected in parallel. As an suggestions signal, either a excitement or a noises was used. It could control the three formant frequencies, voicing amplitude, important frequency, and noises amplitude. Approximately the same time, Gunner Fant released the first cascade formant synthesizer named OVE I ( Orator Verbis Electris). In 1962, Fant and Martony launched a better synthesizer known as OVE II which consisted different parts in it to model the copy function of the vocal area for vowels, nasals and obstruent consonants. The OVE projects were further increased and as a result OVE III and GLOVE created at the Kungliga Tekniska H¶gskolan (KTH), Sweden, and the present commercial Infovox system is actually descended from these [30][31][32].
There was a conversation between PAT and OVE about how the transfer function of the acoustic pipe should be modeled, in parallel or in cascade. John Holmes created his parallel formant synthesizer in 1972 after observing these synthesizers for few years. The voice synthesis was so excellent that the average listener cannot inform the difference between your synthesized and the natural one [17]. About a time later he released parallel formant synthesizer developed with JSRU (Joint Talk Research Unit) [33].
First articulator synthesizer was released in 1958 by George Rosen at the Massachusetts Institute of Technology, M. I. T. [17]. The DAVO (Dynamic Analog of the Vocal tract) was managed by tape recording of control signs created by hand. The first tests with Liner Predictive Coding (LPC) were made in middle 1960s [28].
The first full text-to-speech system for English originated in the Electro complex Laboratory, Japan 1968 by Noriko Umeda and his companions [17]. The synthesis was based on an articulatory model and included a syntactic examination component with some superior heuristics. Though the system was intelligible it is yet monotonic.
The MITalk laboratory text-to-speech system developed at M. I. T by Allen, Hunnicutt, and Klatt in 1979. The machine was used later also in Telesensory Systems Inc. (TSI) commercial TTS system with some adjustments [17][34]. Dennis Klatt launched his famous Klattalk system 2 yrs later, that used a new superior voicing source described more detailed in [17]. The technology used in MITalk and Klattalk systems form the foundation for many synthesis systems today, such as DECtalk and Prose-2000.
In 1976, the first reading help with optical scanning device was presented by Kurzweil. The system was very helpful for the blind people and it might read multifont written words. Though it was useful, the price was very costly for average customers yet it found in libraries and service centers for, but was used in libraries and service centers for visually impaired people [17].
Considerable amount of commercial text-to-speech systems were launched in later 1970's and early on 1980's [17]. In 1978 Richard Gagnon presented a cheap Votrax-based Type-n-Talk system. In 1980, two years later Texas Devices introduced linear prediction coding (LPC) structured Speak-n-Spell synthesizer based on low-cost linear prediction synthesis chip (TMS-5100). In 1982 Neighborhood Electronics unveiled Echo low-cost diphone synthesizer which was based on a more recent version of the same chip as with Speak-n-Spell (TMS-5220). At the same time Talk Plus Inc. launched the Prose-2000 text-to-speech system. A 12 months later, first commercial editions of famous DECtalk and Infovox SA-101 synthesizer were created [17].
One of the present day synthesis technology methods applied lately in conversation synthesis is covered Markov models (HMM). They are applied to speech identification from late
1970's. For just two decades, it has been used for conversation synthesis systems. A concealed Markov
model is a collection of states linked by transitions with two pieces of probabilities in each: a changeover probability which gives the probability when planning on taking this transition, and an outcome probability density function (pdf) which defines the conditional possibility of emitting each end result icon from a finite alphabet, considering that the transition is taken [35].
Neural networks also used in speech synthesis for approximately ten years and yet the means of using neural systems remain not fully uncovered. Same as the HMM, the neural network technology can also use in talk synthesis in a promising manner [28].
Fig. 6. 1. some milestones in conversation synthesis [38]
6. 1. 1 History of Finnish Conversation Synthesis
In past, compared to English, the number of users is quite small and the development process is frustrating and expensive even though Finnish content material processing scheme is easy and correspondence to its pronunciation is in a higher level. The demand has been increased with new multimedia system and telecommunication applications.
In 1977, the first Finnish talk synthesizer, SYNTE2 was introduced in Tampere University or college of Technology. It was the first microprocessor centered synthesis system and the first portable TTS system on the planet. Five years later an improved SYNTE3 synthesizer