Chapter 1,2,3
Chapter 1,2,3
In partial fulfillment
Of the requirements for the degree
Bachelor of Science in Information Technology
By
November 2018
Chapter 1
Introduction
Have you ever thought of listening to your books, articles, and other documents instead
of reading them? Text Speaker reads your text documents aloud on your PC and converts them
to audio files in MP3 or WAV format. Listen to the audio files on your MP3 player, iPod,
iPhone, and mobile phone while you do other tasks at home or at work. Text Speaker offers a
The continuous growing of people’s music library requires more advanced ways of
computing playlists through algorithms that match tracks to the user’s preferences. Several
approaches have been made to enhance the user’s listening experience. The application of
background music in the way of reading may open up a new era of learning possibilities. For
centuries, educators have used music as a learning tool that connects the concept to be acquired
traditional print book (or other printed material Such as, for example, a magazine, newspaper,
and So forth) that can be read by using a personal computer or by using an E-book reader. Unlike
traditional paper books, while adding powerful electronic features for note taking, fast
navigation, and key word Searches. However, such actions, irrespective of whether or not they
are performed on a PC, handheld computer, or E-book reader, generally require the user to read
the text from a display. Thus, the use of an E-book generally requires the user to focus his or her
visual attention on a display to read the text content (e.g., book, magazine, newspaper, and So
forth) of the E-book. Moreover, reading of an E-book is generally performed without any music
playing in the background, particularly without any music playing from the E-book itself. The
same is true for other types of hand-held devices Such as personal digital assistants (PDAS) and
so forth.
machines, all speech aspects must be involved. Speech does not only transmit ideas and
concepts, but also carries information about the attitude, emotion and individuality of the speaker
(Y. Chen, et all 2003). Speech is the most used and natural way for people to communicate.
From the beginning of the man-machine interface research, speech has been one of the most
desired mediums to interact with computers. Therefore, speech recognition and text-to-speech
capability have been studied to make the communication with machines more human likely. In
order to increase the naturalness of oral communications between humans and machines, all
speech aspects must be involved. Speech does not only transmit ideas and concepts, but also
carries information about the attitude, emotion and individuality of the speaker. Speaker identity,
Audiobook has been used since the time e-books had been released. Audio book has been
used by parents and also their children in helping them read. This study focuses on making e-
books to audiobooks for pdf, txt, docs and zip file that you want to listen rather than read. It
would be desirable and highly advantageous to have a hand-held device that allows a user to
Objectives
Our intentions are to provide a new application in smartphones to have an easier reading
and a pleasant listening experience at the same time that will help the users to able to study while
doing other task at home or in school for their school works, generally relates to hand held
The students will be the beneficiary of this application they will able to learn proper
exercise, this can also increase the usability and productivity of the Google Drive. The
application improves listening experience. They don’t have to download a video, PDF, TXT,
Docs or zip file in order to access it. By this application they can listen to long articles with a soft
background track. When converting your document to MP3 format, you can combine speech
with music. The file formats supported for the background music are MP3, WAV, AIFF, WMA,
The result of this application may help to the users to give them an easily and
Lastly, the development of this study will also take benefit for the future researchers.
They might think of making this system more complex which may results to the development of
another system.
The study was conducted from October 2018 to December 2018 in My Value Max Inc.
read it while listening to the text voice that reads the text file. It will continue playing while in
Sleep Mode. The player can also modify the way a voice speaks, by speeding up or slowing
down the speech, changing the pitch, and changing the volume.
The user can also pick play background music while the application reads your document
fluently, including Free Classical music artist like Mozart, Beethoven, Bach, Chopin, etc. The
user can also enable the option Add background music to the output file. With the Test Button
you can listen to how your audio file sounds. You can adjust the volume of the background
Definitions of Terms
Audio Files refers to a computer file that contains digitized audio either in the Compact Disc
E-Book Reader refers to handheld computer devices like Amazon's Kindle, Barnes and
Noble's NOOK and Apple's iPad that make it possible for books in digital form to be viewed and
read by users
Human Sounding Voices refers to voice (or vocalization) is the sound produced by humans and
other vertebrates using the lungs and the vocal folds in the larynx, or voice box. Voice is not
always produced as speech, however. Infants babble and coo; animals bark, moo, whinny, growl,
PDA short for personal digital assistant a hand held device that combines computing,
telephone/fax, Internet and networking features. A typical PDA can function as a cellular phone,
fax sender, Web browser and personal organizer. PDAs may also be referred to as
WAV refers to an audio file format, created by Microsoft that has become a standard PC audio
file format for everything from system and game sounds to CD-quality audio. A Wave file is
Text Speaker refers to your own text and sample some of the languages and voices that we
offer for speech-enabling websites, giving a voice to your online documents and mobile apps,
Text to Speech abbreviated as TTS, is a form of speech synthesisthat converts text into spoken
voice output. Text to speech systems were first developed to aid the visually impaired by
offering a computer-generated spoken voice that would "read" text to the user. TTS should not
Chapter II
According to Jianlei Xie et all. (2002), there is provided an E-book. The E-book
comprises a memory device, a text-to-speech (TTS) module, and a music module. The memory
device stores files. The files include text and music. The TTS module Synthesizes Speech
corresponding to the text. The music module plays back the music. The at least one speaker
defined mobile learning as the intersection of mobile computing (the application of small,
portable, and wireless computing and communications devices) and e-learning (learning
facilitated and supported through the use of information and communications technology).he
predicted that mobile learning would one day provide learning that was truly independent of time
and place and facilitated by portable computers capable of providing rich interactivity, total
connectivity, and powerful processing. in May 2005, Ellen Wagner, senior director of Global
Education Solutions at Mac-romedia, proclaimed that the mobile revolution had finally arrived.
MP3 players, portable game devices, handhelds, tablets, and laptops abound. No demographic is
immune from this phenomenon. From toddlers to seniors, people are increasingly connected and
are digitally communicating with each other in ways that would have been impossible only a few
years ago.
Music capabilities allow an Ebook user to enjoy digital music output from the Ebook.
TTS capabilities allow an Ebook user to listen to Synthesized text output from the Ebook. The
combination of music and TTS allow an Ebook user to listen to the text along with background
music.
The majority of the evidence tends to support background music due to its positive
implications. Cool, Yarbrough, Patton, Runde, and Keith (1994) conducted a study that proved
radio noise generally was considered to be somewhat helpful to students while studying. It kept
them focused and on task. Howard Gardner, a Harvard graduate, wrote, Frames of Mind, in the
early 1980’s. It has since become one of the most influential books for education. Gardner
believes that music creates a positive and relaxing environment that allows for sensory
integration to take place and improves concentration abilities. Sensory integration is essential for
establishing long-term memory. He has also seen background music successfully used to mask
outside traffic sounds, release stress before an exam, and to reinforce subject matter (Campbell,
1997). Jensen (1998) reported that music can deliver as much as sixty percent more content in
five percent of the time usually taken to deliver the same materials.
Based on the article written by Bossard, L. (2008), Several solutions already
use intelligent playlists embedded in music players installed on computers.
There are also online solutions, the most popular of which is last.fm,
which acts as a personalized radio station that plays preferred music. On the
other hand it does not allow playback of a certain track. There are also
other solutions, like the genius function of iTunes or the Music
Explorer; both use the user’s music collection to generate playlists. The
biggest
disadvantage
of
the
latter
solution
is
that
the
user
can
use
only
tracks
that
he/she
already
has
on
his/her
PC
to
generate
playlists.
Of
course this limits the power or the algorithm very much.
space), where the closeness of tracks is approximately proportional to
their similarity. 7M songs currently appear in the database, but only 500K
of them have enough user statistics to be mapped in the graph. Using
this simplified and computationally efficient way of finding similar tracks,
several applications can explore new ways of computing playlists. Most of
them offer support in playlist generation but none also provides the tracks
to be played. This could be seen as a disadvantage because not all
people possess all tracks that are suggested by the space.
aligned streams of phones and phonemes to model a speaker’s specific pronunciation. The
automatic speech recognition system to generate the phoneme stream and an open-loop phone
recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame
level and conditional probabilities of a phone, given a phoneme, are estimated using co-
occurrence counts. A likelihood detector is then applied to these probabilities for the speaker
detection task. This approach achieves a relatively high accuracy in comparison with other
phonetic methods in the SuperSID project at the Johns Hopkins 2002 Workshop [114] [90].
According to H. Gish, et all (1986), A majority of the speaker models, including the
Gaussian mixture models, are based on modeling the underlying distribution of feature vectors
from a speaker. When the speech is corrupted, the spectral based features are also corrupted and
so their distributions are modified. Thus, a speaker model trained using speech from one type of
corrupt environment will generally perform poorly in recognizing the same speaker using speech
collected under different conditions since the feature distributions are now different. Various
studies of speaker recognition systems using degraded or distorted speech have shown a dramatic
conditions such as Switchboard telephone speech, which is close-talking speech. A large amount
of effort is still needed in research about speaker recognition robustness under unlimited
RESEARCH METHODOLOGY
This chapter discusses the research design, the selection of the participants as well the
instrumentation and validation, data gathering procedures, treatment and analysis of data.
Materials
Various hardware and software were used for the study. A Windows Operated, Personal
Computer, printer and 8gb flash-drive were the hardware utilized for the development of the
study. For the software requirements, the following were used; Adobe Photoshop CC and Adobe
Illustrator CS6 for the graphical user interface of the application; Java for the programming
language; MySQL for the database; Sublime text and Notepad ++ for coding; Google Chrome,
Torch r20, Mozilla Firefox for the browser of the study and Microsoft Office 2010 to create the
documentation.
Methods
The application design is about developing the NARATOR E-book to Audiobook Converter
Change the mode (Day/Night Mode) in which the page is being displayed.
The waterfall model is a popular version of the systems development life cycle model for
software engineering. Often considered the classic approach to the systems development life
cycle, the waterfall model describes a development method that is linear and sequential.
Waterfall development has distinct goals for each phase of development. Imagine a waterfall on
the cliff of a steep mountain. Once the water has flowed over the edge of the cliff, gravity is in
control, and water cannot run uphill. It is the same with waterfall development. Once a phase of
development is completed, the development proceeds to the next phase and there no or little
Requirements
This is the first phase of the software development life cycle. Here we gather all the requirements
Figure 1. Definitions of different phases of the water fall model. Source: CrackMBA.
Waterfall Model, 2011. http://crackmba.com/ waterfall-model/, accessed Nov. 2018.
Design
After gathering the requirements we will design this particular project. Here we will design the
system according to the requirements we gathered in the first phase. We use UML to document
Construction
Here the code is implemented. This is the phase where we implement the actual system
according to the design. This phase is also called the coding phase [12]
Testing
We will test, after coding part is finished. In this testing phase, we will test the coding part by
using different testing methods. We will execute the code with a variety of tests until there are no
errors. Once integration is done, we have to again test the system for proper functionality [12].
Installation
After testing the application we have to deploy or install the software or application in the real
time environment to make use of it. In this deployment process the customer is involved. He is
seeing all the coding, testing and executing part. If he wants any changes, again it will be
modified [12].
Maintenance
If we have any issues, when we are using the software/application, we will handle them in the
maintenance phase. After deployment process, if they are not satisfied with that particular
project, again it will be modified. So the project team is maintaining all these phases, in