0% found this document useful (0 votes)
69 views15 pages

Chapter 1,2,3

This document summarizes a research paper presented to fulfill the requirements for a Bachelor of Science degree in Information Technology. The paper explores developing an application for smartphones that converts documents like PDFs, texts, and e-books into audiobooks that can be listened to while performing other tasks. The objectives are to provide an easier and more pleasant reading and listening experience. The study was conducted from October to December 2018 at Cavite State University. It has the potential to benefit students by improving listening skills and allowing for multitasking.

Uploaded by

Normay Bartolo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views15 pages

Chapter 1,2,3

This document summarizes a research paper presented to fulfill the requirements for a Bachelor of Science degree in Information Technology. The paper explores developing an application for smartphones that converts documents like PDFs, texts, and e-books into audiobooks that can be listened to while performing other tasks. The objectives are to provide an easier and more pleasant reading and listening experience. The study was conducted from October to December 2018 at Cavite State University. It has the potential to benefit students by improving listening skills and allowing for multitasking.

Uploaded by

Normay Bartolo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

A Research paper presented to the faculty of

In partial fulfillment
Of the requirements for the degree
Bachelor of Science in Information Technology

By

November 2018

Chapter 1

THE PROBLEM AND THE SETTING

Introduction
Have you ever thought of listening to your books, articles, and other documents instead

of reading them? Text Speaker reads your text documents aloud on your PC and converts them

to audio files in MP3 or WAV format. Listen to the audio files on your MP3 player, iPod,

iPhone, and mobile phone while you do other tasks at home or at work. Text Speaker offers a

great selection of high quality, human sounding voices.

The continuous growing of people’s music library requires more advanced ways of

computing playlists through algorithms that match tracks to the user’s preferences. Several

approaches have been made to enhance the user’s listening experience. The application of

background music in the way of reading may open up a new era of learning possibilities. For

centuries, educators have used music as a learning tool that connects the concept to be acquired

with a catchy song or rhythm (Beentjes, J.W.J. et all 1996).

An electronic book (also referred to as an “E-book”) is an electronic version of a

traditional print book (or other printed material Such as, for example, a magazine, newspaper,

and So forth) that can be read by using a personal computer or by using an E-book reader. Unlike

PCs or handheld computers, E-book readers deliver a reading experience comparable to

traditional paper books, while adding powerful electronic features for note taking, fast

navigation, and key word Searches. However, such actions, irrespective of whether or not they

are performed on a PC, handheld computer, or E-book reader, generally require the user to read

the text from a display. Thus, the use of an E-book generally requires the user to focus his or her

visual attention on a display to read the text content (e.g., book, magazine, newspaper, and So

forth) of the E-book. Moreover, reading of an E-book is generally performed without any music

playing in the background, particularly without any music playing from the E-book itself. The
same is true for other types of hand-held devices Such as personal digital assistants (PDAS) and

so forth.

In order to increase the naturalness of oral communications between humans and

machines, all speech aspects must be involved. Speech does not only transmit ideas and

concepts, but also carries information about the attitude, emotion and individuality of the speaker

(Y. Chen, et all 2003). Speech is the most used and natural way for people to communicate.

From the beginning of the man-machine interface research, speech has been one of the most

desired mediums to interact with computers. Therefore, speech recognition and text-to-speech

capability have been studied to make the communication with machines more human likely. In

order to increase the naturalness of oral communications between humans and machines, all

speech aspects must be involved. Speech does not only transmit ideas and concepts, but also

carries information about the attitude, emotion and individuality of the speaker. Speaker identity,

the sound of a person’s voice, is a key factor in oral communications.

Background of the Study

Audiobook has been used since the time e-books had been released. Audio book has been

used by parents and also their children in helping them read. This study focuses on making e-

books to audiobooks for pdf, txt, docs and zip file that you want to listen rather than read. It

would be desirable and highly advantageous to have a hand-held device that allows a user to

assimilate content without having to look at a display.

Objectives

Our intentions are to provide a new application in smartphones to have an easier reading

and a pleasant listening experience at the same time that will help the users to able to study while
doing other task at home or in school for their school works, generally relates to hand held

devices and, more particularly, to mixing music and text-to-speech (TTS).

Significance of the Study

The students will be the beneficiary of this application they will able to learn proper

intonation of sentences by listening to converted Audiobook, especially in pronunciation

exercise, this can also increase the usability and productivity of the Google Drive. The

application improves listening experience. They don’t have to download a video, PDF, TXT,

Docs or zip file in order to access it. By this application they can listen to long articles with a soft

background track. When converting your document to MP3 format, you can combine speech

with music. The file formats supported for the background music are MP3, WAV, AIFF, WMA,

MPA, ASF, MPEG, MPG, and M1V.

The result of this application may help to the users to give them an easily and

conveniently reading experience.

Lastly, the development of this study will also take benefit for the future researchers.

They might think of making this system more complex which may results to the development of

another system.

Time and place

The study was conducted from October 2018 to December 2018 in My Value Max Inc.

located at Cavite State University Carmona – Campus.

Scope and Limitations


One of the functions of this application the user can also see the document and be able to

read it while listening to the text voice that reads the text file. It will continue playing while in

Sleep Mode. The player can also modify the way a voice speaks, by speeding up or slowing

down the speech, changing the pitch, and changing the volume.

The user can also pick play background music while the application reads your document

fluently, including Free Classical music artist like Mozart, Beethoven, Bach, Chopin, etc. The

user can also enable the option Add background music to the output file. With the Test Button

you can listen to how your audio file sounds. You can adjust the volume of the background

music with the help of the slider.

Definitions of Terms

The following terms as used by the researchers are operationally defined:

Audio Files refers to a computer file that contains digitized audio either in the Compact Disc

(CDDA) format or in an MP3, AAC or other compressed format. See codec

examples, file and sampling.

E-Book Reader refers to handheld computer devices like Amazon's Kindle, Barnes and

Noble's NOOK and Apple's iPad that make it possible for books in digital form to be viewed and

read by users

Human Sounding Voices refers to voice (or vocalization) is the sound produced by humans and

other vertebrates using the lungs and the vocal folds in the larynx, or voice box. Voice is not

always produced as speech, however. Infants babble and coo; animals bark, moo, whinny, growl,

and meow; and adult humans laugh, sing, and cry.


iPad is a portable music player developed by Apple Computer support a wide variety of audio

formats, including MP3, AAC, WAV, and AIFF.

PDA short for personal digital assistant a hand held device that combines computing,

telephone/fax, Internet and networking features. A typical PDA can function as a cellular phone,

fax sender, Web browser and personal organizer. PDAs may also be referred to as

a palmtop, hand-held computer or pocket computer.

WAV refers to an audio file format, created by Microsoft that has become a standard PC audio

file format for everything from system and game sounds to CD-quality audio. A Wave file is

identified by a file name extension of WAV (rarely, Audio for Windows).

Text Speaker refers to your own text and sample some of the languages and voices that we

offer for speech-enabling websites, giving a voice to your online documents and mobile apps,

or making your online/offline content more accessible with text to speech.

Text to Speech abbreviated as TTS, is a form of speech synthesisthat converts text into spoken

voice output. Text to speech systems were first developed to aid the visually impaired by

offering a computer-generated spoken voice that would "read" text to the user. TTS should not

be confused with voice response systems.

Chapter II

REVIEW OF RELATED LITERATURE

According to Jianlei Xie et all. (2002), there is provided an E-book. The E-book

comprises a memory device, a text-to-speech (TTS) module, and a music module. The memory
device stores files. The files include text and music. The TTS module Synthesizes Speech

corresponding to the text. The music module plays back the music. The at least one speaker

outputs the Speech and the music.

According to Clark Quinn, professor, author, and expert in computer-based education,

defined mobile learning as the intersection of mobile computing (the application of small,

portable, and wireless computing and communications devices) and e-learning (learning

facilitated and supported through the use of information and communications technology).he

predicted that mobile learning would one day provide learning that was truly independent of time

and place and facilitated by portable computers capable of providing rich interactivity, total

connectivity, and powerful processing. in May 2005, Ellen Wagner, senior director of Global

Education Solutions at Mac-romedia, proclaimed that the mobile revolution had finally arrived.

Wherever one looks, evidence of mobile penetrations is irrefutable: cellphones, PDA's

MP3 players, portable game devices, handhelds, tablets, and laptops abound. No demographic is

immune from this phenomenon. From toddlers to seniors, people are increasingly connected and

are digitally communicating with each other in ways that would have been impossible only a few

years ago.

Music capabilities allow an Ebook user to enjoy digital music output from the Ebook.

TTS capabilities allow an Ebook user to listen to Synthesized text output from the Ebook. The

combination of music and TTS allow an Ebook user to listen to the text along with background

music.

The majority of the evidence tends to support background music due to its positive

implications. Cool, Yarbrough, Patton, Runde, and Keith (1994) conducted a study that proved
radio noise generally was considered to be somewhat helpful to students while studying. It kept

them focused and on task. Howard Gardner, a Harvard graduate, wrote, Frames of Mind, in the

early 1980’s. It has since become one of the most influential books for education. Gardner

believes that music creates a positive and relaxing environment that allows for sensory

integration to take place and improves concentration abilities. Sensory integration is essential for

establishing long-term memory. He has also seen background music successfully used to mask

outside traffic sounds, release stress before an exam, and to reinforce subject matter (Campbell,

1997). Jensen (1998) reported that music can deliver as much as sixty percent more content in

five percent of the time usually taken to deliver the same materials.

Based on the article written by Bossard,
 L.
 (2008), Several
 solutions
 already


use
 intelligent
 playlists
 embedded
 in
 music
 players
 installed
 on
 computers.


 There
 are
 also
 online
 solutions,
 the
 most
 popular
 of
 which
 is last.fm,


which
 acts
 as
 a
 personalized
 radio
 station
 that
 plays
 preferred music.
 On
 the


 other
 hand
 it
 does
 not
 allow
 playback
 of
 a
 certain
 track.
 There
 are
 also


 other
 solutions,
 like
 the
 genius
 function
 of
 iTunes
 or
 the
 Music


Explorer;
 both
 use
 the
 user’s
 music
 collection
 to
 generate
 playlists.
 The


 biggest
 disadvantage
 of
 the
 latter
 solution
 is
 that
 the
 user
 can
 use

only
 tracks
 that
 he/she
 already
 has
 on
 his/her
 PC
 to
 generate
 playlists.
 Of


 course
 this
 limits
 the
 power
 or
 the
 algorithm
 very
 much.


According to Lorenzi
 (2007)
 proposes
 a
 way
 of
 representing
 the
 similarity


 between
 tracks
 in
 a
 10‐dimensional
 Euclidian
 space
 (further
 called
 music


space),
 where
 the
 closeness
 of
 tracks
 is
 approximately
 proportional
 to


their
 similarity.
 7M
 songs
 currently
 appear
 in
 the
 database,
 but
 only
 500K


of
 them
 have
 enough
 user
 statistics
 to
 be
 mapped
 in
 the
 graph.
 Using


this
 simplified
 and
 computationally efficient
 way
 of
 finding
 similar
 tracks,


several
 applications
 can
 explore
 new
 ways
 of
 computing
 playlists.
 Most
 of


them
 offer
 support
 in
 playlist
 generation
 but
 none
 also
 provides
 the
 tracks


 to
 be
 played.
 This
 could
 be
 seen
 as
 a
 disadvantage because
 not
 all


people
 possess
 all
 tracks
 that
 are
 suggested
 by
 the
 space.


Klusacek [59] proposed a conditional pronunciation modeling method. It uses time-

aligned streams of phones and phonemes to model a speaker’s specific pronunciation. The

system uses phonemes drawn from a lexicon of pronunciations of words recognized by an

automatic speech recognition system to generate the phoneme stream and an open-loop phone

recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame
level and conditional probabilities of a phone, given a phoneme, are estimated using co-

occurrence counts. A likelihood detector is then applied to these probabilities for the speaker

detection task. This approach achieves a relatively high accuracy in comparison with other

phonetic methods in the SuperSID project at the Johns Hopkins 2002 Workshop [114] [90].

According to H. Gish, et all (1986), A majority of the speaker models, including the

Gaussian mixture models, are based on modeling the underlying distribution of feature vectors

from a speaker. When the speech is corrupted, the spectral based features are also corrupted and

so their distributions are modified. Thus, a speaker model trained using speech from one type of

corrupt environment will generally perform poorly in recognizing the same speaker using speech

collected under different conditions since the feature distributions are now different. Various

studies of speaker recognition systems using degraded or distorted speech have shown a dramatic

decrease in performance [47] [38].

Current speaker recognition researches mainly focus on recognition under controlled

conditions such as Switchboard telephone speech, which is close-talking speech. A large amount

of effort is still needed in research about speaker recognition robustness under unlimited

conditions in open environment with distant microphones.


Chapter III

RESEARCH METHODOLOGY

This chapter discusses the research design, the selection of the participants as well the

instrumentation and validation, data gathering procedures, treatment and analysis of data.

Materials

Various hardware and software were used for the study. A Windows Operated, Personal

Computer, printer and 8gb flash-drive were the hardware utilized for the development of the

study. For the software requirements, the following were used; Adobe Photoshop CC and Adobe

Illustrator CS6 for the graphical user interface of the application; Java for the programming

language; MySQL for the database; Sublime text and Notepad ++ for coding; Google Chrome,

Torch r20, Mozilla Firefox for the browser of the study and Microsoft Office 2010 to create the

documentation.

Methods
The application design is about developing the NARATOR E-book to Audiobook Converter

application using which the user can do the following things.

 Read the Documents by just Listening.

 Converts EBooks files to Audiobook file

 Change the GUI Color Scheme.

 Change the Music background.

 Change the reader voice personality,

 Change the mode (Day/Night Mode) in which the page is being displayed.

 Search for some content in the document using keywords.

 Auto flag document pages and sections

 Read .PDF , .DOCX , .TXT files from google Drive

 Share the content of a book on a Facebook wall.

 Set an alarm as a remainder to read a particular book in the future.

SOFTWARE DEVELOPMENT MODEL: (WATER FALL MODEL)

The waterfall model is a popular version of the systems development life cycle model for

software engineering. Often considered the classic approach to the systems development life

cycle, the waterfall model describes a development method that is linear and sequential.

Waterfall development has distinct goals for each phase of development. Imagine a waterfall on

the cliff of a steep mountain. Once the water has flowed over the edge of the cliff, gravity is in

control, and water cannot run uphill. It is the same with waterfall development. Once a phase of
development is completed, the development proceeds to the next phase and there no or little

interplay between phases [12, 24] (Figure 1).

Requirements

This is the first phase of the software development life cycle. Here we gather all the requirements

that have to be fulfilled by the developed software Application [12]

Figure 1. Definitions of different phases of the water fall model. Source: CrackMBA.
Waterfall Model, 2011. http://crackmba.com/ waterfall-model/, accessed Nov. 2018.
Design

After gathering the requirements we will design this particular project. Here we will design the

system according to the requirements we gathered in the first phase. We use UML to document

aspects of the design of the system [12].

Construction

Here the code is implemented. This is the phase where we implement the actual system

according to the design. This phase is also called the coding phase [12]

Testing

We will test, after coding part is finished. In this testing phase, we will test the coding part by

using different testing methods. We will execute the code with a variety of tests until there are no

errors. Once integration is done, we have to again test the system for proper functionality [12].

Installation

After testing the application we have to deploy or install the software or application in the real

time environment to make use of it. In this deployment process the customer is involved. He is

seeing all the coding, testing and executing part. If he wants any changes, again it will be

modified [12].

Maintenance
If we have any issues, when we are using the software/application, we will handle them in the

maintenance phase. After deployment process, if they are not satisfied with that particular

project, again it will be modified. So the project team is maintaining all these phases, in

consultation with the customers [12]

You might also like