0% found this document useful (0 votes)

69 views15 pages

Chapter 1,2,3

This document summarizes a research paper presented to fulfill the requirements for a Bachelor of Science degree in Information Technology. The paper explores developing an application for smartphones that converts documents like PDFs, texts, and e-books into audiobooks that can be listened to while performing other tasks. The objectives are to provide an easier and more pleasant reading and listening experience. The study was conducted from October to December 2018 at Cavite State University. It has the potential to benefit students by improving listening skills and allowing for multitasking.

Uploaded by

Normay Bartolo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views15 pages

Chapter 1,2,3

Uploaded by

Normay Bartolo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

A Research paper presented to the faculty of

In partial fulfillment
Of the requirements for the degree
Bachelor of Science in Information Technology

November 2018

Chapter 1

THE PROBLEM AND THE SETTING

Introduction
Have you ever thought of listening to your books, articles, and other documents instead

of reading them? Text Speaker reads your text documents aloud on your PC and converts them

to audio files in MP3 or WAV format. Listen to the audio files on your MP3 player, iPod,

iPhone, and mobile phone while you do other tasks at home or at work. Text Speaker offers a

great selection of high quality, human sounding voices.

The continuous growing of people’s music library requires more advanced ways of

computing playlists through algorithms that match tracks to the user’s preferences. Several

approaches have been made to enhance the user’s listening experience. The application of

background music in the way of reading may open up a new era of learning possibilities. For

centuries, educators have used music as a learning tool that connects the concept to be acquired

with a catchy song or rhythm (Beentjes, J.W.J. et all 1996).

An electronic book (also referred to as an “E-book”) is an electronic version of a

traditional print book (or other printed material Such as, for example, a magazine, newspaper,

and So forth) that can be read by using a personal computer or by using an E-book reader. Unlike

PCs or handheld computers, E-book readers deliver a reading experience comparable to

traditional paper books, while adding powerful electronic features for note taking, fast

navigation, and key word Searches. However, such actions, irrespective of whether or not they

are performed on a PC, handheld computer, or E-book reader, generally require the user to read

the text from a display. Thus, the use of an E-book generally requires the user to focus his or her

visual attention on a display to read the text content (e.g., book, magazine, newspaper, and So

forth) of the E-book. Moreover, reading of an E-book is generally performed without any music

playing in the background, particularly without any music playing from the E-book itself. The
same is true for other types of hand-held devices Such as personal digital assistants (PDAS) and

so forth.

In order to increase the naturalness of oral communications between humans and

machines, all speech aspects must be involved. Speech does not only transmit ideas and

concepts, but also carries information about the attitude, emotion and individuality of the speaker

(Y. Chen, et all 2003). Speech is the most used and natural way for people to communicate.

From the beginning of the man-machine interface research, speech has been one of the most

desired mediums to interact with computers. Therefore, speech recognition and text-to-speech

capability have been studied to make the communication with machines more human likely. In

order to increase the naturalness of oral communications between humans and machines, all

speech aspects must be involved. Speech does not only transmit ideas and concepts, but also

carries information about the attitude, emotion and individuality of the speaker. Speaker identity,

the sound of a person’s voice, is a key factor in oral communications.

Background of the Study

Audiobook has been used since the time e-books had been released. Audio book has been

used by parents and also their children in helping them read. This study focuses on making e-

books to audiobooks for pdf, txt, docs and zip file that you want to listen rather than read. It

would be desirable and highly advantageous to have a hand-held device that allows a user to

assimilate content without having to look at a display.

Objectives

Our intentions are to provide a new application in smartphones to have an easier reading

and a pleasant listening experience at the same time that will help the users to able to study while
doing other task at home or in school for their school works, generally relates to hand held

devices and, more particularly, to mixing music and text-to-speech (TTS).

Significance of the Study

The students will be the beneficiary of this application they will able to learn proper

intonation of sentences by listening to converted Audiobook, especially in pronunciation

exercise, this can also increase the usability and productivity of the Google Drive. The

application improves listening experience. They don’t have to download a video, PDF, TXT,

Docs or zip file in order to access it. By this application they can listen to long articles with a soft

background track. When converting your document to MP3 format, you can combine speech

with music. The file formats supported for the background music are MP3, WAV, AIFF, WMA,

MPA, ASF, MPEG, MPG, and M1V.

The result of this application may help to the users to give them an easily and

conveniently reading experience.

Lastly, the development of this study will also take benefit for the future researchers.

They might think of making this system more complex which may results to the development of

another system.

Time and place

The study was conducted from October 2018 to December 2018 in My Value Max Inc.

located at Cavite State University Carmona – Campus.

Scope and Limitations

One of the functions of this application the user can also see the document and be able to

read it while listening to the text voice that reads the text file. It will continue playing while in

Sleep Mode. The player can also modify the way a voice speaks, by speeding up or slowing

down the speech, changing the pitch, and changing the volume.

The user can also pick play background music while the application reads your document

fluently, including Free Classical music artist like Mozart, Beethoven, Bach, Chopin, etc. The

user can also enable the option Add background music to the output file. With the Test Button

you can listen to how your audio file sounds. You can adjust the volume of the background

music with the help of the slider.

Definitions of Terms

The following terms as used by the researchers are operationally defined:

Audio Files refers to a computer file that contains digitized audio either in the Compact Disc

(CDDA) format or in an MP3, AAC or other compressed format. See codec

examples, file and sampling.

E-Book Reader refers to handheld computer devices like Amazon's Kindle, Barnes and

Noble's NOOK and Apple's iPad that make it possible for books in digital form to be viewed and

read by users

Human Sounding Voices refers to voice (or vocalization) is the sound produced by humans and

other vertebrates using the lungs and the vocal folds in the larynx, or voice box. Voice is not

always produced as speech, however. Infants babble and coo; animals bark, moo, whinny, growl,

and meow; and adult humans laugh, sing, and cry.

iPad is a portable music player developed by Apple Computer support a wide variety of audio

formats, including MP3, AAC, WAV, and AIFF.

PDA short for personal digital assistant a hand held device that combines computing,

telephone/fax, Internet and networking features. A typical PDA can function as a cellular phone,

fax sender, Web browser and personal organizer. PDAs may also be referred to as

a palmtop, hand-held computer or pocket computer.

WAV refers to an audio file format, created by Microsoft that has become a standard PC audio

file format for everything from system and game sounds to CD-quality audio. A Wave file is

identified by a file name extension of WAV (rarely, Audio for Windows).

Text Speaker refers to your own text and sample some of the languages and voices that we

offer for speech-enabling websites, giving a voice to your online documents and mobile apps,

or making your online/offline content more accessible with text to speech.

Text to Speech abbreviated as TTS, is a form of speech synthesisthat converts text into spoken

voice output. Text to speech systems were first developed to aid the visually impaired by

offering a computer-generated spoken voice that would "read" text to the user. TTS should not

be confused with voice response systems.

Chapter II

REVIEW OF RELATED LITERATURE

According to Jianlei Xie et all. (2002), there is provided an E-book. The E-book

comprises a memory device, a text-to-speech (TTS) module, and a music module. The memory
device stores files. The files include text and music. The TTS module Synthesizes Speech

corresponding to the text. The music module plays back the music. The at least one speaker

outputs the Speech and the music.

According to Clark Quinn, professor, author, and expert in computer-based education,

defined mobile learning as the intersection of mobile computing (the application of small,

portable, and wireless computing and communications devices) and e-learning (learning

facilitated and supported through the use of information and communications technology).he

predicted that mobile learning would one day provide learning that was truly independent of time

and place and facilitated by portable computers capable of providing rich interactivity, total

connectivity, and powerful processing. in May 2005, Ellen Wagner, senior director of Global

Education Solutions at Mac-romedia, proclaimed that the mobile revolution had finally arrived.

Wherever one looks, evidence of mobile penetrations is irrefutable: cellphones, PDA's

MP3 players, portable game devices, handhelds, tablets, and laptops abound. No demographic is

immune from this phenomenon. From toddlers to seniors, people are increasingly connected and

are digitally communicating with each other in ways that would have been impossible only a few

years ago.

Music capabilities allow an Ebook user to enjoy digital music output from the Ebook.

TTS capabilities allow an Ebook user to listen to Synthesized text output from the Ebook. The

combination of music and TTS allow an Ebook user to listen to the text along with background

music.

The majority of the evidence tends to support background music due to its positive

implications. Cool, Yarbrough, Patton, Runde, and Keith (1994) conducted a study that proved
radio noise generally was considered to be somewhat helpful to students while studying. It kept

them focused and on task. Howard Gardner, a Harvard graduate, wrote, Frames of Mind, in the

early 1980’s. It has since become one of the most influential books for education. Gardner

believes that music creates a positive and relaxing environment that allows for sensory

integration to take place and improves concentration abilities. Sensory integration is essential for

establishing long-term memory. He has also seen background music successfully used to mask

outside traffic sounds, release stress before an exam, and to reinforce subject matter (Campbell,

1997). Jensen (1998) reported that music can deliver as much as sixty percent more content in

five percent of the time usually taken to deliver the same materials.

Based on the article written by Bossard,  L.  (2008), Several  solutions  already 

use  intelligent  playlists  embedded  in  music  players  installed  on  computers.

  There  are  also  online  solutions,  the  most  popular  of  which  is last.fm, 

which  acts  as  a  personalized  radio  station  that  plays  preferred music.  On  the

  other  hand  it  does  not  allow  playback  of  a  certain  track.  There  are  also

  other  solutions,  like  the  genius  function  of  iTunes  or  the  Music 

Explorer;  both  use  the  user’s  music  collection  to  generate  playlists.  The

  biggest  disadvantage  of  the  latter  solution  is  that  the  user  can  use 
only  tracks  that  he/she  already  has  on  his/her  PC  to  generate  playlists.  Of

  course  this  limits  the  power  or  the  algorithm  very  much. 

According to Lorenzi  (2007)  proposes  a  way  of  representing  the  similarity

  between  tracks  in  a  10‐dimensional  Euclidian  space  (further  called  music 

space),  where  the  closeness  of  tracks  is  approximately  proportional  to 

their  similarity.  7M  songs  currently  appear  in  the  database,  but  only  500K 

of  them  have  enough  user  statistics  to  be  mapped  in  the  graph.  Using 

this  simplified  and  computationally efficient  way  of  finding  similar  tracks, 

several  applications  can  explore  new  ways  of  computing  playlists.  Most  of 

them  offer  support  in  playlist  generation  but  none  also  provides  the  tracks

  to  be  played.  This  could  be  seen  as  a  disadvantage because  not  all 

people  possess  all  tracks  that  are  suggested  by  the  space. 

Klusacek [59] proposed a conditional pronunciation modeling method. It uses time-

aligned streams of phones and phonemes to model a speaker’s specific pronunciation. The

system uses phonemes drawn from a lexicon of pronunciations of words recognized by an

automatic speech recognition system to generate the phoneme stream and an open-loop phone

recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame
level and conditional probabilities of a phone, given a phoneme, are estimated using co-

occurrence counts. A likelihood detector is then applied to these probabilities for the speaker

detection task. This approach achieves a relatively high accuracy in comparison with other

phonetic methods in the SuperSID project at the Johns Hopkins 2002 Workshop [114] [90].

According to H. Gish, et all (1986), A majority of the speaker models, including the

Gaussian mixture models, are based on modeling the underlying distribution of feature vectors

from a speaker. When the speech is corrupted, the spectral based features are also corrupted and

so their distributions are modified. Thus, a speaker model trained using speech from one type of

corrupt environment will generally perform poorly in recognizing the same speaker using speech

collected under different conditions since the feature distributions are now different. Various

studies of speaker recognition systems using degraded or distorted speech have shown a dramatic

decrease in performance [47] [38].

Current speaker recognition researches mainly focus on recognition under controlled

conditions such as Switchboard telephone speech, which is close-talking speech. A large amount

of effort is still needed in research about speaker recognition robustness under unlimited

conditions in open environment with distant microphones.

Chapter III

RESEARCH METHODOLOGY

This chapter discusses the research design, the selection of the participants as well the

instrumentation and validation, data gathering procedures, treatment and analysis of data.

Materials

Various hardware and software were used for the study. A Windows Operated, Personal

Computer, printer and 8gb flash-drive were the hardware utilized for the development of the

study. For the software requirements, the following were used; Adobe Photoshop CC and Adobe

Illustrator CS6 for the graphical user interface of the application; Java for the programming

language; MySQL for the database; Sublime text and Notepad ++ for coding; Google Chrome,

Torch r20, Mozilla Firefox for the browser of the study and Microsoft Office 2010 to create the

documentation.

Methods
The application design is about developing the NARATOR E-book to Audiobook Converter

application using which the user can do the following things.

 Read the Documents by just Listening.

 Converts EBooks files to Audiobook file

 Change the GUI Color Scheme.

 Change the Music background.

 Change the reader voice personality,

 Change the mode (Day/Night Mode) in which the page is being displayed.

 Search for some content in the document using keywords.

 Auto flag document pages and sections

 Read .PDF , .DOCX , .TXT files from google Drive

 Share the content of a book on a Facebook wall.

 Set an alarm as a remainder to read a particular book in the future.

SOFTWARE DEVELOPMENT MODEL: (WATER FALL MODEL)

The waterfall model is a popular version of the systems development life cycle model for

software engineering. Often considered the classic approach to the systems development life

cycle, the waterfall model describes a development method that is linear and sequential.

Waterfall development has distinct goals for each phase of development. Imagine a waterfall on

the cliff of a steep mountain. Once the water has flowed over the edge of the cliff, gravity is in

control, and water cannot run uphill. It is the same with waterfall development. Once a phase of
development is completed, the development proceeds to the next phase and there no or little

interplay between phases [12, 24] (Figure 1).

Requirements

This is the first phase of the software development life cycle. Here we gather all the requirements

that have to be fulfilled by the developed software Application [12]

Figure 1. Definitions of different phases of the water fall model. Source: CrackMBA.
Waterfall Model, 2011. http://crackmba.com/ waterfall-model/, accessed Nov. 2018.
Design

After gathering the requirements we will design this particular project. Here we will design the

system according to the requirements we gathered in the first phase. We use UML to document

aspects of the design of the system [12].

Construction

Here the code is implemented. This is the phase where we implement the actual system

according to the design. This phase is also called the coding phase [12]

Testing

We will test, after coding part is finished. In this testing phase, we will test the coding part by

using different testing methods. We will execute the code with a variety of tests until there are no

errors. Once integration is done, we have to again test the system for proper functionality [12].

Installation

After testing the application we have to deploy or install the software or application in the real

time environment to make use of it. In this deployment process the customer is involved. He is

seeing all the coding, testing and executing part. If he wants any changes, again it will be

modified [12].

Maintenance
If we have any issues, when we are using the software/application, we will handle them in the

maintenance phase. After deployment process, if they are not satisfied with that particular

project, again it will be modified. So the project team is maintaining all these phases, in

consultation with the customers [12]

Universal
No ratings yet
Universal
26 pages
CHAPTER TWOpdftoaudio
No ratings yet
CHAPTER TWOpdftoaudio
18 pages
Sound Based Assistive Technology Support To Hearing, Speaking and Seeing All Sections Download
100% (10)
Sound Based Assistive Technology Support To Hearing, Speaking and Seeing All Sections Download
14 pages
CHAPTER TWOpdftoaudiocorrection
No ratings yet
CHAPTER TWOpdftoaudiocorrection
15 pages
PDF To Audio Converter
No ratings yet
PDF To Audio Converter
4 pages
P26413PPT
No ratings yet
P26413PPT
12 pages
Gian EMTECH
No ratings yet
Gian EMTECH
4 pages
E3 Chap 10
No ratings yet
E3 Chap 10
30 pages
Speech Application Language Tags
No ratings yet
Speech Application Language Tags
13 pages
Phonetic Shapes
No ratings yet
Phonetic Shapes
6 pages
Our Lady of Fatima University: Universal Design
No ratings yet
Our Lady of Fatima University: Universal Design
25 pages
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
57% (7)
Text - To - Speech Converter: Bachelor of Engineering IN Computer Science & Engineering
42 pages
PDF To Audio Poster
No ratings yet
PDF To Audio Poster
1 page
HPL-2002-42 - Speech-As-Data Technologies For Personal Information Devices
No ratings yet
HPL-2002-42 - Speech-As-Data Technologies For Personal Information Devices
10 pages
Universal Design Principles
No ratings yet
Universal Design Principles
5 pages
Text To Speech Converter: Advantages
No ratings yet
Text To Speech Converter: Advantages
2 pages
Text To Speech Converter Documentation
50% (4)
Text To Speech Converter Documentation
28 pages
Speech Recog Intro
No ratings yet
Speech Recog Intro
9 pages
Rapha Dauda Chapter One To Four - 034731
No ratings yet
Rapha Dauda Chapter One To Four - 034731
40 pages
Naturalreader: A New Generation Text Reader: Jacqueline Flood
No ratings yet
Naturalreader: A New Generation Text Reader: Jacqueline Flood
12 pages
Assistive Devices
No ratings yet
Assistive Devices
4 pages
Assistive Technology Report
No ratings yet
Assistive Technology Report
7 pages
Rapha Dauda One To Five - 043847
No ratings yet
Rapha Dauda One To Five - 043847
41 pages
Report Virtual Assistants (2) 2
No ratings yet
Report Virtual Assistants (2) 2
21 pages
mst124 Cag E2i1 Sup061263 PDF
100% (1)
mst124 Cag E2i1 Sup061263 PDF
168 pages
ABYS Virtual Assistance
No ratings yet
ABYS Virtual Assistance
6 pages
A Mobile Learning
No ratings yet
A Mobile Learning
30 pages
Rajveer Project File
No ratings yet
Rajveer Project File
43 pages
Assistive Technology
No ratings yet
Assistive Technology
5 pages
PDF To Voice by Using Deep Learning
No ratings yet
PDF To Voice by Using Deep Learning
5 pages
Chapter 1,2,3
No ratings yet
Chapter 1,2,3
16 pages
45 - JAVA - Text To Speech
No ratings yet
45 - JAVA - Text To Speech
6 pages
Text To Speech Converter Using Javascript: Madhav Institute of Technology & Science Gwalior
No ratings yet
Text To Speech Converter Using Javascript: Madhav Institute of Technology & Science Gwalior
3 pages
TTL-Module 5
No ratings yet
TTL-Module 5
5 pages
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
No ratings yet
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
6 pages
Assistive Technologies For People With Visual Impairment
No ratings yet
Assistive Technologies For People With Visual Impairment
6 pages
Do Mobile Technologies Baron
No ratings yet
Do Mobile Technologies Baron
7 pages
Week 9 - Beyond Text (Audiobooks)
No ratings yet
Week 9 - Beyond Text (Audiobooks)
6 pages
Synopsis
No ratings yet
Synopsis
6 pages
Promoting Engagement and Motivation For Distance Learners Through Podcasting
No ratings yet
Promoting Engagement and Motivation For Distance Learners Through Podcasting
7 pages
Daisy 3: A Standard For Accessible Multimedia Books: Accessibility and Assistive Technologies
No ratings yet
Daisy 3: A Standard For Accessible Multimedia Books: Accessibility and Assistive Technologies
10 pages
Assistive Technology: Voice Recognition, Text Readers and Sound Recording
No ratings yet
Assistive Technology: Voice Recognition, Text Readers and Sound Recording
6 pages
Jarvis Voice Assistant For PC
No ratings yet
Jarvis Voice Assistant For PC
10 pages
Week 9 - Chronicle - Hanem Ibrahim
No ratings yet
Week 9 - Chronicle - Hanem Ibrahim
2 pages
Review On Digital Library of Audio Books
No ratings yet
Review On Digital Library of Audio Books
3 pages
Design and Implementation of Text To Speech Application For Vision Impaired Students
100% (2)
Design and Implementation of Text To Speech Application For Vision Impaired Students
15 pages
Calcutta University Computer Sc. Syllabus
75% (8)
Calcutta University Computer Sc. Syllabus
28 pages
Computer Based Automatic Speech Processing: Pham Van Tuan
No ratings yet
Computer Based Automatic Speech Processing: Pham Van Tuan
70 pages
AI Chatbot: Green University of Bangladesh
100% (2)
AI Chatbot: Green University of Bangladesh
20 pages
Synopsis
No ratings yet
Synopsis
11 pages
Ijisr 15 139 02 PDF
No ratings yet
Ijisr 15 139 02 PDF
7 pages
CHAPTER ONEpdfCORRECTION To Audio
No ratings yet
CHAPTER ONEpdfCORRECTION To Audio
9 pages
IJEDR2202022
No ratings yet
IJEDR2202022
7 pages
Caie Igcse: Updated To 2023-2025 Syllabus
No ratings yet
Caie Igcse: Updated To 2023-2025 Syllabus
52 pages
JARVIS A PC Voice Assistant
No ratings yet
JARVIS A PC Voice Assistant
9 pages
Shoe Store
50% (2)
Shoe Store
41 pages
Text To Speech
No ratings yet
Text To Speech
5 pages
Speech Recognition, Digitization, Generation
100% (6)
Speech Recognition, Digitization, Generation
12 pages
Famous People in Computer History
0% (1)
Famous People in Computer History
6 pages
Security Assessment and Testing
No ratings yet
Security Assessment and Testing
43 pages
Game Report Documentation
67% (3)
Game Report Documentation
14 pages
Scripting Tracker: Hint: If The SAP
No ratings yet
Scripting Tracker: Hint: If The SAP
140 pages
Sbau 171 A
No ratings yet
Sbau 171 A
73 pages
SPSS 12.0 Developer's Guide
No ratings yet
SPSS 12.0 Developer's Guide
249 pages
Module 1-Introduction To HCI
No ratings yet
Module 1-Introduction To HCI
77 pages
Problem Statement: 1. Issue of Books
No ratings yet
Problem Statement: 1. Issue of Books
82 pages
Automated Gate Pass System For Leads City University
No ratings yet
Automated Gate Pass System For Leads City University
13 pages
Desktop Applications Notes
No ratings yet
Desktop Applications Notes
2 pages
Common Language Runtime (CLR)
No ratings yet
Common Language Runtime (CLR)
14 pages
Yearly Lesson Plan For ICTL Form 1: SMK Baling, 09100 Baling
No ratings yet
Yearly Lesson Plan For ICTL Form 1: SMK Baling, 09100 Baling
8 pages
Msbte w22 22617mad
No ratings yet
Msbte w22 22617mad
42 pages
Datasheet of DS A81024S - 240 Single Controller Storage - V2.3.9 - 20230718
No ratings yet
Datasheet of DS A81024S - 240 Single Controller Storage - V2.3.9 - 20230718
5 pages
Supported Upgrade Paths For FortiOS 5.2
No ratings yet
Supported Upgrade Paths For FortiOS 5.2
21 pages
History of Computing
No ratings yet
History of Computing
7 pages
BlackCat USB Script Functions
No ratings yet
BlackCat USB Script Functions
9 pages
CMP 101 - Number System - Note
No ratings yet
CMP 101 - Number System - Note
11 pages
Hysplit Tutorial 2013
No ratings yet
Hysplit Tutorial 2013
132 pages
Module 1 Unit 1 - Computers and Operating System PDF
No ratings yet
Module 1 Unit 1 - Computers and Operating System PDF
12 pages
CASE STUDY: School Learning Management System (SLMS)
No ratings yet
CASE STUDY: School Learning Management System (SLMS)
11 pages
Objective of Java Projects On News Publishing Portal System
No ratings yet
Objective of Java Projects On News Publishing Portal System
9 pages
Ignite Net Recovery
No ratings yet
Ignite Net Recovery
30 pages
Undelusively Reekers Black-Clad
No ratings yet
Undelusively Reekers Black-Clad
2 pages
SDA Quiz - 02
No ratings yet
SDA Quiz - 02
2 pages
Voice Application Development for Android
From Everand
Voice Application Development for Android
Michael F. McTear
1/5 (1)
Voice Content and Usability
From Everand
Voice Content and Usability
Preston So
No ratings yet
Audio Podcasting for Teaching (Part 1)
From Everand
Audio Podcasting for Teaching (Part 1)
Donna Eyestone
4/5 (1)
Listen To What You Wrote! Text-To-Speech for Writers and Others
From Everand
Listen To What You Wrote! Text-To-Speech for Writers and Others
Mitch Sexton
No ratings yet
Triple A Dude Audio Production Resource Guide : 400 + Top Resources For Musicians, Producers, And Multimedia Artists To Maximize Productivity And Creativity
From Everand
Triple A Dude Audio Production Resource Guide : 400 + Top Resources For Musicians, Producers, And Multimedia Artists To Maximize Productivity And Creativity
Jon LeMond
No ratings yet
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 1,2,3

Uploaded by

Chapter 1,2,3

Uploaded by

A Research paper presented to the faculty of

THE PROBLEM AND THE SETTING

great selection of high quality, human sounding voices.

with a catchy song or rhythm (Beentjes, J.W.J. et all 1996).

An electronic book (also referred to as an “E-book”) is an electronic version of a

PCs or handheld computers, E-book readers deliver a reading experience comparable to

In order to increase the naturalness of oral communications between humans and

the sound of a person’s voice, is a key factor in oral communications.

Background of the Study

assimilate content without having to look at a display.

devices and, more particularly, to mixing music and text-to-speech (TTS).

Significance of the Study

intonation of sentences by listening to converted Audiobook, especially in pronunciation

MPA, ASF, MPEG, MPG, and M1V.

conveniently reading experience.

Time and place

located at Cavite State University Carmona – Campus.

Scope and Limitations

music with the help of the slider.

The following terms as used by the researchers are operationally defined:

(CDDA) format or in an MP3, AAC or other compressed format. See codec

examples, file and sampling.

and meow; and adult humans laugh, sing, and cry.

formats, including MP3, AAC, WAV, and AIFF.

a palmtop, hand-held computer or pocket computer.

identified by a file name extension of WAV (rarely, Audio for Windows).

or making your online/offline content more accessible with text to speech.

be confused with voice response systems.

REVIEW OF RELATED LITERATURE

outputs the Speech and the music.

According to Clark Quinn, professor, author, and expert in computer-based education,

Wherever one looks, evidence of mobile penetrations is irrefutable: cellphones, PDA's

According to Lorenzi (2007) proposes a way of representing the similarity

between tracks in a 10‐dimensional Euclidian space (further called music

Klusacek [59] proposed a conditional pronunciation modeling method. It uses time-

system uses phonemes drawn from a lexicon of pronunciations of words recognized by an

decrease in performance [47] [38].

Current speaker recognition researches mainly focus on recognition under controlled

conditions in open environment with distant microphones.

application using which the user can do the following things.

 Read the Documents by just Listening.

 Converts EBooks files to Audiobook file

 Change the GUI Color Scheme.

 Change the Music background.

 Change the reader voice personality,

 Search for some content in the document using keywords.

 Auto flag document pages and sections

 Read .PDF , .DOCX , .TXT files from google Drive

 Share the content of a book on a Facebook wall.

 Set an alarm as a remainder to read a particular book in the future.

SOFTWARE DEVELOPMENT MODEL: (WATER FALL MODEL)

interplay between phases [12, 24] (Figure 1).

that have to be fulfilled by the developed software Application [12]

aspects of the design of the system [12].

consultation with the customers [12]

You might also like

According to Lorenzi  (2007)  proposes  a  way  of  representing  the  similarity

  between  tracks  in  a  10‐dimensional  Euclidian  space  (further  called  music