Date Approved

4-2018

Graduate Degree Type

Thesis

Degree Name

Engineering (M.S.E.)

Degree Program

School of Engineering

First Advisor

Nicholas Baine

Second Advisor

Samhita Rhodes

Third Advisor

Robert Bossemeyer

Academic Year

2017/2018

Abstract

Speech recognition is a very useful technology because of its potential to develop applications, which are suitable for various needs of users. This research is an attempt to enhance the performance of a speech recognition system by combining the visual features (lip movement) with audio features. The results were calculated using utterances of numerals collected from participants inclusive of both male and female genders. Discrete Cosine Transform (DCT) coefficients were used for computing visual features and Mel Frequency Cepstral Coefficients (MFCC) were used for computing audio features. The classification was then carried out using Support Vector Machine (SVM). The results obtained from the combined/fused system were compared with the recognition rates of two standalone systems (Audio only and visual only).

ScholarWorks Citation

Acharya, Vikrant Satish, "Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System" (2018). Masters Theses. 884.
https://scholarworks.gvsu.edu/theses/884

Download

Included in

Electrical and Electronics Commons

COinS

ScholarWorks@GVSU

Masters Theses

Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

Date Approved

Graduate Degree Type

Degree Name

Degree Program

First Advisor

Second Advisor

Third Advisor

Academic Year

Abstract

ScholarWorks Citation

Included in

Browse

Author Information

Links

ScholarWorks@GVSU

Masters Theses

Fusion of Audio and Visual Information for Implementing Improved Speech Recognition System

Author

Date Approved

Graduate Degree Type

Degree Name

Degree Program

First Advisor

Second Advisor

Third Advisor

Academic Year

Abstract

ScholarWorks Citation

Included in

Share

Browse

Author Information

Links