Skip to main content

Trinity College Dublin, The University of Dublin

Menu Search

Trinity College Dublin By using this website you consent to the use of cookies in accordance with the Trinity cookie policy. For more information on cookies see our cookie policy.

Profile Photo

Dr. Joao Paulo Cabral

Research Fellow (Computer Science)


João Cabral is a research fellow at Trinity College Dublin, in the School of Computer Science and Statistics, as part of the ADAPT Centre. He received B.Sc. and M.Sc. degrees from Instituto Superior Técnico (IST), Lisbon, Portugal, in Electrical and Computer Engineering, in 2003 and 2006 respectively. He spent the final year of his B.Sc. at the Royal Institute of Technology (KTH), Sweden, under the programme Socrates-Erasmus, where he started working in speech signal processing funded by the Department of Signals, Sensors and Systems. In his MSc he developed the Pitch-Synchronous Time-Scaling (PSTS) algorithm which permits to transform glottal parameters by manipulating the estimated source signal in the time-domain. PSTS is a great contribution to high-quality voice conversion, e.g. applied to emotions in the EmoVoice system (J.P. Cabral and L. C. Oliveira, 2006) or speech recognition of children's speech (Shweta Ghai and Rohit Sinha, 2011) He was awarded a Ph.D. degree in Computer Science and Informatics from the University of Edinburgh, U.K., in 2010, funded by a European Commission Marie Curie Fellowship, under the Early Stage Research Training (E.S.T) scheme. His Ph.D. thesis contributed with the novel integration of an acoustic glottal source model in HMM-based speech synthesis, for improvement of speech quality and control over voice characteristics. Before joining Trinity College Dublin in 2013, he also worked as a postdoctoral research fellow at the University College Dublin, as part of the CNGL research centre, from 2010.
  Audio Signal Processing   Computer Assisted Language Learning (CALL)   MACHINE LEARNING   SPEECH RECOGNITION   Speech synthesis   statistical parametric speech synthesis   VOICE QUALITY   VOICE SOURCE   voice transformation
 CogSIS - Cognitive Effects of Speech Interface Synthesis
 Production, Perception and Cognition in the interception between speech and singing

Language Skill Reading Skill Writing Skill Speaking
English Fluent Fluent Fluent
French Medium Medium Basic
Portuguese Fluent Fluent Fluent
Spanish Medium Basic Basic
Details Date From Date To
Member of the International Speech Communication Association (ISCA) 2005
Member of the Marie Curie Fellows Association (MCFA) 2006
João P. Cabral, Estimation of the Asymmetry Parameter of the Glottal Flow Waveform Using the Electroglottographic Signal, INTERSPEECH 2018, Hyderabad, India, 2-6 Septmeber, 2018, Conference Paper, ACCEPTED
Beatriz R. de Medeiros and João P. Cabral, Acoustic distinctions between speech and singing: Is singing acoustically more stable than speech?, Speech Prosody, Poznań, Poland, 13-16 June, 2018, pp542 - 546, Conference Paper, ACCEPTED  TARA - Full Text  URL
Leigh Clark, João Cabral, Benjamin Cowan, The CogSIS Project: Examining the Cognitive Effects of Speech Interface Synthesis, British Human Computer Interaction Conference, Belfast, 2-6 July, 2018, Conference Paper, PRESENTED  TARA - Full Text  URL
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schögl, Jens Edlund, Matthew Aylett, Cosmin Munteanu, João P. Cabral, and Benjamin R. Cowan, The State of Speech in HCI: Trends, Themes and Challenges, Interacting with Computers, 2018, Journal Article, IN_PRESS
João P. Cabral, Benjamin R. Cowan, Katja Zibrek, Rachel McDonnell, The Influence of Synthetic Voice on the Evaluation of a Virtual Character, Interspeech 2017, Stockholm, Sweden, 20-24 August, ISCA, 2017, pp229 - 233, Conference Paper, PUBLISHED  TARA - Full Text  DOI  URL
Eva Vanmassenhove, João P. Cabral, Fasih Haider, Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis, 9th ISCA Workshop on Speech Synthesis, Sunnyvale, CA, USA, 13-15 September, 2016, pp22 - 27, Conference Paper, PUBLISHED  TARA - Full Text  URL
João P. Cabral, Christian Saam, Eva Vanmassenhove, Stephen Bradley, Fasih Haider, The ADAPT entry to the Blizzard Challenge 2016, Blizzard Challenge 2016 Workshop, Cupertino, CA, USA, 2016, Conference Paper, PUBLISHED  URL
Séamus Lawless, Peter Lavin, Mostafa Bayomi, João P. Cabral and M. Rami Ghorab, Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations, 20th International Conference on Application of Natural Language to Information Systems (NLDB), Passau, germany, June 17-19, edited by Springer , Springer, 2015, pp307 - 320, Conference Paper, PUBLISHED  DOI  URL
Elias Christy, João P. Cabral and Nick Campbell, Audio features for the Classification of Engagement, Workshop on Engagement in Social Intelligent Virtual Agents, Delft, Netherlands, 25th August 2015, 2015, pp8 - 12, Conference Paper, PUBLISHED  TARA - Full Text  URL
João P. Cabral, Yuyun Huang, Christy Elias, Ketong Su and Nick Campbell, Interface for Monitoring of Engagement from Audio-Visual Cues, The 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, Vienna, Austria, 11-13 Septmeber, ISCA, 2015, Poster, PUBLISHED  URL

Page 1 of 4


Award Date
Awarded a Commercial Case Feasibility Support Grant from Enterprise Ireland 2014
My main research topic is speech processing for improving natural interaction with computer systems. It includes expressive Text-To-Speech synthesis (TTS) and recognition of verbal and non-verbal signals. For example, analysis and generation of social signals and paralinguistic features such as laughter and tone of the voice, which play an important role on engagement of people with interactive systems. I'm also interested in analysis of emotion and affect in speech. I've expertise in analysis and modelling of glottal source parameters which are strongly correlated with voice quality, such as breathy or tense voices. In addition, I also have interest in using modalities for improving the prediction of human cognition states and physical context in real scenarios using machine learning algorithms (from audio, video and biometric sensors data).