Day 1 (10 august)

8.30 am Registration

9.30 am KEYNOTE 1: Jennifer Green (45 min presentation + 15 min questions)

Multimodal diversity: Indigenous narrative practices from the Central Australian deserts

The verbal arts are often viewed as amongst the most treasured aspects of a culture’s linguistic and musical achievements. Yet understanding why particular forms emerge, and unravelling their multimodal complexities, remain challenges for understanding human language. The deserts of Central Australia are home to some unique Indigenous narrative practices. In sand stories, a traditional form perfected by women and girls, the soft sand provides a palette for dynamic inscriptions that are accompanied by speech, gesture, and sometimes sign and song. A rich appreciation of the aesthetic dimensions of such verbal art forms is achieved if the communicative signal is not reduced, and if the contributions that one modality makes to meaning are not favoured at the expense of others. In this presentation I argue for a broad approach to language. I bring examples from on-going research on multimodal narrative practices and alternate sign languages from Central and Northern Australia to illustrate the processes, tools, and techniques involved in their documentation and analysis.

10.30 am Coffee break (30 min)

11.0 am Presenter introductions (30 min)

11.30 am   Modality, stress and atypical processing
Do visual cues to interrogativity vary between language modalities? Evidence from spoken Portuguese and Portuguese Sign Language Additional media files P4 wh_Q P3 wh_Q P3 yes-no Q (20 min + 10 min)
Do gestures during training facilitate L2 lexical stress acquisition by Dutch learners of Spanish? (20 min + 10 min)
Auditory-visual speech perception in bipolar disorder: behavioural data and physiological predictions (20 min + 10 min)

1.00 pm Lunch and a Short history of AVSP

2.00 pm   Emotion 1
Multi-Modal Speech Emotion Recognition Using Speech Embeddings And Audio Features (20min + 10 min)
Learning Salient Features for Multimodal Emotion Recognition with Recurrent Neural Networks and Attention Based Fusion (20 min + 10 min)
The development of eye gaze patterns during audiovisual perception of affective and phonetic information (20 min + 10 min)

3.30 pm Coffee break (30 min)

4.00 pm   Emotion 2
Auditory and Visual Emotion Recognition: Investigating why some portrayals are better recognized than others (20 min + 10 min)
Unbalanced visuo-auditory interactions for gender and emotions processing (20 min + 10 min)

5.00 pm Speech reading contest – drinks
7.00 pm Dinner (University Cafe, 257 Lygon Street, Carlton, 3053)

Day 2 (11 August)

9.30 am KEYNOTE 2: Denis Burnham (45 min + 15 min)

Auditory-Visual Speech Perception: Ubiquity and Utility

It is now well-established that speech perception is multimodal; perceivers use both auditory and visual speech information whenever it is available in noisy and even clear auditory contexts. Here, the case for such ubiquity of auditory-visual speech perception is strengthened by presentation of research on (i) the use of visual information in the head and face not only for the perception of segments (consonants and vowels), but also the perception of lexical tones; (ii) infant perception and perceptual reorganisation for consonants, vowels and tones, in both auditory-only and auditory-visual contexts; and (iii) the language-general phonetic basis for auditory-visual speech perception despite language-specific phonological differences. The case for the utility of auditory-visual speech information will be demonstrated by presentation of research on (i) the specific locus of differential auditory-visual speech perception development; and (ii) the function of auditory-visual speech perception in perceptual reorganisation and learning to read.

10.30 am Coffee break (30 min)

11.00 am Children/infants
Auditory-Visual Speech Segmentation in Infants (20 min + 10 min)
Audiovisual benefits for speech processing speed among children with hearing loss (20 min + 10 min)
Four-Year-Olds’ Cortical Tracking to Continuous Auditory-Visual Speech (20 min + 10 min)

12.30 pm Jonas Beskow demonstrates Furhat (30 min)

1.00 pm lunch (60 min)

2.00 pm   Visual speech processing
Neural processing of degraded speech using speaker’s mouth movement (20 min + 10 min)
Auditory-Visual Integration During the Attentional Blink (20 min + 10 min)
Visual Correlates of Thai Lexical Tone Production: Motion of the Head, Eyebrows and Larynx? (20 min + 10 min)

3.30 pm Coffee break (30 min)

4.00 pm   Artificial agents/smart devices
Embodied Conversational Agents and Interactive Virtual Humans for Training Simulators (20 min + 10 min)
Audio-visual synthesized attitudes presented by the German speaking robot SMiRAE (20 min + 10 min)
LiP25w: Word-level Lip Reading Web Application for Smart Device (20 min + 10 min)