# 1. Prof Tatsuya Kawahara ### Kyoto University ### Topic : Making a Robot to Communicate with Social Signals ### Abstract Chatbots or dialogue systems have been improved impressively thanks to large language models, but it is not straightforward to implement them into a communicative robot to realize natural spoken dialogue. Among major problems are smooth turn-taking and real-time response without latency. Another important feature is social signals, which provide feedback in a non-lexical manner. It is desirable for a robot to detect them from human speech and also generate them by themselves. We have explored for a human-like social robot ERICA in this direction. Specifically, we have investigated generation of backchannels in appropriate timing, form and prosody, and demonstrated it significantly improves the dialogue experience. We also explore generation of shared laughter, which shows empathy to the dialogue partner. These studies confirm the social signals are fundamental in human communications. ### Biography Tatsuya Kawahara received B.E. in 1987, M.E. in 1989, and Ph.D. in 1995, all in information science, from Kyoto University, Kyoto, Japan. From 1995 to 1996, he was a Visiting Researcher at Bell Laboratories, Murray Hill, NJ, USA. Currently, he is a Professor and the Dean of the School of Informatics, Kyoto University. He has published more than 450 academic papers on speech recognition, spoken language processing, and spoken dialogue systems. He has been conducting several projects including speech recognition software Julius, the automatic transcription system deployed in the Japanese Parliament (Diet), and the autonomous android ERICA. From 2003 to 2006, he was a member of IEEE SPS Speech Technical Committee. He was a General Chair of IEEE ASRU 2007. He also served as a Tutorial Chair of INTERSPEECH 2010, a Local Arrangement Chair of ICASSP 2012, and a General Chair of APSIPA ASC 2020. He was an editorial board member of Elsevier Journal of Computer Speech and Language and IEEE/ACM Transactions on Audio, Speech, and Language Processing. From 2018 to 2021, he was the Editor-in-Chief of APSIPA Transactions on Signal and Information Processing. Dr. Kawahara is the President of APSIPA, a board member of ISCA, and a Fellow of IEEE. # 2. Prof. Kosin Chamnongthai ## Professor at Electronic and Telecommunication Engineering Department, Faculty of Engineering, King Mongkut's University of Technology Thonburi (KMUTT) ## Topic: Visual Sensors in Digital Transformation Era ## Abstract The talk starts from reviewing and discussing what happen in the digital transformation era, and then focus on development case studies of visual sensor. One of visual sensor developments is 3D point-of-interest (POI), which plays important basic roles for many applications such as intention reading for stroke patients, customer intention estimation, worker intention estimation for collaborate robots, game entertainment, and so on. This talk introduces three methods of 3D POI estimation using eye gazes, multimodal fusion between hand pointing and eye gaze, and multimodal fusion between hand pointing, eye gaze, and depth information for collaborate robots. Based on the nature of human head which is assumed to always move, the 1st method using eye gaze rays finds a 3D POI using a crossing point between a couple of consecutive eye gaze rays detected by an eye tracker, while the 2nd method using multimodal of hand pointing and eye gaze estimates the 3D POI based on a crossing point in the space of interest (SOI) between eye gaze and hand pointing rays in which eye gaze and hand pointing are detected by an eye tracker and a Leap Motion sensor, respectively. On the other hand, the last method assumes a working status between human workers and collaborate robots in a working site where a work piece in the working site is assumed to become an obstacle and interfere the 3D POI estimation. To solve this problem, a depth sensor system is mathematically designed to be added in the 3D POI estimation system to sense all needed views which make the system possible to reconstruct 3D shape of all objects existing in the working site. Then, the 3D POI is determined based on a volume of interest (VOI) which is the 3D crossing space created by eye gaze, hand pointing, and a 3D object reconstructed by depth information. These proposed three methods of 3D POI estimation would be introduced and discussed in their pros and cons in many applications in the talk. ## Biography: Kosin Chamnongthai currently works as professor at Electronic and Telecommunication Engineering Department, Faculty of Engineering, King Mongkut's University of Technology Thonburi (KMUTT) and serves as vice president (conference) of APSIPA Association. He served as president of ECTI Association (2018-2019), editor of ECTI e-magazine during 2011-2015, associate editor of ECTI-CIT Trans during 2011-2016, associate editor of ECTI-EEC Trans during 2003-2010, associate editor of ELEX (IEICE Trans) during 2008-2010, and chairman of IEEE COMSOC Thailand during 2004-2007. He has received B.Eng. in Applied Electronics from the University of Electro-communications, Tokyo, Japan in 1985, M.Eng. in Electrical Engineering from Nippon Institute of Technology, Saitama, Japan in 1987, and Ph.D. in Electrical Engineering from Keio University, Tokyo, Japan in 1991. His research interests include computer vision, image processing, robot vision, signal processing, and pattern recognition. He is a senior member of IEEE, and a member of IEICE, TESA, ECTI, AIAT, APSIPA, TRS, and EEAAT. # 3. Prof Lin Weisi ### Nanyang Technological University (NTU), Singapore ## Title : Representation & Applications of 3D Point Clouds ## Abstract 3D Point clouds (PCs) are now available for diversified applications in our work and life, with creation of digital twins for almost everything in the physical world, enabled by rapid development of depth sensing and photogrammetry; they offer unprecedented possibilities in digital transformation and smart cities, ranging from a single object (like a desk or statue) to an entire city, toward BIM/VR/AR/metaverse, smart manufacturing, urban surveillance/planning, cultural heritage preservation, crime investigation, robot navigation, autonomous driving, and medical/biological sciences. This talk will present the recent research and development for PC representation, evaluation and utilities. Related important topics include various filtering and processing, simplification, compression, shape/mesh construction, and image-based localization. More exciting possibilities are expected because PCs provide a bridge between computer vison and computer graphics which had been two largely separated domains for long and facilitate comprehensive multimedia interaction; Possible future research directions will be therefore highlighted and discussed as well. ## Biography Weisi Lin is an active researcher in image processing, perception-based signal modelling and assessment, video compression, and multimedia communication. He had been the Lab Head, Visual Processing, Institute for Infocomm Research (I2R), Singapore. He is currently a Professor in School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore, where he also serves as the Associate Chair (Research). He is a Fellow of IEEE and IET. He has been awarded Highly Cited Researcher 2019, 2020, 2021 and 2022 by Clarivate Analytics, and elected for the Research Award 2023, College of Engineering, NTU. He has been a Distinguished Lecturer in both IEEE Circuits and Systems Society (2016-17) and Asia-Pacific Signal and Information Processing Association (2012-13). He has been an Associate Editor for IEEE Trans. Neural Networks Learn. Syst., IEEE Trans. Image Process., IEEE Trans. Circuits Syst. Video Technol., IEEE Trans. Multim., IEEE Sig. Process. Lett., Quality and User Experience, and J. Visual Commun. Image Represent. He has been a TP Chair for several international conferences and is a General Co-Chair for IEEE ICME 2025. He believes that good theory is practical and has delivered 10+ major systems for industrial deployment with the technology developed. # 4. Prof. Toshihisa Tanaka ### Department of Electrical and Electronic Engineering, at the Tokyo University of Agriculture and Technology ## Title : AI-based diagnostic-aid for epileptic electroencephalography ## Abstract This talk addresses “AI-based diagnosis and treatment-aid” system that learn electroencephalograms (EEG) of epilepsy patients and diagnoses by clinicians. Japan has about one million patients of epilepsy, but there are very limited numbers (about 800) of clinicians who can read properly and interpret EEG of patients. Epileptic seizures sometimes cause serious traffic accidents, and thus it is highly desirable to take a social measure. We established AI-based automated algorithms that can learn the diagnosis by medical specialists in epilepsy for EEG data measured in hospitals. I will report techniques to construct a dataset of EEG and to learn the interpretation of the data. ## Biography CV: https://tanaka.sip.tuat.ac.jp/cv # 5. Assoc. Prof Sakriani Sakti ### Japan Advanced Institute of Science and Technology (JAIST) Japan ## Title: A Machine Speech Chain Approach for Self-Adaptive Lombard TTS in Noisy Environments ## Abstract The development of text-to-speech synthesis (TTS) has enabled computers to learn how to speak, imitating the capability of human speech production. Recent neural TTS systems have also successfully synthesized high-quality speech. However, TTS speech intelligibility degrades in noisy environments. Furthermore, TTS has no ability to grasp the situation as TTS still cannot hear their own voice. A common approach for TTS in handling noisy conditions is offline fine-tuning, which is generally utilized on static noises and predefined conditions. Humans, on the other hand, never perform offline fine-tuning. Instead, they speak with the Lombard effect in noisy places, where they dynamically adjust their vocal effort to improve the audibility of their speech. This ability is supported by the speech chain mechanism, which involves auditory feedback passing from speech perception to speech production. In this talk, I will present our alternative approach to TTS in noisy environments that is closer to the human Lombard effect. Specifically, we implement self-adaptive Lombard TTS in a machine speech chain framework that enables TTS to control its voices statically and dynamically in noisy environments based on auditory feedback from ASR and speech-to-noise ratio (SNR). ## Biography Sakriani Sakti is currently an associate professor at Japan Advanced Institute of Science and Technology (JAIST) Japan, adjunct associate professor at Nara Institute of Science and Technology (NAIST) Japan, visiting research scientist at RIKEN Center for Advanced Intelligent Project (RIKEN AIP) Japan, and adjunct professor at the University of Indonesia. She received her B.E. degree in Informatics (cum laude) from ITB and her MSc & Ph.D. in Communication Technology from the University of Ulm, Germany. She was actively involved in international collaboration activities such as Asian Pacific Telecommunity Project (2003-2007) and various speech-to-speech translation research projects, including A-STAR and U-STAR (2006-2011). She also served as a visiting scientific researcher of INRIA Paris-Rocquencourt, France, in 2015-2016, under JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation. Furthermore, she is currently a committee member of the IEEE SLTC and an associate editor of the IEEE/ACM TASLP, Frontiers in Language Sciences, and IEICE. She is also an ELRA Board member. She was involved in creating joint ELRA/ISCA SIG on Under-resourced Languages (SIGUL) and currently serves as SIGUL Chair. # 6. Assist Prof. Candy Olivia Mawalim ### School of Information Science from the Japan Advanced Institute of Science and Technology (JAIST) ## Topic: Speech signal processing for privacy and security protection ## Abstract: Speech signal processing plays a vital role in various applications, ranging from communication systems to voice assistants and speech analytics. However, the increasing reliance on speech data causes the rise of privacy concerns and security threats. The talk begins by providing an overview of speech signal processing for secure speech communication. Next, we will introduce various techniques and approaches used to address security and privacy concerns. Finally, some ongoing research projects and challenges on the relevance of security and privacy protection will be discussed. ## Biography Candy Olivia Mawalim received her B.S. in Computer Science from Institut Teknologi Bandung (ITB), Bandung, Indonesia. She received M.S. and Ph.D. in School of Information Science from the Japan Advanced Institute of Science and Technology (JAIST) in 2019 and 2022, respectively. She was selected as a research fellow for young scientists DC1 (JSPS) in FY2020-2022. Her main research interests are speech signal processing, hearing perception, voice privacy, and machine learning. Since April 2022, she worked as an assistant professor at social signal interaction group, School of Information Science, JAIST.