Publications By Type

Gerasimos Potamianos has joined the Institute of Informatics and Telecommunications at the National Center for Scientific Research (NCSR), "Demokritos", in Athens, Greece, as a Research Director.

He can be contacted at gpotam@ieee.org.

Updated information can be found at http://www.iit.demokritos.gr/~gpotam

This web page is no longer being maintained.




Gerasimos (Makis) Potamianos

Manager, Multimodal Conversational Solutions Department
Human Language Technologies / Multilingual Analytics and User Technologies, IBM T.J. Watson Research Center

Gerasimos (Makis) Potamianos

CONTACT INFO
RESUME
ONGOING PROJECTS AND RESEARCH
SHORT BIO | PUBLICATIONS BY TOPIC

JOURNAL ARTICLES

  1. D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S.M. Chu, A. Tyagi, J.R. Casas, J. Turmo, L. Christoforetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, and C. Rochet, The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms, To Appear: Journal of Language Resources and Evaluation, 2008.
  2. Z. Zhang, G. Potamianos, A.W. Senior, and T.S. Huang, Joint face and head tracking inside multi-camera smart rooms, Signal, Image and Video Processing, vol. 1, pp. 163-178, 2007.
  3. J. Huang, G. Potamianos, J. Connell, and C. Neti. Audio-visual speech recognition using an infrared headset, Speech Communication, vol. 44, no. 4, pp. 83-96, 2004.
  4. G. Potamianos, C. Neti, G. Gravier, A. Garg, and A.W. Senior, Recent advances in the automatic recognition of audio-visual speech, Invited, Proceedings of the IEEE, vol. 91, no. 9, pp. 1306-1326, 2003.
  5. G. Potamianos, C. Neti, G. Iyengar, A.W. Senior, and A. Verma, A cascade visual front end for speaker independent automatic speechreading, Int. J. Speech Technology, vol. 4, pp. 193-208, 2001.
  6. G. Potamianos and F. Jelinek, A study of n-gram and decision tree letter language modeling methods, Speech Communication, vol. 24, no. 3, pp. 171-192, 1998.
  7. G. Potamianos and J. Goutsias, Stochastic approximation algorithms for partition function estimation of Gibbs random fields, IEEE Transactions on Information Theory, vol. 43, no. 6, pp. 1948-1965, 1997.
  8. G. Potamianos and J. Goutsias, Partition function estimation of Gibbs random field images using Monte Carlo simulations, IEEE Transactions on Information Theory, vol. 39, no. 4, pp. 1322-1332, 1993.

BOOK CHAPTERS (APPEARED / IN PRESS)

  1. G. Potamianos, C. Neti, J. Luettin, and I. Matthews, Audio-Visual Automatic Speech Recognition: An Overview, To Appear: Audio-Visual Speech Processing, E. Vatikiotis-Bateson, G. Bailly, and P. Perrier (Eds.), MIT Press, ISBN: 0-26-222078-4, 2008.
  2. G. Potamianos, Audio-Visual Speech Recognition, Short Article, Encyclopedia of Language and Linguistics, Second Edition, (Speech Technology Section - Computer Understanding of Speech), K. Brown (Ed. In Chief), Elsevier, Oxford, United Kingdom, ISBN: 0-08-044299-4, 2006.
  3. P.S. Aleksic, G. Potamianos, and A.K. Katsaggelos, Exploiting Visual Information in Automatic Speech Processing, In: Handbook of Image and Video Processing, Second Edition, Al. Bovic (Ed.), ch. 10.8, pp. 1263-1289, Elsevier Academic Press, Burlington, MA, ISBN: 0-12-119792-1, 2005.

BOOK CHAPTERS (SUBMITTED)

  1. P. Lucey, G. Potamianos, and S. Sridharan, Visual Speech Recognition Across Multiple Views, Submitted to: Visual Speech Recognition: Lip Segmentation and Mapping, A. Wee-Chung Liew and S. Wang (Eds.), Information Science Publishing Press, 2008.
  2. G. Potamianos, L. Lamel, M. Wolfel, J. Huang, E. Marcheret, C. Barras, J. McDonough, J. Hernando, D. Macho, and C. Nadeu, Automatic Speech Recognition in CHIL, Submitted to: Computers in the Human Interaction Loop, A. Waibel and R. Stieflhagen (Eds.), Springer, 2008.
  3. K. Bernardin, R. Stiefelhagen, A. Pmevmatikakis, O. Lanz, A. Brutti, J. Casas, and G. Potamianos, Joint Person Tracking in CHIL, Submitted to: Computers in the Human Interaction Loop, A. Waibel and R. Stieflhagen (Eds.), Springer, 2008.

CONFERENCE ARTICLES

  1. A. Tyagi, J.W. Davis, and G. Potamianos, Steepest descent for efficient covariance tracking, To Appear in: Proc. IEEE Work. Motion and Video Computing (WMVC), Copper Mountain, Colorado, 2008.
  2. V. Libal, J. Connell, G. Potamianos, and E. Marcheret, An embedded system for in-vehicle visual speech activity detection, Int. Work. Multimedia Signal Process. (MMSP), pp. 255-258, Chania, Greece, 2007.
  3. P. Lucey, G. Potamianos, and S. Sridharan, A unified approach to multi-pose audio-visual ASR, Proc. Conf. Int. Speech Comm. Assoc. (Interspeech), pp. 650-653, Antwerp, Belgium, 2007.
  4. J. Huang, E. Marcheret, K. Visweswariah, V. Libal, and G. Potamianos, Detection, diarization, and transcription of far-field lecture speech, Proc. Conf. Int. Speech Comm. Assoc. (Interspeech), pp. 2161-2164, Antwerp, Belgium, 2007.
  5. J. Huang, E. Marcheret, K. Visweswariah, and G. Potamianos, The IBM RT07 evaluation systems for speaker diarization on lecture meetings, (To Appear) Proc. Rich Transcription Evaluation Work. (RT), Baltimore, Maryland, 2007.
  6. J. Huang, E. Marcheret, K. Visweswariah, V. Libal, and G. Potamianos, The IBM Rich Transcription Spring 2007 speech-to-text systems for lecture meetings, (To Appear) Proc. Rich Transcription Evaluation Work. (RT), Baltimore, Maryland, 2007.
  7. P. Lucey, G. Potamianos, and S. Sridharan, An extended pose-invariant lipreading system, Proc. Work. Audio-Visual Speech Process. (AVSP), pp. 176-180, Hilvarenbeek, The Netherlands, 2007.
  8. A. Tyagi, M. Keck, J.W. Davis, and G. Potamianos, Kernel-based 3D tracking, Proc. IEEE Int. Work. Visual Surveillance (VS/CVPR)}, Minneapolis, Minnesota, 2007.
  9. A. Tyagi, G. Potamianos, J.W. Davis, and S.M. Chu, Fusion of multiple camera views for kernel-based 3D tracking, Proc. IEEE Works. Motion and Video Computing (WMVC), Austin, Texas, 2007.
  10. E. Marcheret, V. Libal, and G. Potamianos, Dynamic stream weight modeling for audio-visual speech recognition, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 4, pp. 945-948, Honolulu, HI, 2007.
  11. G. Potamianos and P. Lucey, Audio-visual ASR from multiple views inside smart rooms, Proc. Int. Conf. Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 35-40, Heidelberg, Germany, 2006.
  12. Z. Zhang, G. Potamianos, S.M. Chu, J. Tu, and T.S. Huang, Person tracking in smart rooms using dynamic programming and adaptive subspace learning, Proc. Int. Conf. Multimedia Expo. (ICME), pp. 2061-2064, Toronto, Canada, 2006.
  13. P. Lucey and G. Potamianos, Lipreading using profile versus frontal views, Proc. Works. Multimedia Signal Process. (MMSP), pp. 24-28, Victoria, Canada, 2006.
  14. A.W. Senior, G. Potamianos, S. Chu, Z. Zhang, and A. Hampapur, A comparison of multicamera person-tracking algorithms, Proc. IEEE Int. Works. Visual Surveillance (VS/ECCV), Graz, Austria, 2006.
  15. G. Potamianos and Z. Zhang, A joint system for single-person 2D-face and 3D-head tracking in CHIL seminars, Proc. CLEAR Evaluation Works., LNCS vol. 4122, Southampton, United Kingdom, 2006.
  16. Z. Zhang, G. Potamianos, M. Liu, and T. Huang, Robust multi-view multi-camera face detection inside smart rooms using spatio-temporal dynamic programming, Proc. Int. Conf. Automatic Face and Gesture Recog. (FGR), Southampton, United Kingdom, 2006.
  17. E. Marcheret, G. Potamianos, K. Visweswariah, and J. Huang, The IBM RT06s evaluation system for speech activity detection in CHIL seminars, Proc. RT06s Evaluation Works. - held with Joint Works. on Multimodal Interaction and Related Machine Learning Algorithms (MLMI), LNCS 4299, pp. 323-335, Washington DC, 2006.
  18. J. Huang, M. Westphal, S. Chen, O. Siohan, D. Povey, V. Libal, A. Soneiro, H. Schulz, T. Ross, and G. Potamianos, The IBM rich transcription spring 2006 speech-to-text system for lecture meetings, Proc. RT06s Evaluation Works. - held with Joint Works. on Multimodal Interaction and Related Machine Learning Algorithms (MLMI), LNCS 4299, pp. 432-443, Washington DC, 2006.
  19. G. Potamianos and P. Scanlon, Exploiting lower face symmetry in appearance-based automatic speechreading, Proc. Works. Audio-Visual Speech Process. (AVSP), pp. 79-84, Vancouver Island, Canada, 2005.
  20. S.M. Chu, E. Marcheret, and G. Potamianos, Automatic speech recognition and speech activity detection in the CHIL smart room, Proc. Joint Works. on Multimodal Interaction and Related Machine Learning Algorithms (MLMI), LNCS vol. 3869, pp. 332-343, Edinburgh, United Kingdom, 2005.
  21. Z. Zhang, G. Potamianos, A. Senior, S. Chu, and T. Huang, A joint system for person tracking and face detection, Proc. Int. Works. Human-Computer Interaction (ICCV 2005 Works. on HCI), pp. 47-59, Beijing, China, 2005.
  22. E. Marcheret, K. Visweswariah, and G. Potamianos, Speech activity detection fusing acoustic phonetic and energy features, Proc. Europ. Conf. Speech Comm. Technol. (Interspeech), pp. 241-244, Lisbon, Portugal, 2005.
  23. J. Jiang, G. Potamianos, and G. Iyengar, Improved face finding in visually challenging environments, Proc. Int. Conf. Multimedia Expo (ICME), Amsterdam, The Netherlands, 2005.
  24. D. Macho, J. Padrell, A. Abad, C. Nadeu, J. Hernando, J. McDonough, M. Wolfel, U. Klee, M. Omologo, A. Brutti, P. Svaizer, G. Potamianos, and S.M. Chu, Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus, Proc. Int. Conf. Multimedia Expo (ICME), Amsterdam, The Netherlands, 2005.
  25. P. Scanlon, G. Potamianos, V. Libal, and S.M. Chu, Mutual information based visual feature selection for lipreading, Proc. Int. Conf. Spoken Lang. Process. (ICSLP), pp. Jeju Island, Korea, 2004.
  26. E. Marcheret, S.M. Chu, V. Goel, and G. Potamianos, Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition, Proc. Int. Conf. Spoken Lang. Process. (ICSLP), Jeju Island, Korea, 2004.
  27. G. Potamianos, C. Neti, J. Huang, J.H. Connell, S. Chu, V. Libal, E. Marcheret, N. Haas, and J. Jiang, Towards practical deployment of audio-visual speech recognition, Invited, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 3, pp. 777-780, Montreal, Canada, 2004.
  28. J. Jiang, G. Potamianos, H. Nock, G. Iyengar, and C. Neti, Improved face and feature finding for audio-visual speech recognition in visually challenging environments, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 5, pp. 873-876, Montreal, Canada, 2004.
  29. S.M. Chu, V. Libal, E. Marcheret, C. Neti, and G. Potamianos, Multistage information fusion for audio-visual speech recognition, Proc. Int. Conf. Multimedia Expo (ICME), Taipei, Taiwan, 2004.
  30. G. Potamianos, C. Neti, and S. Deligne, Joint audio-visual speech processing for recognition and enhancement, Proc. Works. Audio-Visual Speech Process., pp. 95-104, St. Jorioz, France, 2003.
  31. J. Huang, G. Potamianos, and C. Neti, Improving audio-visual speech recognition with an infrared headset, Proc. Works. Audio-Visual Speech Process. (AVSP), pp. 175-178, St. Jorioz, France, 2003.
  32. G. Potamianos and C. Neti, Audio-visual speech recognition in challenging environments, Proc. Eur. Conf. Speech Comm. Tech. (Eurospeech), pp. 1293-1296, Geneva, Switzerland, 2003.
  33. J.H. Connell, N. Haas, E. Marcheret, C. Neti, G. Potamianos, and S. Velipasalar, A real-time prototype for small-vocabulary audio-visual ASR, Proc. Int. Conf. Multimedia Expo (ICME), vol. II, pp. 469-472, Baltimore, MD, 2003.
  34. U.V. Chaudhari, G.N. Ramaswamy, G. Potamianos, and C. Neti, Information fusion and decision cascading for audio-visual speaker recognition based on time varying stream reliability prediction, Proc. Int. Conf. Multimedia Expo (ICME), vol. III, pp. 9-12, Baltimore, MD, July 2003.
  35. A. Garg, G. Potamianos, C. Neti, and T.S. Huang, Frame-dependent multi-stream reliability indicators for audio-visual speech recognition, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. I, pp. 24-27, Hong Kong, China, 2003.
  36. U.V. Chaudhari, G.N. Ramaswamy, G. Potamianos, and C. Neti, Audio-visual speaker recognition using time-varying stream reliability prediction, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. V, pp. 712-715, Hong Kong, China, 2003.
  37. S. Deligne, G. Potamianos, and C. Neti, Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization), Int. Conf. Spoken Lang. Process., pp. 1449-1452, Denver, CO, 2002.
  38. R. Goecke, G. Potamianos, and C. Neti, Noisy audio feature enhancement using audio-visual speech data, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 2025-2028, Orlando, FL, 2002
  39. G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), pp. 853-856, Orlando, FL, 2002.
  40. G. Gravier, G. Potamianos, and C. Neti, Asynchrony modeling for audio-visual speech recognition, Proc. Human Language Technology Conference (HLT), pp. 1-6, San Diego, CA, 2002.
  41. G. Potamianos, C. Neti, G. Iyengar, and E. Helmuth, Large-vocabulary audio-visual speech recognition by machines and humans, Proc. Europ. Conf. Speech Comm. Technol. (Eurospeech), pp. 1027-1030, Aalborg, Denmark, 2001.
  42. G. Potamianos and C. Neti, Automatic speechreading of impaired speech, Proc. Works. Audio-Visual Speech Process. (AVSP), pp. 177-182, Aalborg, Denmark, 2001.
  43. G. Potamianos and C. Neti, Improved ROI and within frame discriminant features for lipreading, Proc. Int. Conf. Image Process. (ICIP), vol. III, pp. 250-253, Thessaloniki, Greece, 2001.
  44. C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop, Proc. Works. Multimedia Signal Process. (MMSP), pp. 619-624, Cannes, France, 2001.
  45. G. Iyengar, G. Potamianos, C. Neti, T. Faruquie, and A. Verma, Robust detection of visual ROI for automatic speechreading, Proc. Works. Multimedia Signal Process. (MMSP), pp. 79-84, Cannes, France, 2001.
  46. I. Matthews, G. Potamianos, C. Neti, and J. Luettin, A comparison of model and transform-based visual features for audio-visual LVCSR, Proc. Int. Conf. Multimedia Expo (ICME), Tokyo, Japan, 2001.
  47. G. Potamianos, J. Luettin, and C. Neti, Hierarchical discriminant features for audio-visual LVCSR, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 1, pp. 165-168, Salt Lake City, UT, 2001.
  48. J. Luettin, G. Potamianos, and C. Neti, Asynchronous stream modeling for large-vocabulary audio-visual speech recognition, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 1, pp. 169-172, Salt Lake City, UT, 2001.
  49. H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, Weighting schemes for audio-visual fusion in speech recognition, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 1, pp. 173-176, Salt Lake City, UT, 2001.
  50. G. Potamianos and C. Neti, Stream confidence estimation for audio-visual speech recognition, Proc. Int. Conf. Spoken Language Process. (ICSLP), vol. III, pp. 746-749, Beijing, China, 2000.
  51. C. Neti, G. Iyengar, G. Potamianos, A. Senior, and B. Maison, Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction, Proc. Int. Conf. Spoken Language Process. (ICSLP), vol. III, pp. 11-14, Beijing, China, 2000.
  52. G. Potamianos, A. Verma, C. Neti, G. Iyengar, and S. Basu, A cascade image transform for speaker independent automatic speechreading, Proc. Int. Conf. Multimedia Expo (ICME), vol. II, pp. 1097-1100, New York, NY, 2000.
  53. E. Cosatto, G. Potamianos, and H.P. Graf, Audio-visual unit selection for the synthesis of photo-realistic talking-heads, Proc. Int. Conf. Multimedia Expo (ICME), vol. II, pp. 619-622, New York, 2000.
  54. G. Potamianos and A. Potamianos, Speaker adaptation for audio-visual automatic speech recognition, Proc. Europ. Speech Comm. Technol. (Eurospeech), vol. 3, pp. 1291-1294, Budapest, Hungary, 1999.
  55. G. Potamianos and H.P. Graf, Linear discriminant analysis for speechreading, Proc. Works. Multimedia Signal Process., pp. 221-226, Los Angeles, CA, 1998.
  56. G. Potamianos, H.P. Graf, and E. Cosatto, An image transform approach for HMM based automatic lipreading, Proc. Int. Conf. Image Process. (ICIP), vol. III, pp. 173-177, Chicago, IL, 1998.
  57. G. Potamianos and H.P. Graf, Discriminative training of HMM stream exponents for audio-visual speech recognition, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 6, pp. 3733-3736, Seattle, WA, 1998.
  58. H.P. Graf, E. Cosatto, and G. Potamianos, Machine vision of faces and facial features, Proc. R.I.E.C. Int. Symp. Design Archit. Inform. Process. Systems Based Brain Inform. Princ., pp. 48-53, Sendai, Japan, 1998.
  59. G. Potamianos, E. Cosatto, H.P. Graf, and D.B. Roe, Speaker independent audio-visual database for bimodal ASR, Proc. Europ. Tutorial Research Work. Audio-Visual Speech Process. (AVSP), pp. 65-68, Rhodes, Greece, 1997.
  60. H.P. Graf, E. Cosatto, and G. Potamianos, Robust recognition of faces and facial features with a multi-modal system, Proc. Int. Conf. Systems Man Cybern. (SMC), pp. 2034-2039, Orlando, FL, 1997.
  61. G. Potamianos, Efficient Monte Carlo estimation of partition function ratios of Markov random field images, Proc. Conf. Inform. Sci. Systems (CISS), vol. II, pp. 1212-1215, Princeton, NJ, 1996.
  62. G. Potamianos and J. Goutsias, A unified approach to Monte Carlo likelihood estimation of Gibbs random field images, Proc. Conf. Inform. Sci. Systems (CISS), vol. I, pp. 84-90, Princeton, NJ, 1994.
  63. G. Potamianos and J. Goutsias, An analysis of Monte Carlo methods for likelihood estimation of Gibbsian images, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. V, pp. 519-522, Minneapolis, MN, 1993.
  64. G. Potamianos and J. Goutsias, On computing the likelihood function of partially observed Markov random field images using Monte Carlo simulations, Proc. Conf. Inform. Sci. Systems (CISS), vol. I, pp. 357-362, Princeton, NJ, 1992.
  65. G. Potamianos and J. Goutsias, A novel method for computing the partition function of Markov random field images using Monte Carlo simulations, Proc. Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 4, pp. 2325-2328, Toronto, Canada, 1991.
  66. G. Potamianos and J. Diamessis, Frequency sampling design of 2-D IIR filters using continued fractions, Proc. Int. Symp. Circuits Systems (ISCAS), vol. 3, pp. 2454-2457, New Orleans, LA, 1990.
  67. J. Diamessis and G. Potamianos, A novel method for designing IIR filters with nonuniform samples, Proc. Conf. Inform. Sci. Systems (CISS), vol. 1, pp. 192-195, Princeton, NJ, 1990.
  68. J. Diamessis and G. Potamianos, Modeling unequally spaced 2-D discrete signals by rational functions, Proc. Int. Symp. Circuits Systems (ISCAS), vol. 2, pp. 1508-1511, Portland, OR, 1989.

PATENTS

  1. J.H. Connell, N. Haas, E. Marcheret, C.V. Neti, and G. Potamianos, Audio-Only Backoff in Audio-Visual Speech Recognition System, Patent No.: US007251603B2, July 31, 2007.
  2. U.V. Chaudhari, C. Neti, G. Potamianos, and G.N. Ramaswamy, Automated Decision Making Using Time-Varying Stream Reliability Prediction, Patent No.: US007228279B2, June 5, 2007.
  3. P. de Cuetos, G.R. Iyengar, C.V. Neti, and G. Potamianos, System and Method for Microphone Activation Using Visual Speech Cues, Patent No.: US006754373B1, June 22, 2004.
  4. E. Cosatto, H.P. Graf, G. Potamianos, and J. Schroeter, Audio-Visual Selection Process for the Synthesis of Photo-Realistic Talking-Head Animations, Patent No.: US006654018B1, Nov. 25, 2003.
  5. E. Cosatto, H.P. Graf, and G. Potamianos, Robust multi-modal method for recognizing objects, Patent No.: US006118887A, Sep. 12, 2000.

Last Update: Jan. 30, 2008