Emotionally Evocative Environments for Training
Iyer, K., Valanejad, K., Sadek, R., Miraglia, D., Milam, D.
Army Science Conference 2002
(2002)
Read Abstract »
This paper describes a project currently in progress at the University of Southern California\’s Institute for Creative Technologies (ICT). Much of the research at ICT involves developing better graphics, sound and artificial intelligence to be used in creating the next generation of training tools for the United States Army. Our project focuses on the use of emotional responses as an enhancement for training.
Applying Perceptually Driven Cognitive Mapping to Virtual Urban Environments
Han, C.
14th Innovative Applications of Artificial Intelligence Conference (IAAI 02)
(2002)
Read Abstract »
This paper describes a method for building a cognitive map of a virtual urban environment. Our routines enable virtual humans to map their environment using a realistic model of perception. We based our implementation on a computational framework proposed by Yeap and Jefferies (Yeap & Jefferies 1999) for representing a local environment as a structure called an Absolute Space Representation (ASR). Their algorithms compute and update ASRs from a 2-1/2D 1 sketch of the local environment, and then connect the ASRs together to form a raw cognitive map. Our work extends the framework developed by Yeap and Jefferies in three important ways. First, we implemented the framework in a virtual training environment, the Mission Rehearsal Exercise (Swartout et al. 2001). Second, we describe a method for acquiring a 2- 1/2D sketch in a virtual world, a step omitted from their framework, but which is essential for computing an ASR. Third, we extend the ASR algorithm to map regions that are partially visible through exits of the local space. Together, the implementation of the ASR algorithm along with our extensions will be useful in a wide variety of applications involving virtual humans and agents who need to perceive and reason about spatial concepts in urban environments.
Alternative Model for Sound Signals Encountered in Reverberant Environments
Georgiou, P., Kyriakakis, C.
Audio Engineering Society Convention Paper Presented at the 113th Convention 2002
(2002)
Read Abstract »
In this paper we investigate an alternative to the Gaussian density for modeling signals encountered in audio environments. The observation that sound signals are impulsive in nature, combined with the reverberation effects commonly encountered in audio, motivates the use of the Sub-Gaussian density. The new Sub-Gaussian statistical model and the separable solution of its Maximum Likelihood estimator are derived. These are used in an array scenario to demonstrate with both simulations and two different microphone arrays the achievable performance gains. The simulations exhibit the robustness of the sub-Gaussian based method while the real world experiments reveal a significant performance gain, supporting the claim that the sub-Gaussian model is better suited for sound signals.
Stable Modeling of Noise and Robust Time-Delay Estimation in the Presence of Impulsive Noise
Georgiou, P., Kyriakakis, C.
Audio Engineering Society Convention Paper Presented at the 113th Convention 2002
(2002)
Read Abstract »
A new representation of audio noise signals is proposed, based on symmetric -stable (S S) distributions in order to better model the outliers that exist in real signals. This representation addresses a shortcoming of the Gaussian model, namely, the fact that it is not well suited for describing signals with impulsive behavior. The -stable and Gaussian methods are used to model measured noise signals. It is demonstrated that the -stable distribution, which has heavier tails than the Gaussian distribution, gives a much better approximation to real-world audio signals. The significance of these results is shown by considering the time delay estimation (TDE) problem for source localization in teleimmersion applications. In order to achieve robust sound source localization, a novel time delay estimation approach is proposed. It is based on fractional lower order statistics (FLOS), which mitigate the effects of heavy-tailed noise. An improvement in TDE performance is demonstrated using FLOS that is up to a factor of four better than what can be achieved with second-order statistics.
Emotional Variation in Speech-Based Natural Language Generation
Fleischman, M.
International Natural Language Generation Conference 2002 (INLG'02)
(2002)
Read Abstract »
We present a framework for handling emotional variations in a speech-based natural language system for use in the MRE virtual training environment. The system is a first step toward addressing issues in emotion-based modeling of verbal communicative behavior. We cast the problem of emotional generation as a distance minimization task, in which the system chooses between multiple valid realizations for a given input based on the emotional distance of each realization from the speaker\’s attitude toward that input.
Playing the Game: Autonomous Agents Go Hollywood
Douglas, J.
International Conference on Information Systems, Analysis and Synthesis SCI 2002/ISAS 2002
(2002)
Read Abstract »
The task of creating an interactive story in real-time is complicated by the lack of a mechanism for balancing the demands placed on the storyteller by a human interactor who is sometimes a character in a mini-drama and sometimes the target of a story told by other characters. Inspired by “story within a story” cinema, such as The Game and The Truman Show, I suggest that the tools of narrative critical theory open a space for using techniques of narrative analysis to construct interactive stories, and I describe an architecture for an interactive storytelling environment and an autonomous storytelling agent.
A Photometric Approach to Digitizing Cultural Artifacts
Cohen, J.
2nd International Symposium on Virtual Reality, Archaeology, and Cultural Heritage (VAST 2001)
(Glyfada, Greece, November 2001)
Read Abstract »
: In this paper we present a photometry-based approach to the digital documentation of cultural artifacts. Rather than representing an artifact as a geometric model with spatially varying reflectance properties, we instead propose directly representing the artifact in terms of its reflectance field - the manner in which it transforms light into images. The principal device employed in our technique is a computer-controlled lighting apparatus which quickly illuminates an artifact from an exhaustive set of incident illumination directions and a set of digital video cameras which record the artifact\’s appearance under these forms of illumination. From this database of recorded images, we compute linear combinations of the captured images to synthetically illuminate the object under arbitrary forms of complex incident illumination, correctly capturing the effects of specular reflection, subsurface scattering, self-shadowing, mutual illumination, and complex BRDF\’s often present in cultural artifacts. We also describe a computer application that allows users to realistically and interactively relight digitized artifacts.
Olson, M., Traum, D., Van-ess Dykema, C., Weinberg, A.
Machine Translation Summit VIII
(Spain, September 2001)
Read More »
Bharitkar, S., Kyriakakis, C.
Audio Engineering Society Convention Paper Presented at the 111th Convention 2001
(September 21-24)
Read Abstract »
Room acoustical modes, particularly in small rooms, cause a significant variation in the room responses measured at different locations. Responses measured only a few cm apart can vary by up to 15-20 dB at certain frequencies. This makes it difficult to equalize an audio system for multiple simultaneous listeners. Previous methods have utilized multiple microphones and spatial averaging with equal weighting. In this paper we present a different multiple point equalization method. We first determine representative prototypical room responses derived from several room responses that share similar characteristics, using the fuzzy unsupervised learning method. These prototypical responses can then be combined to form a general point response. When we use the inverse of the general point response as an equalizing filter, our results show a significant improvement in equalization performance over the spatial averaging methods. This simultaneous equalization is achieved by suppressing the peaks in the room magnitude spectrums. Applications of this method thus include equalization and multiple point sound control at home and in automobiles.
Intelligent Virtual Agents for Education and Training: Opportunities and Challenges
Rickel, J.
Intelligent Virtual Agents: Third International Workshop, IVA 2001
(Madrid, Spain, September 10-11, 2001)
Read Abstract »
Interactive virtual worlds provide a powerful medium for ex- periential learning. Intelligent virtual agents can cohabit virtual worlds with people and facilitate such learning as guides, mentors, and team- mates. This paper reviews the main pedagogical advantages of animated agents in virtual worlds, discusses two key research challenges, and out- lines an ambitious new project addressing those challenges.
Cohen, J., Tchou, C., Hawkins, T., Debevec, P.
Eurographics Rendering Workshop 2001
(June 2001)
Read Abstract » | Read More »
This paper presents a technique for representing and displaying high dynamic-range texture maps (HDRTMs) using current graphics hardware. Dynamic range in real-world environments often far exceeds the range representable in 8-bit per-channel texture maps. The increased realism afforded by a high-dynamic range representation provides improved fidelity and expressiveness for interactive visualization of image-based models. Our technique allows for real-time rendering of scenes with arbitrary dynamic range, limited only by available texture memory. In our technique, high-dynamic range textures are decomposed into sets of 8- bit textures. These 8-bit textures are dynamically reassembled by the graphics hardware’s programmable multitexturing system or using multipass techniques and framebuffer image processing. These operations allow the exposure level of the texture to be adjusted continuously and arbitrarily at the time of rendering, correctly accounting for the gamma curve and dynamic range restrictions of the display device. Further, for any given exposure only two 8-bit textures must be resident in texture memory simultaneously. We present implementation details of this technique on various 3D graphics hardware architectures. We demonstrate several applications, including high-dynamic range panoramic viewing with simulated auto-exposure, real-time radiance environment mapping, and simulated Fresnel reflection.
Damiano, R., Traum, D.
NAACL 2001 workshop on Adaptation in Dialogue Systems
(2001)
Toward the Holodeck: Integrating Graphics, Sound, Character and Story
Johnson, L., Kyriakakis, C., LaBore, C., Lindheim, R., Miraglia, D., Moore, B., Rickel, J., Thiébaux, M., Tuch, L., Whitney, R.
Proceedings of 5th International Conference on Autonomous Agents
(Montreal, Canada, June 2001)
Read Abstract »
We describe an initial prototype of a holodeck-like environment that we have created for the Mission Rehearsal Exercise Project. The goal of the project is to create an experience learning system where the participants are immersed in an environment where they can encounter the sights, sounds, and circumstances of realworld scenarios. Virtual humans act as characters and coaches in an interactive story with pedagogical goals.
Adaptive Narrative
Douglas, J., Gratch, J.
Proceedings of the 5th International Conference on Autonomous Agents
(Montreal, Canada, June 2001)
Read Abstract »
Creating dramatic narratives for real-time virtual reality environments is complicated by the lack of temporal distance between the occurrence of an event and its telling in the narrative. This paper describes the application of a multiprocessing operating system architecture to the creation of adaptive narratives, narratives that use autonomous actors or agents to create real-time dramatic experiences for human interactors. We also introduce the notion of dramatic acts and dramatic functions and indicate their use in constructing this real-time drama.
Ligorio, B., Mininni, G., Traum, D.
1st European Conference on Computer-Supported Collaborative Learning (Euro-CSCL 2001)
(March 2001)
Read More »
Steve Goes to Bosnia: Towards a New Generation of Virtual Humans for Interactive Experiences
Rickel, J.
AAAI Spring Symposium on Artificial Intelligence and Interactive Entertainment
(Stanford University, CA, March 2001)
Read Abstract »
Interactive virtual worlds provide a powerful medium for entertainment and experiential learning. Our goal is to enrich such virtual worlds with virtual humans { autonomous agents that support face-to-face interaction with people in these environments in a variety of roles. While supporting face-to-face interaction in virtual worlds is a daunting task, this paper argues that the key building blocks are already in place. We pro- pose an ambitious integration of core technologies centered on a common representation of task knowledge, and we describe an implemented virtual world and set of characters for an Army peace-keeping scenario that illustrates our vision.
Bharitkar, S., Kyriakakis, C.
2001 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Read Abstract »
Selectively cancelling signals at specific locations within an acoustical environment with multiple listeners is of significant importance for home theater, automobile, teleconferencing, office, industrial and other applications. We have proposed the eigenfilter for selectively cancelling signals in one direction, while attempting to retain them at unintentional directions. In this paper we investigate the behaviour of the performance measure (i.e., the gain) for a vowel and an unvoiced fricative, when the listener moves his head, in an automobile type environment. We show that in such a situation, a large energy in the difference between the impulse responses at a listener’s location may affect the gain substantially. listeners in which only a subset wish to listen to the audio signal.
Bharitkar, S., Kyriakakis, C.
Conference Proceeding
(Mohonk, NY)
Read Abstract »
In this paper we address the problem of simultaneous room response equalization for multiple listeners. Traditional approaches to this problem have used a single microphone at the listening position to measure impulse responses from a loudspeaker and then use an inverse filter to correct the frequency response. The problem with that approach is that it only works well for that one point and in most cases is not practical even for one listener with a typical ear spacing of 18 cm. It does not work at all for other listeners in the room, or if the listener changes positions even slightly. We propose a new approach that is based on the Fuzzy c-means clustering technique. We use this method to design equalization filters and demonstrate that we can achieve better equalization performance for several locations in the room simultaneously as compared to single point or simple averaging methods.
A Real Time High Dynamic Range Light Probe
Waese, J.
Siggraph Technical Sketches
(2001)
Read More »
HDR Shop
Tchou, C.
Siggraph Technical Sketches
(2001)
Use of Model Transformations for Distributed Speech Recognition
Srinivasamurthy, N., Ortega, A.
4th ISCA Tutorial and Research Workshop on Speech Synthesis
(2001)
Read Abstract »
Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech - either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9 %, 63.3 % and 56.3 % for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme.
Use of Model Transformations for Distributed Speech Recognition
Srinivasamurthy, N., Ortega, A.
4th ISCA Tutorial and Research Workshop on Speech Synthesis
(2001)
Read Abstract »
Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech - either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9 %, 63.3 % and 56.3 % for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme.
Recognition of Negative Emotions from the Speech Signal
Lee, C., Holman, T.
Automatic Speech Recognition and Understanding Workshop (ASRU 2001)
(2001)
Read Abstract »
This paper reports on methods for automatic classification of spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human- machine dialogs recorded from a commercial application deployed by SpeechWorks. Linear discriminant classification with Gaussian class-conditional probability distribution and knearest neighborhood methods are used to classify utterances into two basic emotion states, negative and non-negative. The features used by the classifiers are utterance-level statistics of the fundamental frequency and energy of the speech signal. To improve classification performance, two specific feature selection methods are used; namely, promising first selection and forward feature selection. Principal component analysis is used to reduce the dimensionality of the features while maximizing classification accuracy. Improvements obtained by feature selection and PCA are reported in this paper. We reported the results.
Light Stage 2.0
Cohen, J., Tchou, C.
Siggraph Technical Sketches
(2001)
Read More »
Adaptive Karhunen-Loeve Transform for Enhanced Multichannel Audio Coding.
Yang, D., Ai, H., Kyriakakis, C., Kuo, C.
SPIE
(San Diego, CA, 2001)
Read Abstract »
A modified MPEG Advanced Audio Coding (AAC) scheme based on the Karhunen-Loeve transform (KLT) to remove inter-channel redundancy, which is called the MAACKL method, has been proposed in our previous work. However, a straightforward coding of elements of the KLT matrix generates about 240 bits per matrix for typical 5 channel audio contents. Such an overhead is too expensive so that it prevents MAACKL from updating KLT dynamically in a short period of time. In this research, we study the de-correlation efficiency of adaptive KLT as well as an efficient way to encode elements of the KLT matrix via vector quantization. The effect due to different quantization accuracy and adaptation period is examined carefully. It is demonstrated that with the smallest possible number of bits per matrix and a moderately long KLT adaptation time, the MAACKL algorithm can still generate a very good coding performance.
Embedded High-Quality Multichannel Audio Coding.
Yang, D., Ai, H., Kyriakakis, C., Kuo, C.
Conference on Media Processors, Symposium on Electronic Imaging 2001
(San Jose, CA, January 21-26, 2001)
Read Abstract »
An embedded high-quality multichannel audio coding algorithm is proposed in this research. The Karhunen-Loeve Transform (KLT) is applied to multichannel audio signals in the pre-processing stage to remove inter-channel redun- dancy. Then, after processing of several audio coding blocks, transformed coefficients are layered quantized and the bit stream is ordered according to their importance. The multichannel audio bit stream generated by the proposed algorithm has a fully progressive property, which is highly desirable for audio multicast applications in heterogeneous networks. Experimental results show that, compared with the MPEG Advanced Audio Coding (AAC) algorithm, the proposed algorithm achieves a better performance with both the objective MNR (Mask-to-Noise-Ratio) measurement and the subjective listening test at several different bit rates.
Marsella, S., Gratch, J., Rickel, J.
Proceedings of the Agents2001 Workshop on Representing, Annotating, and Evaluating Non-Verbal and Verbal Communicative Acts to Achieve Contextual Embodied Agents
(Montreal, Canada, June 2001)
Read Abstract »
A person’s behavior provides significant information about their emotional state, attitudes, and attention. Our goal is to create virtual humans that convey such information to people while interacting with them in virtual worlds. The virtual humans must respond dynamically to the events surrounding them, which are fundamentally influenced by users’ actions, while providing an illusion of human-like behavior. A user must be able to interpret the dynamic cognitive and emotional state of the virtual humans using the same nonverbal cues that people use to understand one another. Towards these goals, we are integrating and extending components from three prior systems: a virtual human architecture with a range of cognitive and motor capabilities, a model of emotional appraisal, and a model of the impact of emotional state on physical behavior. We describe the key research issues, our approach, and an initial implementation in an Army peacekeeping scenario.
Mouchtaris, A., Reveliotis, P., Kyriakakis, C.
IEEE Transactions on Multimedia
(2000)
Read Abstract »
Immersive audio systems can be used to render virtual sound sources in three-dimensional (3-D) space around a listener. This is achieved by simulating the head-related transfer function (HRTF) amplitude and phase characteristics using digital filters. In this paper, we examine certain key signal processing considerations in spatial sound rendering over headphones and loudspeakers. We address the problem of crosstalk inherent in loudspeaker rendering and examine two methods for implementing crosstalk cancellation and loudspeaker frequency response inversion in real time. We demonstrate that it is possible to achieve crosstalk cancellation of 30 dB using both methods, but one of the two (the Fast RLS Transversal Filter Method) offers a significant advantage in terms of computational efficiency. Our analysis is easily extendable to nonsymmetric listening positions and moving listeners.
Bharitkar, S., Kyriakakis, C.
ICME 2000
(New York, July 2000)
Read Abstract »
Selectively canceling signals at specific locations within an acoustical environment with multiple listeners is of significant importance for home theater, teleconferencing, office, industrial and other applications. The traditional noise cancellation approach is impractical for such applications because it requires sensors that must be placed on the listeners. In this paper we propose an alternative method to minimize signal power in a given location and maximize signal power in another location of interest. A key advantage of this approach would be the need to eliminate sensors. We investigate the use of an information theoretic criterion known as mutual information to design filter coefficients that selectively cancel a signal in one audio channel, and transmit it in another (complementary) channel. Our results show an improvement in power gain at one location in the room relative to the other.
A Multiple Input Single Output Model for Immersive Audio Rendering in Real Time
Georgiou, P., Kyriakakis, C.
ICME 2000
(New York, July 2000)
Read Abstract »
Accurate localization of sound in 3-D space is based on variations in the spectrum of sound sources. These variations arise mainly from reflection and diffraction effects caused by the pinnae and are described through a set of Head-RElated Transfer Functions (HRTF’s) that are unique for each azimuth and elevation angle. A virtual sound source can be rendered in the desired location by filtering with the corresponding HRTF for each ear. Previous work on HRTF modeling has mainly focused on the methods that attempt to model each transfer function individually. These methods are generally computationally-complex and cannot be used for real-time spatial rendering of multiple moving sources. In this work we provide an alternative approach, which uses a multiple input single output state space system to creat a combined model of the HRTF’s for all directions. This method exploits the similarities among the different HRTF’s to achieve a significant reduction in the model size with a minimum loss of accuracy.
Bharitkar, S., Kyriakakis, C.
ISPACS 2000
(Hawaii, 2000)
Read Abstract »
Selectively canceling signals at specific locations within an acoustical environment with multiple listeners is of significant importance for home theater, automobile, teleconferencing, office, industrial and other applications. The traditional noise cancellation approach is impractical for such applications because it requires sensors that must be placed on the listeners. In this paper we investigate the theoretical properties of eigenfilters for signal cancellation proposed in [1]. We also investigate the sensitivity of the eigenfilter as a function of the room impulse response duration. Our results show that with the minimum phase model for the room impulse response, we obtain a better behaviour in the sensitivity of the filter to the duration of the room response.
Efficient Scalable Speech Compression for Scalable Speech Recognition
Srinivasamurthy, N., Ortega, A.
IEEE International Conference on Multimedia & Expo
(2000)
Read Abstract »
We propose a scalable recognition system for reducing recognition complexity. Scalable recognition can be combined with scalable compression in a distributed speech recognition (DSR) application to reduce both the computational load and the bandwidth requirement at the server. A low complexity preprocessor is used to eliminate the unlikely classes so that the complex recognizer can use the reduced subset of classes to recognize the unknown utterance. It is shown that by using our system it is fairly straightforward to trade-off reductions in complexity for performance degradation. Results of preliminary experiments using the TI-46 word digit database show that the proposed scalable approach can provide a 40% speed up, while operating under 1.05 kbps, compared to the baseline recognition using uncompressed speech.
Learning Domain Knowledge for Teaching Procedural Skills
Scholer, A., Rickel, J., Angros, R., Johnson, L.
AAAI-2000 Fall Symposium on Learning How to Do Things
(2000)
Traum, D., Andersen, C., Chong, Y., Josyula, D., Okamoto, Y., Purang, K., O'Donovan-Anderson, M., Perlis, D.
Electronic Transactions on Artificial Intelligence 3(D)
(1999)
Read More »