Kang, S., Gratch, J., & Wang, N.
Intelligent Virtual Agent 2008
(Tokyo, Japan, 09/02/2008)
Read Abstract »
This study explored associations between the five-factor personality traits of human subjects and their feelings of rapport when they interacted with virtual agent or real humans. The agent, the Rapport Agent, responded to real human speakers’ storytelling behavior, using contingent, but only nonverbal feedback. We further investigated how interactants’ personalities were related to the three components of rapport: positivity, attentiveness, and coordination. The results revealed that more agreeable people showed strong self-reported rapport and weak behavioral-measured rapport in the disfluency dimension with the Rapport Agent, while showing no significant associations between agreeableness and both people’s self-reported rapport and the disfluency dimension with real humans. The conclusions provide fundamental data to further develop the rapport theory that would contribute to evaluating and enhancing the interactional fidelity of an agent on the design of virtual humans for social skills training and therapy.
Gandhe, S., DeVault, D., Roque, A., Artstein, R., Leuski, A., Gerten, J., Traum, D., & Martinovski, B.
Interspeech 2008
(Brisbane, Australia, 9/26/2008)
Read Abstract »
We present a new approach for rapidly developing dialogue capabilities for virtual humans. Starting from domain specification, an integrated authoring interface automatically generates dialogue acts with all possible contents. These dialogue acts are linked to example utterances in order to provide training data for natural language understanding and generation. The virtual human dialogue system contains a dialogue manager following the information-state approach, using finite-state machines and SCXML to manage local coherence, as well as explicit modeling of emotions and compliance level and a grounding component based on evidence of understanding. Using the authoring tools, we design and implement a version of the virtual human Hassan and compare to previous architectures for the character.
Carre, D. & Levasseur, M.
Institute for Creative Technologies, ICT-TR-03-2008,
(Marina del Rey, CA, September 17, 2007 - December 6, 2007)
Read Abstract »
Rapport between people and virtual human agents is not limited to just speech. There are many non-verbal behaviors such as gestures or facial expressions that can express feelings or convey a message. One of the challenges in making an agent appear more realistic is to make his non-verbal behaviors appear more natural. To accomplish this, it is essential to find out how and when gestures are performed.
In order to determine how gestures are performed, it is necessary to assess different appearances of the same gesture and the mapping between their respective function.
To determine when gestures are performed, the key is to find relevant contextual features and their links with gestures, which will lead to the prediction of the moment they should be performed.
Finally, both of these issues can now be tackled with the provided toolbox. Preliminary results show that we have some gesture pattern. Beside, we were able, based on contextual features, to predict when the agent should nod his head. Early results appear to show the agent nods at an opportune time. Moreover, this toolbox generalizes the results to other kind of gestures than head nods, which is the goal of this study.
de Kok, I.
Institute for Creative Technologies, ICT-TR-02-2008,
(Marina del Rey, CA, June 20, 2008)
Read Abstract »
In this report I will document the work I have done during my internship at Institute for Creative Technologies from 22 January to 25 April under supervision of Louis-Phillipe Morency. During this time I have done research in the field of virtual humans, more specically in the field of predicting and producing listener backchannels. But more on that later. I will start this report with a little background about the Institute for Creative Technologies and the project group which I was part of. After this the goal of my internship will be explained in Section 2. A general overview of our approach of achieving the goals set in Section 2 will be explained in Section 3. A more detailed description of the dierent steps taken will be given in Section 4. Following on that the results of the conducted research will be presented in Section 5. Finally a discussion of the work done, recommendations for improvement and future work will be given in Section 6.
Roque, A., & Traum, D.
9th SIGdial Workshop on Discourse and Dialogue
(Columbus, OH, June 19-20, 2008)
Read Abstract »
We introduce the Degrees of Grounding model, which defines the extent to which material being discussed in a dialogue has been grounded. This model has been developed and evaluated by a corpus analysis, and includes a set of types of evidence of understanding, a set of degrees of groundedness, a set of grounding criteria, and methods for identifying each of these. We describe how this model can be used for dialogue management.
Morency, L.P., de Kok, I., & Gratch, J.
Conference on Intelligent Virtual Agents (IVA 2008)
(Tokyo, Japan, September 1-3, 2008)
Read Abstract »
During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods), our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.
Morency, L.P., Whitehill, J., & Movellan, J.
8th International Conference on Automatic Face and Gesture Recognition (FG 2008)
(Amsterdam, The Netherlands, September 17-19, 2008)
Read Abstract »
Accurately estimating the person’s head position and orientation is an important task for a wide range of applications such as driver awareness and human-robot interaction. Over the past two decades, many approaches have been suggested to solve this problem, each with its own advantages and disadvantages. In this paper, we present a probabilistic framework called Generalized Adaptive View based Appearance Model (GAVAM) which integrates the advantages from three of these approaches: (1) the automatic initialization and stability of static head pose estimation, (2) the relative precision and user-independence of differential registration, and (3) the robustness and bounded drift of keyframe tracking. In our experiments, we show how the GAVAM model can be used to estimate head position and orientation in real-time using a simple monocular camera. Our experiments on two previously published datasets show that the GAVAM framework can accurately track for a long period of time (>2 minutes) with an average accuracy of 3.5◦ and 0.75in with an inertial sensor and a 3D magnetic sensor.
Sun, X., Morency, L.P., Okanohara, D., & Tsujii, J.
The 22nd International Conference on Computational Linguistics (COLING 2008)
(Manchester, United Kingdom, August 18, 2008)
Read Abstract »
Shallow parsing is one of many NLP tasks that can be reduced to a sequence labeling problem. In this paper we show that the latent-dynamics (i.e., hidden substructure of shallow phrases) constitutes a problem in shallow parsing, and we show that modeling this intermediate structure is useful. By analyzing the automatically learned hidden states, we show how the latent conditional model explicitly learn latent-dynamics. We propose in this paper the Best Label Path (BLP) inference algorithm, which is able to produce the most probable label sequence on latent conditional models. It outperforms two existing inference algorithms. With the BLP inference, the LDCRF model significantly outperforms CRF models on word features, and achieves comparable performance of the most successful shallow parsers on the CoNLL data when further using part-ofspeech features.
Bulitko, V., Solomon, S., Gratch, J., & van Lent, M.
The 10th International Conference on the Simulation of Adaptive Behavior (SAB); Workshop on the role of emotion in adaptive behavior and cognitive robotics.
(Osaka, Japan, July 11, 2008)
Read Abstract »
Culture and emotions have a profound impact on human behavior. Consequently, high-fidelity simulated interactive environments (e.g., trainers and computer games) that involve virtual humans must model socio-cultural and emotional affects on agent behavior. In this paper we discuss two recently fielded systems that do so independently: Culturally Affected Behavior (CAB) and EMotion and Adaptation (EMA). We then propose a simple language that combines the two systems in a natural way thereby enabling simultaneous simulation of culturally and emotionally affected behavior. The proposed language is based on matrix algebra and can be easily implemented on single- or multi-core hardware with a standard matrix package (e.g., MATLAB or a C++ library). We then show how to extend the combined culture and emotion model with an explicit representation of religion and personality profiles.
Parsons, T.D., & Rizzo, A.A.
CyberPsychology & Behavior, 11, 1, 16-24
(2008)
Read Abstract »
The current project is an initial attempt at validating the Virtual Reality Cognitive Performance Assessment Test (VRCPAT), a virtual environment–based measure of learning and memory. To examine convergent and discriminant validity, a multitrait–multimethod matrix was used in which we hypothesized that the VRCPAT’s total learning and memory scores would correlate with other neuropsychological measures involving learning and memory but not with measures involving potential confounds (i.e., executive functions; attention; processing speed; and verbal fluency). Using a sequential hierarchical strategy, each stage of test development did not proceed until specified criteria were met. The 15-minute VRCPAT battery and a 1.5-hour in-person neuropsychological assessment were conducted with a sample of 30 healthy adults, between the ages of 21 and 36, that included equivalent distributions of men and women from ethnically diverse populations. Results supported both convergent and discriminant validity. That is, findings suggest that the VRCPAT measures a capacity that is (a) consistent with that assessed by traditional paper-and-pencil measures involving learning and memory and (b) inconsistent with that assessed by traditional paper-and-pencil measures assessing neurocognitive domains traditionally assumed to be other than learning and memory. We conclude that the VRCPAT is a valid test that provides a unique opportunity to reliably and efficiently study memory function within an ecologically valid environment.
Parsons, T.D., & Rizzo, A.A.
Journal of Behavior Therapy and Experimental Psychiatry, 39, 250-261
(2008)
Read Abstract »
Virtual reality exposure therapy (VRET) is an increasingly common treatment for anxiety and specific phobias. Lacking is a quantitative meta-analysis that enhances understanding of the variability and clinical significance of anxiety reduction outcomes after VRET. Searches of electronic databases yielded 52 studies, and of these, 21 studies (300 subjects) met inclusion criteria. Although meta-analysis revealed large declines in anxiety symptoms following VRET, moderator analyses were limited due to inconsistent reporting in the VRET literature. This highlights the need for future research studies that report uniform and detailed information regarding presence, immersion, anxiety and/or phobia duration, and demographics.
Gordon, A.
International Conference on Knowledge Management, Special Track on Intelligent Assistance for Self-Directed and Organizational Learning.
(Graz, Austria, September 3-5, 2008)
Read Abstract »
The stories told among members of an organization are an effective instrument for knowledge socialization, the sharing of experiences through social mechanisms. However, the utility of stories for organizational learning is limited due to the difficulties in acquiring stories that are relevant to the practices of an organization, identifying the learning goals that these stories serve, and delivering these stories to the right people and the right time in a manner that best facilitates learning. In this paper we outline a vision for story-based organizational learning in the future, and describe three areas where intelligent technologies can be applied to automate story management practices in support of organizational learning. First, we describe automated story capture technologies that identify narratives of people’s experiences within the context of a larger discourse. Second, we describe automated retrieval technologies that identify stories that are relevant to specific educational needs. Third, we describe how stories can be transformed into effective story-based learning environments with minimal development costs.
Gordon, A., & Swanson, R.
International Conference on New Media Technology, Special Track on Knowledge Acquisition From the Social Web.
(Graz, Austria, September 3-5, 2008)
Read Abstract »
In this position paper we present a vision of how the stories that people tell in Internet weblogs can be used directly for automated commonsense reasoning, specifically to support the core envisionment functions of event prediction, explanation, and imagination.
Gordon, A., Havasi, C., Lux, M., & Strohmaier, M.
2008 International Conference on Intelligent User Interfaces
(Canary Islands, Spain, January 13-16, 2007)
Read Abstract »
We present an overview of the workshop on Common Sense Knowledge and Goal-Oriented Interfaces held at the 2008 Intelligent User Interfaces conference. Six papers were accepted from diverse research groups, each offering innovative new research on interfaces that incorporate common sense knowledge and that are oriented around the goals of their users.
Artstein, R., Gandhe, S., Leuski, A., & Traum, D.
ELRA Workshop on Evaluation
(Marrakech, Morocco, 5/27/08)
Read Abstract »
We tested a life-size embodied question-answering character at a convention where he responded to questions from the audience. The character’s responses were then rated for coherence. The ratings, combined with speech transcripts, speech recognition results and the character’s responses, allowed us to identify where the character needs to improve, namely in speech recognition and providing off-topic responses.
Morie, J.
SPIE Electronic Imaging: The Engineering Reality of Virtual Reality 2008
(01/31/2008)
Read Abstract »
The idea of Virtual Reality once conjured up visions of new territories to explore, and expectations of awaiting worlds of wonder. VR has matured to become a practical tool for therapy, medicine and commercial interests, yet artists, in particular, continue to expand the possibilities for the medium. Artistic virtual environments created over the past two decades probe the phenomenological nature of these virtual environments. When we inhabit a fully immersive virtual environment, we have entered into a new form of Being. Not only does our body continue to exist in the real, physical world, we are also embodied within the virtual by means of technology that translates our bodied actions into interactions with the virtual environment. Very few states in human existence allow this bifurcation of our Being, where we can exist simultaneously in two spaces at once, with the possible exception of meta-physical states such as shamanistic trance and out-of-body experiences. This paper discusses the nature of this simultaneous Being, how we enter the virtual space, what forms of persona we can don there, what forms of spaces we can inhabit, and what type of wondrous experiences we can both hope for and expect.
Parsons, T., Kenny, P., Ntuen, C., Pataki, C., Pato, M., Rizzo, A., St-George, C., & Sugar, J.
Medicine Meets Virtual Reality Conference
(Long Beach, CA, February 2008)
Read Abstract »
Effective interview skills are a core competency for psychiatry residents and developing psychotherapists. Although schools commonly make use of standardized patients to teach interview skills, the diversity of the scenarios standardized patients can characterize is limited by availability of human actors. Further, there is the economic concern related to the time and money needed to train standardized patients. Perhaps most damaging is the “standardization” of standardized patients—will they in fact consistently proffer psychometrically reliable and valid interactions with the training clinicians. Virtual Human Agent (VHA) technology has evolved to a point where researchers may begin developing mental health applications that make use of virtual reality patients. The work presented here is a preliminary attempt at what we believe to be a large application area. Herein we describe an ongoing study of our virtual patients (VP). We present an approach that allows novice mental health clinicians to conduct an interview with a virtual character that emulates an adolescent male with conduct disorder. This study illustrates the ways in which a variety of core research components developed at the University of Southern California facilitates the rapid development of mental health applications.
van Velsen, Martin.
The Florida Artificial Intelligence Research Society (FLAIRS)
(Key West, Florida, 5/15/2008)
Read Abstract »
In this paper we present an authoring tool called Narratoria that allows non-technical experts in the field of digital entertainment to create interactive narratives with 3D graphics and multimedia. Narratoria allows experts in digital entertainment to participate in the generation of story-based military training applications. Users of the tools can create story-arcs, screenplays, pedagogical goals and AI models using a single software application. Using game engines, which provide direct visual output in a real-time feedback-loop, users can view the final product as they edit.
Gordon, A., Hobbs, J. & Cox, M.
AAAI Workshop on Metareasoning: Thinking about thinking.
(Chicago, IL, July 13-14, 2008)
Read Abstract »
Representations of an AI agent’s mental states and processes are necessary to enable metareasoning, i.e., thinking about thinking. However, the formulation of suitable representations remains an outstanding AI research challenge, with no clear consensus on how to proceed. This paper outlines an approach involving the formulation of anthropomorphic self-models, where the representations that are used for metareasoning are based on formalizations of commonsense psychology. We describe two research activities that support this approach, the formalization of broad-coverage commonsense psychology theories and use of representations in the monitoring and control of object level reasoning. We focus specifically on metareasoning about memory, but argue that anthropomorphic self-models support the development of integrated, reusable, broadcoverage representations for use in metareasoning systems.
Hobbs, J. & Gordon, A.
Workshop on Sentiment Analysis: Emotion, Metaphor, Ontology and Terminology (EMOT-08), 6th International Conference on Language Resources and Evaluation (LREC-08)
(Marrakech, Morocco, May 27, 2008)
Read Abstract »
We understand discourse so well because we know so much. If we are to have natural language understanding systems that are able to deal with texts with emotional content, we must encode knowledge of human emotions for use in the systems. In particular, we must equip the system with a formal version of people’s implicit theory of how emotions mediate between what they experience and what they do, and rules that link the theory with words and phrases in the emotional lexicon. The effort we describe here is part of a larger project in knowledge-based natural language understanding to construct a collection of abstract and concrete core formal theories of fundamental phenomena, geared to language, and to define or at least characterize the most common words in English in terms of these theories (Hobbs, 2008). One collection of theories we have put a considerable amount of work into is a commonsense theory of human cognition, or how people think they think (Hobbs and Gordon, 2005). A formal theory of emotions is an important piece of this. In this paper we describe this theory and our efforts to define a number of the most common words about emotions in terms of this and other theories. Vocabulary related to emotions has been studied extensively within the field of linguistics, with particular attention to cross-cultural differences (Athanasiadou and Tabakowska, 1998; Harkins and Wierzbicka, 2001; Wierzbicka, 1999). Within computational linguistics, there has been recent interest in creating large-scale text corpora where expressions of emotion and other private states are annotated (Wiebe et al., 2005). In Section 2 we describe Core WordNet and our categorization of it to determine the most frequent words about cognition and emotion. In Section 3 we describe an effort to flesh out the emotional lexicon by searching a large corpus for emotional terms, so we can have some assurance of high coverage in both the core theory and the lexical items linked to it. In Section 4 we sketch the principal facets of some of the core theories. In Section 5 we describe the theory of Emotion with several examples of words characterized in terms of the theories.
Solomon, S., van Lent, M., Core, M., Carpenter, P., & Rosenberg, M.
Proceedings of the 17th Conference on Behavior Representation in Modeling and Simulation (BRIMS 2008),
(Providence, RI, April 2008)
Read Abstract »
Increasingly, the military has requirements for teaching cultural awareness, which demands flexible representations of cultural knowledge. The Culturally-Affected Behavior project seeks to define a language for encoding ethnographic data in order to capture cultural knowledge and use that knowledge to affect human behavior models. Having anthropologists encode ethnographic data will validate the language and will result in a library of culture models for immersive training.
Manshadi M., Swanson, R., Gordon, A.
Twenty-first International Conference of the Florida AI Society, Applied Natural Language Processing track
(Coconut Grove, FL, May 15-17, 2008)
Read Abstract »
One of the central problems in building broad-coverage
story understanding systems is generating expectations
about event sequences, i.e. predicting what happens next
given some arbitrary narrative context. In this paper, we
describe how a large corpus of stories extracted from
Internet weblogs was used to learn a probabilistic model of
event sequences using statistical language modeling
techniques. Our approach was to encode weblog stories as
sequences of events, one per sentence in the story, where
each event was represented as a pair of descriptive key
words extracted from the sentence. We then applied
statistical language modeling techniques to each of the event
sequences in the corpus. We evaluated the utility of the
resulting model for the tasks of narrative event ordering and
event prediction.
Gordon, A., Swanson, R.
International Conference on Weblogs and Social Media
(3/31/2008)
Read Abstract »
The phenomenal rise of Internet weblogging has created
new opportunities for people to tell personal stories of their
life experience, and the potential to share these stories with
those who can most benefit from reading them. One barrier
to this new mode of storytelling is the lack of accessibility;
existing Internet search tools are not tailored to the unique
characteristics of this textual genre. In this paper we
describe our efforts to develop a search engine specifically
for the stories that appear in Internet weblogs, called
StoryUpgrade. This application utilizes statistical text
classification technologies to separate story content from
other text in weblog entries, and facilitates searches for
stories that are related to particular activities of interest.
Swanson, R., Chew, E., Gordon, A.
AAAI Spring Symposium Series
(Stanford University, March 26-28, 2008)
Read Abstract »
Music and language are two human activities that fit well with a traditional notion of creativity and are particularly suited to computational exploration. In this paper we will argue for the necessity of syntactic processing in musical applications. Unsupervised methods offer uniquely interesting approaches to supporting creativity. We will demonstrate using the Constituent Context Model that syntactic structure of musical melodies can be learned automatically without annotated training data. Using a corpus built from the Well Tempered Clavier by Bach we describe a simple classification experiment that shows the relative quality of the induced parse trees for musical melodies.
Lane, H., Core, M., Gomboc, D., Karnavat, A., Rosenberg, M.
I/ITSEC
(Orlando, Florida, November 2007)
Read Abstract »
We describe some key issues involved in building an intelligent tutoring system for the ill-defined domain of interpersonal and intercultural skill acquisition. We discuss the consideration of mixed-result actions (actions with pros and cons), categories of actions (e.g. required steps vs. rules of thumb), the role of narrative, and reflective tutoring, among other topics. We present these ideas in the context of our work on an intelligent tutor for ELECT BiLAT, a game-based system to teach cultural awareness and negotiation skills for bilateral engagements. The tutor provides guidance in two forms: (1) as a coach that gives hints and feedback during an engagement with a virtual character, and (2) during an after-action review to help the learner reflect on their choices. Learner activities are mapped to learning objectives, which include whether the actions represent positive or negative evidence of learning. These underlie an expert model, student model, and models of coaching and reflective tutoring that support the learner. We describe several other cultural and interpersonal training systems that situate learners in goal based social contexts that include interaction with virtual characters and automated guidance. Finally, our future work includes evaluations of learning, expansion of the coach and reflective tutoring strategies, and integration of deeper knowledge-based resources that capture more nuanced cultural aspects of interaction.
Gordon, A., Cao, Q., Swanson, R.
Proceedings of the Fourth International Conference on Knowledge Capture
(Whistler, BC, October 28-31, 2007)
Read Abstract »
Among the most interesting ways that people share knowledge is through the telling of stories, i.e. first-person narratives about real life experiences. Millions of these stories appear in Internet weblogs, offering a potentially valuable resource for future knowledge management and training applications. In this paper we describe efforts to automatically capture stories from Internet weblogs by extracting them using statistical text classification techniques. We evaluate the precision and recall performance of competing approaches. We describe the large-scale application of story extraction technology to Internet weblogs, producing a corpus of stories with over a billion words.
Peers, P., Tamura, N., Matusik, W., Debevec, P.
ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007)
(San Diego, CA, August 2007)
Read Abstract » | Read More »
We propose a novel post-production facial performance relighting system for human actors. Our system uses just a dataset of view-dependent facial appearances with a neutral expression, captured for a static subject using a Light Stage apparatus. For the actual performance, however, a potentially different actor is captured under known, but static, illumination. During post-production, the reflectance field of the reference dataset actor is transferred onto the dynamic performance, enabling image-based relighting of the entire sequence. Our approach makes post-production relighting more practical and could easily be incorporated in a traditional production pipeline since it does not require additional hardware during principal photography. Additionally, we show that our system is suitable for real-time post-production illumination editing.
Ma, W., Hawkins, T., Peers, P., Chabert, C., Weiss, M., Debevec, P.
Conference Proceeding
Read Abstract » | Read More »
We estimate surface normal maps of an object from either its diffuse or specular reflectance using four spherical gradient illumination patterns. In contrast to traditional photometric stereo, the spherical patterns allow normals to be estimated simultaneously from any number of viewpoints. We present two polarized lighting techniques that allow the diffuse and specular normal maps of an object to be measured independently. For scattering materials, we show that the specular normal maps yield the best record of detailed surface shape while the diffuse normals deviate from the true surface normal due to subsurface scattering, and that this effect is dependent on wavelength. We show several applications of this acquisition technique. First, we capture normal maps of a facial performance simultaneously from several viewing positions using time-multiplexed illumination. Second, we show that highresolution normal maps based on the specular component can be used with structured light 3D scanning to quickly acquire high-resolution facial surface geometry using off-the-shelf digital still cameras. Finally, we present a realtime shading model that uses independently estimated normal maps for the specular and diffuse color channels to reproduce some of the perceptually important effects of subsurface scattering.
Paek, T., Gandhe, S., Chickering, D., Ju, Y.
Proceedings of ACL 2007 Speechgram Workshop: Grammar-based approaches to Spoken Language Processing
(Prague, June 2007)
Read Abstract »
In command and control (C&C) speech interaction, users interact by speaking commands or asking questions typically specified in a context-free grammar (CFG). Unfortunately, users often produce out-of-grammar (OOG) command, which can result in misunderstanding or non-understanding. We explore a simple approach to handling OOG commands that involves generating a backoff grammar from any CFG using filler models, and utilizing that grammar for recognition whenever the CFG fails. Working within the memory footprint requirements of a mobile C&C product, applying th approach yielded a 35% relative reduction in semantic error rate for OOG commands. It also improve partial recognitions for enabling clarification dialogue.
Jan, D., Traum, D.
Proceedings of ACL 2007 workshop on Embodied Language Processing
(Prague, Czech Republic, June 2007)
Read Abstract »
For embodied agents to engage in realistic multiparty conversation, they must stand in appropriate places with respect to other agents and the environment. When these factors change, for example when an agent joins a conversation, the agents must dynamically move to a new location and/or orientation to accommodate. This paper presents an algorithm for simulating the movement of agents based on observed human behavior using techniques developed for pedestrian movement in crowd simulations. We extend a previous group conversation simulation to include an agent motion algorithm. We examine several test cases and show how the simulation generates results that mirror real-life conversation settings.
Gordon, A., Swanson, R.
Proceedings of the 2007 meeting of the Association for Computational Linguistics (ACL-07)
(Prague, Czech Republic, June 2007)
Read Abstract »
Large corpora of parsed sentences with semantic role labels (e.g. PropBank) provide training data for use in the creation of high-performance automatic semantic role labeling systems. Despite the size of these corpora, individual verbs (or role-sets) often have only a handful of instances in these corpora, and only a fraction of English verbs have even a single annotation. In this paper, we describe an approach for dealing with this sparse data problem, enabling accurate semantic role labeling for novel verbs (rolesets) with only a single training example. Our approach involves the identification of syntactically similar verbs found in PropBank, the alignment of arguments in their corresponding rolesets, and the use of their corresponding annotations in PropBank as surrogate training data.
Lane, H.
Conference Proceeding
(Marina Del Rey, CA, July 2007)
Read Abstract » | Read More »
We argue that metacognition is a critical component in the development of intercultural competence by highlighting the importance of supporting a learner’s self-assessment, self-monitoring, predictive, planning and reflection skills. We also survey several modern immersive cultural learning environments and discuss the role intelligent tutoring and experience management techniques can play to support these metacognitive demands. Techniques for adapting the behaviors of virtual humans to promote cultural learning are discussed, as well as explicit approaches to feedback. We conclude with several suggestions for future research, including the use of existing intercultural development metrics for evaluating learning in immersive environments and to conduct more studies of the use of implicit and explicit feedback to guide learning and establish optimal conditions for acquiring intercultural competence.
Tortell, R., Luigi, D., Dozois, A., Bouchard, S., Morie, J., Ilan, D.
Virtual Reality
(London, 2007)
Read Abstract »
Scent has been well documented as having significant effects on emotion (Alaoui-Ismaili in Physiol Behav 62(4):713?720, 1997; Herz et al. in Motiv Emot 28(4):363?383, 2004), learning (Smith et al. in Percept Mot Skills 74(2):339?343, 1992; Morgan in Percept Mot Skills 83(3)(2):1227?1234, 1996), memory (Herz in Am J Psychol 110(4):489?505, 1997) and task performance (Barker et al. in Percept Mot Skills 97(3)(1):1007?1010, 2003). This paper describes an experiment in which environmentally appropriate scent was presented as an additional sensory modality consistent with other aspects of a virtual environment called DarkCon. Subjects? game play habits were recorded as an additional factor for analysis. Subjects were randomly assigned to receive scent during the VE, and/or afterward during a task of recall of the environment. It was hypothesized that scent presentation during the VE would significantly improve recall, and that subjects who were presented with scent during the recall task, in addition to experiencing the scented VE, would perform the best on the recall task. Skin-conductance was a significant predictor of recall, over and above experimental groups. Finally, it was hypothesized that subjects? game play habits would affect both their behavior in and recall of the environment. Results are encouraging to the use of scent in virtual environments, and directions for future research are discussed.
Buckwalter, J.G., Geiger, A.M., Parson, T.D., Handler, J., Howes, M., & Lehmer, R.R.
The International Journal of Neuroscience, 117, 1579-1590
(2007)
Rendering for an Interactive 360 Degree Light Field Display
Jones, A., Bolas, M., McDowall, I., Yamada, H., & Debevec, P.
SIGGRAPH 2007
(San Diego, CA, August 2007)
Read Abstract »
We describe a set of rendering techniques for an autostereoscopic light field display able to present interactive 3D graphics to multiple simultaneous viewers 360 degrees around the display. The display consists of a high-speed video projector, a spinning mirror covered by a holographic diffuser, and FPGA circuitry to decode specially rendered DVI video signals. The display uses a standard programmable graphics card to render over 5,000 images per second of interactive 3D graphics, projecting 360-degree views with 1.25 degree separation up to 20 updates per second. We describe the system’s projection geometry and its calibration process, and we present a multiple-center-of-projection rendering technique for creating perspective-correct images from arbitrary viewpoints around the display. Our projection technique allows correct vertical perspective and parallax to be rendered for any height and distance when these parameters are known, and we demonstrate this effect with interactive raster graphics using a tracking system to measure the viewer’s height and distance. We further apply our projection technique to the display of photographed light fields with accurate horizontal and vertical parallax. We conclude with a discussion of the display’s visual accommodation performance and discuss techniques for displaying color imagery.
Traum, D., Roque, A., Leuski, A., Georgiou, P., Gerten, J., Martinovski, B., Narayanan, S., Robinson, S., & Vaswani, A.
Proceedings of the 8th SIGdial Workshop on Discourse and Dialoque, pages 71-74
(Antwerp, Belgium, September 2007)
Read Abstract »
We present Hassan, a virtual human who engages in Tactical Questioning dialogues. We describe the tactical questioning domain, the motivation for this character, the specific architecture and present brief examples and an evaluation.
Leuski, A., Pair, J., Traum, D., McNerney, P., Georgiou, P., & Patek, R.
Proceedings of the 11th International Conference on Intelligent User Interfaces (IUI'06), pages 360-362
(Sydney, Australia, January 2006)
Read Abstract »
There is a growing need for creating life-like virtual human simulations that can conduct a natural spoken dialog with a human student on a predefined subject. We present an overview of a spoken-dialog system that supports a person interacting with a full-size hologram-like virtual human character in an exhibition kiosk settings. We also give a brief summary of the natural language classification component of the system and describe the experiments we conducted with the system.
Lee, J., Marsella, S., Traum, D., Gratch, J., & Lance, B.
Proceedings of the 7th International Conference on Intelligent Virtual Agents, pp. 296-303
(Paris, France, September 2007)
Read Abstract »
Gaze plays a large number of cognitive, communicative and affective roles in face-to-face human interaction. To build a believable virtual human, it is imperative to construct a gaze model that generates realistic gaze behaviors. However, it is not enough to merely imitate a person’s eye movements. The gaze behaviors should reflect the internal states of the virtual human and users should be able to derive them by observing the behaviors. In this paper, we present a gaze model driven by the cognitive operations; the model processes the virtual human’s reasoning, dialog management, and goals to generate behaviors that reflect the agent’s inner thoughts. It has been implemented in our virtual human system and operates in real-time. The gaze model introduced in this paper was originally designed and developed by Jeff Rickel but has since been extended by the authors.
Kenny, P., Hartholt, A., Gratch, J., Swartout, W., Traum, D., Marsella, S., & Piepol, D.
I/ITSEC 2007
(Orlando, FL, November 2007)
Read Abstract »
There is a great need in the Joint Forces to have human to human interpersonal training for skills such as negotiation, leadership, interviewing and cultural training. Virtual environments can be incredible training tools if used properly and used for the correct training application. Virtual environments have already been very successful in training Warfighters how to operate vehicles and weapons systems. At the Institute for Creative Technologies (ICT) we have been exploring a new question: can virtual environments be used to train Warfighters in interpersonal skills such as negotiation, tactical questioning and leadership that are so critical for success in the contemporary operating environment? Using embodied conversational agents to create this type of training system has been one of the goals of the Virtual Humans project at the institute. ICT has a great deal of experience building complex, integrated and immersive training systems that address the human factor needs for training experiences. This paper will address the research, technology and value of developing virtual humans for training environments. This research includes speech recognition, natural language understanding & generation, dialogue management, cognitive agents, emotion modeling, question response managers, speech generation and non-verbal behavior. Also addressed will be the diverse set of training environments we have developed for the system, from single computer laptops to multi-computer immersive displays to real and virtual integrated environments. This paper will also discuss the problems, issues and solutions we encountered while building these systems. The paper will recount subject testing we have performed in these environments and results we have obtained from users. Finally the future of this type of Virtual Humans technology and training applications will be discussed.
Kenny, P., Hartholt, A., Gratch, J., Traum, D., & Swartout, W.
AAAI 2007
(Vancouver, British Columbia, Canada, July 2007)
Read Abstract »
The goal of the Virtual Humans Project at the University of Southern California’s Institute for Creative Technologies is to enrich virtual training environments with virtual humans–autonomous agents that support face-to-face interaction with trainees in a variety of roles – through bringing together many different areas of research including speech recognition, natural language understanding, dialogue management, cognitive modeling, emotion modeling, nonverbal behavior and speech and knowledge management. The demo at AAAI will focus on our work using virtual humans to train negotiation skills. Conference attendees will negotiate with a virtual human doctor and elder to try to move a clinic out of harm’s way in single and multi-party negotiation scenarios using the latest iteration of our Virtual Humans framework. The user will use natural speech to talk to the embodied agents, who will respond in accordance with their internal task model and state. The characters will carry out a multi-party dialogue with verbal and non-verbal behavior. A video of a single-party version of the scenario was shown at AAAI-06. This new interactive demo introduces several new features, including multi-party negotiation, dynamically generated non-verbal behavior and a central ontology.
Jan, D., Herrera, Martinovski, B., Novich, D., & Traum, D.
Proceedings of Intelligent Virtual Agents Conference45-56
(Paris, France, September 2007)
Read Abstract »
This paper presents a model for simulating cultural differences in the conversational behavior of virtual agents. The model provides parameters for differences in proxemics, gaze and overlap in turn taking.We present a review of literature on these factors and show results of a study where native speakers of North American English, Mexican Spanish and Arabic were asked to rate the realism of the simulations generated based on different cultural parameters with respect to their culture.
Kenny, P., Parsons, T., Gratch, J., Rizzo, A., & Leuski, A.
7th International Conference on Intelligent Virtual Agents
(Paris, France, September 2007)
Read Abstract »
Virtual humans offer an exciting and powerful potential for rich interactive experiences. Fully embodied virtual humans are growing in capability, ease, and utility. As a result, they present an opportunity for expanding research into burgeoning virtual patient medical applications. In this paper we consider the ways in which one may go about building and applying virtual human technology to the virtual patient domain. Specifically we aim to show that virtual human technology may be used to help develop the interviewing and diagnostics skills of developing clinicians. Herein we proffer a description of our iterative design process and preliminary results to show that virtual patients may be a useful adjunct to psychotherapy education.
Gratch, J., Wang, N., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., & Morency, L.P.
12th International Conference on Human-Computer Interaction
(Beijing, China, 2007)
Read Abstract »
Emotional bonds don’t arise from a simple exchange of facial displays, but often emerge through the dynamic give and take of face-to-face interactions. This article explores the phenomenon of rapport, a feeling of connectedness that seems to arise from rapid and contingent positive feedback between partners and is often associated with socio-emotional processes. Rapport has been argued to lead to communicative efficiency, better learning outcomes, improved acceptance of medical advice and successful negotiations. We provide experimental evidence that a simple virtual character that provides positive listening feedback can induce stronger rapport-like effects than face-to-face communication between human partners. Specifically, this interaction can be more engaging to storytellers than speaking to a human audience, as measured by the length and content of their stories.
Chu, S., Narayanan, S., & Kuo-C.
AAAI 2006 Fall Symposium, "Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems
(Arlington, VA, October 2006)
Read Abstract »
We consider the task of recognizing and learning the environments for mobile robot using audio information. Environments are mainly characterized by different types of specific sounds. Using audio enables the system to capture a semantically richer environment, as compared to using visual information alone. The goal of this paper is to investigate suitable features and the design feasibility of an acoustic environment recognition system. We performed statistical analysis of promising frequency- and timedomain based audio features. We show that even from unstructured environmental sounds, we can predict with fairly accurate results the type of environment that the robot is positioned.
Busso, C., & Narayanan, S.
7th International Seminar on Speech Production
(Ubatuba, Brazil, December 2006)
Read Abstract »
Communicative goals are simultaneously expressed through gestures and speech to convey messages enriched with valuable verbal and non-verbal clues. This paper analyzes and quantifies how linguistic and affective goals are reflected in facial expressions. Using a database recorded from an actress with markers attached to her face, the facial features during emotional speech were compared with the ones expressed during neutral speech. The results show that the facial activeness is mainly driven by articulatory processes. However, clear spatial-temporal patterns are observed during emotional speech, which indicate that emotional goals enhance and modulate facial expressions. The results also show that the upper face region has more degrees of freedom to convey non-verbal information than the lower face region, which is highly constrained by the underlying articulatory processes. These results are important toward understanding how humans communicate and interact.
Ai, H., Roque, A., Leuski, A., & Traum, D.
In Proceedings of the 10th Interspeech Conference
(Antwerp, Belgium, August 2007)
Read Abstract »
In this paper we investigate how to improve the performance of a dialogue move and parameter tagger for a taskoriented dialogue system using the information-state approach. We use a corpus of utterances and information states from an implemented system to train and evaluate a tagger, and then evaluate the tagger in an on-line system. Use of information state context is shown to improve performance of the system. Index Terms: spoken dialogue systems, dialogue management, tagging
Roque, A.
AAAI-07 Twelfth AAAI/SIGART Doctoral Consortium
(Vancouver, British Columbia, Canada, July 2007)
Read Abstract »
Computational models of grounding are extended to include representations of degrees of groundedness. These representations are then used for decision-making in dialogue management for spoken dialogue systems. Several domains will be explored with this model, and an implementation will be tested and evaluated.
Gandhe, S., & Traum, D.
5th Workshop on Knowledge and Reasoning in Practical Dialogue Systems
(Hyderabad, India, Jan 8, 2007)
Read Abstract »
Virtual human characters equipped with natural language dialogue capability have proved useful in many fields like simulation training and interactive games. Generally behind such dialogue managers lies a complex knowledge-rich rule-based system. Building such system involves meticulous annotation of data and hand autoring of rules. In this paper we build a statistical dialogue model from roleplay and wizard of oz dialog corpus with virtually no annotation. We compare these methods with the tra ditional approaches. We have evaluated these systems for perceived appropriateness of response and the results are presented here.
Fullerton, T., Morie, J., & Pearce, C.
Published in Digital Ars and Culture Conference Proceedings
(Perth, Australia, Fall, 2007)
Read Abstract »
The techno-fetishism of computer game culture has lead to a predominately male sensibility towards the construction of space in digital entertainment. Real-time strategy games conceive of space as a domain to be conquered; first-person shooters create labyrinthine battlefields in which space becomes a context for combat. Massively multiplayer games offer the opportunity for non-linear exploration, but emphasize linear achievement within a combat-based narrative. In this paper, we argue for a new gendered, regendered and perhaps degendered poetics of game space, rethinking ways in which space is conceptualized and represented as a domain for play. We argue for a more egalitarian virtual playground that acknowledges and embraces a wider range of spatial and cognitive models, referencing literature, philosophy, fine art and non-digital games for inspiration. Reflecting on a variety of sources, beginning with Virginia Woolf’s A Room of One’s Own and Bachelard’s Poetics of Space, feminist writings of Charlotte Gilman Perkins, Simone de Beauvoir, Hélène Cixous, Judith Butler, Janet Murray, and including contemporary game writers such as Lizbeth Klastrup, Mary Flanagan, Maia Engeli, and T.L. Taylor, we will argue for a new gendered poetics of game space, proposing an inclusionary approach that integrates feminine conceptions of space into the gaming landscape.
Fron, J., Fullerton, T., Morie, J., & Pearce, C.
Published in Proceedings of DIGRA: Situated Play
(Tokyo, September 24-27, 2007)
Read Abstract »
In this paper, we introduce the concept of a “Hegemony of Play,” to critique the way in which a complex layering of technological, commercial and cultural power structures have dominated the development of the digital game industry over the past 35 years, creating an entrenched status quo which ignores the needs and desires of “minority” players such as women and “non-gamers,” Who in fact represent the majority of the population. Drawing from the history of pre-digital games, we demonstrate that these practices have “narrowed the playing field,” and contrary to conventional wisdom, have actually hindered, rather than boosted, its commercial success. We reject the inevitability of these power structures, and urge those in game studies to “step up to the plate” and take a more proactive stance in questioning and critiquing the status of the Hegemony of Play.