Home | Organization | Registration | ICMI 2007 | Schedule | Abstracts

 

1st International Workshop

Tagging, Mining and Retrieval of Human-Related Activity Information

Thursday 15th November 2007

Nagoya Marunouchi Tokyu Inn

 

 

Abstracts (with Schedule)

 

9-9:30am

 

Welcome

9:30am

Lucas Malta,

Chiyomi Miyajima,

Kazuya Takeda

(Nagoya University)

Multimodal Driving Data Integration for the Analysis of Driver's Responses to Hazardous Situations

On the last decade, experts from academia and industry have been actively involved in road safety. Although encouraging improvements in transportation have been made, the number of road fatalities remains unacceptably high, suggesting that current efforts alone do not suffice. Since almost three-quarters of traffic accidents falls on human shoulders, a better understanding of driver behavior is a decisive step towards safer and more efficient driving. In this study, to increase the understanding of driver behaviors during potential threats, we explored the multimodality of driver reactions. Our implemented method utilized multimedia driving behavior signals, namely, force on the brake pedal or speech or both to retrieve potentially hazardous situations from a large database.

10:00am

Rutger Rienks,

Anton Nijholt

(University of Twente)

Verbal Behavior of the More and the Less Influential Meeting Participant

We test the strength of the relationship between the way that people behave in a discussion and their level of influence on the basis of some empirical grounds. We use the data sources that were collected from the AMI corpus for the experiments in the areas of argumentation, dialogue act and influence research. Statistical dependencies and (cor)relations between the tags are mined for possible relationships.

10:30am

Nadia Mana,

Bruno Lepri,

Paul Chippendale,

Alessandro Cappelletti,

Fabio Pianesi,

Piergiorgio Svaizer,

Massimo Zancanaro

(FBK-Irst)

Multimodal Corpus of Multi-Party Meetings for Automatic Social Behavior Analysis and Personality Traits Detection

This paper describes an automatically annotated multimodal corpus of multi-party meetings. The corpus provides for each subject involved in the experimental sessions information on her/his social behavior and personality traits, as well as audiovisual cues (speech rate, pitch and energy, head orientation, head, hand and body fidgeting). The corpus is based on the audio and video recordings of thirteen sessions, which took place in a lab setting equipped with cameras and microphones. Our main concern in collecting this corpus was to investigate the possibility of creating a system capable of automatically analyzing social behaviors and predicting personality traits using audio-visual cues.

11:00am

Elisa Rubegni,

Jevon Brunk,

Maurizio Caporali,

Antonio Rizzo

(University of Siena)

Wi-roni: A Gesture Tangible Interface for Experiencing Internet Content in Public Spaces

In this paper, we describe a case study in which we explore the opportunity of gesture tangible interfaces to allow the retrieval of web content. Wi-roni provides audio content (streaming radio and podcasts), and the interface is based on capturing physical gestures as command input and providing sound as a feedback response to these actions. Wi-roni is urban furniture that aims to allow everyone to listen to Web-distributed audio content in a public area, bridging the gap between expert and non-expert users and favoring the dissemination of Internet culture. The work is part of a project that attempts to complement distance communication with in-presence communication and everywhere/everytime web resource availability with here and now preferred resource exploitation.

11:30am

Martin Kurze

(Deutsche Telekom Labs)

Personalization in Multimodal Interfaces

This paper describes a novel view to the area of multimodal interfaces from the perspective of personalization. As a position paper it analyses properties of multimodality in relation to features of personalization. As a short research paper, it describes the context of a newly developed personalization framework and its implementation with multimodality in the focus. As an industry paper it presents insights on current research and considerations within the telco industry. The main purpose of this paper is to initiate a discussion on the problem field.

12–2:00pm

 

Lunch

2:15pm

Svenja Kahn,

Tobias Klug,

Felix Flentge

(Technische Universität Darmstadt)

Modeling Temporal Dependencies Between Observed Activities

The modeling of parallel activities requires a notation which can represent the temporal dependencies as well as variations of the execution order of the activities. This paper introduces ART (Activity Relation Trees), a notation to describe temporal dependencies between activities. ART is based on ConcurTaskTrees (CTT) that are extended with the means to describe temporal relationships. Furthermore, we present an algorithm that allows to automatically generate ART models from observed examples. Because former approaches for automatic model acquisition were restricted to strictly sequential data and cannot be applied in the case of parallel activities, we developed a method to reduce the problem of automatic modeling of parallel activities to the simpler task of modeling sequential data. By grouping activities and distinguishing different phases we are able to form general descriptions of a scenario that include variations in the execution order. The paper defines all necessary concepts and describes the algorithm in detail. The evaluation of the algorithm shows that precise models can be generated by using only few examples.

2:45pm

Koji Kamei,

Yutaka Yanagisawa,

Takuya Maekawa,

Yasue Kishino,

Yasushi Sakurai,

Takeshi Okadome,

(NTT Corporation)

Tagging Strategies for Extracting Real-World Events with Networked Sensors

In this paper, we introduce our ‘s-room’ project as well as the tagging strategies and environment developed for the project. In the s-room, many small sensor nodes are attached to various objects. Our project aims to construct a system for comprehending real-world events and the properties or status information of physical objects by utilizing sensor nodes distributed throughout the room as well as general knowledge obtained from web space. The events extracted in the s-room are then published as web contents. We defined a set of event descriptors as a middle language between the sensor data stream and natural language description. The descriptors are selected by a two-way method: 1) a top-down approach based on definitions in NL-dictionaries and laws in physics, 2) a bottom-up approach based on manually tagged sensor data streams. We also developed a tagging environment that enables us to arrange the relationship between NL phrase expressions of human activities and multiple sensor events automatically extracted from the sensor signal streams.

3:15pm

Hisao Setoguchi,

Katsuya Takanashi,

Tatsuya Kawahara

(Kyoto University)

Multi-Modal Conversational Analysis of Poster Presentations Using Multiple Sensors

We are constructing a research environment called the "IMADE room", which can capture a variety of multi-modal human interactions. With this setting, we have designed and conducted recordings of poster sessions made by one presenter and two audiences. In addition to speech data of individual participants, gazing, nodding, and pointing behaviors are recorded through multiple sensors. This article describes the specifications of the data collection and preliminary analyses of the relationship between verbal and non-verbal behaviors.

3:45pm

Kharsim Yousef,

Eamonn O’Neill

(University of Bath)

Sunrise: Towards Location-Based Clustering For Assisted Photo Management

We present an end to end system ‘Sunrise’ to assist users in using location based data to tag, cluster, and find digital photos. We suggest that location tagging of photos is a common desire of users but existing manual solutions are too time consuming and complex. Our 3-tier system offers a solution that integrates automated location tagging with existing manual approaches. The mobile component of our system captures GPS based data at user defined intervals to create trail logs of the user’s movements. The desktop component automates location tagging of the photos and makes use of the GPS location data to provide assisted photo keywording through a process of reverse geo-coding. The server component uses automated location clustering techniques to produce hierarchical location representations, to simplify photo navigation and generate temporal/spatial data combinations enabling assisted photo management.

4:15pm

Edward C Kaiser

(Adapx)

Cross-Domain Matching for Automatic Tag Extraction across Redundant Handwriting and Speech Events

In many types of natural human-human interactions people communicate important information redundantly across multiple communication modes, like saying what they handwrite during a presentation or discussion. To detect and benefit from such redundancies a computational understanding system must align the recognition outputs from different perceptual modes like handwriting and speech. Since the recognition domains of each mode differ, researchers refer to tasks like this as cross-domain matching. We describe how SHACER (our Speech and HAndwriting reCognizER) currently implements cross-domain matching, and compare that to an existing, formally optimal algorithm for this task. Successful alignment and recognition of such multimodal redundancies can be leveraged for automatic tagging of social interactions. These automatically generated tags can benefit retrieval techniques for non-textual documents recorded during computationally perceived social interactions.

4:45-5:00pm

 

Closing Discussion and Remarks

 

 

Presentation slots are  all 30 minutes: 20 minutes for presentation, with 10 minutes for questions.