|
Home | Organization | Registration | ICMI 2007 | Schedule | Abstracts |
1st International
Workshop
Tagging, Mining and Retrieval of Human-Related Activity Information
Thursday 15th
November 2007
Nagoya Marunouchi Tokyu
Inn
Abstracts (with Schedule)
|
9-9:30am |
|
Welcome |
|
9:30am |
Lucas
Malta, Chiyomi Miyajima, Kazuya
Takeda ( |
Multimodal Driving Data
Integration for the Analysis of Driver's Responses to Hazardous Situations On the last decade, experts from academia and
industry have been actively involved in road safety. Although encouraging
improvements in transportation have been made, the number of road fatalities
remains unacceptably high, suggesting that current efforts alone do not
suffice. Since almost three-quarters of traffic accidents falls on human
shoulders, a better understanding of driver behavior is a decisive step
towards safer and more efficient driving. In this study, to increase the
understanding of driver behaviors during potential threats, we explored the
multimodality of driver reactions. Our implemented method utilized multimedia
driving behavior signals, namely, force on the brake pedal or speech or both
to retrieve potentially hazardous situations from a large database. |
|
10:00am |
Rutger Rienks, Anton
Nijholt ( |
Verbal Behavior of the
More and the Less Influential Meeting Participant We
test the strength of the relationship between the way that people behave in a
discussion and their level of influence on the basis of some empirical
grounds. We use the data sources that were collected from the AMI corpus for
the experiments in the areas of argumentation, dialogue act and influence
research. Statistical dependencies and (cor)relations between the tags are mined for possible
relationships. |
|
10:30am |
Nadia
Mana, Bruno
Lepri, Paul
Chippendale, Alessandro
Cappelletti, Fabio
Pianesi, Piergiorgio Svaizer, Massimo
Zancanaro (FBK-Irst) |
Multimodal Corpus of
Multi-Party Meetings for Automatic Social Behavior Analysis and Personality
Traits Detection This
paper describes an automatically annotated multimodal corpus of multi-party
meetings. The corpus provides for each subject involved in the experimental
sessions information on her/his social behavior and personality traits, as
well as audiovisual cues (speech rate, pitch and energy, head orientation,
head, hand and body fidgeting). The corpus is based on the audio and video
recordings of thirteen sessions, which took place in a lab setting equipped
with cameras and microphones. Our main concern in collecting this corpus was
to investigate the possibility of creating a system capable of automatically
analyzing social behaviors and predicting personality traits using audio-visual
cues. |
|
11:00am |
Elisa
Rubegni, Jevon Brunk, Maurizio
Caporali, Antonio
Rizzo ( |
Wi-roni: A Gesture Tangible Interface for Experiencing
Internet Content in Public Spaces In this
paper, we describe a case study in which we explore the opportunity of
gesture tangible interfaces to allow the retrieval of web content. Wi-roni provides audio content (streaming radio and
podcasts), and the interface is based on capturing physical gestures as
command input and providing sound as a feedback response to these actions. Wi-roni is urban furniture that aims to allow everyone to
listen to Web-distributed audio content in a public area, bridging the gap
between expert and non-expert users and favoring the dissemination of
Internet culture. The work is part of a project that attempts to complement
distance communication with in-presence communication and everywhere/everytime web resource availability with here and now
preferred resource exploitation. |
|
11:30am |
Martin
Kurze (Deutsche Telekom Labs) |
Personalization in
Multimodal Interfaces This
paper describes a novel view to the area of multimodal interfaces from the perspective
of personalization. As a position paper it analyses properties of
multimodality in relation to features of personalization. As a short research
paper, it describes the context of a newly developed personalization
framework and its implementation with multimodality in the focus. As an
industry paper it presents insights on current research and considerations
within the telco industry. The main purpose of this
paper is to initiate a discussion on the problem field. |
|
12–2:00pm |
|
Lunch |
|
2:15pm |
Svenja Kahn, Tobias
Klug, Felix
Flentge (Technische Universität |
Modeling Temporal
Dependencies Between Observed Activities The modeling of parallel
activities requires a notation which can represent the temporal dependencies
as well as variations of the execution order of the activities. This paper
introduces ART (Activity Relation Trees), a notation to describe temporal
dependencies between activities. ART is based on ConcurTaskTrees
(CTT) that are extended with the means to describe temporal relationships.
Furthermore, we present an algorithm that allows to automatically generate ART models from observed examples. Because former
approaches for automatic model acquisition were restricted to strictly
sequential data and cannot be applied in the case of parallel activities, we
developed a method to reduce the problem of automatic modeling of parallel
activities to the simpler task of modeling sequential data. By grouping
activities and distinguishing different phases we are able to form general
descriptions of a scenario that include variations in the execution order.
The paper defines all necessary concepts and describes the algorithm in
detail. The evaluation of the algorithm shows that precise models can be
generated by using only few examples. |
|
2:45pm |
Koji
Kamei, Yutaka
Yanagisawa, Takuya
Maekawa, Yasue
Kishino, Yasushi
Sakurai, Takeshi
Okadome, (NTT Corporation) |
Tagging Strategies for
Extracting Real-World Events with Networked Sensors In this
paper, we introduce our ‘s-room’ project as well as
the tagging strategies and environment developed for the project. In the
s-room, many small sensor nodes are attached to various objects. Our project
aims to construct a system for comprehending real-world events and the
properties or status information of physical objects by utilizing sensor
nodes distributed throughout the room as well as general knowledge obtained
from web space. The events extracted in the s-room are then published as web
contents. We defined a set of event descriptors as a middle language between
the sensor data stream and natural language description. The descriptors are
selected by a two-way method: 1) a top-down approach based on definitions in
NL-dictionaries and laws in physics, 2) a bottom-up approach based on
manually tagged sensor data streams. We also developed a tagging environment
that enables us to arrange the relationship between NL phrase expressions of
human activities and multiple sensor events automatically extracted from the
sensor signal streams. |
|
3:15pm |
Hisao Setoguchi, Katsuya Takanashi, Tatsuya
Kawahara ( |
Multi-Modal
Conversational Analysis of Poster Presentations Using Multiple Sensors We are constructing a research environment called the
"IMADE room", which can capture a variety of multi-modal human
interactions. With this setting, we have designed and conducted recordings of
poster sessions made by one presenter and two audiences. In addition to
speech data of individual participants, gazing, nodding, and pointing
behaviors are recorded through multiple sensors. This article describes the
specifications of the data collection and preliminary analyses of the
relationship between verbal and non-verbal behaviors. |
|
3:45pm |
Kharsim Yousef, Eamonn O’Neill ( |
We present an end to end system ‘ |
|
4:15pm |
Edward
C Kaiser (Adapx) |
Cross-Domain Matching
for Automatic Tag Extraction across Redundant Handwriting and Speech Events In
many types of natural human-human interactions people communicate important information
redundantly across multiple communication modes, like saying what they
handwrite during a presentation or discussion. To detect and benefit from
such redundancies a computational understanding system must align the
recognition outputs from different perceptual modes like handwriting and
speech. Since the recognition domains of each mode differ, researchers refer
to tasks like this as cross-domain
matching. We describe how SHACER (our Speech
and HAndwriting reCognizER)
currently implements cross-domain matching, and compare that to an existing, formally optimal
algorithm for this task. Successful alignment and recognition of such
multimodal redundancies can be leveraged for automatic tagging of social
interactions. These automatically generated tags can benefit retrieval
techniques for non-textual documents recorded during computationally
perceived social interactions. |
|
4:45-5:00pm |
|
Closing Discussion and
Remarks |
Presentation slots are all 30 minutes: 20 minutes for
presentation, with 10 minutes for questions.