Fourth Dialog State Tracking Challenge
@ IWSDS2016

Main task: Dialog State Tracking at Sub-dialog Level

Task General Overview

The goal of the main task of the challenge is to track dialog states for sub-dialog segments. For each turn in a given sub-dialog, the tracker should fill out a frame of slot-value pairs considering all dialog history prior to the turn. The performance of a tracker will be evaluated by comparing its outputs with reference annotations.

In the development phase, participants will be provided with a training set of dialogs with manual annotations over frame structures. In the test phase, each tracker will be evaluated on the results generated for a test set of unlabeled dialogs. A baseline system and evaluation script will be provided along with the training data. Participation in the main track is mandatory for all teams and/or individuals registered in the DSTC4.

Dataset General Description

In this challenge, participants will use the TourSG corpus to develop the components. TourSG consists of 35 dialog sessions on touristic information for Singapore collected from Skype calls between three tour guides and 35 tourists. These 35 dialogs sum up to 31,034 utterances and 273,580 words. All the recorded dialogs, with a total length of 21 hours, have been manually transcribed and annotated with speech act and semantic labels for each turn level.

Since each subject in these dialogs tends to be expressed not just in a single turn, but through a series of multiple turns, dialog states are defined in these conversations for each sub-dialog level. A full dialog session is divided into sub-dialogs considering their topical coherence and then they are categorized by topics. Each sub-dialog assigned to one of major topic categories will have an additional frame structure with slot value pairs to represent some more details about the subject discussed within the sub-dialog (see an example of reference annotations).

Evaluation General Description

Although the fundamental goal of this tracking task is to analyze the state for each sub-dialog level, the execution should be done in each utterance level regardless of the speaker from the beginning to the end of a given session in sequence. It aims at evaluating the capabilities of trackers not only for understanding the contents mentioned in a given segment, but also for predicting its dialog states even at an earlier turn of the segment.

To examine these both aspects of a given tracker, two different 'schedules' are considered to select the utterances for the target of evaluation:
* Schedule 1: all turns are included
* Schedule 2: only the turns at the end of segments are included

If some information is correctly predicted or recognized at an earlier turn in a given segment and well kept until the end of the segment, it will have higher accumulated scores than the other cases where the same information is filled at a later turn under schedule 1. On the other hand, the results under schedule 2 indicate the correctness of the outputs after providing all the turns of the target segment.

In this challenge, the following two sets of evaluation metrics are used for the main task:
* Accuracy: Fraction of segments in which the tracker's output is equivalent to the gold standard
* Precision/Recall/F-measure:
-- Precision: Fraction of slot-value pairs in the tracker's outputs that are correctly filled
-- Recall: Fraction of slot-value pairs in the gold standard labels that are correctly filled
-- F-measure: The harmonic mean of precision and recall

While the first metric is to check the equivalencies between the outputs and the references in whole frame-level, the others can show the partial correctness in each slot-value level.

Regarding operational aspects of the main track evaluation, it will be run as a CodaLab competition. Every participant should create a CodaLab account first and then register for participating at DSTC4 competition page. Once the registration request is confirmed by organizers, the participants will be able to make submissions of the outputs from their trackers

DSTC4 Main Task Handbook and Resources

A more comprehensive description of the main task, avaliable datasets and evaluation protocol can be found in the official challenge's Handbook: DSTC4 Main Task Handbook (V3).

Additional resources related to DSTC4 main track can be found in the Resources Page of this website.

Please check this page frequently for possible updates. Handbook and resources updates will be also announced through the DSTC4 mailing list. For instructions on how to subscribe to the mailing list, please refer to the Contact Information section of this website.