Basic structure of SCEPA

The archive contains an Excuses.xml file which is the corpus itself.

The corpus contains political (UK, USA, Canada) apologies; details about apology: date, author, name of apology, gender of author, country of author, link(s) to source of excuse, reason of apology. An important part of the corpus is communicative tactics used in apologies.

At the beginning of the corpus within the tag <tacticIndices> there are defined well-known communicative tactics that people use when apologizing.

So each excuse contains a reference to the tactics that were used in it. Also, it includes the part of the original excuse text which expresses that tactic.

   <excuse>
      <id></id>
      <name></name>
      <author>Bill Clinton</author>
      <gender>male/female/group</gender>
      <date>date of apology</date>
      <country>country of author</country>
      <text>apology text itself</text>
      <communicativeTactics>
         <communicativeTactic tacticIndex="index of defined tactic">
            <text>the part of the excuse text which expresses this tactic</text>
         </communicativeTactic>
         ...
      </communicativeTactics>
      <sources>
         <source>link to source of excuse</source>
         ...
      </sources>
      <additionalInformation>Additional information related to excuse like reason, etc.</additionalInformation>
   </excuse>
                  

Also in the archive is presented a schema validation file. So if anyone wants to extend the list of excuses, this schema can help to keep the data in a consistent state.

Genre

Public apology.

Sociolinguistic coverage

Politicians of USA, UK, Canada. Male/female. Different ages.