- Islandora Repository
- Lund Corpora
The Barack Obama Corpus
The Barack Obama Corpus (BOC) consists of 6,215,948 words (tokens), which are sourced from nearly 3,500 different texts, dating from January 2009 to January 2016. The texts, all taken from the White House Archives, comprise all speeches held by Barack Obama in his official capacity as 44th President of the United States of America. The earliest speech in the BOC is President Obama’s inauguration speech and the last is his final State of the Union speech (January 2016). In total, the corpus includes 34,967 word types, which leads to a type/token-ratio of 0.56.
The files, which display the original titles given to them by the White House, have been tagged for genre, audience type, date and location of delivery, and principal topics. The genres include remarks, addresses, statements, press conferences, debates and question-answer sessions, while the audience types have been separated in three: general public, specialized audience and press. The locations distinguish between the United States and abroad (Germany, UK, Indonesia etc.). Topics include a six-way distinction into political issues (e.g. fiscal household), social issues (e.g. health care), humanitarian issues, environment, representation (e.g. ceremonial duties) and campaign speeches., How to cite this resource:
Riesner, Katherina (2017). The Barack Obama Corpus [Data set]. http://hdl.handle.net/10050/00-0000-0000-0003-C53B-4@view, the_barack_obama_corpus_information.txt