- Islandora Repository
- Lund Corpora
-
-
The Barack Obama Corpus
-
The Barack Obama Corpus (BOC) consists of 6,215,948 words (tokens), which are sourced from nearly 3,500 different texts, dating from January 2009 to January 2016. The texts, all taken from the White House Archives, comprise all speeches held by Barack Obama in his official capacity as 44th President of the United States of America. The earliest speech in the BOC is President Obama’s inauguration speech and the last is his final State of the Union speech (January 2016). In total, the corpus includes 34,967 word types, which leads to a type/token-ratio of 0.56.
The files, which display the original titles given to them by the White House, have been tagged for genre, audience type, date and location of delivery, and principal topics. The genres include remarks, addresses, statements, press conferences, debates and question-answer sessions, while the audience types have been separated in three: general public, specialized audience and press. The locations distinguish between the United States and abroad (Germany, UK, Indonesia etc.). Topics include a six-way distinction into political issues (e.g. fiscal household), social issues (e.g. health care), humanitarian issues, environment, representation (e.g. ceremonial duties) and campaign speeches., How to cite this resource:
Riesner, Katherina (2017). The Barack Obama Corpus [Data set]. http://hdl.handle.net/10050/00-0000-0000-0003-C53B-4@view, the_barack_obama_corpus_information.txt
-
-
Eline Visser
-
Corpora of Papuan and Austronesian languages in eastern Indonesia: Geser-Gorom (ISO639-3:ges), Kalamang (ISO639-3:kgv), Uruangnirin (ISO639-3:urn) and Yamdena (ISO639-3:jmd).
-
-
LACOLA
-
LACOLA: Language, Cognition, and Landscape
-
-
LANG-KEY
-
The LANG-KEY project opens up new horizons in the human sciences by providing scientific access to human diversity. It does so by situating field linguists at the center of an interdisciplinary research framework in which language expertise provides the crucial point of connection between researchers and lesser-known speech communities and knowledge systems. The project explores human perception, a field in which recent research hints at considerable but still poorly understood variation across human languages and cultures. To this end, the project brings together a unique and highly qualified team representing Linguistics, Cognitive Psychology, Geoscience and History of Religions to investigate language of perception in three diverse Language Observatories. Combining well-established methods with novel ones, the research focuses on two fundamental and interrelated arenas of perception: landscape and ritual. Acknowledging that language provides a window on both cognition and culture, the project bridges the gap between psychological and cultural approaches to the senses.
-
-
REaCHeS
-
References to Environs are Coordinated to be Heard and Seen. The REaCHeS project examines indicating strategies in speakers of Eastern Chatino (EC), a lesser-studied and typologically unusual language used in Oaxaca, Mexico.
-
-
SpaceH
-
The datas were recorded on computer with other students present
-
-
Swedia2000
-
This is a research corpus as well as a public corpus of Swedish dialects. It was recorded 1998-2001 by the Swedia 2000 project and described in IMDI within the ECHO project.
-
-
Tactile Reading
-
Participants reading a short text from Pippi Longstocking and feeling a tactile image of a face. This is recorded by the automated finger-tracker, developed by Björn Breidegård.