Course information

Course description

Contemporary language documentation is dedicated to producing a long-lasting, multipurpose record of a language. This course will give you the skills you need to produce such a documentation, with special attention given to digital data collection, data sustainability, and the documentation of language-in-use. The skills you develop in this class can be extended to your future fieldwork or toward bringing an existing documentation project in line with current practice. Students will (1) gain an understanding of the current best practices in digital language documentation; (2) develop skills in a prosody-based transcription system that can be applied to any language; (3) become familiar with key software and hardware used in our field; (4) develop skills to troubleshoot data management problems in a variety of fieldwork situations. By the end of the course, students will be able to plan for conducting best-practice language documentation project of their own, from equipment purchase to data collection to data annotation to archiving and presentation.

Course readings

The course readings listed in the course schedule can be accessed through Laulima Resource foler unless they are already available online.

Required materials

Students are required to have (or have access to) the following:

  1. Computer (preferably a laptop)
  1. Closed-ear/closed back (`circumaural’) monitoring headphones (recommended)
  • These headphones will be used with your laptop as well as audio and video recorders.

    • Sennheiser 201, 202 and Audio-Technica ATH-M20 are recommended (Price: $20–$45).
    • It may be possible to check out headphones from the lab or borrow them from a fellow student.
    • Note that earbuds or on-ear (`supra-aural’) headphones are not recommended.

Instructions for downloading FLEx on virtual machine

Follow the steps provided on these websites in this order:

  1. Download Ubuntu 16.04.4 here. Don’t do anything else, just have it downloaded before step 2.
  2. Download Virtual Box for OS X here.1 Follow all instructions closely here, all the way to the bottom of the page.
  3. I noticed that Ubuntu was running very, very slowly. So follow the instructions here to speed it up and make it actually useful.
  4. Follow instructions here to install FLEx (aka: Fieldworks). Note: Follow instructions carefully and read all of the dialogue boxes. It can be a bit tricky.


  1. Participation & reading responses 20%
  • Students are required to be at every class and participate in class discussions of the readings.
  • When readings are assigned, students are required to send two responses (approximately 3-5 sentences each) that contain commentary, criticisms, or questions on the Laulima forum about the reading by 8pm on the Tuesday before the reading is to be discussed .
  • Each student is required to reply to two of their classmates’ responses by 8pm on Wednesday (approximately 2-4 sentences each).
  1. Assignments 20%
  • There will be several short assignments on the topics being discussed in class.
    1. IRB assignment
    2. Metadata assignment
    3. ELAN assignment
    4. FLEx lexicon assignment
    5. FLEx glossing assignment
    6. ELAN-FLEx-ELAN import export assignment
    7. Data management plan
  1. Presentations 20%
  • Each student will present once a topic listed below:
    1. Digital data management in language documentation
    2. Hardware/software for language documentation
  1. Audiovisual recording & transcription 20%
  • Each student will make a recording of a conversation of at least 1/2 hour.
    • The recording must use video with a separate audio recorder.
    • The audio and video will be synced, edited and exported for transcription in ELAN.
  • Students will transcribe a small portion of the recording using Discourse Transcription (Du Bois et al. 1993).
  1. Documentation enrichment project 20%
  • Enrich a portion of either the documentation of Besemah or Nasal. In week 6 of the semester students will begin meeting with me to discuss the areas of the documentation project on which they would like to work. The goal of this project is to give student hands on experience working with documentary materials, using the technical skills that they have learned in this class. Possible projects include the following:

    • Prepare and possibly edit audio and video recordings to be archived in PARADISEC or ELAR.
    • Enrich and organize metadata and consent materials.
    • Edit ELAN files to improve the accuracy of alignment and English free translations.
    • Prepare ELAN files to be imported into FLEx.
    • Export ELAN files into FLEx and gloss different speech events.


Students are required to do two mini-presentations. These are short 10 minute presentations that explain topics to the class in a concise, useful way. The presentation should provide an overview of the topic with illustrative examples that help other students understand how the topic applies to language documentation. Each presentation should be accompanied by a short presentation and/or a 2 page (single-spaced) handout with relevant information and examples. The handout should serve as a kind of cheat sheet.

Presentation 1 topics

  • IMDI/CMDI metadata standards
  • OLAC metadata standards
  • XML & JSON
  • Unicode
  • Digital audio file formats
  • Digital video file formats
  • ISO 693-3 & Glottocodes
  • DOIs, PIDs, and permanent handles
  • Repositories, servers, and DSpace

Course Policy

  • Attendance in this course is crucial. In order to be successful in the course, you need to attend every class and be punctual. Excessive absences or tardiness may result in a grade reduction.
  • Please be attentive during class. This means that students are not working on other tasks during class (e.g., responding to emails) or browsing the internet (e.g., Facebook).

Needs (ADA Statement)

If you have a disability for which you need accommodations in this class or any other special need (e.g. religious holidays), please inform the instructor as soon as possible. The KOKUA Program (Office for Students with Disabilities) can be reached at (808) 956-7511 or (808) 956-7612 (voice/text) in room 013 of the Queen Lili‘uokalani Center for Student Services.

Course Schedule

These readings and due dates are subject to change.

Week 1 (8/27): Introduction

  • Class activities:
    • Review syllabus
    • Discuss class format
    • Discuss audio and video recording assignments
    • Discuss software

Week 2 (9/3): Overview of language documentation

Week 3 (9/10): Data management, archiving, metadata, and IRB

Week 4 (9/17): Audio Recording

Week 5 (9/24): Video Recording

Week 6 (10/1): Video conversion

Week 7 (10/8): ELAN I

Week 8 (10/15): ELAN II

  • Readings: None
  • Assignment: ELAN I exercises
  • Class activities: slides

Week 10 (10/29): FLEx II

Week 11 (11/5): FLEx-ELAN

Week 12 (11/12): Transcription (Level 1)

Week 13 (11/19): Transcription (Level 2)

  • Readings: Du Bois et al. (1992: ch. 11-12), Du Bois handouts (Vocal Noises, Quality)
  • Assignment:
    • Complete DT take-home task.
    • Transcribe three minutes of conversation up to level 1. Be ready to share/discuss excerpt!
  • Additional readings: Dingemanse & Floyd (2014)
  • Class activities: slides

Week 14 (11/26): NO CLASS

Week 15 (12/3): Transcription (Level 3)

  • Readings: Du Bois et al. (1992: ch. 17), Du Bois handouts (Spelling)
  • Assignment:
    • Transcribe three minutes of conversation up to level 2.5 (i.e., transitional continuity (, . ?), pause, laughter, overlap, unintelligble speech). Be ready to share/discuss excerpt!
  • Additional readings:
  • Class activities: slides

Week 16 (12/10): Transcription, annotation, searching, and final presentations

  • Readings: Schultze-Berndt (2006), McDonnell (to appear)

  • Assignment:

    • Enrich three minutes of conversation up to level 3 and upload to Laulima.
    • Documentation Enrichment Project final presentation


Arkhipov, Alexandre & Nick Thieberger. 2018. Reflections on software and technology for language documentation. In Bradley McDonnell, Andrea L Berez-Kroeker, & Gary Holton (eds.), Reflections on Language Documentation 20 Years after Himmelmann 1998, 140–149. Honolulu: University of Hawai’i Press. Retrieved from
Artis, Anthony Q. 2014. The shut up and shoot documentary guide: A Down & Dirty DV production 2nd edition. New York ; London: Focal Press, Taylor & Francis Group.
Austin, Peter. 2016. Language documentation 20 years on. In Luna Filipović & Martin Pütz (eds.), Endangered Languages and Languages in Danger: Issues of documentation, policy, and language rights, 147–170. Amsterdam: John Benjamins Publishing Company. DOI:
Berez, Andrea L. 2007. Review of EUDICO linguistic annotator (ELAN). Language Documentation & Conservation 1(2). 283–289.
Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3). 557–582. DOI:
Bow, Catherine, Baden Hughes & Steven Bird. 2003. Towards a General Model of Interlinear Text. In Proceedings ofEMELD Workshop 2003: Digitizing and Annotating Textsand Field Recordings, 1–47. Lansing MI, USA. Retrieved from
Bowern, Claire. 2010. Fieldwork and the IRB: A snapshot. Language 86(4). 897–905.
Bowern, Claire. 2015. Linguistic Fieldwork: A Practical Guide 2nd ed. New York: Palgrave MacMillan.
Burnard, Lou. 2005. Metadata for Corpus Work. In Martin Wynne (ed.), Developing Linguistic Corpora: A Guide to Good Practice, 30–46. Oxford: Oxbow Books. Retrieved from
Cox, Christopher. to appear. Managing data in a language documentation corpus. In Andrea L Berez-Kroeker, Bradley McDonnell, Eve Koller, & Lauren Collister (eds.), Handook of Linguistic Data Management,. Cambridge: MIT Press Open.
Dimmendaal, Gerrit J. 2010. Language description and ‘the new paradigm’: What linguists may learn from ethnocinematographers. Language Documentation & Conservation 4. 152–158. Retrieved from
Dingemanse, Mark & Simeon Floyd. 2014. Conversation across cultures. In Nicholas J Enfield, Paul Kockelman, & Jack Sidnell (eds.), The Cambridge Handbook of Linguistic Anthropology, 447–480. Cambridge: Cambridge University Press.
DiPersio, Denise. 2014. Linguistic Fieldwork and IRB Human Subjects Protocols. Language and Linguistics Compass 8(11). 505–511. DOI:
Du Bois, John W., Susanna Cumming, Stephan Schuetze-Coburn & Danae Paolino. 1992. Discourse transcription. Santa Barbara: Department of Linguistics, University of California, Santa Barbara.
Du Bois, John W., Stephan Schuetze-Coburn, Susanna Cumming & Danae Paolino. 1993. Outline of discourse transcription. In Jane Anne Edwards & Martin D. Lampert (eds.), Talking data: Transcription and coding in discourse research, 45–89. Hillsdale, NJ: Lawrence Erlbaum Associates.
Good, Jeff. 2011. Data and Language Documentation. In Peter K. Austin & J. Sallabank (eds.), The Cambridge Handbook of Endangered Languages, 212–234. Cambridge: Cambridge University Press.
Himmelmann, Nicholas P. 2018. Meeting the Transcription Challenge. In Bradley McDonnell, Andrea L. Berez-Kroeker, & Gary Holton (eds.), Reflections on Language Documentation 20 Years after Himmelmann 1998, 33–40. Honolulu: University of Hawai’i Press. Retrieved from
Himmelmann, Nikolaus P. 2006. Prosody in language documentation. In Jost Gippert, Nikolaus Himmelmann, & Ulrike Mosel (eds.), Essentials of language documentation, 163–182. Berlin, Boston: Mouton de Gruyter.
Margetts, Anna & Andrew Margetts. 2011. Audio and Video Recording Techniques for Linguistic Research. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge Handbook of Endangered Languages, 13–53. Cambridge: Cambridge University Press.
McDonnell, Bradley, Andrea L. Berez-Kroeker & Gary Holton. 2018. Introduction. In Bradley McDonnell, Andrea L. Berez-Kroeker, & Gary Holton (eds.), Reflections on Language Documentation 20 Years after Himmelmann 1998, 1–11. Honolulu: University of Hawai’i Press. Retrieved from
Nathan, David. 2008. Minding Our Words: Audio Responsibilities in Endangered Languages Documentation and Archiving. Taiwan Journal of Linguistics 6(2). 59–77.
Pentangelo, Joseph. 2020. 360º Video and Language Documentation: Towards a Corpus of Kanien’kéha (Mohawk). New York: City University of New York dissertation.
Robinson, Laura C. 2010. Informed consent among analog people in a digital world. Language & Communication 30(3). 186–191. DOI:
Salffner, Sophie. 2015. A Guide to the Ikaan Language and Culture Documentation. Language Documentation \& Conservation 9. 237–267.
Schultze-Berndt, Eva. 2006. Linguistic annotation. In Jost Gippert, Nikolaus P. Himmelmann, & Ulrike Mosel (eds.), Trends in Linguistics. Studies and Monographs [TiLSM], 213–252. Berlin, New York: Mouton de Gruyter. DOI:
Seifart, Frank, Nicholas Evans, Harald Hammarström & Stephen C. Levinson. 2018. Language documentation twenty-five years on. Language 94(4). e324–e345. DOI:
Seyfeddinipur, Mandana. 2011. Reasons for Documenting Gestures and Suggestions for How to Go About It. In Nicholas Thieberger (ed.), The Oxford Handbook of Linguistic Fieldwork, 147–165. Oxford: Oxford University Press.
Seyfeddinipur, Mandana & Felix Rau. 2020. Keeping it real: Video data in language documentation and language archiving. Language Documentation & Conservation 14. 503–519. Retrieved from
Thieberger, Nicholas & Andrea L. Berez. 2012. Linguistic Data Management. In Nicholas Thieberger (ed.), The Oxford Handbook of Linguistic Fieldwork, 90–118. Oxford: Oxford University Press.
Thieberger, Nick, Amanda Harris & Linda Barwick. 2015. PARADISEC: Its history and future. In Amanda Harris, Nick Thieberger, & Linda Barwick (eds.), Research, Records and Responsibility: Ten years of PARADISEC, 1–16. Sydney: Sydney University Press. Retrieved from
van Driem, George. 2016. Endangered Language Research and the Moral Depravity of Ethics Protocols. Language Documentation & Conservation 10. 243–252. Retrieved from
Woodbury, Anthony C. 2011. Language Documentation. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge Handbook of Endangered Languages, 159–176. Cambridge: Cambridge University Press.

  1. Apparently, Virtual Box can be very slow at first but improves after you restart your computer several times.↩︎