Course information

  • Course number: 84462
  • Instructor: Bradley McDonnell
  • Time: T 1:30-4:30
  • Location: SAKAM A102
  • Email:
  • Office hours: W 2-3

Course description

Contemporary language documentation is dedicated to producing a long-lasting, multipurpose record of a language. This course will give you the skills you need to produce such a documentation, with special attention given to digital data collection, data sustainability, and the documentation of language-in-use. The skills you develop in this class can be extended to your future fieldwork or toward bringing an existing documentation project in line with current practice. Students will (1) gain an understanding of the current best practices in digital language documentation; (2) develop skills in a prosody-based transcription system that can be applied to any language; (3) become familiar with key software and hardware used in our field; (4) develop skills to troubleshoot data management problems in a variety of fieldwork situations. By the end of the course, students will be able to plan for conducting best-practice language documentation project of their own, from equipment purchase to data collection to data annotation to archiving and presentation.

Course readings

The course readings listed in the course schedule can be accessed through Laulima Resource foler unless they are already available online.

Required materials

Students are required to have (or have access to) the following:

  1. Computer (preferably a laptop)
  • You may use any operating system (Mac OS, Windows, Linux) except when we are working with FLEx, at which point you will need access to Windows or Linux, either on a PC or through an emulator of some kind.

  • Download the following software in the version listed below:

    For certain classes students will need to bring a laptop computer and headphones to class.

  1. Closed-ear/closed back (`circumaural’) monitoring headphones
  • These headphones will be used with your laptop as well as audio and video recorders.

    • Sennheiser 201, 202 and Audio-Technica ATH-M20 are recommended (Price: $20–$45).
    • It may be possible to check out headphones from the lab or borrow them from a fellow student.
    • Note that earbuds or on-ear (`supra-aural’) headphones are not recommended.

    If students are not already “Advanced Users of the LAE Labs”, please do so before week 4. This will allow you to check out recording equipment. See information on becoming a LAE Labs Advanced User.

Instructions for downloading FLEx on virtual machine

Follow the steps provided on these websites in this order:

  1. Download Ubuntu 16.04.4 here. Don’t do anything else, just have it downloaded before step 2.
  2. Download Virtual Box for OS X here.2 Follow all instructions closely here, all the way to the bottom of the page.
  3. I noticed that Ubuntu was running very, very slowly. So follow the instructions here to speed it up and make it actually useful.
  4. Follow instructions here to install FLEx (aka: Fieldworks). Note: Follow instructions carefully and read all of the dialogue boxes. It can be a bit tricky.

Grading

  1. Participation & reading responses 20%
  • Students are required to be at every class and participate in class discussions of the readings. When readings are assigned, students are required to send two short comments or questions on the Laulima forum about the reading by 8pm the day before the reading is to be discussed.
  1. Assignments 20%
  • There will be several short assignments on the topics being discussed in class.
    1. IRB assignment
    2. Metadata assignment
    3. ELAN assignment
    4. FLEx lexicon assignment
    5. FLEx glossing assignment
    6. ELAN-FLEx-ELAN import export assignment
    7. Data management plan
  1. Presentations 20%
  • Each student will present once a topic listed below:
    1. Digital data management in language documentation
    2. Hardware/software for language documentation
  1. Audiovisual recording & transcription 20%
  • Each student will make a recording of a conversation of at least 1/2 hour.
    • The recording must use video with a separate audio recorder.
    • The audio and video will be synced, edited and exported for transcription in ELAN.
  • Students will transcribe a small portion of the recording using Discourse Transcription (Du Bois et al. 1993).
  1. Documentation enrichment project 20%
  • Enrich a portion of either the documentation of Besemah or Nasal. In week 8 of the semester students will begin meeting with me to discuss the areas of the documentation project on which they would like to work. The goal of this project is to give student hands on experience working with documentary materials, using the technical skills that they have learned in this class. Possible projects include the following:

    • Prepare and possibly edit audio and video recordings to be archived in PARADISEC or ELAR.
    • Enrich and organize metadata and consent materials.
    • Edit ELAN files to improve the accuracy of alignment and English free translations.
    • Prepare ELAN files to be imported into FLEx.
    • Export ELAN files into FLEx and gloss different speech events.

Presentations

Students are required to do two mini-presentations. These are short 10 minute presentations that explain topics to the class in a concise, useful way. The presentation should provide an overview of the topic with illustrative examples that help other students understand how the topic applies to language documentation. Each presentation should be accompanied by a short presentation and/or a 2 page (single-spaced) handout with relevant information and examples. The handout should serve as a kind of cheat sheet.

Presentation 1 topics

  • IMDI/CMDI metadata standards
  • OLAC metadata standards
  • XML & JSON
  • Unicode
  • Digital audio file formats
  • Digital video file formats
  • ISO 693-3 & Glottocodes
  • DOIs, PIDs, and permanent handles
  • Repositories, servers, and DSpace

Course Policy

  • Attendance in this course is crucial. In order to be successful in the course, you need to attend class and be punctual. That said, if you are sick (and contagious), please do come to class. Excessive absences or tardiness may result in a grade reduction.
  • Please do not text, check email, Facebook, Instagram, Snapchat, etc., or work on anything unrelated to class. It can really be distracting for everyone.

Needs (ADA Statement)

If you have a disability for which you need accommodations in this class or any other special need (e.g. religious holidays), please inform the instructor as soon as possible. The KOKUA Program (Office for Students with Disabilities) can be reached at (808) 956-7511 or (808) 956-7612 (voice/text) in room 013 of the Queen Lili‘uokalani Center for Student Services.

Course Schedule

The tentative course schedule roughly follows the order (i) preparation, (ii) data collection, (iii) data annotation, (iv) data analysis, (v) data archival, and (iv) data dissemination.

These readings and due dates are subject to change.

Week 1 (1/14): Introduction

Week 2 (1/21): Overview of data management & IRB

Week 3 (1/28): Archiving & metadata

Week 4 (2/4): Audio Recording

  • Readings: Margetts & Margetts (2011: p. 13-32), Artis (2014: p. 183-216)

  • Class activities: (Slides)

  • Assignment: Metadata assignment: Spreadsheet vs SayMore X

  • Additional readings: Nathan (2008: section 4), Bowern (2015: section 2.2)

Week 5 (2/11): Video Recording

I will be in Indonesia. Leah Pappas will be teaching.

  • Readings: Seyfeddinipur (2011: section 6.4), Artis (2014: p. 35-83, 113-182)
  • Class activities:
    • Share audio recordings and experiences
    • Slides
  • Assignment: Group audio recording assignment
  • Additional readings: Dimmendaal (2010: p. 33-53) and Margetts & Margetts (2011)

Week 6 (2/18): Video conversion

I will be in Indonesia. Leah Pappas will be teaching.

Week 8 (3/3): FLEx II

Week 9 (3/10): ELAN I

  • Readings: Berez (2007)
  • Assignment: FLEx Assignment
  • Additional readings:

SPRING BREAK (3/17): NO CLASS

Week 10 (3/24): ELAN II

  • Readings: no readings
  • Assignment:
  • Additional readings:

Week 11 (3/31): FLEx-ELAN

Week 12 (4/7): Transcription (Level 1)

  • Readings: Himmelmann (2006), Himmelmann (2018), Du Bois et al. (1992: ch. 1, 4-6, 10) (Introduction, Units, Speakers, Transitional Continuity, Pause)
  • Assignment: ELAN FLEx import export assignment
  • Additional readings:
  • Class activities: slides

Week 13 (4/14): Transcription (Level 2)

  • Readings: Du Bois et al. (1992: ch. 11-12), Du Bois handouts (Vocal Noises, Quality)
  • Assignment:
    • Complete DT take-home task.
    • Transcribe three minutes of conversation up to level 1. Be ready to share/discuss excerpt!
  • Additional readings: Dingemanse & Floyd (2014)
  • Class activities: slides

Week 14 (4/21): Transcription (Level 3)

  • Readings: Du Bois et al. (1992: ch. 17), Du Bois handouts (Spelling)

  • Assignment:

    • Transcribe three minutes of conversation up to level 2.5 (i.e., transitional continuity (, . ?), pause, laughter, overlap, unintelligble speech). Be ready to share/discuss excerpt!
  • Additional readings:

Week 15 (4/28): Transcription, annotation, and searching

  • Readings: Schultze-Berndt (2006), McDonnell (in revision)
  • Assignment: Enrich three minutes of conversation up to level 3 and upload to Laulima.
  • Additional readings:

Week 16 (5/5): Course review & Final Presentations

  • Documentation Enrichment Project final presentation

Readings

Arkhipov, Alexandre & Nick Thieberger. 2018. Reflections on software and technology for language documentation. In Bradley McDonnell, Andrea L Berez-Kroeker, & Gary Holton (eds.), Reflections on Language Documentation 20 Years after Himmelmann 1998, 140–149. Honolulu: University of Hawai’i Press. Retrieved from http://hdl.handle.net/10125/24821

Artis, Anthony Q. 2014. The shut up and shoot documentary guide: A Down & Dirty DV production 2nd edition. New York ; London: Focal Press, Taylor & Francis Group.

Berez, Andrea L. 2007. Review of EUDICO linguistic annotator (ELAN). Language Documentation & Conservation 1(2). 283–289.

Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3). 557–582.

Bow, Catherine, Baden Hughes & Steven Bird. 2003. Towards a General Model of Interlinear Text. In Proceedings ofEMELD Workshop 2003: Digitizing and Annotating Textsand Field Recordings, 1–47. Lansing MI, USA. Retrieved from http://emeld.org/workshop/2003/bowbadenbird-paper.html

Bowern, Claire. 2010. Fieldwork and the IRB: A snapshot. Language 86(4). 897–905.

Bowern, Claire. 2015. Linguistic Fieldwork: A Practical Guide 2nd ed. New York: Palgrave MacMillan.

Burnard, Lou. 2005. Metadata for Corpus Work. In Martin Wynne (ed.), Developing Linguistic Corpora: A Guide to Good Practice, 30–46. Oxford: Oxbow Books. Retrieved from http://ota.ox.ac.uk/documents/creating/dlc/

Cox, Christopher. n.d. Managing data in a language documentation corpus. In Andrea L Berez-Kroeker, Bradley McDonnell, Eve Koller, & Lauren Collister (eds.), Handook of Linguistic Data Management, Cambridge: MIT Press Open.

Dimmendaal, Gerrit J. 2010. Language description and ‘the new paradigm’: What linguists may learn from ethnocinematographers. Language Documentation & Conservation 4. 152–158.

Dingemanse, Mark & Simeon Floyd. 2014. Conversation across cultures. In Nicholas J Enfield, Paul Kockelman, & Jack Sidnell (eds.), The Cambridge Handbook of Linguistic Anthropology, 447–480. Cambridge: Cambridge University Press.

DiPersio, Denise. 2014. Linguistic Fieldwork and IRB Human Subjects Protocols: Linguistic Fieldwork and IRB Human Subjects Protocols. Language and Linguistics Compass 8(11). 505–511. DOI: https://doi.org/10.1111/lnc3.12106

Du Bois, John W., Susanna Cumming, Stephan Schuetze-Coburn & Danae Paolino. 1992. Discourse transcription. Santa Barbara: Department of Linguistics, University of California, Santa Barbara.

Du Bois, John W., Stephan Schuetze-Coburn, Susanna Cumming & Danae Paolino. 1993. Outline of discourse transcription. In Jane Anne Edwards & Martin D. Lampert (eds.), Talking data: Transcription and coding in discourse research, 45–89. Hillsdale, NJ: Lawrence Erlbaum Associates.

Good, Jeff. 2011. Data and Language Documentation. In Peter K. Austin & J. Sallabank (eds.), The Cambridge Handbook of Endangered Languages, 212–234. Cambridge: Cambridge University Press.

Himmelmann, Nicholas P. 2018. Meeting the Transcription Challenge. In Bradley McDonnell, Andrea L. Berez-Kroeker, & Gary Holton (eds.), Reflections on Language Documentation 20 Years after Himmelmann 1998, 33–40. Honolulu: University of Hawai’i Press. Retrieved from http://hdl.handle.net/10125/24806

Himmelmann, Nikolaus P. 2006. Prosody in language documentation. In Jost Gippert, Nikolaus Himmelmann, & Ulrike Mosel (eds.), Essentials of language documentation, 163–182. Berlin, Boston: Mouton de Gruyter.

Margetts, Anna & Andrew Margetts. 2011. Audio and Video Recording Techniques for Linguistic Research. In Peter K. Austin & Julia Sallabank (eds.), The Cambridge Handbook of Endangered Languages, 13–53. Cambridge: Cambridge University Press.

Nathan, David. 2008. Minding Our Words: Audio Responsibilities in Endangered Languages Documentation and Archiving. Taiwan Journal of Linguistics 6(2). 59–77.

Robinson, Laura C. 2010. Informed consent among analog people in a digital world. Language & Communication 30(3). 186–191. DOI: https://doi.org/10.1016/j.langcom.2009.11.002

Salffner, Sophie. 2015. A Guide to the Ikaan Language and Culture Documentation. Language Documentation \& Conservation 9. 237–267.

Schultze-Berndt, Eva. 2006. Linguistic annotation. In Jost Gippert, Nikolaus P. Himmelmann, & Ulrike Mosel (eds.), Trends in Linguistics. Studies and Monographs [TiLSM], 213–252. Berlin, New York: Mouton de Gruyter. DOI: https://doi.org/10.1515/9783110197730.213

Seyfeddinipur, Mandana. 2011. Reasons for Documenting Gestures and Suggestions for How to Go About It. In Nicholas Thieberger (ed.), The Oxford Handbook of Linguistic Fieldwork, 147–165. Oxford: Oxford University Press.

Thieberger, Nicholas & Andrea L. Berez. 2012. Linguistic Data Management. In Nicholas Thieberger (ed.), The Oxford Handbook of Linguistic Fieldwork, 90–118. Oxford: Oxford University Press.

Thieberger, Nick, Amanda Harris & Linda Barwick. 2015. PARADISEC: Its history and future. In Amanda Harris, Nick Thieberger, & Linda Barwick (eds.), Research, Records and Responsibility: Ten years of PARADISEC, 1–16. Sydney: Sydney University Press.

van Driem, George. 2016. Endangered Language Research and the Moral Depravity of Ethics Protocols. Language Documentation & Conservation 10. 243–252.


  1. Please do not download the latest version. For Linux users, please follow these instructions. For students who plan to run an emulater on Mac, please see me in the first two weeks of class to discuss the different options.↩︎

  2. Apparently, Virtual Box can be very slow at first but improves after you restart your computer several times.↩︎