Principles of Documenting Data

In this lesson you will learn

  • What “data documentation” is and what good documentation includes
  • At what point in your research you should document your data
  • What different types (or “levels”) of documentation exist

Initial questions

  • You have probably already documented some of your data – can you think of some ways you did so? Were you thinking of those tasks as “documenting” your data?
  • Why is it important to document your data?
  • Are there “right” and “wrong” ways to document data?

Documentation Overview

At the core of good data management is documentation. Documentation introduces your data, provides a detailed description of their key attributes, and contextualizes them Your documentation should describe what you did and why you made the choices you made.

Documentation can be written at many “levels” and comes in many forms (as described below). Together, all of the documentation associated with a research project should answer a series of important questions (although any particular piece of documentation may only answer a subset of those questions).

  • What was / is the context of data collection (empirical, theoretical, and/or normative)?
  • How did you generate / collect the data?
  • In what form are the data (e.g., “interview transcript”)?
  • What are the data about (i.e., what is your research about)?
  • How are the data formatted, structured, and organized?
  • How did you transform or manipulate the data (e.g., modify format)?
  • How did you validate / assure the quality of the data?
  • What ethical or legal limits (e.g., confidentiality, copyright) are there on access to / use / re-use of the data?
  • (Perhaps) How did you analyze the data?

Clear, thorough documentation is helpful to you (and the “future you”), to other members of your research team (if you have research assistants or are otherwise working collaboratively), and to scholars who may re-use your data later in time. Good documentation helps you and others to get clarity on and remember details about your data and its provenance, to assess the quality and evidentiary value of the data, and to avoid misinterpreting or incorrectly using the data. If you share some or all of your data (as discussed in the next module), good documentation improves the processing and archiving of your data, and facilitates the creation of good metadata and an accurate catalog record for a published data collection.

The specific types of documentation that might best introduce, describe, and contextualize data differ from research project to research project. Below we offer some examples of types of data documentation that you might produce as you carry out a research project:

  • Questionnaires used for surveys or semi-structured interviews
  • Guidance materials used for team-based fieldwork
  • Instructions for focus group facilitation
  • Consent forms and information sheets
  • Approved IRB application
  • Permissions or licenses from copyright holders
  • Description of methods used to analyze the data
  • Description of fieldwork and project context
  • Description of how derived materials (individual files or variables) were created
  • Coding schemas

Documentation is a love letter to yourself

This famous quote, originally referring to computer code, applies just as much to your data. Be kind to your future self.

When to Document Data

Lifecycle thumbnail *collection and analysis*

Documentation – and creating it – play important roles throughout the lifecycle of your research project and the lifecycle of your data. You engage in different documentation tasks at each stage of the research and lifecycles.

  • When you are beginning to plan your research think of the different types of documentation you will need and create a template for each type. For instance:
    • Generate a template for your archive log.
    • Create a template for “informal metadata” from interactive data collection (the next lesson talks more about metadata).
    • Create a field observations template (see the next lesson for more on this type of template).
  • You create the bulk of the documentation for a particular project as you are collecting / generating data, using the templates you created.

  • As you analyze your data and write you pull together, organize, and finalize your documentation.

In particular when you are deep in the trenches generating data, you may feel that working on documentation is not the best use of your time. To the contrary, while you’re collecting / generating data is the best moment to document them. Systematically capturing all relevant information about your data as you gather / create them – i.e., when that information is most available and clear to you – will allow you and others to make the best use of your data.

Levels of Documentation

It is helpful to think of documentation by the unit it documents. Some information refers to the whole project, while other information refers to an individual file. No matter at what level you are documenting data, documentation files should be clearly labeled using a consistent labeling schema.

Project Level Documentation

Project level documentation describes the general parameters of your research project. This type of documentation typically begins with a high level overview of your project and its focus. Thereafter, you describe the “what,” “who,” “when,” “where” and “how” of data collection / generation.

  • Describe the what of your data
    • What types of data are you collecting?
    • In what form are they (i.e., focus group transcript)?
    • What is the substantive content of the data – what are they about?
  • Document anyone who has or will contribute to a project – research assistants, translators, transcribers, interviewers, coders, etc. – and what role they played.
  • Document when any important project-related events occur.
  • Describe the “where” of your data collection – contextualize the collection process(es) in space and time.
  • Describe how you / your team collected / generated the data, detailing and justifying important methodological choices, for example:
    • How did you pick interview respondents (and why did you do so as you did)?
    • How did you recruit focus group participants (and why did you do so as you did)?
    • How did you pick research sites (and why did you do so as you did)?
    • How did you select archives and archival documents (and why did you do so as you did)?
    • If you deviated from your original data collection plans, in what ways and why?

As you generate data you should keep informal but complete and accurate lists to document these activities. Spreadsheets are a great way to keep track of, for example, which websites you consult, whom you interview, or what documents you review. Later, you can formalize these lists into clear documentation.

File Level Documentation

File-level documentation reflects the contents of an individual data file. Often, you keep track of this information in the header of a file/document. Alternatively, you can record information using the file “properties”, which makes descriptions of a file and its content visible to your computer’s operating system. You can construct templates for either type of documentation, and different types of data files, before you start your research so the information is structured similarly across different files of the same type.

Software tools can help you create file-level metadata. You may already use a reference manager like Zotero, Mendeley, or Endnote to store academic work (e.g., scholarly articles) and metadata about it. You can also use these tools to store newspaper articles, archival documents, or even interview transcripts, and descriptive information for each document. Such file-level metadata. might include, for instance, date collected, location collected, people involved, main topics, etc.

There are also other tools that can help you to keep track of documents. For example, the Tropy software was developed by historians to facilitate keeping track of and organizing large numbers of pictures taken in archives.

Exercise

Documenting Your Data

  1. Make a list of three types of documentation at the project or file level that you will need to create for a research project you are carrying out. For each item, indicate in what form you’ll create the documentation (i.e., as a Word doc, Excel spreadsheet?) and offer a short description of the types of information you’ll include in that documentation.
  2. Think through three ways in which creating and having this documentation will help you.
  • show solution
    1. Documentation lists and insights will vary considerably from project to project. The goal of this exercise is simply for you to think about the ways in which data documentation is valuable – operationally and analytically – to you and your project, as the more valuable you perceive documentation to be the more likely you are to create it. We provide a sample solution here, but don’t using it as a template. You should come up with your own strategies matching your data and your workflows.

      Sample solution

      • Project-level documentation – “who”
        • I will use a (continually evolving) Google doc for this (so I can share it with my dissertation advisors easily). I will not use a date-extension in the file name to version, but Google doc’s version history will allow anyone viewing the document to see what changed when.
        • The document will have two sections: (1) project personnel and (2) interview respondents (interviews are the only form of interactive data collection I’m using).
          • (1) I will include here a list (using pseudonyms just to be extra cautious) of the small group of people (research assistants, translators, transcribers) whom I’ve invited to work with me on the project, listing information relevant to each (e.g., the institution with which they’re affiliated, how I identified then, when I interviewed and hired them; by what logic, how, how much, and how often I plan to pay them, what exactly they are doing). (I should keep track of their birthdays too.)
          • (2) Here I’m thinking not a list of my interview respondents (I have another plan for that) but rather the methodology I used to identify people in their “category” – i.e., how did I identify/choose judges to interview, how did I identify/choose clerks to interview, how did I identify/choose constitutional scholars to interview, etc..
      • Project-level documentation – “when”
        • I will use a (continually evolving) Google doc for this (so I can share it with my dissertation advisors easily). I will not use a date-extension in the file name to version, but Google doc’s version history will allow anyone viewing the document to see what changed when.
        • This document is going to be kind of like a diary. Each day I’m going to make a short entry about what I did for my research project. I will be sure to highlight important milestones, key decisions made and problems that I resolved (or am stuck on), key interviews (using a code for respondents), etc..
      • File-level documentation – interview
        • I’m not sure how many of my interviews I’m going to audio-record, and of those, for how many I’ll make formal transcripts. Even without doing these things, though, I will want to keep track of a lot of aspects of my interviews. I already looked ahead to the next lesson and I really like what I saw there about the two types of “informal documentation” I can create for each interview – both “practical information” and “observations and reactions.
        • I’m going to have a dedicated Google doc for each interview that contains this information.
        • Having this information at my fingertips will help me (along with my first type of documentation above) to make sure I’m carrying out my inquiry in a similar way in my second research context.
      • How this documentation will help me
        • Having to write all of this will help me think critically about my choices. I didn’t do a very good job justifying my choices above, but in my actual documentation I’ll do so. This will help me to make sure I’m making well-founded choices (and convince my dissertation advisors of the same).
        • Having this will help me remember what I did, which it will be really important to have in mind when I’m interpreting my data and using them to support claims and conclusion in my written work.
        • I’m carrying out my project in two different countries and I want my research processes to be the same in both (in hopes of maximizing the comparability of my data). Keeping careful track of all of this will help me to operate the same way in both contexts.
          • For instance, it will be really important that I try to identify people in the same way – and have more or less the same mix of people – in both of my interview contexts. My first and third types of documentation above will help me to do that.