Research Transparency and Qualitative Data

In this lesson you will learn

  • Three general principles that underlie transparent social science research
  • Some approaches to making qualitative social science research more transparent

Initial questions

  • What would you need to know about a particular piece of research in order to fully evaluate its quality?
  • Can you think of some great examples of transparent research in your field?
  • What are some exemplary practices for making research transparent that are used in your field?

Transparency in Social Science Research

Norms across the social sciences are evolving to encourage greater access to the data underpinning research, and more transparency with regard to research practices. The goal of this norm change is to make scholarly work more easily understood and evaluated. Three general principles comprise research transparency. (We draw here on the trio of imperatives envisioned by the American Political Science Association [2012, 9-10]).

  • Data access: achieved by referencing the data that underpin evidence-based knowledge claims and, if you generated or collected those data yourself, sharing those data or explaining why you cannot do so.
  • Production transparency: entails offering a full account of the procedures used to collect or generate your data (if you did so yourself).
  • Analytic transparency: involves providing a full account of how you drew inferences from the data, i.e., clearly explicating the links between your data and your empirical claims and conclusions.

While openness is relevant for all types of social science research, different research traditions of evidenced-based inquiry have developed, and will continue to develop, different strategies to realize these principles. In particular, the strategies devised to achieve the general principles will differ between quantitative and qualitative research, in large part because data are deployed differently in these distinct types of inquiry (see the previous lesson in this module on Deploying Qualitative Data).

Nonetheless, all such strategies for achieving openness should:

  • Make relevant data and analytic information immediately available in tandem with the particular knowledge claim they were used to generate (proximity)
  • Make data and analytic information FAIR (findable, accessible, interoperable, and reusable, Wilkinson et al. 2016)
  • Address concerns about the ethical and legal complications that constrain openness (protection).

For qualitative research, optimizing proximity entails linking digital data sources (e.g., archival documents, audio-recordings, interview transcripts, ethnographic field notes) and accompanying materials containing relevant analytic information directly to the relevant passage in a digital publication, and accessible from a journal’s web page (i.e., on the publisher’s platform).

Rendering data and analytic information FAIR entails tasks such as describing them with the proper metadata and assuring their long-term preservation. Data repositories have the expertise and technology to help you to render, and keep, your data FAIR as standards evolve over time.

Finally, as lessons two and three of the Sharing Qualitative Data module suggest, a promising way to maximize transparency while simultaneously addressing the ethical and legal complications that sharing social science data can present – in particular protecting human participants and respecting copyright law – is by establishing differential access to the evidentiary base of published articles.

Annotation for Transparent Inquiry (ATI)

The Qualitative Data Repository (QDR), in partnership with the software non-profit Hypothesis, has developed a new approach to increasing the openness of qualitative research that achieves the objectives noted above: Annotation for Transparent Inquiry (ATI). ATI builds on “active citation” (an earlier technique for openness in qualitative research pioneered by Princeton Professor of Politics Andy Moravcsik).

ATI facilitates transparency by allowing you to add digital annotations to a book or article manuscript. Annotations allow you to provide content and offer information about the research context and/or the generation and analysis of data that you do not have enough room to include in the article text. Each annotation is linked to a particular passage in the text of the manuscript. Ultimately, the annotations appear right beside the text of the article or book on the publisher’s web page. You can see a visual representation of ATI here.

Each annotation includes one or more of the following:

  • Full citation to the underlying data source(s) and, when relevant, supplementary information about the source’s location;
  • Source excerpt(s): a quote (or redaction) from a textual source (including the transcription of handwritten or audiovisual material), typically 100 to 150 words;
  • Source excerpt translation(s): a translation (and its source) if the excerpt is not in the language in which the manuscript is written;
  • Analytic note: information contextualizing the source(s), and/or discussion of how the relevant source(s) were collected, how data were generated, how that source/data support conclusions or claims in the annotated passage; and potentially how ethical and legal complications inhibit the sharing of the underlying data source;
  • Link to the underlying data source(s) when these are digital and can be shared ethically and legally.

ATI empowers authors to demonstrate the richness, rigor, nuance, and validity of their inferences and interpretations, amplifying their research products. Annotations make immediately available to readers information about how the underlying data were generated and/or analyzed, thus enabling research transparency. They can also facilitate data access by serving as a link between the data sources underlying a claim and the text of an article or book.

An “ATI Data Supplement” for a particular manuscript comprises the set of digital annotations that you created, as well as an “ATI Data Overview.” This overview, of approximately 1,000 words in length, discusses the various data generation procedures that you employed, and how the analysis attends to the rules of inference or interpretation that underlie the qualitative methods that you employed.

Employing ATI benefits you, and your qualitative inquiry, in several ways. Employing ATI:

  • permits you to display critical evidence supporting your claims;
  • encourages you to be more careful and precise when making and supporting evidence-based arguments;
  • helps you to meet transparency standards;
  • facilitates evaluation of your work by reducing transaction costs for readers who seek more information about how you drew descriptive, causal, or interpretive claims or who wish to investigate whether the information contained in cited sources supports evidence-based claims.

Some models of scholarship that has been annotated using ATI can be found here, and more specific directions for using ATI can be found here.


Annotation for Transparent Inquiry

  1. Using a research product you recently completed, choose three contiguous pages that contain multiple evidence-based claims. Consider how well you were able to substantiate those claims given space-limitations. Try to remember if there was additional information or evidence that you cut as you were revising the piece. Then, seek to annotate a few passages, following the description above and the directions here. Ask yourself what you are gaining – and if you are losing anything – through annotation.
  • show solution
    1. Working with another scholar who is familiar with your area of research, give them your original research product (without the annotations, and without mentioning the annotations!) to read. Then ask them to read the version with the annotations, and to answer the same questions you asked yourself – what is gained, and potentially lost, through annotation?

    Data Appendix

    No matter how you collect and generate data, production transparency requires you to communicate to readers as much about your data-gathering processes as you can. While your processes were likely varied and intricate, you need to convey them holistically and synthetically, while simultaneously offering sufficient detail for a reader to understand and evaluate what you did. ATI empowers you to give a “macro-representation” of your data collection through the Data Overview, and a “micro-representation” of your data collection through annotations.

    A data appendix offers a “meso-level” representation: an “itemized overview” of respondents, documents, or other data sources, with each described via a structured set of attributes. Your data appendix might include a subset of the data included in your “data manifest,” described in the previous lesson, with each item described in more detail. If you didn’t use ATI your appendix should also include a holistic overview of the type you would have written in your ATI Data Overview detailing, e.g., how data sources were chosen.

    We describe here, as an example, a data appendix for a particular type of interactive data collection – interview research – drawing on an excellent example developed by Erik Bleich and Robert Pekkanen. In projects involving interactive data collection, with whom you interacted (and why), and how you solicited information from them, are key drivers of the data that are produced, and thus of your analysis and findings. Being transparent in this kind of research, then, entails providing as much information as you can about those interactions.

    In our lesson on Documenting Data and Creating Metadata, we suggested that you create “informal metadata” for each data-collection interaction. Creating a data appendix entails aggregating certain aspects of those informal metadata into a holistic depiction of your interactive data collection. You probably won’t include in your appendix all of the types of informal metadata that you created for each exchange; you should choose those that you think will allow your readers to evaluate the quality and evidentiary value of the data.

    Bleich and Pekkanen provide a template for one potential element of your data appendix – what they term an “Interview Methods Table”. Such a table includes key information about each respondent and each exchange. (“Saturation” refers to whether an exchange revealed any new information, and/or whether a particular category of respondent has reached “saturation” such that no additional interviews are required.) You might include other key metadata such as the date and location of the exchange.

    Interview Appendix

    The exact content of your appendix depends on your research project and the types of data collection in which you engaged. They key is for it to help readers of your published work to understand to the greatest degree possible how your data collection processes produced the data that underpin your work, thus helping them to assess the quality of the data and how well your claims are supported.


    Creating a Data Appendix

    1. Draft the framework for a data appendix for a research product you are currently creating. What set of attributes would work across all your different forms of data (documents, interviews transcripts / notes, etc.) and help your readers understand and evaluate your project’s evidentiary foundation?
    • show solution
      1. Give your draft framework, and the abstract for the research product you are writing, to someone unfamiliar with the piece. Solicit their input on whether the information you are proposing to provide is the type they would want to know when evaluating your work.

      Qualitative Data Analysis Software and Transparency

      The use of software to assist qualitative data analysis (sometimes referred to as CAQDAS or “Computer Assisted Qualitative Data Analysis Software”) is becoming more and more common across the social sciences. Such software assists researchers with routine tasks such as coding, categorizing, and annotating documents. Different from statistical packages (and as including “assist” in the name suggests), the analysis itself does not take place in the software: you, not an algorithm, make key analytic choices such as which code to assign to a statement. As a result, simply sharing some code or output does not satisfy requirements for analytic transparency. Nevertheless, the software can help you to make your work and your data transparent.

      We offer here some suggestions on achieving transparency when working with CAQDAS software

      1. Follow General Advice on Data Management

      Most advice for managing qualitative data in order to facilitate their subsequent sharing is applicable to CAQDAS data: have a clear organizational structure, document during data collection, etc.

      2. Keep Track of Sensitive Information

      As you collect your data, keep concerns about privacy and sensitivity in mind. As you see information in your data that may need redacting, use the software to highlight it, so you can quickly identify it later on. Also consider tagging files that you specifically cannot share (e.g., interviews given “off the record” or signed consent forms).

      3. Keep Memos about Analytic Decisions

      As you analyze your data, your CAQDAS tool will help you make your analytic process transparent. Making coding and analysis decisions explicit in memos will help readers to evaluate your conclusions, and also help secondary users to better understand the application of given codes in your data.

      4. Preparing Your Data for Sharing

      Make a copy of your project and delete any information you do not want to share, such as private notes or sensitive information. If you have followed our advice above, you can now use the tags you have created to redact potentially identifying information from transcripts following the guidelines we provided previously.

      5. Exporting Your Data for Sharing

      One of the challenges of sharing CAQDAS-produced data is that every software product has its own, typically proprietary, export format. These formats do not travel between software, may change between software versions, and are thus problematic for sharing and archiving.

      One solution is to share data in two different forms. The first form is the raw full export from your software. Once your data are prepared for sharing, first export the whole project into your software’s dedicated export format (e.g., .nvp for NVivo, “Export Data” for Dedoose, or “copy bundle” for atlas.ti). Then, create a second “human-readable” export that anyone, regardless of software, can use: Export all relevant files in widely used formats (such as RTF, PDF, Excel, as well as widely used video, image, and audio formats). Also export all relevant memos as RTF or PDF files.

      As of this writing, efforts are underway to provide a standardized format for exchange between different CAQDAS software products. As this exchange format matures and becomes more widely available, we expect it to replace some of these recommendations.

      Further Reading