Research Data Management and the Research Data Lifecycle

In this lesson you will learn

  • The distinction between information and data
  • What data management is and why you should do it, and do it well
  • The importance of planning for successful data management
  • What the research data lifecycle is, how it relates to the research lifecycle, and how it relates to your research
  • How this course uses the concept of the research data lifecycle

Initial questions

  • Had you heard of data management before this course? In what context?
  • You have probably engaged in some standard data management practices many times – backing-up, naming and organizing files, etc. When did you decide exactly how you would carry out those tasks?
  • When working on a research project, how and when do you tend to plan out how you will collect your data?
  • Have you ever tried to think about the different phases of a research project “through the eyes of” your data?

What Are Data?

We encourage you to conceive of data broadly. For the purpose of this course we define data as any representations of the social world relevant to a particular type of inquiry and rendered in a form suited to the analysis to be undertaken (adapted from Kapiszewski & Karcher, forthcoming). This definition implies that data can take many forms: numbers (e.g., statistics), interview transcripts, archival materials, photographs, movies, field notes, maps… you can think of many more. Frequently we distinguish between qualitative and quantitative data, although the distinction is not always clear-cut. Quantitative data are generally numeric and typically organized in tables. Qualitative data come in many forms (e.g., text, image, audio, and video) and structures.

You’ll note that we define data by their relationship to systematic inquiry. It is this relationship that distinguishes them from information. For instance, you may read a newspaper article, let’s say a performance review, to stay informed about contemporary culture; for you the contents of that article are information. That same review article (together with many others) could be part of a systematic collection of data by a theater scholar.

How you generate data, i.e. how you produce the empirical building blocks of your research, how you manage those data, how you analyze them, and (potentially) how you share them, have profound implications for your work. This course is designed to help you to carry out that second step – managing your data – as effectively as possible. Doing so enables you to make the most of your data.

What Is Research Data Management?

This entire course focuses on data management. While parts of the course examine data sharing, as we will discuss, good data management is a prerequisite for meaningful data sharing.

What do we mean by data management? One definition we like is by the University of Edinburgh Research Data Services: “Research data management is caring for, facilitating access to, preserving and adding value to research data throughout its lifecycle.”

This definition highlights various features of data management.

  • “Caring for” emphasizes that your data will suffer if you do not manage them well. You may lose them entirely, they may become unusable, or you may need to spend a lot of time trying to remember why you collected them and what they mean.

  • “Facilitating access” refers to the eventual sharing of your research data with other scholars, and also applies during your research: you want to make sure that the right people (e.g., research assistants, co-authors, dissertation advisors, etc.) can access your data while preventing any unauthorized access.

  • “Preserving” your research data means keeping them usable for the long term, many years after you collected them.

  • “Adding value” refers to the various activities that increase the quality of your data: clarifying their structure, thoroughly documenting them, and making them easily findable by other researchers.

Why Put Effort into Managing Your Data?

Data management is at the core of good empirical research. You may already be doing some of the data management tasks that we cover in this course, without knowing you’re engaging in data management! Why should you care about managing your data well, though?

You should manage your data well because:

  • your data are valuable. You spend a lot of time and money to generate your data.

  • you want to make sure your data are continuously accessible to you, and organized in a way that allows you to quickly find what you’re looking for.

  • your data are critical to your research as they form the basis for your empirical claims. Better data management ultimately means better research.

  • strong data management can help you to effectively address any concerns with your data raised by reviewers or critics.

  • others may require that you manage your data effectively. If you apply for a National Science Foundation (NSF) grant, for example, you will have to create a data management plan (DMP).

  • your future self will thank you. Research projects take many years to complete. By organizing and documenting your data systematically throughout, you will still be able to understand the data years after you collect them.

Why Plan for Data Management?

But why this emphasis on advance planning of data management? There are two main reasons:

  • Steps you take early in your research can have significant impacts on what you may be able to do at later stages. For example, whether and how you can share data that you collected through research with human participants will depend on the consent language you used, which you likely determined early on, in your IRB application.

  • Research, particularly fieldwork in a foreign country, is hectic, often like juggling many things at once. Some aspects of research projects are much harder or even impossible to set up once you’re away from your home base. Having a plan in place for carrying out the data management tasks that you can anticipate, and a firm enough grasp on the fundamentals of data management so that you can develop strategies for addressing unexpected tasks, will help you stay organized.

The Research Lifecycle and the Research Data Lifecycle

All researchers are familiar with the “research lifecycle”, pictured below. This lifecycle traces the arc of a research project — from identifying a research topic / question / opportunity, to (potentially) finding others to work with, to securing funding, to reviewing extant work on the topic, to designing the project, to data generation and analysis, to writing up, to publication.

Research Lifecycle

Research Lifecycle. Based on: Nicholas, David & Rowlands, Ian & Wamae, Deanna. (2012). Charleston Conference Observatory: Are Social Media Impacting in Research?. https://doi.org/10.5703/1288284314807. p. 16

The “research data lifecycle” (RDL), pictured below, is closely related. On the one hand, the RDL tracks the research lifecycle, highlighting your continuous interaction with your data over the arc of a research project. At every point in a research project — from your initial ideas about what data you’ll need to support your claims, through collecting and analyzing those data, through sharing them when you publish based on them, through preserving them for yourself and others — you are working with your data.

On the other hand, the RDL extends beyond the research lifecycle (i.e., beyond the dissemination of research results). It also involves ensuring the independent existence, accessibility, discoverability and longevity of all of the data generated in association with a research project after its conclusion.

Research Data Lifecycle

Research Data Lifecycle. Based on: Green, Ann G., and Myron P. Gutmann. (2007) “Building Partnerships among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53. http://doi.org/10.1108/10650750710720757

As you think through and execute your project, keep the RDL in mind (or even physically in front of you: here is a pdf version to print out, and make sure that you are taking good care of your data at every step of the way!

Exercise

You and the Research Data Lifeycle

  1. Take another close look at the research data lifecycle. Does the cycle track how you have dealt with your data – or how you envision doing so? Do you think these steps actually form a cycle? Are there steps that you hadn’t fully considered? Do you think there are steps missing? If you now can see how your research data might have a “lifecycle,” will that change anything about your workflow going forward?
  • show solution
    1. Answers to these questions may vary depending upon how scholars conceive of the research process and how they collect and analyze data. One aspect you might question is the arrow from “long-term management” to “discovery and planning”. Do you see one research project tightly linked to the next in the way this arrow suggests? One aspect you might not have thought about before is long-term management of your data; had you considered the possibility of (someone) taking care of your data for years or decades? A core concern with the RDL model, particularly in the context of qualitative research, is that it has gaps and glosses over details. You can probably think of multiple steps you will take between any two steps in the model. Also, there is not always a neat distinction between planning, initial, and final data collection. You (will) likely also shift between planning your research (including data collection), carrying it out, and writing. And qualitative researchers often go back and forth between data and theory. The lifecycle doesn’t reflect any of these “back-and-forths”. When you’re planning your research, you should.

    Managing Research Data Through Their Lifecycle

    This course follows your research data through their lifecycle, discussing important data management tasks that you should carry out in every phase of the RDL. The rest of Module 1 focuses on the crucial planning stage: you learn how to write an effective data management plan (DMP), and how to craft a consent script for human participant research that will allow you to share your data. Module 2 focuses on managing your data while you are collecting them. You will learn how to document your data so that they are easy to use in the future. You will also be presented with some practical strategies for organizing your data and files, including naming conventions and back-up strategies. Finally, you will learn how to transform the raw data you collect, e.g., how to digitize documents or transcribe interviews. Module 3 addresses sharing your data safely and responsibly. Module 4 considers writing and publishing with qualitative data.

    Throughout this course, we will use a small image of the data lifecycle to locate in what phase you would carry out the tasks being discussed.