Understanding the Data Lifecycle

In this lesson you will learn

  • What the data lifecycle is and how it relates to your research
  • How this course uses the concept of the lifecycle
  • Limitations of the data lifecycle model

Initial questions

  • As you were putting together your research proposal, did you ever try to access data from other researchers? Were you successful?
  • How did you decide on your plan for your own data collection?

The Research Data Lifecycle

The “research data lifecycle” has become a ubiquitous image among data professionals. Here is one example:

Data Lifecycle

Based on: Green, Ann G., and Myron P. Gutmann. (2007) “Building Partnerships among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53. http://doi.org/10.1108/10650750710720757

As you think through the organization and management of your research data, keep this cycle in mind (or even physically in front of you: here is a pdf version to print out). Let’s begin with some questions about the data lifecycle.

Exercise

Your Research and the Data Lifecycle

  1. Consider why the use of research data is depicted as a “cycle”. In particular, what does the dashed arrow from “Long-term Management” to “Discovery and Planning” represent?
  2. Which activities that are part of your research fall under the different headings of the lifecycle?
  • show solution
    1. The basic idea for the cyclical nature of research data is its re-use. Often, you may be re-using your own data. Something you found in your original data peaked your interest and becomes the basis for your next project. In other cases, other researchers may take advantage of data you shared and use them as the basis for their next project
      1. This will depend on your research but here are some possible activities in each category
        1. Discovery and Planning: design research; review literature, identify existing data; apply for funding; hire research team; plan data security and backups; plan informed consent; apply for IRB.
          1. Initial Data Collection: identifying participants / interviewees / archives, etc. collect data; document data; store and back-up data; identify additional data needed.
            1. Final Data Preparation and Analysis: collect additional data; convert data for analysis (OCR, transcription, etc.); read, annotate, analyze using software.
              1. Publication and Sharing: write! Finalized documentation; de-identify data if needed; deposit data with repository; publicize through social media, e-mail, etc.
                1. Long-term Management: handled by data repository.

              Data Lifecycle and Research Lifecycle

              We introduce the concept of a research project lifecycle (from project design to funding application to research to publication) and the attendant research data lifecycle. This data lifecycle, we demonstrate, tracks the research project lifecycle and also extends beyond the publications of articles or books to ensure the independent existence, accessibility, discoverability and longevity of all of the data generated in association with a research project. Finally, we consider the various benefits of effective data management – to the researcher personally, to other potential users, and to the broader social science community.

              This Course and the Data Lifecycle

              This course follows your data around its lifecycle. The following three lessons, module 1, focus on the crucial planning stage: you will learn the basics of data management, how to write an effective data management plan, and how to craft a consent script for human participant research that will later allow you to share your data. The next two lessons, module 2, focus on managing your data while you are collecting them. You will learn how to document your data so that they are easy to use in the future. You will learn practical strategies for organizing your data and files, including naming conventions and back-up strategies. Finally, you will learn how to transform the raw data you collect effectively, e.g., how to digitize documents or transcribe interviews. The following three lessons, module 3, address sharing your data in a way that’s safe and responsible. The final three lessons, module 4, are about writing and publishing with qualitative data.

              Exercise

              Limitations of the Lifecycle Model

              1. What are some of the limitations of how well the data lifecycle model represents actual research, especially as you think about qualitative research?
              • show solution
                1. Like every model, the data lifecycle glosses over details. For example, there often won’t be a neat distinction between planning, initial, and final data collection. You may also find that you go back between research and writing/publication. For example, scholars would frequently conduct additional research between dissertation and the book resulting from the dissertation. Arguably, many of this applies particularly strongly to qualitative research. In many qualitative traditions, researchers go back and forth between data and theory. The lifecycle paints too neat a picture of that. As you use it as a planning device, make sure to account for such “back-and-forth.”