Data Management Planning Basics

In this lesson you will learn

  • What data management is and why you should do it, and do it well
  • The importance of planning for successful data management
  • How to use the data lifecycle to begin planning for data management

Initial questions

  • Did you hear about data management before this course? In what context?
  • As you engaged in some standard data management practices – backing-up, naming and organizing files – when did you decide on the details for them?

Defining Data Management

This entire course is about data management, so we should probably begin by defining it. One definition we like is by the University of Edinburgh Research Data Services:

Research data management is caring for, facilitating access to, preserving and adding value to research data throughout its lifecycle.

This definition highlights various features of data management.

  • Caring for emphasizes that your data will suffer if you do not manage them well. You may lose it entirely, it may become unusable, or you may need to spend a lot of time understanding it.
  • Facilitating access does refer to the eventual sharing of your research data with other researchers, but also applies during your research: you want to make sure that the right people (e.g., research assistants, co-authors, dissertation advisors, etc.) can access your data while preventing any unauthorized access.
  • Preserving your research data means keeping it usable for the long term, many years after you collect it.
  • Adding value refers to the various activities that make your data more valuable: clear structure, thorough documentation, easy findability for other researchers.

Why Data Management

Data management is at the core of good empirical work. You may already be doing some of the things we cover in this course, whether you think of them as data management or not. Why should you care about managing your data well, though?

You should manage your data well because

  • they are valuable. You spend a lot of time and money to generate your data and you want to make sure to not lose them and to organize them in a way that allows you to quickly find what you’re looking for.
  • they are the basis for your empirical claims. Better data management ultimately means better research. Additionally, it can help you to effectively address any concerns with your data that reviewers or critics bring up.
  • others may require it. If you ever apply for an NSF grant, for example, you will have to create a data management plan (DMP).
  • your future self will thank you. Research projects take many years to complete. By organizing and documenting your data systematically throughout, you will still be able to understand it years along.

Why Data Management Planning?

But why this emphasis on planning? There are two main reasons:

  • Steps you take early in your research can have significant impacts on what you may be able to do at later stages. For example, whether and how you can share your data will depend on the consent language you used, which you likely determined early on, in your IRB application.
  • Research, particularly fieldwork in a foreign country, is hectic, often like juggling many things at once. Some things may be much harder or even impossible to set up once you’re away from your home base. Having a plan in place for the things that you can plan (and for some eventualities) will help you stay organized.

How Do I Start?

A good way to think about data management is to ask yourself the following question:

“How can I demonstrate that I have carefully thought about what my data needs will be at each stage of my research and data lifecycles and that I have adopted specific and well justified procedures to meet those needs?”

So let’s start with the data lifecycle that you encountered in the last lesson.

Exercise

Using the Data Lifecycle for Planning

  1. Using a depiction of the data lifecycle (you can draw one by hand or download and print a PDF version from here), mark at every point in the lifecycle which data management and/or data management planning steps you should take. Remember you already did some related work in an exercise in the last lesson.
  • show solution
    1. The details will of course depend on your individual solution, but the image below gives you some ideas of which steps may occur at which point in the data lifecycle. Lifeycle with interventions