Data Management

Before Research Begins

There are several issues to consider before data collection even begins. There may be regulatory constraints regarding the privacy and confidentiality or collection and use of certain types of data. An investigator must also consider what data will best answer the research questions posed, how data will be collected, and what methods will be used for analysis.

Compliance Requirements

Expectations and regulations regarding the collection and management of data have been put in place by federal, state, and local governments, academic and private institutions, individual professions, and society. It is important to be aware of these standards with respect to the type of data that will be collected during a given study.

Special permission to collect and use data may be necessary depending on the nature of the research. For example, if research involves the use of human subjects , vertebrate animals , hazardous materials , or protected/controlled species/areas , special permits or consent will be required. Plans to acquire such permits and conform to regulations (including those set by the funding agency) should be included in grant proposals.

Federal and state laws protect the privacy and confidentiality of data which include personal information such as health (Health Insurance Portability and Accountability Act ), education (Family Education Rights and Privacy Act ), or financial (Financial Services Modernization Act ) details. At Florida State University researchers collecting or managing personal information must do so in compliance with the policies of the FSU Human Subjects Committee and Florida Law .

Data Acquisition and Analysis Plan

There are always limits on sampling a given population (i.e. set of all possible data points), so before beginning a project, careful consideration must be given as to which data should be collected to best answer the research question posed and how sampling should be done.

Factors such as available funding, time, or manpower may put constraints on what kind or how much sampling can be done. It is the responsibility of the researcher to ensure that an adequate amount of relevant data is collected to draw significant conclusions from the study, but it is also important not to waste resources by collecting extraneous, unnecessary, or excess data. Careful planning of data collection and analysis procedures can help ensure studies are conducted efficiently.

Once an investigator knows what data they need to collect and what will be sampled, it is important to design a study that will yield quality data that are gathered and analyzed in a consistent and unbiased manner. Sampling strategies must be made clear before a study begins, and if multiple researchers will be involved in sample collection, group training before sampling and occasional auditing during sampling should be done. To avoid the influences of any potential bias concerning the outcome of a project, sample coding to blind the study may be necessary, and statistical tests that will be used to analyze the data should be determined before data collection begins.


Managing Data

As data are obtained during a study, they need to be stored in a well-planned manner to ensure they are kept confidential if necessary, are available to those working with them, and are safe from tampering or loss. When data collection and analysis are complete, the results should usually be shared and/or published in a timely manner. After a certain amount of time has passed after the completion of a study, it may be appropriate to dispose of old data. Data should be disposed of in a responsible and thorough manner.

Data Storage

If it is important data from a particular study be kept confidential, researchers must secure them in a way appropriate to their form. Data may be recorded by hand, photographed, saved to a drive, etc. Keeping hard copies of data secure may be as simple as keeping them in a locked room or file cabinet, but it is increasingly common for data to be stored in an electronic format. Saving data in an electronic format is convenient because it makes them easy to access, share, copy, and edit, but is inconvenient for those same reasons when security is considered. Confidential data stored on a shared or network computer must be kept safe from unauthorized access, and it is also important to take precautions when storing data on mobile devices such as laptop computers or flash drives in case they are lost or stolen. Keeping anti-malware software up-to-date, using password protection, and encrypting files are examples of possible steps that can be taken to protect sensitive data. For more information on keeping your electronically stored data safe, see FSU's Information Technology Services website.

However, while it is important data be kept safe from unauthorized access, they must still be accessible to authorized users. While this does include ensuring authorized users are able to bypass precautions put in place to protect the data, dangers that may make data completely inaccessible to everyone must also be seriously considered. Theft, fire, or computer failure could cause a complete loss of a data that are stored in one location. Backing up data regularly and keeping the copy at a secondary, secure location is essential to avoid such a tragedy.

During the course of a project, original data usually undergo a series of evolutionary steps to make them easier to interpret and publish. For example, raw data may be written in a notebook, transferred to a computer file, transformed, and finally graphed. Careful tracking of each of these steps in the lifecycle of the data set is necessary to ensure the integrity of the original data set is preserved in the final output and is known as “information lifecycle management” (ILM).


After the completion of a study, original data must be kept for a certain period of time. The amount of time may be determined by the type of study done, the common practices of a particular discipline, or policies set by an academic or funding organization, but if no guidelines for data retention time are available, three years is generally considered an appropriate amount of time to keep original data after a study is complete.

After the appropriate amount of time has passed and a researcher desires to dispose of data, care must be taken to ensure the data cannot be recovered after disposal. Hard copies of data should be shredded and electronic data should undergo the appropriate erasure procedures. Magnetically recorded data may require multiple-pass erasures. If proper expertise or tools to dispose of data are lacking, the appropriate aid should be sought from the research institution or from professional services.


Publication and Sharing of Data

Responsible data management extends to the publication and sharing of data. Sharing results by presenting at meetings and publishing in journals is essential to progress in a given field of research, so care must be taken to ensure all data, interpretations, and conclusions disseminated to the community are unbiased and of high integrity. Pressure to publish, unclear guidelines, and opportunity for personal gain are examples of some of the factors that can lead to integrity issues in data reporting. Examples of integrity issues that may arise are:

  • Misrepresentation of data
  • Changing analysis methods to achieve significance
  • Misleading discussion of results/unsupported conclusions
  • Fabrication, falsification, and plagiarism
  • Unjust attribution of authorship

Guidance regarding proper practices of data reporting can usually be sought from faculty advisors, departmental chairpersons, and published policies, codes, or rules.

Generally, an atmosphere of openness regarding the sharing of data is promoted in the scientific community. Occasionally, data may be withheld pending publication or patent acquisition. Data which have the potential to affect national security may also be protected from release (details ). However, making data available allows for its review and verification and should most often be done promptly after the completion of a study.

Both the National Institutes of Health and the National Science Foundation have policies meant to encourage the timely sharing of data resulting from research the agencies fund (see NIH Grants Policy Statement Part II Subpart A and NSF Award and Administration Guide Chapter 6 Part D4 ).


Ownership and Responsibility Issues

Principal investigators and postdoctoral, graduate, and undergraduate students may all collect data and therefore share responsibility for the security and integrity of data. However, collecting and being responsible for data does not necessarily translate to ownership. If a student is paid by an institution to work on a research project, any data that student collects is technically property of the institution. Along the same lines, data collected by a principal investigator may be the property of the institution or the agency funding the project. Before data is transferred from one person or organization to another or patent applications are filed, ownership issues must be examined and understood.

The Bayh-Dole Act of 1980 allowed institutions to maintain control of intellectual property generated under federal grants. Institutions are therefore able to profit from patenting and licensing inventions resulting from federally funded research to the private sector. This facilitates the integration of new technology into public use, but in some cases may impose limitations on investigators concerning the disclosure of data. It is important for researchers working on projects that may generate marketable results to understand their institution’s policies regarding technology transfer.