Content
Data Management FAQs
Why should I create a Data Management and Sharing Plan (DMSP)?
Taking the role of a data steward, you should always be able to describe the complete operational workflow for your research data, from data capture to data analysis, archiving, and sharing. A DMSP helps you to think this through. You are responsible for answering questions about the origin of your data, data manipulations, the location where the data are analyzed and archived, and with whom they are shared under what conditions.
If your research data contain personal information, it is essential to ensure that the privacy of the persons involved is protected during all phases of your research. When you share or link data with a third party, you need to take additional measures such as drawing up a legal agreement that is approved by your institute.
Who can access contact information?
In studies, contact data of study subjects are usually registered. Access rules should differentiate between those having access to research data and those having access to these contact data. In principle, one person should not have access to both, unless the researcher is also the treating physician. An exception can only be made for smaller projects that have a limited period during which data are created, processed, and analyzed. In your Data Management and Sharing Plan, you will have to argue why this exception applies to your research project.
What does data stewardship cost?
The costs of data stewardship should be included in your grant application or research budget. Explicitly specify the costs for:
- Re-using another data collection
- Building or using a database
- Data management
- Long-term preservation of data
- Sharing data with others
Funders estimate that as a rule of thumb approximately 5-10% of the available budget is necessary for data stewardship.
How an I de-identify the data?
In general, you can protect the privacy of your study subjects by:
- Keeping identifiable data separated from unidentifiable research data
- Using random unique research codes and separating the code list from the research data
- Encrypting vital identifying information
What should I consider when reusing data?
Before reusing data, you should ask yourself questions like:
- Will these data help me answer my research question?
- Is the quality and integrity of the data sufficient?
- Are the data available under appropriate terms and conditions?
- What technical measures do I need to take to use the data?
- Is it wise to start a scientific collaboration with the data providers?
Which file formats are preferred?
You should use open, well-documented, flexible, frequently used file formats. Consult guides on preferred formats for long-term preservation and accessibility.
What safety requirements apply to my data?
You should install state-of-the-art security measures to prevent unauthorized and unnecessary access to your research data, to protect privacy and scientific integrity. You can do this by:
- Setting access policies
- Protecting data with passwords
- Using firewalls, encrypted data transport, backups, etc.
- Consulting your information security personnel
- Performing risk assessments
Databases connected to the internet require additional security measures. Report any data breaches to appropriate personnel.
How should I organize my files?
Once you start creating and processing data, files can easily become disorganized. Naming and organizing your files consistently from the start saves time and prevents errors. Decide on conventions and share them with all people involved.
If you have many data files, keep a master list with critical information and links. Properly version this list so changes are tracked.
What data should I store?
At minimum, store your raw data used for publications, including metadata describing how you obtained and processed the data. Ensure metadata clearly describe which data they document. Also store any scripts used with datasets.
What should I do with intermediate files?
You may need to keep intermediate data files for reproducibility. If not needed, consider deleting them to save space and reduce privacy risks. You can also exclude them from backups. However, keeping intermediate data can be useful for traceability.
What metadata should I document?
Collect metadata that will help you and others understand, interpret, find, use, reproduce, and properly cite your data in the future. Important metadata may include:
- Methodologies, protocols, instrument details, calibration data
- Data quality indications
- Descriptions of all data elements and files
- Standards followed
- Software and hardware used
- Data provenance
- People involved
- Funding sources
How do I store metadata and research data?
Metadata and data should be stored close together so their relationship is clear. Some data formats allow embedding metadata.
How can I ensure my data is monitored and validated?
Consistently monitor data entry, documenting who enters/modifies data when. Validate data after entry by having a second person check it, comparing it to the raw sources, etc. Perform and document data quality checks during and after collection. Never let this process influence analyses.
How can I cite my data?
To enable citation tracking, provide your publicly available data with a persistent identifier (PID) like a DOI. Indicate in licenses or agreements that you want your data cited on reuse. Construct data citations similarly to article citations, including authors, titles, dates, PIDs, etc.
Data Analysis FAQs
How can I prepare my research data for analysis?
To enable transparent, reproducible analyses:
- Create metadata documenting your raw dataset
- Make a working copy of data, archive originals
- Document all data cleaning/processing
- Preserve raw and intermediate datasets
What analysis tool should I use?
For anonymized data, you can likely use any decent statistical software, provided processes are well-documented and manipulations scripted.
What versions of my data should I preserve?
Store raw data and processed versions representing meaningful, difficult-to-repeat steps. Keep what underlies your analyses and publications. Delete unnecessary intermediate files.
How can I create a data analysis plan?
For complex studies, create an analysis plan beforehand, addressing:
- Research questions and data needed
- Inclusion/exclusion criteria
- Merging datasets
- Missing data treatment
- Statistical methods
What statistical method should I use?
Think carefully about hypotheses and alternatives before running analyses. Consult experts on appropriate statistical methods. Document all analysis decisions, scripts, and workflows thoroughly.
Data Archival FAQs
What is data archiving?
Scientific data archiving refers to the long-term storage of scientific data and methods in a secure, trusted repository.
How should I archive my data?
Adhere to FAIR Principles when archiving. Add datasets to field-specific repositories. Register data archived externally. List publicly available data in an open catalogue.
What is the difference between data storage and archiving?
Storage just saves data temporarily, while archiving ensures long-term retention, preservation and stewardship according to best practices.
What data should I archive?
Archive at minimum your data underlying publications and analyses. Archive all data that is impractical to reproduce, or that has significant research value or legal requirements for retention. Do not indefinitely retain transitory files.
How long should I archive my research data?
- Minimum: 10+ years, as stipulated by policies and regulations
- Maximum: Ideally, anonymized data is preserved indefinitely
- Base your decision on reproduction difficulty, value, legal issues, costs
Get in Touch!
Do you have questions for the DMAC team? Fill out this form to get in touch with us!