Data Documentation
Data documentation is a critical part of research data management. When you describe your dataset well and document what you've done, it's not only for others to discover but also important for long-term preservation and usability. This process ensures that your data will be understood and interpreted by any user and also yourself.
Data documentation will explain how your data was created, what the context is for the data, the structure of the data and its contents, and any manipulations that have been done to the data. This means that it answers questions such as what, how, where, and which of the data etc. A common type of documentation used when datasets are deposited in a repository is a README file.
What to include in data documentation?
What is Metadata?
Think about what information will be required in years to come from now to understand and evaluate your data, and also as reproduce of your findings. That is what must be provided when creating metadata
Metadata can be defined as the data providing information about one or more aspects of the other data; it is used to summarize basic information about data that can make tracking and working with specific data easier. This can be as simple as the author, date, and title.
There are three distinct types of metadata, i.e.descriptive metadata, structural metadata, and administrative metadata
Descriptive metadata describes a resource for purposes such as discovery and identification. It includes elements such as title, author, abstract and keywords.
Structural metadata is metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters.
Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.
Research data should be organized into various files to save time when you want to locate them at a later date. File descriptions that are clear will aid in precise file identification and discovery. When data/files are organized, the following are considered: file naming conventions, file versioning and file formats
File naming converntions:
Popular tips on file naming in general:
File versioning:
Make sure that files stored in various formats or locations, information that is cross-referenced between files, and various copies or versions of files are all subject to version control.
The UK Data Service provides a good description of Versioning
File formats for preservation:
For the purpose of open data, there are several things you should consider when choosing the file formats to avoid them becoming obsolete, and to make them interoperable with systems. Some standards require specific file formats to be used, but for general file formats the following are used.
Textual data : XML, TXT, HTML, PDF/A (Archival PDF)
Audio: MP3, MP4, WAV, FLAC
Images: PNG, TIFF, JPEG (If you are using the JPEG file format, you must note that it tend to loose some quality when re-saved)
Databases: CSV, XML
A metadata standard or schema is a set of elements that have been standardized for a particular field of research which is used to describe data in a consistant manner. When these standards are in place, they ensure consistency across records, enable data sharing and reuse, support interoperability between different systems and also enhance discoverability through search engines and repositories
Some desciplines have subject specific metadata standards, however the most general metadata standards that are mostly used are:
The 'Dublin Core' also known as the Dublin Core Metadata Element Set, is a set of fifteen "core" elements (properties) for describing resources.
MODS (MODS) which is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications
Many data repositories are currently using the Digital Curation Center's Disciplinary Metadata Standards. This link will give you more information on this metadata standard
Data should be made open as much as possible, and as closed as necessary. If your research is funded, please familiarise yourself with the funder's data sharing policy since not all research data can be made openly available due to variuos reasons.
Access might be restricted on account of legal or ethical reasons. It might find that data is deriving from human participants, sometimes is sensitive or falls under copyright restrictions, all these makes data sharing to be restricted without choice. It is for this reason that access may need to be managed to maintain confidentiality and other security risks. In the case where data cannot be shared, you can consider sharing aspects such as metadata only, if that is not against your funder policy (if funded).
Why share your data?
You must decide well where you are going to share your data so that it can be safe and not forgetting it be accessible. Best practice is that, share your data in a trusted open access repository such as the Wits Research Data Repository. A type like this promotes long term preservation making it accessible now and in the years to come still being intact. Adhering to FAIR Principles is good for data sharing even to those datasets that should be restricted. In this case, metadata only, can be shared and access conditions be clearly stated. When data is shared, it enables the following: