LibGuides: Research Data Management: Best Practices in RDM

Data Documentation and Metadata

Data Documentation

Data documentation is a critical part of research data management. When you describe your dataset well and document what you've done, it's not only for others to discover but also important for long-term preservation and usability. This process ensures that your data will be understood and interpreted by any user and also yourself.

Data documentation will explain how your data was created, what the context is for the data, the structure of the data and its contents, and any manipulations that have been done to the data. This means that it answers questions such as what, how, where, and which of the data etc. A common type of documentation used when datasets are deposited in a repository is a README file.

What to include in data documentation?

Context of data collection
Data collection methods
Information about variables used
File organization and naming schemes
How data has been transformed or processed for analysis
Software used for data processing and analysis
Outside data sources used
The roles and responsibilities of project personnel

What is Metadata?

Think about what information will be required in years to come from now to understand and evaluate your data, and also as reproduce of your findings. That is what must be provided when creating metadata

Metadata can be defined as the data providing information about one or more aspects of the other data; it is used to summarize basic information about data that can make tracking and working with specific data easier. This can be as simple as the author, date, and title.

There are three distinct types of metadata, i.e.descriptive metadata, structural metadata, and administrative metadata

Descriptive metadata describes a resource for purposes such as discovery and identification. It includes elements such as title, author, abstract and keywords.

Structural metadata is metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters.

Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.

File Organisation

Research data should be organized into various files to save time when you want to locate them at a later date. File descriptions that are clear will aid in precise file identification and discovery. When data/files are organized, the following are considered: file naming conventions, file versioning and file formats

File naming converntions:

Popular tips on file naming in general:

Date should be formatted in the following way (i.e. ISO 8601): YYYYMMDD or YYMMDD
File name length shouldn’t be too long as it becomes incompatible with all software types --- leave to 32 characters maximum
Avoid special characters usage in file names like: ! @ $ % * () ‘;<>,[]{}”
When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001
Avoid using spaces in file names; instead, use underscores (e.g. file_name), no separation (e.g. filename), dashes (e.g. file-name), or camel case (e.g. FileName)
Design a README.txt file that explains your naming conversions, abbreviations, or codes

File versioning:

Make sure that files stored in various formats or locations, information that is cross-referenced between files, and various copies or versions of files are all subject to version control.

The UK Data Service provides a good description of Versioning

File formats for preservation:

For the purpose of open data, there are several things you should consider when choosing the file formats to avoid them becoming obsolete, and to make them interoperable with systems. Some standards require specific file formats to be used, but for general file formats the following are used.

Textual data : XML, TXT, HTML, PDF/A (Archival PDF)

Audio: MP3, MP4, WAV, FLAC

Images: PNG, TIFF, JPEG (If you are using the JPEG file format, you must note that it tend to loose some quality when re-saved)

Databases: CSV, XML

Further information on RDM Best Practices

RDM Best Practices Evaluation Checklist

File Formats Recomended by the UK Data Service

Metadata Standards

A metadata standard or schema is a set of elements that have been standardized for a particular field of research which is used to describe data in a consistant manner. When these standards are in place, they ensure consistency across records, enable data sharing and reuse, support interoperability between different systems and also enhance discoverability through search engines and repositories

Some desciplines have subject specific metadata standards, however the most general metadata standards that are mostly used are:

The 'Dublin Core' also known as the Dublin Core Metadata Element Set, is a set of fifteen "core" elements (properties) for describing resources.

MODS (MODS) which is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications

Many data repositories are currently using the Digital Curation Center's Disciplinary Metadata Standards. This link will give you more information on this metadata standard

Data Sharing

Data should be made open as much as possible, and as closed as necessary. If your research is funded, please familiarise yourself with the funder's data sharing policy since not all research data can be made openly available due to variuos reasons.

Access might be restricted on account of legal or ethical reasons. It might find that data is deriving from human participants, sometimes is sensitive or falls under copyright restrictions, all these makes data sharing to be restricted without choice. It is for this reason that access may need to be managed to maintain confidentiality and other security risks. In the case where data cannot be shared, you can consider sharing aspects such as metadata only, if that is not against your funder policy (if funded).

Why share your data?

You must decide well where you are going to share your data so that it can be safe and not forgetting it be accessible. Best practice is that, share your data in a trusted open access repository such as the Wits Research Data Repository. A type like this promotes long term preservation making it accessible now and in the years to come still being intact. Adhering to FAIR Principles is good for data sharing even to those datasets that should be restricted. In this case, metadata only, can be shared and access conditions be clearly stated. When data is shared, it enables the following:

Sharing data promotes academic intergrity which supports research reproducibility
It underlines your findings and therefore enables your research to be built upon it
It reduces duplication
Data which is shared brings solutions to problems
Datasets which are published on the repository are assigned with Digital Objects Identifiers (DOIs) and this enables citation and metrics. With metrics we are able to track who is using the data and on what platform and this improves visibility and impact of the researcher