Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Management for Wits: Data Citation

The following is general advice,data varies hugely between types of research and projects.

Prinicipals of citation for supervisors , examiners and PI

For students its important to remember that  data citation needs  to be very clear on the multiple authors.Datasets generally come from projects.So they have individual authors,corporate authors editors,data mangers,curators.Put them all in Data sets also have versions and they have subsets point clearly to which part of the data you used.

Data requires citations for the same reasons journal articles and other types of publications require citations:to acknowledge the original author/producer and to help other researchers find the resource.Citation are also to software that you use in data management of all kinds.

For supervisors assessing data citation look to the following principals

"Data citation:Refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers.Citing data is now recognized as one of the key practices leading to recognition of data as a primary research output.

  1. Data citation: Presents several differences from normal citation but the essence is the same. The aim of a citation is to ensure that the correct people get the credit and people are accurate pointed to the data that supports the argument.
  2. Data citation:The practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to printed resources.
  3. Data citation:Is a key practice underpinning the recognition of data as a primary research output rather than as a by-product of research.
  4. Data citation is akin to work performed by an art or museum curator. though the citation process,data are organized, described,cleaned,enhanced,and preserved for public use,much like the work done on paintings or rare books to make the works accessible to the public now and in the future.With the modern Web,it's increasingly easy to post and share data.Without citation,however,data can be difficult to find,use,and interpret.Through citation,ICPSR provides meaningful and enduring access to data." icpsr  MANUAL "

Data citation steps

Click here for more

How to Cite Data

Citing a Data Set

It is important to cite data sets because data is the intellectual and moral property of the creators of the data.If you use data and do not cite it is plagiarism.In some case depending on what you have represented about the data it is academic fraud so CITE.The basic rule  is to cite the data based the provider's standard.They are providing the data;they have the right to specify how it should be cited.Basically ask the data provider,often it will be on their website or documentation.If not ask them directly. You can also look at the style guide such as APA but if you need to cite a data-set for a publication,ask the editor.Finally,if you are still unsure,send an email to your helpful data librarian:nina.lewin@wits.ac.za.

However,if you can't get the citation format from either the editor or the provider,remember that the rules for citing data are a bit looser than other sort of citations.So don't panic,there are no definitive standards,yet.Remember that in a citation more information is better than less.Include as much information as you have about where the data was collected,by whom,with what funding,who organized and analyzed the data-set and finally where and when  you got it.If you used a part of a larger data-set its important to include that,and especially mention if you used a transformed part such as variables created by another researcher.

How to Cite Data

  • Similar to citing a published article or book
               -Provide information necessary to identify and locate the work cited
  • Broadly-applicable data citation standards have not yet been established;use standards adopted by relevant academic journal,data repository,or professional organization

Proper citation ensures that research data can be: discovered;reused; replicated for verification;credited for recognition;and tracked to measure usage and impact.

Citing data is straightforward.Each citation must include the basic elements that allow a unique dataset to be identified over time:

  • Title
  • Author
  • Date
  • Version
  • Persistent identifier (such as the Digital Object Identifier,Uniform Resource Name URN,or Handle System)

Here are some examples

Examples of Citation of Datasets

Data requires citations for the same reasons journal articles and other types of publications require citations: To acknowledge the original author/producer and to help other researchers find the resource. The citation is also to software that you use in data analysis.

Simple examples

•DataCite: Creator (Publication Year): Title.Publisher.Identifier

•Dryad: Author (Date of Article Publication) Data from Article name.Dryad Digital Repository.Doi: DOI number

For  Example in APA 6th style

Data Sets:Simmons Market Research Bureau. (2000).Simmons national consumer survey [Data file].
New York, NY: Author.

Complex examples

Author/Principal Investigator/Data Creator
•Release Date/Year of Publication – the year of release, for a completed dataset
•Title of Data Source – formal title of the dataset
•Version/Edition Number – the version of the dataset used in the study
•The format of the Data – the physical format of the data
•3rd Party Data Producer – refers to data accessed from a 3rd party repository
•Archive and/or Distributor – the location that holds the dataset
•Locator or Identifier – includes Digital Object Identifiers (DOI), Handles, Archival Resource Key (ARK), etc.
•Access Date and Time – when data is accessed online
•A subset of Data Used – description based on the organization of the larger dataset
•Editor or Contributor – a reference to a person who compiled data, or performed value-added functions
•Publication Place – city and state and country of the distributor of the data
•Data within a Larger Work – refers to the use of data in a compilation or a data supplement (such as published in a peer-reviewed paper)

General Principles of Data Citation

  • Importance - Data should be considered legitimate, citable products of research
  • Credit and Attribution-Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data
  • Evidence- In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. Unique Identification-A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.
  • Access - Data citations should facilitate access to the data, metadata, code, and other materials, as necessary for both humans and machines.
  • Persistence - Unique identifiers, data, and metadata should persist beyond the lifespan of the data they describe.
  • Specificity and Verifiability-Data citations should facilitate identification of, access to, and verification of the specific data that support a claim.
  • Interoperability and Flexibility-Data citation methods should be flexible, but enable interoperability across communities

Benefits of Data Citation

        1.Short term

  • Facilitates discovery of relationships between data and publications, making it easier to validate and build upon previous work
  • Ensures that proper credit can be given when others use your work
  • Facilitates impact assessments of datasets based on a number of publications that cite them
  • Helps researchers re-using data to find other ways the data has been used.

        2.Long-term

  • Promotes the availability of data into the future.
  • Facilitates discovery of existing data relevant to a particular question.
  • Enables recognition of scholarly effort within disciplines and organizations.
  • Increases transparency of scientific research.

Examples of information needed in a citation

Examples

  • Author/Principal Investigator/Data Creator
  • Release Date/Year of Publication – the year of release, for a completed dataset
  • Title of Data Source – formal title of the dataset
  • Version/Edition Number – the version of the dataset used in the study
  • The format of the Data – the physical format of the data
  • 3rd Party Data Producer – refers to data accessed from a 3rd party repository
  • Archive and/or Distributor – the location that holds the dataset
  • Locator or Identifier – includes Digital Object Identifiers (DOI), Handles, Archival Resource Key (ARK), etc.
  • Access Date and Time – when data is accessed online
  • A subset of Data Used – description based on the organization of the larger dataset
  • Editor or Contributor – a reference to a person who compiled data, or performed value-added functions
  • Publication Place – city and state and country of the distributor of the data
  • Data within a Larger Work – refers to the use of data in a compilation or a data supplement (such as published in a peer-reviewed paper)