Skip to Main Content

Data Management for Wits: Downloading , accessing and using Data Sets

The following is general advice,data varies hugely between types of research and projects.


Congratulations to the  AWARD WINING Dr Lucia Lotter and her team. They have steadily been producing the most diverse and I would venture to say cleanest data in South Africa.The metadata is local, and very well done

Please be award of  existence of the Human Sciences Research Council’s research data repository (

The HSRC Research Data Service ( provides a digital repository facility for the HSRC’s research data in support of evidence-based human and social development in South Africa and the broader region. Our mission is to make research data accessible and to ensure its future survival and usability. We strive to develop a quantitative and qualitative social science research data collection of high re-use value that speaks to the priorities of a developing country, such as the South African Social Attitudes Survey (SASAS) and the South African National HIV Prevalence, HIV Incidence, Behaviour and Communication Survey (SABSSM) amongst others.


Access to our data is dependent on ethical requirements for protecting research participants, as well as on legal agreements with the owners, funders, or in the case of data owned by the HSRC, the requirements of the depositors of the data. All data sets are provided free of charge.


We provide metadata records that describe data sets, as well as related documentation including Read me files, user guides, code books, questionnaires, and other related information so that informed decisions can be made about whether the data will be useful for an intended purpose.


Data can be located by browsing or through the use of an advanced search or a keyword search facility. The attached document provides more information on how data sets can be accessed.


Why should you reuse data?

It is cheaper, faster and often better than you can collect on your own.

Does this matter, shouldn't you collect your own? Ask your supervisors and they will tell you the rules that apply.

However, there are only three years for a Ph.D. and two for a master, so time matters, a lot. Data collection is very unpredictable, it can take a lot longer or not produce the results you need. This is a major problem if the original contribution that you were looking to makes lies not in the data but in the method of analysis you were going to use.

As a researcher or student may not have funding to collect the kind of data you need, this could lead to you compromising your research plan. Finally using data-sets does not mean you can’t or should not collect your own. If you have a data-set, you can then be more focused and go out and collect just the information you need.

For all researchers, secondary data can be the much better research choice. You may not be able to collect the same amount of data or from specific people or areas no matter how much time you spend. The data could be unavailable because those specific subjects can’t be found by you or interviewed by you i.e. prisoners, you might not have access to equipment or material no matter how much money you spend because they are restricted or involve proprietary data and software.

Finally, if your research focus is not on collecting data but analysis techniques, then collecting your own data is wasting time. Your findings will come from what you do with the data. If you get data, and better quality data, you can do a better job on the main topic of your research. For experienced researchers, we can organize access to larger and more complex data-sets to help with a larger research agenda.

Acessesing restricted data from repositories and data search Engines

Data search engines can find a lot of data but it not always specific or high quality. Sometimes the data you need is not in a standard repository with straightforward downloads. Data services act as a verifying agent for restricted repositories. We write data requests and ensure that correct request processes are followed. In some cases, we assist you to gain access to data that is restricted to institutional users or academic users by providing credentials or seeing if the data can be brokered for you.

Downloading with thanks to

"Downloading Data From The Web

Different websites provide data in several different formats-ASCII, SAS, SPSS, Stata and others. Not all data are available in all formats,though,so you need to choose which best suits your needs. Sometimes,you will find that the data you want is not available in the format you want. Don't worry about this,your data can always be converted. Here are some tips:

  1. Data and related files are often bundled together and compressed into what is often called a "zip" file. Common file extensions for these are ".zip" on Windows and ".gz" or ".tar" on Unix. You will need to "unzip" the files before you can do anything else with them.WinZip is a good Windows program, and the "gunzip" command can be used on Unix.
  2. If there is an ASCII (i.e., plain text) data set with a program file for the statistical package you intend to use,then select that option. Sometimes there is an option for data files already in the format you want ("system," "portable," "transport"), but these may have some "glitches" due to differences in the type of machine they were created on and the type you are using. It's rare, but it does happen.
  3. If there is not an ASCII data set and setup file in the package you want to use,but there is one for another package, then use the other package to create a system file and then convert it. For example,if you like to use SPSS,but there is only an option for SAS,use SAS to read and create a SAS data file,then convert the SAS data file to SPSS using Stat Transfer.
  4. If you are downloading data from a geospatial data site,the file may be in "Database" format and have an extension of ".dbf".This is the format used by ArcView (a "shape" file is actually a set of files,one or more of which is a .dbf file). These files can be read directly into SAS,Stata, SPSS, and Excel. We have more information on using the statistical packages with ArcView.
  5. The setup files are written to read the entire data set,which you may not need. Rather than editing the program to read only the variables/observations you want,let the program read the entire data set,then just add drop and/or keep statements in the appropriate place to retain what you want. Make absolutely sure that you select all identification and weighting variables. If you are not sure if you want a particular variable,keep it. It's easier to ignore or drop a variable later than it is to go back and add it to your data-set.
  6. Sometimes the programs have large sections "commented out" so those statements are not executed. If you do want these statements to be executed,then be sure to un-comment them. Typically,these are statements to convert missing value codes (such as "999") to system-missing codes.
  7. If possible,run some descriptive statistics on your data and compare them to the codebook or some other source to make sure you have read the data correctly."From Paul Bern

For more follow the link :Syracuse university libraries