Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Management for Wits: Linguistics: African Languages , Literature

The following is general advice,data varies hugely between types of research and projects.

Data sources

This data is local . Use the search engines to find international data

Human Sciences Research Council Poll: Communications between African Speaking Whites & People of Color

November, 1986  Sample:   White & Colored Adult

Sample Size:  1677   Survey Notes:  There are two data files for this study; 1) White N=831 and 2) Colored N=846.
Variables:  299    Abstract:

Occupation (2); length of job (2); income (2); religion (4); home language (2); feelings about your population group (18); relationship with other population groups (8); coloureds of Cape Peninsula (4); whites of Cape Peninsula (4); employment status (4); contact with population groups (78); description of superior/supervisor (2); characteristics of superior/supervisor (60); status of people in each population group (8); feelings about social status (14); reasons for social status (4); financial status of different population groups (2); reasons for financial status (4); largest population group in work force (4); work relations (16); relations at work with members of a different population group (16); competition/co-operation in work place (12); behavior/habits that bother respondent (4).

Ukwabelana - An open-source morphological Zulu corpus

Labelled Zulu words (3000 types)

    * as list of morphological  analyses. text, 336 Kbytes.

    * as word list with labelled analyses per word. text, 440 Kbytes.

    * as word list with segmentation's per word. text, 240 Kbytes.

Unlabelled Zulu words (100k types)

    * text, 1.3 Mbytes.

POS-tagged sentences (3000 sentences)
    * text, 233 Kbytes.

Untagged sentences (30k sentences)

    * text, 2.4 Mbytes.

The LDC Corpus Catalog

 A list of  corpora of language data with some very interesting African language 

Grassfields Bantu Fieldwork: Dschang Lexicon was produced by Linguistic Data Consortium (LDC) catalog number LDC2003L01 and ISBN 1-58563-255-4.

The data contains a lexicon of the language Yémba (Bamileke Dschang), a Bamileke (Grassfields Bantu) language spoken by 300,000+ people in Southwestern Cameroon.

More Information On This Subject