Skip to Main Content

Data Management for Wits: Linguistics: African Languages , Literature

The following is general advice,data varies hugely between types of research and projects.

Data sources

This data is local . Use the search engines to find international data

Human Sciences Research Council Poll: Communications between African Speaking Whites & People of Color

November, 1986  Sample:   White & Colored Adult

Sample Size:  1677   Survey Notes:  There are two data files for this study; 1) White N=831 and 2) Colored N=846.
Variables:  299    Abstract:

Occupation (2); length of job (2); income (2); religion (4); home language (2); feelings about your population group (18); relationship with other population groups (8); coloureds of Cape Peninsula (4); whites of Cape Peninsula (4); employment status (4); contact with population groups (78); description of superior/supervisor (2); characteristics of superior/supervisor (60); status of people in each population group (8); feelings about social status (14); reasons for social status (4); financial status of different population groups (2); reasons for financial status (4); largest population group in work force (4); work relations (16); relations at work with members of a different population group (16); competition/co-operation in work place (12); behavior/habits that bother respondent (4).

Ukwabelana - An open-source morphological Zulu corpus

Labelled Zulu words (3000 types)

    * as list of morphological  analyses. text, 336 Kbytes.

    * as word list with labelled analyses per word. text, 440 Kbytes.

    * as word list with segmentation's per word. text, 240 Kbytes.

Unlabelled Zulu words (100k types)


    * text, 1.3 Mbytes.

POS-tagged sentences (3000 sentences)
    * text, 233 Kbytes.

Untagged sentences (30k sentences)

    * text, 2.4 Mbytes.

The LDC Corpus Catalog

 A list of  corpora of language data with some very interesting African language 

Grassfields Bantu Fieldwork: Dschang Lexicon was produced by Linguistic Data Consortium (LDC) catalog number LDC2003L01 and ISBN 1-58563-255-4.

The data contains a lexicon of the language Yémba (Bamileke Dschang), a Bamileke (Grassfields Bantu) language spoken by 300,000+ people in Southwestern Cameroon.



More Information On This Subject