Within the area of
corpus linguistics, 'collocation' is defined as a sequence of words or
terms which
co-occur more often than would be expected by chance.
Collocation refers to the restrictions on how words can be used together, for example which prepositions are used with particular verbs, or which verbs and nouns are used together. Collocations should not be confused with
idioms.
Common features
'Non-substitutability': We cannot substitute a word in a collocation with a related word. For example, we cannot say ''yellow wine'' instead of ''white wine'' although both ''yellow'' and ''white'' are the names of colors.
'Non-modifiability': We cannot modify a collocation or apply syntactic transformations. For example, ''John kicked the green bucket'' or ''the bucket was kicked'' have nothing to do with dying.
Expanded definition
If the expression is heard often, transmiting itself memetically, the words become 'glued' together in our minds. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a
compound noun, for example 'riding boots' or 'motor cyclist'.
Collocations can be in a
syntactic relation (such as verb-object: 'make' and 'decision'),
lexical relation (such as antonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a
grammatically correct sentence will stand out as 'awkward' if collocational preferences are violated. This makes collocation an interesting area for language teaching.
Corpus Linguists specify a
Key Word in Context (
KWIC) and identify the words immediately surrounding them. This gives an idea of the way words are used.
The processing of collocations involves a number of parameters, the most important of which is the ''measure of association'', which evaluates whether the
co-occurrence is purely by chance or statistically
significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include
mutual information,
t scores, and
log-likelihood.
Examples
In English the verb ''perform'' is used with ''operation'', but not with ''discussion'': ''The doctor performed the operation''.
Collocates of 'bank' are: central, river, account, manager, merchant, money, deposits, lending, society. These examples reflect a number of common expressions, 'central bank', 'bank or building society', and so forth. It is easy to see how the meaning of 'bank' is partly expressed through the choice of collocates.
''High'' collocates with ''probability'', but not with ''chance'': ''a high probability'' but ''a good chance''
References
Jack C. Richards, John Platt, Heidi Platt. 1992. Longman Dictionary of Language teaching and Applied Linguistics, Longman Group UK Limited(2nd Ed).
Foundations of Statistical Natural Language Processing, , Christopher D., Manning, Massachusetts Institute of Technology, 2003,
See also
★
Cliché
★
Collostructional analysis
★
Compound noun, adjective and verb
★
Idiom
★
Phrasal verb
★
Siamese twins (English language)
★
Stock phrase
★
Collocational restriction
External links
★
What is collocation