| |
- cca(freq_table)
- Covarying Collexeme Analysis
Parameters
----------
freq_table : dict
A frequency table in the format of:
{
(Slot1, Slot2): freq,
(Slot1, Slot2): freq,
...
}
where Slot1 & Slot2 are lexical items in a same construction.
Returns
-------
dict
A dictionary with a pair of lexical items as keys and a dicionary of
association measures (returned by measure()) indicating the
strength of attraction of the two lexical items.
Notes
-----
The contingency table used in Covarying Collexeme Analysis:
L_slot1 ~L_slot1
L_slot2 o11 o12
~L_slot2 o21 o22
References:
Desagulier, G. (2017). Corpus Linguistics and Statistics with R. p213-221.
- dca(freq_table)
- Distinctive Collexeme Analysis
Parameters
----------
freq_table : dict
A frequency table in the format of:
{
C1: {L1: freq, L2: freq, ...},
C2: {L1: freq, L2: freq, ...}
}
where C1 & C2 are labels for construction types and
L1, L2, L3, ... are labels for word types.
Returns
-------
dict
A dictionary with lexical items as keys and a dicionary of
association measures (returned by measure()) indicating the
strength of attraction of the lexical item to the two constructions.
Notes
-----
The contingency table used in Distinctive Collexeme Analysis:
Lj ~Lj
C1 o11 o12
C2 o21 o22
References:
Desagulier, G. (2017). Corpus Linguistics and Statistics with R. p213-221.
- measures(o11, o12, o21, o22)
- Compute a list of association measures from the contingency table.
Parameters
----------
o11 : int
Cell(1, 1) in a 2x2 contingency table.
o12 : int
Cell(1, 2) in a 2x2 contingency table.
o21 : int
Cell(2, 1) in a 2x2 contingency table.
o22 : int
Cell(2, 2) in a 2x2 contingency table.
Returns
-------
dict
A list of association strengths as measured by different stats.
For Fisher's exact test, see scipy.stats.fisher_exact()
Notes
-----
G2 stat significance levels: 3.8415 (p < 0.05); 10.8276 (p < 0.01)
- rank_collo(collo_measures, sort_by='G2', reverse=True, freq_cutoff=1)
- Helper function to sort the results of collostructional analyses as returned by dca() and cca().
Parameters
----------
collo_measures : dict
The analysis results returned by dca() or cca()
sort_by : str, optional
The association measure used to sort the result, by default 'G2'.
See measure() for the list of possible assoiciation measures to use for sorting.
reverse : bool, optional
Whether to sort from high values to low ones, by default True
freq_cutoff : int, optional
The minimal number of occurrences required to be included in the
results, by default 1
Returns
-------
list
A list of tuple of length three, with the second element being
the association statistic used for sorting and the third element the frequency.
|