Python: module collo

collo_measures

index
/home/liao/Desktop/hocor2020-GramColl/collo_measures.py

Modules

math

Functions


cca(freq_table)
Covarying Collexeme Analysis Parameters ---------- freq_table : dict     A frequency table in the format of:         {             (Slot1, Slot2): freq,             (Slot1, Slot2): freq,             ...         }     where Slot1 & Slot2 are lexical items in a same construction. Returns ------- dict     A dictionary with a pair of lexical items as keys and a dicionary of     association measures (returned by measure()) indicating the     strength of attraction of the two lexical items. Notes ----- The contingency table used in Covarying Collexeme Analysis:                 L_slot1     ~L_slot1     L_slot2       o11          o12     ~L_slot2      o21          o22 References:     Desagulier, G. (2017). Corpus Linguistics and Statistics with R. p213-221.

dca(freq_table)
Distinctive Collexeme Analysis Parameters ---------- freq_table : dict     A frequency table in the format of:         {             C1: {L1: freq, L2: freq, ...},             C2: {L1: freq, L2: freq, ...}         }     where C1 & C2 are labels for construction types and     L1, L2, L3, ... are labels for word types. Returns ------- dict     A dictionary with lexical items as keys and a dicionary of     association measures (returned by measure()) indicating the     strength of attraction of the lexical item to the two constructions. Notes ----- The contingency table used in Distinctive Collexeme Analysis:          Lj     ~Lj     C1   o11    o12     C2   o21    o22 References:     Desagulier, G. (2017). Corpus Linguistics and Statistics with R. p213-221.

measures(o11, o12, o21, o22)
Compute a list of association measures from the contingency table. Parameters ---------- o11 : int     Cell(1, 1) in a 2x2 contingency table. o12 : int     Cell(1, 2) in a 2x2 contingency table. o21 : int     Cell(2, 1) in a 2x2 contingency table. o22 : int     Cell(2, 2) in a 2x2 contingency table. Returns ------- dict     A list of association strengths as measured by different stats.     For Fisher's exact test, see scipy.stats.fisher_exact() Notes ----- G2 stat significance levels: 3.8415 (p < 0.05); 10.8276 (p < 0.01)

rank_collo(collo_measures, sort_by='G2', reverse=True, freq_cutoff=1)
Helper function to sort the results of collostructional analyses as returned by dca() and cca(). Parameters ---------- collo_measures : dict     The analysis results returned by dca() or cca() sort_by : str, optional     The association measure used to sort the result, by default 'G2'.     See measure() for the list of possible assoiciation measures to use for sorting. reverse : bool, optional     Whether to sort from high values to low ones, by default True freq_cutoff : int, optional     The minimal number of occurrences required to be included in the     results, by default 1 Returns ------- list     A list of tuple of length three, with the second element being     the association statistic used for sorting and the third element the frequency.