Week 5 Word Meaning (II)
- 詞彙關係、詞彙網路、知識本體
- 事件語意與計算表徵
CWN 2.0
CWN 2.0 is a manually curated Chinese Wordnet that contains 29,231+ Chinese words and their lexical semantic relations. introductory slide.
資訊抽取 (Information Extraction)
a subfield of natural language processing (NLP)
The process of information extraction (IE) turns the un/semi-structured information embedded in texts into structured data.
Main tasks of IE include:
- Named entity recognition
- Relation extraction
- Event extraction; Template filling
- Coreference resolution
- Temporal expression recognition
- Semantic role labeling
- etc.
Relation extraction is the task of identifying and classifying semantic relations between pairs of entities in text.
Types of linguistic relations:
- Lexical (semantic) relations (e.g., synonymy, antonymy, hyponymy, etc.)
- Syntactic relations (e.g., subject, object, etc.)
- Semantic relations (e.g., causality, entailment, etc.)
- Pragmatic relations (e.g., presupposition, etc.)
- Discourse relations (e.g., contrast, etc.)
Other (common-sense) relations
- Temporal relations (e.g., before, after, etc.)
- Spatial relations (e.g., above, below, etc.)
- Possession relations (e.g., has, has-part, etc.)
- Quantitative relations (e.g., more, less, etc.)
- Comparative relations (e.g., more, less, etc.)
- Logical relations (e.g., and, or, etc.)
- Attribution relations (e.g., author, publisher, etc.)
- ……… 太多了,不一一列舉
Relations in Named Entity Tasks
- Named entity recognition (NER) is the task of identifying and classifying named entities in text into pre-defined categories such as person names, organization names, locations, etc.
與 Information Extraction 有關, 但複雜度較高。
e.g., 什麼是名詞的部分整體關係?動詞有上下位關係嗎?爲何反義關係只發生在詞彙層面而不是詞義層面?
Regular Polysemy Detecection
SemEval 2010-Task 8
- 從前的整合企圖 (palmer2014semlink?, +)
- ontology and ontologies
- lexicalized ontologies
Qualia structure
RDF (Resource Description Framework)
RDF is a standard metalanguage (W3 recommendation) for data interchange on the Web.
RDF triples are the basic unit of data in RDF, a triple consists of
< subject, predicate, object >
RDF triples are used to describe resources (e.g., people, places, things, etc.) and their properties (e.g., name, age, etc.).
are two popular knowledge bases that use RDF triples to represent knowledge.
Relation Extraction Algorithms
Five main classes of relation extraction algorithms:
Pattern-based (
Hearst patterns
)Feature-based supervised relation classifier
Neural supervised relation classifiers
Semi-/un- supervised
Semisupervised Relation Extraction
is a semi-supervised learning method that uses a small amount of labeled data to train a classifier, and then uses the classifier to label a large amount of unlabeled data.
Semisupervised Relation Extraction
Distant Supervision
Distant supervision
is a semi-supervised learning method that uses a large amount of unlabeled data and a small amount of labeled data to train a classifier.
Evaluation Metrics
Step 1: use spaCy’s named entity recognizer to extract money and currency values (entities labelled as MONEY)
Step2: use spaCy’s dependency parser to find the noun phrase they are referring to.
event semantics and event representation
SK yelled ‘Yo’
Must SK yell ‘Yo’?
\(\rightarrow\){☐∃e[(R(Yo)(Jo))(e)], ¬☐∃e[(R(Yo)(Jo))(e)]}
Event types (Aktionsart
; Lexical Aspect)
- state, activitis, accomplishments, achievements, semelfactives
Event Representation
- Event representation is the task of representing events in a structured format.
Event Extraction
Event extraction is the task of identifying mentions of events and classifying them in text.
an event mention is a span of text (expressions) that refers to an event that can be assigned to a particular point or interval in time.
FrameNet is a large semantic lexicon that organizes English words into frames (based on Frame Semantics).
- Frame (e.g., Make a phone call)
- Frame element (e.g., Agent, Patient, Instrument, Theme, etc.)
- Lexical unit (e.g., call, phone, make, etc.)
- Annotation (e.g., Agent of Make a phone call is caller, etc.)
Chinese FrameNet
(中國) Chinese FrameNet project (CFN) by the State Key Laboratory of Intelligent Technology and Systems at Tsinghua University in Beijing. 失效的計劃連接
(台灣) NTU 半自動生成版
(香港) 中文 VerbNet
Frames, constructions, and FrameNet
Semantic Role Labeling
NLP 計算上最接近的任務
Semantic role labeling (SRL) is the task of identifying and classifying semantic roles of arguments in a sentence. demo
LLM 也算是部分解決了 SRL 的問題。
視角 (perspective)