Knowledge graph is a very important key factor in data-driven applications such as QA (Question & Answering), Chatbot, etc. Although there are various public knowledge graphs which contain a massive amount of triples, they are still far from perfe...
Knowledge graph is a very important key factor in data-driven applications such as QA (Question & Answering), Chatbot, etc. Although there are various public knowledge graphs which contain a massive amount of triples, they are still far from perfection compared to infinite real-world facts. This lack of knowledge greatly affects the performance of data-driven applications. Therefore, this problem should be solved by creating triples infinitely.
This dissertation proposes two models for knowledge graph enrichment which extract explicit knowledge and implicit knowledge. In order to extract explicit knowledge, a pattern-based relation extraction approach is proposed. This model adopts a parse tree pattern representation and a semantic similarity based pattern filtering function. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, the proposed semantic filter which is a combination of WordNet-based similarity and word embedding similarity removes patterns that are semantically irrelevant to the meaning of a target relation.
A Logical Property Preserving (LPP) embedding method is proposed for extracting implicit knowledge. Previous translation based embedding methods could not handle two crucial logical properties of relations which are transitivity and symmetricity. The embedding space generated by existing translation-based embedding models cannot represent triples which have transitive and symmetric relations perfectly, because they ignore the role of entities in a triple. This dissertation describes the aftereffect of this phenomenon and introduce a solution which is named a role-specific projection. A role-specific projection overcomes the limitation of previous methods by mapping an entity to distinct vectors according to its role in a triple. That is, a head entity is projected onto an embedding space by a head projection operator, and a tail entity is projected by a tail projection operator. This idea can be applied to previous translation based embedding models, easily. In this dissertation, lppTransE, lppTransR, and lppTransD based on TransE, TransR, and TransD, respectively, are introduced.
According to experiments, two proposed knowledge extraction models showed outstanding performance. In an explicit knowledge extraction task, a proposed explicit knowledge extraction model achieved 60.1% of the average accuracy of the newly extracted triples in an English knowledge extraction task. This is 28.9% higher than baseline which is lexical sequence pattern-based. The proposed model also works wonderfully in a multi-lingual environment. In addition, a proposed model showed much more stable performance in comparison with neural network based approaches. These results prove that the proposed model produces more relevant patterns for the relations of seed knowledge, and thus more accurate triples are generated by the patterns. The performance of implicit knowledge extraction models were measured with two kinds of tasks, link prediction, and triple classification. The proposed lpp-models achieved state-of-the-art performance at both tasks. Especially, there was significant improvement with a N-to-N relation category which contains transitivity and symmetricity relations. These results prove that it is critical to preserve logical properties of relations while embedding knowledge graphs, and the proposed method does it effectively.