文章资料-库博笔记 游客
conceptNet及认知网格说明
【5007】by02013-09-09 2013-09-19 最后编辑2018-05-06 23:00:05 浏览1434

Chinese ConceptNet

iAgents Lab

Introduction

常識(即 commonsense,多數人共享、一般非專業的知識)是人們間溝通、

解決難題的基本要素。不幸的是, 雖然現代電腦的運算能力與儲存容量均急遽

成長,電腦的「沒常識」卻是一個眾所週知的缺陷。欲將數百萬筆人類知識轉換

成機器可處理的格式的確是一件費時且昂貴的工作。經過二十五年的努力,

OpenCyc 2.0 甫於 2009 年七月正式推出,其知識庫含 47,000 個「概念」,以及

306,000 筆知識工程師悉心編撰的「事實」。

相對的,MIT 媒體實驗室的「開放常識」計畫於十年內順利的從一萬五千

名使用者貢獻了超過百萬筆英文句子。目前,兩個知識庫的內容均以英文為主,

而且還極不完整。本研究計畫挑戰多語言常識知識庫的資料蒐集、驗證、與推理

技術的開發,以期改善常識資料的涵蓋度、正確性、以及有效推理的能力。尤其

是,本研究將旨在結合機器學習技術與具生產力社群遊戲來建構一個中文的嘗試

知識庫。前者自動從非結構式與半結構式線上文件擷取出結構式知識;而後者則

累積線上社群遊戲玩家的常識。所產出的知識庫可能含有錯誤或矛盾的語句。

Our Knowledge Base

中文為世界上最多數人口使用的語言。在東方世界中,也存在著不少常識,

非常值得我們去蒐集與應用。根基於英文版本的 ConceptNet,我們建構專屬於中

文的 ConceptNet。並藉由數個遊戲以及熱心人士的貢獻,來擴增我們的

ConceptNet-Zn,並維持其常識的可信度與品質。

截至目前為止,中文 ConceptNet 擁有超過 60 萬個常識句子,僅次於英文版

本的 100 萬個。相信在不久的將來,中文版的數量可以到達 100 萬個。我們仍致

力於增加遊戲的娛樂性,以吸引更多的使用者。

Example – Chinese commonsense about “紙”:

http://conceptnet5.media.mit.edu/web/concept/zh_TW/紙

AnalogySpace

AnalogySpace is the reasoning technique used in ConceptNet.

AnalogySpace Matrix

The AnalogySpace matrix represents each concept as a feature vector. The

feature of a concept is its neighbor and the relation. For example, "has fur" and

"capable of flying" are features of bird



The assertions in our knowledge base (e.g. Chinese ConceptNet) can be

converted to the AnalogySpace matrix. The rows in AnalogySpace matrix are

concepts; the columns are their features. Each entry is associated with a

real-number value which is the number of collected sentences for an assertion.

Figure 2 is part of the AnalogySpace matrix.

Figure 2. Part of the AnalogySpace matrix.

Semantics of AnalogySpace Matrix

For any two rows in the AnalogySpace matrix, we can find that the sentence in

an inference rule can be replaced by other sentence and gives plausible

inference results if the two sentences have similar truth assignments for the

same feature. For example, the sentences PartOf(fur, cat) and IsA(cat, pet) in

modus ponens rule can be replaced by PartOf (fur, dog) and IsA(dog, pet).

Similarity of any two concepts can then be defined as the number of shared

features. We can use similarity to identify the semantic meaning of concepts.

Build Chinese AnalogySpace

Since the size of knowledge base is always very large, the AnalogySpace matrix

must be large and sparse. We apply truncated singular value decomposition

(truncated SVD) on AnalogySpace matrix to smooth the noisy data in the

knowledge base. The concepts are then transformed to a k-dimensional vector

space spanned by eigen-features. In the vector space spanned by

eigen-features, the proximity of two concepts represents their level of overlaps

in features. Therefore, the similarity of two concept vectors can be defined by

their cosine similarity. Figure 3 is the projection of 1st and 2nd dimension of

Chinese AnalogySpace. The 1st dimension groups the things people don't want

together; the 2nd dimension is most about the objects we can find in our daily


认知网格是一个概念的语义网,采用以下关系对应两个概念。



33:{1}会让你想要{2}。 34:{1}会让你{2}。 35:{1}之后可能会发生的事情是{2}。 36:因为{1}所以{2}。 37:{1}可能会带来{2}。 38:{1}可能会引起{2}。 40:{1}的时候,首先要{2}。 45:{1}是{2}的一部分。 46:{1}可以用{2}制成。 47:{1}由{2}组成。 50:{1}是一种{2}。 51:{1}在{2}里。 55:{1}在{2}外。 57:你可以在{2}找到{1}。 58:{2}有{1}。 60:{2}的时候可能会用到{1}。 63:{1}能做的事情有{2}。 64:{1}会{2}。 65:你会{1}因为你{2}。 66:{1}是为了{2}。 67:想要有{2}应该要{1}。 68:当你想要{2}的时候你可能会{1}。 69:{2}的时候会想要{1}。 70:{1}喜欢{2}。 71:{1}想要{2}。 72:{1}不想要{2}。 73:{1}害怕{2}。 75:{1}痛恨{2}。 79:{1}是{2}的。84:{2}可能代表{1}。 89:{1}代表{2}。 92:{1}的时候,你会{2}。 95:在{1},你会{2}。
将这些概念按照关系组织起来,用于机器对概念的认知。

参阅论文:

ConceptNet: A Practical Commonsense Reasoning Toolkit

Hugo Liu and Push Singh

Media Laboratory

Massachusetts Institute of Technology

{hugo, push}@media.mit.edu