Step 1:Extract aphrasal lexiconfrom reviews,通过规则抽取的phrasal如下图所示:
Step 2:Learn polarity of each phrase,那么,如何评价phrase的polarity呢?直观上,有这样的结论:“Positive phrases co-occur more with‘excellent’,Negative phrases co-occur more with’poor’”,这时,将问题转换成如何衡量词条之间的共现关系?于是,学者们引入了点互信息(Pointwise mutual information,PMI),它经常被用于度量两个具体事件的相关程度,公式为:
两个词条的PMI公式为:
常用的计算PMI(word1, word2)方法是分别以”word1”,”word2”和”word1 NEAR word2”为query,根据搜索引擎检索结果,得到P(word)和P(word1, word2),如下:
P(word) = hits(word)/N
P(word1,word2) = hits(word1 NEAR word2)/N2
则有:
那么,计算一个phrase的polarity公式为(excellent和poor也可以使用其它已知极性词代替):
Turney Algorithm在410 reviews(from Epinions)的数据集上,其中170 (41%) negative,240 (59%) positive,取得了74%的准确率(baseline为59%,均标注为positive)。
Step 3:Rate a review by the average polarity of its phrases
3. Using WordNet to learn polarity:论文见S.M. Kim and E. Hovy. 2004.Determining the sentiment of opinions. COLING 2004,M. Hu and B. Liu.Mining and summarizing customer reviews. In Proceedings of KDD, 2004.该方法步骤如下:
Create positive (“good”) and negative seed-words (“terrible”)
Find Synonyms and Antonyms
Positive Set: Add synonyms of positive words (“well”) and antonyms of negative words
Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)
Repeat, following chains of synonyms
Filter
以上几个方法都有较好的领域适应性和鲁棒性,基本思想可以概括为“Use seeds and semi-supervised learning to induce lexicons”,即:
Start with a seed set of words (‘good’, ‘poor’)
Find other words that have similar polarity:
Using “and” and “but”
本文来自电脑杂谈,转载请注明本文网址:
http://www.pc-fly.com/a/jisuanjixue/article-35803-6.html
有钱了说句屁话都被人捧为经典
小子人太狂了要付出代价的