A few hand-built patterns
To bootstrap a lexicon
接下来,通过相关的几篇论文,详细阐述下构建情感词典的方法。具体如下:
1. Hatzivassiloglou & McKeown:论文见Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997.Predicting the Semantic Orientation of Adjectives.ACL, 174–181,基于这样的一种语言现象:“Adjectives conjoined by ‘and’’ have same polarity;Adjectives conjoined by ‘but‘ do not”,如下示例:
Fairandlegitimate, corruptandbrutal
*fairandbrutal, *corruptandlegitimate
fairbutbrutal
Hatzivassiloglou & McKeown(1997)提出了基于bootstrapping的学习方法,主要包括四步:
Step 1:Label seed set of 1336 adjectives (all >20 in 21 million word WSJ corpus)
初始集包括657个 positive words(如adequate central clever famous intelligent remarkable reputed sensitive slender thriving…)和679个 negative words(如contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting…)
Step 2:Expand seed set to conjoined adjectives,如下图所示:
Step 3:Supervised classifier assigns “polarity similarity” to each word pair, resulting in graph,如下图所示:
Step 4:Clustering for partitioning the graph into two
最终,输出新的情感词典,如下(加粗词条为自动挖掘出的词条):
Positive: bold decisivedisturbinggenerous good honest important large mature patient peaceful positive proud sound stimulating straightforwardstrangetalented vigorous witty…
Negative: ambiguouscautiouscynical evasive harmful hypocritical inefficient insecure irrational irresponsible minoroutspokenpleasantreckless risky selfish tedious unsupported vulnerable wasteful…
2. Turney Algorithm:论文见Turney (2002):Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,具体步骤如下:
本文来自电脑杂谈,转载请注明本文网址:
http://www.pc-fly.com/a/jisuanjixue/article-35803-5.html
还讲陆军
坏账多
千玺千玺千玺