Computer Science and Engineering
Feature study for improving Chinese overlapping ambiguityresolution based on SVM
Xiong Ying Zhu Jie
Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200240, China
support vector machine Chinese overlapping ambiguity Chinese word segmentation word probability model
In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors.First, four statistical parameters—mutual information, accessor variety, two-character word frequency and single-character word frequency are used to describe the feature vectors respectively.Then other parameters are tried to add as complementary features to the parameters which obtain the best results for further improving the classification performance.Experimental results show that features represented by mutual information, single-character word frequency and accessor variety can obtain an optimum result of 94.39%.Compared with a commonly used word probability model, the accuracy has been improved by 6.62%.Such comparative results confirm that the classification performance can be improved by feature selection and representation.


Biographies: Xiong Ying(1977—), female, graduate;Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.
