Costantea, Ioana ; Bot, Radu Ioan ; Wanka, Gert : Patent Document Classification Based on Mutual Information Feature Selection
- Author(s):
-
Costantea, Ioana
Bot, Radu Ioan
Wanka, Gert
- Title:
- Patent Document Classification Based on Mutual Information Feature Selection
- Electronic source:
-
application/pdf
- Preprint series:
- Technische Universität Chemnitz, Fakultät für Mathematik (Germany). Preprint 11, 2004
- Mathematics Subject Classification:
-
62H30 [ Classification and discrimination; cluster analysis ] 68T50 [ Natural language processing ] 90C46 [ Optimality conditions, duality ] - Abstract:
- We describe a supervised text classification approach based on a greedy feature selection method, which uses a support vector machine (SVM) classifier. As feature selection method we use the mutual information. This measures the quantity of information about the categories contained by the words. To train and test the algorithm we used patent documents from the US Patent Classification System. Average break-even point (BEP) for some US Classes is reported as conclusion.
- Keywords:
- Supervised Classification; Support Vector Machines; Mutual Information; Patent Classification
- Language:
-
English
- Publication time:
- 8 / 2004