5 dic 2008

LibSVM integrated

I finally finished integration of LibSVM in my software. Trying to reproduce Joachims' results on reuters and ohsumed-23 I got the following on the micro-averaged breakeven point:
  • On reuters: 84.2 (Joachims), 85.9 (me).
  • On ohsumed: 60.7 (Joachims), 64.8 (me).
(the reference paper is this).

The differences can be due to the difference on the stopword list (I used the famous 571 words of the SMART system which is almost a standard) and my own processing procedure (I remove all punctuation marks). Indeed the results are really good, but the great difference in ohsumed is mysterious...

By the way, training time in my Core 2 Duo 2Ghz, for LibSVM is 4m28s, and classification 1m42s. It is the Java version, but it is still affordable. Who said SVMs were slow?

On the following days, I will try to improve my k-NN implementation (at this time, it has no inverted index, and so is terrifyingly slow), and to include another Bayesian network classifier (Sahami's "limited dependence bayesian classifier"), which I think could be improved in some way to make it competitive with SVMs.

4 comentarios:

JJ Merelo dijo...

Pretty good, for a SVN implementation. I still remember the times when it took a well-loaded SGI to do that kind of stuff

Hieu Kieng dijo...

Hi, do u still maintain this post. The PRBEP on Ohsumed is pretty interesting. In Joachim98, BEP is calculated via 1 vs. all strategy while the built-in strategy in Libsvm is 1 vs. 1. So can you explain in more detail the way you calculate BEP using Libsvm? I am currently working with SVM and Ohsumed.

Alfonso E. dijo...

Dear Hieu,

The problem is what you call "Ohsumed". Rather than "ohsumed" I'd better say "Ohsumed-23", where the "23" makes reference to 23 categories related with heart diseases. See here for more details: http://disi.unitn.it/moschitti/corpora.htm

Hieu Kieng dijo...

I used the same corpus and I got 62.8 BEP using libsvm with linear kernel.
My question is that do u use
1) the binary classifier and use 1 vs all as in Joachim98
2) or the built-in multi-class 1 vs 1 supported by libsvm.
In case 2) how do you calculate the BEP (by tuning which parameter?)