Professional blog of Alfonso E. Romero: Novice problems with LibSVM

4 dic 2008

If you are dealing with LibSVM, you mus remember the following:

When building sparse vectors using datatype svm_node, be careful with allocating keys in ascending order. This is clearly specified in the documentation, but sometimes we are too lazy to read it before.
By default, the outputs of LibSVM, when doing classification are one of {-1,1}. So, do not wait to get real outputs (for instance, distance to the hyperplane), unless you hack the code yourself. If you are doing text categorization, this is good to measure (macro/micro) F1, but not to get a good accuracy.
You must first preprocess your feature vectors! Joachims proposes using a tf * idf, followed by a L2 normalization (classical Euclidean norm). This is valid for text classification, translating every coordinate value to the interval [0,1]. Other normalization schemes are valid for "classic" classification problems like iris and so (in those cases, the different atributes are scaled independently to [0,1]).
There is a nasty bug (lack of feature?) in the Java version, at the method "svm_save_model", that makes very slow that procedure, because the output is not buffered. To solve it, find this line:
DataOutputStream fp = new DataOutputStream(new FileOutputStream(model_file_name));
And change it by the following:
DataOutputStream fp = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(model_file_name)));

Alejandro Bellogín dijo...: Este comentario ha sido eliminado por el autor.; 5 dic 2008, 15:45:00
Alejandro Bellogín dijo...: Just a note (a missing new!):
DataOutputStream fp = new DataOutputStream(new BufferedOutputStream(
new FileOutputStream(model_file_name)));

Bye!

PS: this blog is a very good idea! don't give up!; 5 dic 2008, 15:47:00
Alfonso E. dijo...: Yeah, you're right. I skipped a "new", now should be correct. Obviously the code was right ;).

PS.: Thanks for your comments and for following me :); 5 dic 2008, 16:00:00

Professional blog of Alfonso E. Romero