11 jun 2011

Rebayct

[This is a post related with a piece of software I released some time ago]

ReBayCT ('Redes Bayesianas para Clasificación en Tesauros', literally in Spanish "Bayesian networks for classification from a Thesaurus") is a console-based tool for performing experiments in Thesaurus-based indexing, that is to say, Text Categorization over the set of descriptors of a thesaurus. For more information in this problem, see [1]. It's written in Java (JDK 5.0 or higher required). The code of project is located here and it is free software (see below).

There are several classifiers implemented in this software. Two baseline (VSM and hierarchical VSM) and one algorithm based in Bayesian networks with versions for unsupervised classification and also supervised. If you use them, please consider citing [2] and [3].

[1] L. M. de Campos, J. M. Fernández-Luna, J. F. Huete, A. E. Romero, Thesaurus Based Automatic Indexing, book chapter in Handbook of Research on Text and Web Mining Technologies. Ed. Idea Group, Inc. USA, 2009, ISBN: 978-1-59904-990-8. Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/thesaurus.pdf.

[2] L. M. de Campos, A. E. Romero, Bayesian Network Models for Hierarchical Text Classification from a Thesaurus, Int. J. Approx. Reasoning 50(7): 932-944 (2009). Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/ijar09-thesaurus.pdf.

[3] L. M. de Campos, J. M. Fernández-Luna, J. F. Huete, A. E. Romero, Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives. ECSQARU 2007: 865-877. In: Lecture Notes in Computer Science 4724 Springer 2007, ISBN 978-3-540-75255-4. Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/lncs07-ecsqaru-thesaurus.pdf.

The license of the software package is GNU GPL v3. Please check http://www.gnu.org/licenses/gpl.html for more details.

Note (a) to possible users: I am not maintaining this software (except for small bugs) and I'm not working in this research topic now. So, don't wait for a new "major release", because it's never going to come out. If you have any doubts about how to extend or use it, please ask writing a comment to this post or to my gmail account (alfonsoeromero). I'll be glad to answer it and helping with your project. I must recall that derivative works should also be free software, as specified by the GPL license (it should have a compatible license).

Note (b) to possible users: to run this software you need a collection and the EUROVOC (or other) thesaurus in XML. I cannot distribute the EUROVOC, so please try to get a copy yourself (in XML). The dataset I used for experimentation in [2] is not entirely public (parliamentary initiatives of the Parliament of Andalusia), and I prefer to have some "control" about it, due to the fact that I don't have the real ownership of the data (and it's not very clear whether I should be able to distribute it), although it could be obtained by parsing public documents occuring in the Parliament of Andalusia webpage. Anyway, if you need the set of documents, please ask them to me.

18 mar 2011

New paper: "Image zooming based on sampling theorems"

Together with my good friend, Prof. José M. Almira, I have published in Materials Matemàtics a paper entitled "Image zooming based on sampling theorems" which reviews some classic zooming methods (specifically the 'sinc interpolation') used in the field of digital image processing. It is a review paper, where we have tried to be precise the compilation of the literature in this topic, and giving a formal notation, from a mathematical point of view, of the process of zooming an image. As the abstract says:
In this paper we introduce two digital zoom methods based on sampling theory and we study their mathematical foundation. The first one (usually known by the names of ‘sinc interpolation’, ‘zero-padding’ and ‘Fourier zoom’) is commonly used by the image processing community.
The paper is online here, as the journal is electronic, and can be seen and downloaded without costs of any kind. I highly encourage to read it if you are insterested in how digital images can be zoomed in programs like The Gimp or Photoshop.

I must thank the editors and add that the final version of the paper is impressive, due to the nice LaTeX style used in the journal and the careful edition they have made, polishing its content, and adding some descriptive images to those we already provided.

1 ene 2011

Happy 2011!

2010 was a great year for me, in professional terms. Mainly, I achieved two important goals for my career:

  1. In April, I read my thesis, and therefore I got my PhD.
  2. In September, I started a new job as a postdoc at the Computer Science Department in the Royal Holloway, University of London, under the supervission of Dr. Alberto Paccanaro.
On the other hand, I released my first contribution to the free software community, DauroLab, a Java library for doing Large Scale Machine Learning (still not very mature). I plan to improve it monthly during this new year, with clear and concise objectives.

Besides, in my new research group, I started working in bioinformatics. This is a new research area for me, plenty of promising and exciting problems, and though I am still learning a lot the basics of this field of science, I will surely publish some paper on it very soon.

Also, 2010 was a great year for many other reasons. But indeed, the most important one is the people I met during it. Thank you everybody for your support, for your trust, and for being there in the not-so-good moments.