Professional blog of Alfonso E. Romero

11 jun 2011

Rebayct

[This is a post related with a piece of software I released some time ago]

ReBayCT ('Redes Bayesianas para Clasificación en Tesauros', literally in Spanish "Bayesian networks for classification from a Thesaurus") is a console-based tool for performing experiments in Thesaurus-based indexing, that is to say, Text Categorization over the set of descriptors of a thesaurus. For more information in this problem, see [1]. It's written in Java (JDK 5.0 or higher required). The code of project is located here and it is free software (see below).

There are several classifiers implemented in this software. Two baseline (VSM and hierarchical VSM) and one algorithm based in Bayesian networks with versions for unsupervised classification and also supervised. If you use them, please consider citing [2] and [3].

[1] L. M. de Campos, J. M. Fernández-Luna, J. F. Huete, A. E. Romero, Thesaurus Based Automatic Indexing, book chapter in Handbook of Research on Text and Web Mining Technologies. Ed. Idea Group, Inc. USA, 2009, ISBN: 978-1-59904-990-8. Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/thesaurus.pdf.

[2] L. M. de Campos, A. E. Romero, Bayesian Network Models for Hierarchical Text Classification from a Thesaurus, Int. J. Approx. Reasoning 50(7): 932-944 (2009). Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/ijar09-thesaurus.pdf.

[3] L. M. de Campos, J. M. Fernández-Luna, J. F. Huete, A. E. Romero, Automatic Indexing from a Thesaurus Using Bayesian Networks: Application to the Classification of Parliamentary Initiatives. ECSQARU 2007: 865-877. In: Lecture Notes in Computer Science 4724 Springer 2007, ISBN 978-3-540-75255-4. Available online at http://www.cs.rhul.ac.uk/~aeromero/pdf/lncs07-ecsqaru-thesaurus.pdf.

The license of the software package is GNU GPL v3. Please check http://www.gnu.org/licenses/gpl.html for more details.

Note (a) to possible users: I am not maintaining this software (except for small bugs) and I'm not working in this research topic now. So, don't wait for a new "major release", because it's never going to come out. If you have any doubts about how to extend or use it, please ask writing a comment to this post or to my gmail account (alfonsoeromero). I'll be glad to answer it and helping with your project. I must recall that derivative works should also be free software, as specified by the GPL license (it should have a compatible license).

Note (b) to possible users: to run this software you need a collection and the EUROVOC (or other) thesaurus in XML. I cannot distribute the EUROVOC, so please try to get a copy yourself (in XML). The dataset I used for experimentation in [2] is not entirely public (parliamentary initiatives of the Parliament of Andalusia), and I prefer to have some "control" about it, due to the fact that I don't have the real ownership of the data (and it's not very clear whether I should be able to distribute it), although it could be obtained by parsing public documents occuring in the Parliament of Andalusia webpage. Anyway, if you need the set of documents, please ask them to me.

18 mar 2011

New paper: "Image zooming based on sampling theorems"

Together with my good friend, Prof. José M. Almira, I have published in Materials Matemàtics a paper entitled "Image zooming based on sampling theorems" which reviews some classic zooming methods (specifically the 'sinc interpolation') used in the field of digital image processing. It is a review paper, where we have tried to be precise the compilation of the literature in this topic, and giving a formal notation, from a mathematical point of view, of the process of zooming an image. As the abstract says:

In this paper we introduce two digital zoom methods based on sampling theory and we study their mathematical foundation. The first one (usually known by the names of ‘sinc interpolation’, ‘zero-padding’ and ‘Fourier zoom’) is commonly used by the image processing community.

The paper is online here, as the journal is electronic, and can be seen and downloaded without costs of any kind. I highly encourage to read it if you are insterested in how digital images can be zoomed in programs like The Gimp or Photoshop.

I must thank the editors and add that the final version of the paper is impressive, due to the nice LaTeX style used in the journal and the careful edition they have made, polishing its content, and adding some descriptive images to those we already provided.

1 ene 2011

Happy 2011!

2010 was a great year for me, in professional terms. Mainly, I achieved two important goals for my career:

In April, I read my thesis, and therefore I got my PhD.
In September, I started a new job as a postdoc at the Computer Science Department in the Royal Holloway, University of London, under the supervission of Dr. Alberto Paccanaro.

On the other hand, I released my first contribution to the free software community, DauroLab, a Java library for doing Large Scale Machine Learning (still not very mature). I plan to improve it monthly during this new year, with clear and concise objectives.

Besides, in my new research group, I started working in bioinformatics. This is a new research area for me, plenty of promising and exciting problems, and though I am still learning a lot the basics of this field of science, I will surely publish some paper on it very soon.

Also, 2010 was a great year for many other reasons. But indeed, the most important one is the people I met during it. Thank you everybody for your support, for your trust, and for being there in the not-so-good moments.

9 sept 2010

Got a Postdoc!

Sorry for not updating since so long. Anyway I got some exciting news!

Since September 1 I am a postdoctoral research assistant at the Computational Biology group of the Computer Science Department of the Royal Holloway, University of London, under the supervision of Alberto Paccanaro. Although it might sound really "biologic", the group searches for solutions to biological problems using machine-learning based models. So, in the end, it is machine learning applied to something (biology, in this case).

The position is for 1.5 years (till the end of February, 2012), and it is not renewable. For me it is a great opportunity to start in the world of Computational Biology, in a high-level group. Also, I would like to keep studying Machine Learning, and trying to develop some work in a more "pure" and "formal" line (but this is not a real priority).

The University seems to have a really nice working atmosphere, and all my colleagues are fantastic. Also, the department is small, but counts with great figures in Computer Science (Vladimir Vapnik since his retirement this year, Alexey Chervonenkis or Glenn Shafer, among others). Also, the fact that is not situated in the center of London makes the campus a quiet and peaceful zone (perfect for thinking and working).

I think it is going to be one of the most important periods of my research career (and probably my life).

6 may 2010

Manuscript of the thesis

I have uploaded the manuscript (and the slides) of my thesis (entitled Document Classification Models Based on Bayesian Networks) in the "publications" section of my webpage (so, if you want to have a look at it, you can). All comments will be welcomed.

29 abr 2010

PhD Defended

Defending the Thesis
Originally uploaded by AlfonsoERomero

At last! Also, with a "cum laude" (maximum mark) as the result. Now, it is time to look for a postdoc.

5 abr 2010

Looking for a postdoc

On April 20 27* I will defend my PhD thesis entitled "Document Classification Models based on Bayesian Networks". It has been a long way, and if everything runs normally, I will be a doctor by the end of this month.

In order to improve and open my scientific interests, I have decided to go for a postdoc in Europe. My current research interests are Document Classification, Information Retrieval and Bayesian networks. Anyway, I am interested in all approaches and applications in Machine Learning (not neccesarily documents), but in fact I am opened to any research topic containing a strong theoretical support.

I have a degree (Ingeniería, like BEng + MSc) in Computer Science, and (will have) a PhD in Computer Science, and I think I have a decent list of publications (here are some of them).

So, if you hear of some interesting offer (or if you maybe can give me one), I will be grateful to listen to it. My email: alfonsoeromero (AT) gmail (DOT) com.

_____________________

* Finally, I had some problems due to the famous volcano, and the defense was on April 27.