Lucene for Information Access and Retrieval Research (LIARR)

SIGIR2017 Workshop: Less yaking, more hacking!

Important Dates


Less yaking, more hacking

Open Source Information Retrieval (IR) has attracted a lot of attention and in turn many toolkits. Over the years, Lucene and its and its expansions PyLucene, Solr and Elasticsearch, have grown to be the dominant Open Source Information Retrieval toolkits used in Industry. However, unlike the Open Source IR toolkits developed by academics (e.g, Indri, Lemur, Terrier, Wumpus and Zettair), Lucene et al. has been less focused on evaluation and experimentation and so is not as developed to undertake and perform Information Retrieval (IR) Research and Evaluation. For example, it is not particularly clear how to undertake and perform TREC based evaluations using such toolkits or how to modify the underlying code bases to experiment with new methods and retrieval models.

However, there have been two recent initiatives: Anserini and Lucene4IR for developing add-ons for IR researchers to work with Lucene along with a raft of other independent code bases. So it is timely to bring the community together and look to see how we can develop these resources collaboratively. By working together and with the Open Source community that supports Lucene, the IR community can have greater impact on industry, because we will be able to transfer knowledge more efficiently, increase the reproducibility of the methods we developed, and encourage greater collaboration between academic and industry.

The purpose of this proposed workshop is to bring together the community of researchers using Lucene and its derivatives like Solr and Elasticsearch (referred to as simply “Lucene” below), and develop tools for IR research. Rather than having a “mini-conference” the workshop will be more like a hackathon where participants will learn about Lucene and work on code. Presentations are meant only as a tool for structuring and guiding the efforts of attendees.

The goals of this workshop are:

Scope & Topics

The aim is to take state of the art methods and develop prototype implementations, where we will focus on:

Submission Guidelines

We seek submissions and contributions that describe and detail how to undertake and development various components, algorithms, etc using Lucene based tools, e.g. how to guides, overviews of code developed, etc. And we also seek pitches that outline and describe components that participants would like to have in Lucene based tools, e.g. different parsers, learning to rank, TREC indexers, etc.

Essentially, we would like to provide participants with the opportunity to showcase some of the tools that they have been developing using Lucene et al, providing training on how to use different functionality Lucene et al provides, and to suggest directions on what we should hack.

Submissions should be uploaded to EasyChair via, where they will be reviewed by the committee, and a coherent set will be chosen for presentation. However, we will be as inclusive as possible, and include all acceptable works in the proceedings to showcase the work being undertaken.

All pitches and demos should be in ACM format.