Important Dates

Pitches and Demos submission 20 June 2017
Notification of acceptance: 10 July 2017
Camera ready: 15 July 2017
Workshop: 11 August 2017

Overview

Less yaking, more hacking

Open Source Information Retrieval (IR) has attracted a lot of attention and in turn many toolkits. Over the years, Lucene and its and its expansions PyLucene, Solr and Elasticsearch, have grown to be the dominant Open Source Information Retrieval toolkits used in Industry. However, unlike the Open Source IR toolkits developed by academics (e.g, Indri, Lemur, Terrier, Wumpus and Zettair), Lucene et al. has been less focused on evaluation and experimentation and so is not as developed to undertake and perform Information Retrieval (IR) Research and Evaluation. For example, it is not particularly clear how to undertake and perform TREC based evaluations using such toolkits or how to modify the underlying code bases to experiment with new methods and retrieval models.

However, there have been two recent initiatives: Anserini and Lucene4IR for developing add-ons for IR researchers to work with Lucene along with a raft of other independent code bases. So it is timely to bring the community together and look to see how we can develop these resources collaboratively. By working together and with the Open Source community that supports Lucene, the IR community can have greater impact on industry, because we will be able to transfer knowledge more efficiently, increase the reproducibility of the methods we developed, and encourage greater collaboration between academic and industry.

The purpose of this proposed workshop is to bring together the community of researchers using Lucene and its derivatives like Solr and Elasticsearch (referred to as simply “Lucene” below), and develop tools for IR research. Rather than having a “mini-conference” the workshop will be more like a hackathon where participants will learn about Lucene and work on code. Presentations are meant only as a tool for structuring and guiding the efforts of attendees.

The goals of this workshop are:

to create a development plan and common codebase for IR research with Lucene,
to implement various information retrieval methods in Lucene/Solr/Elasticsearch and
to evaluate the quality of such methods and models.

Scope & Topics

The aim is to take state of the art methods and develop prototype implementations, where we will focus on:

exposing the standard functions that we need to have access to when we want to code up a retrieval model;
getting some of the core retrieval functions in there (those that are not there already);
provide an understanding on how some of the functions are implemented in Lucene and how they deviate from how people know them in IR (e.g., field search in Elasticsearch);
provide a roadmap & a set of guidelines to researchers and developers for which models/algorithms/techniques should the community include next into Lucene and how this should be done.

Submission Guidelines

We seek submissions and contributions that describe and detail how to undertake and development various components, algorithms, etc using Lucene based tools, e.g. how to guides, overviews of code developed, etc. And we also seek pitches that outline and describe components that participants would like to have in Lucene based tools, e.g. different parsers, learning to rank, TREC indexers, etc.

Essentially, we would like to provide participants with the opportunity to showcase some of the tools that they have been developing using Lucene et al, providing training on how to use different functionality Lucene et al provides, and to suggest directions on what we should hack.

Pitches: 1–2 page outlines of the algorithms/components/features/etc to be developed, brief rationale for creating them, sketch of how they might be implemented, links to relevant papers, and existing code, along with other relevant information, i.e. how it might lead to reproducibility experiments, how it could be used, etc.
Demos: 1–2 page outlines of demos/tools/code developed, what is does, how it works, etc.

Submissions should be uploaded to EasyChair via https://easychair.org/conferences/?conf=liarr2017, where they will be reviewed by the committee, and a coherent set will be chosen for presentation. However, we will be as inclusive as possible, and include all acceptable works in the proceedings to showcase the work being undertaken.

All pitches and demos should be in ACM format.

Committee

Leif Azzopardi (University of Strathclyde)
Matt Crane (University of Waterloo)
Hui Fang (University of Delaware)
Grant Ingersoll (Lucidworks)
Jimmy Lin (University of Waterloo)
Yashar Moshfeghi (University of Glasgow)
Harrisen Scells (Queensland University of Technology)
Peilin Yang (University of Delaware)
Guido Zuccon (Queensland University of Technology)