How to get all documents in Lucene?

If you are new to Lucene you may be wondering how to get all documents in the Lucene index. We can easily get documents matching a particular term or matching a query but how do we get all the documents in the Lucene index?

We can get all documents in Lucene by using either the MatchAllDocsQuery query or by using *:* as the search string with QueryParser.

In this blog post, we will look at both ways with fully working code to get all documents from the Lucene index.

Get all documents in Lucene using MatchAllDocsQuery

Lucene provides a special query called MatchAllDocsQuery. As the name suggests this query will match all the documents that are contained in the index. You can create this query as:

1//This query will match with all documents in the index
2Query query = new MatchAllDocsQuery();

Get all documents in Lucene using QueryParser

The other way is to create the query using Lucene QueryParser with the following *:* as the search string. This tells Lucene to match all terms in all fields thereby matching all documents. Here is the code to get all documents using this method:

1QueryParser queryParser = new QueryParser("title", analyzer);
2//We pass in our special syntax to fetch all documents. This searches for all terms in all fields.
3Query query = queryParser.parse("*:*");

Working example to get all documents in Lucene

As promised, here is the fully functional code to get all documents present in the Lucene index.

In this example, we first index 100 documents into our index. Then we present the two ways to fetch all the documents.

  1package upmanyu.ishan.lucene.tutorial;
  2
  3import org.apache.lucene.analysis.Analyzer;
  4import org.apache.lucene.analysis.standard.StandardAnalyzer;
  5import org.apache.lucene.document.Document;
  6import org.apache.lucene.document.Field;
  7import org.apache.lucene.document.TextField;
  8import org.apache.lucene.index.DirectoryReader;
  9import org.apache.lucene.index.IndexReader;
 10import org.apache.lucene.index.IndexWriter;
 11import org.apache.lucene.index.IndexWriterConfig;
 12import org.apache.lucene.queryparser.classic.ParseException;
 13import org.apache.lucene.queryparser.classic.QueryParser;
 14import org.apache.lucene.search.*;
 15import org.apache.lucene.store.Directory;
 16import org.apache.lucene.store.FSDirectory;
 17
 18import java.io.IOException;
 19import java.nio.file.Paths;
 20
 21/**
 22 * This class illustrates how to get all documents in Lucene index.
 23 */
 24public class GetAllDocs {
 25
 26    /**
 27     * Get all documents in Lucene index using query parser syntax of *:*
 28     *
 29     * @param indexPath the index path where index is stored
 30     * @throws IOException    the io exception
 31     * @throws ParseException the parse exception
 32     */
 33    public void getAllDocsWithQueryParser(String indexPath) throws IOException, ParseException {
 34
 35        //We need to open an IndexReader to read the lucene index stored at given indexPath
 36        IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
 37
 38        //IndexSearcher will help us query the index
 39        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
 40
 41        //We will use standard analyzer while we parse our query.
 42        Analyzer analyzer = new StandardAnalyzer();
 43
 44        //This query parser will search in title field by default if no field is specified.
 45        //Also, this will use our Standard analyzer to create terms for the query.
 46        QueryParser queryParser = new QueryParser("title", analyzer);
 47
 48        //We pass in our special syntax to fetch all documents. This searches for all terms in all fields.
 49        Query query = queryParser.parse("*:*");
 50
 51        searchAndPrintResults(indexSearcher, query);
 52
 53    }
 54
 55    /**
 56     * Get all documents in Lucene index using MatchAllDocsQuery.
 57     *
 58     * @param indexPath the index path where index is stored
 59     * @throws IOException    the io exception
 60     * @throws ParseException the parse exception
 61     */
 62    public void getAllDocsWithMatchAllDocsQuery(String indexPath) throws IOException, ParseException {
 63
 64        //We need to open an IndexReader to read the lucene index stored at given indexPath
 65        IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
 66
 67        //IndexSearcher will help us query the index
 68        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
 69
 70        //This query will match with all documents in the index
 71        Query query = new MatchAllDocsQuery();
 72
 73        searchAndPrintResults(indexSearcher, query);
 74
 75    }
 76
 77
 78    private void searchAndPrintResults(IndexSearcher indexSearcher, Query query) throws IOException {
 79        //We perform the search and get search results 10 at a time.
 80        TopDocs topDocs = indexSearcher.search(query, 10);
 81
 82        //Print the count of matching documents.
 83        long totalHits = topDocs.totalHits.value;
 84        System.out.println(String.format("Found %d hits.", totalHits));
 85
 86        //Print the title field of each matching document
 87        while(topDocs.scoreDocs.length != 0){
 88            ScoreDoc[] results  = topDocs.scoreDocs;
 89            for(ScoreDoc scoreDoc: results){
 90
 91                //Returns the id of the document matching the query
 92                int docId = scoreDoc.doc;
 93                float score = scoreDoc.score;
 94
 95                //We fetch the complete document from index via its id
 96                Document movie = indexSearcher.doc(docId);
 97
 98                //Now we print the title of the document
 99                System.out.println(String.format("Found: %s", movie.get("title")));
100            }
101
102            //we fetch the last doc of this page. We will need to pass this to index searcher to get next page.
103            ScoreDoc lastDoc = results[results.length -1];
104
105            //Get next 10 documents after lastDoc. This gets us the next page of search results.
106            topDocs = indexSearcher.searchAfter(lastDoc, query, 10);
107        }
108    }
109
110    /**
111     * This method adds 100 documents to our index.
112     *
113     * @param indexPath the index path where we want to store the index
114     * @throws IOException the io exception
115     */
116    public void index(String indexPath) throws IOException {
117
118        //We open a File System directory as we want to store the index on our local file system.
119        Directory directory = FSDirectory.open(Paths.get(indexPath));
120
121        //The analyzer is used to perform analysis on text of documents and create the terms that will be
122        //added in the index.
123        Analyzer analyzer = new StandardAnalyzer();
124        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
125
126        //This will always overwrite the existing index. This way even if we run the program multiple times
127        //we won't see duplicate documents.
128        indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
129        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
130
131        System.out.println("Going to index 100 documents.");
132
133        //Now we create 100 documents. We have only one Field called title in each document.
134        for(int i = 1; i <= 100; i++){
135            Document doc = new Document();
136            doc.add(new TextField("title", "This is document "+ i, Field.Store.YES));
137            indexWriter.addDocument(doc);
138        }
139
140        System.out.println("Documents Indexed Successfully!");
141
142        indexWriter.close();
143    }
144
145    /**
146     * The entry point of application.
147     *
148     * @param args the input arguments
149     * @throws IOException    the io exception
150     * @throws ParseException the parse exception
151     */
152    public static void main(String[] args) throws IOException, ParseException {
153        String indexPath = "index";
154        GetAllDocs getAllDocsExample = new GetAllDocs();
155        getAllDocsExample.index(indexPath);
156
157        System.out.println("Here are all documents fetched with QueryParser Syntax");
158        getAllDocsExample.getAllDocsWithQueryParser(indexPath);
159
160        System.out.println("\n");
161        System.out.println("Here are all documents fetched with MatchAllDocsQuery");
162        getAllDocsExample.getAllDocsWithMatchAllDocsQuery(indexPath);
163    }
164}

Conclusion

In this post, we saw how to get all documents in Lucene using two methods. We first introduced the methods and then we gave a detailed example of how to use both methods.

What next?

You now seem to be enjoying working with Lucene, but have you ever wondered how Lucene is able to give search result this fast? It is so because it is powered by an awesome data structure known as the inverted index. Read all about it here.