If you are new to Lucene you may be wondering how to get all documents in the Lucene index. We can easily get documents matching a particular term or matching a query but how do we get all the documents in the Lucene index?
We can get all documents in Lucene by using either the MatchAllDocsQuery query or by using *:* as the search string with QueryParser.
In this blog post, we will look at both ways with fully working code to get all documents from the Lucene index.
Get all documents in Lucene using MatchAllDocsQuery
Lucene provides a special query called MatchAllDocsQuery. As the name suggests this query will match all the documents that are contained in the index. You can create this query as:
1//This query will match with all documents in the index
2Query query = new MatchAllDocsQuery();
Get all documents in Lucene using QueryParser
The other way is to create the query using Lucene QueryParser with the following *:* as the search string. This tells Lucene to match all terms in all fields thereby matching all documents. Here is the code to get all documents using this method:
1QueryParser queryParser = new QueryParser("title", analyzer);
2//We pass in our special syntax to fetch all documents. This searches for all terms in all fields.
3Query query = queryParser.parse("*:*");
Working example to get all documents in Lucene
As promised, here is the fully functional code to get all documents present in the Lucene index.
In this example, we first index 100 documents into our index. Then we present the two ways to fetch all the documents.
1package upmanyu.ishan.lucene.tutorial;
2
3import org.apache.lucene.analysis.Analyzer;
4import org.apache.lucene.analysis.standard.StandardAnalyzer;
5import org.apache.lucene.document.Document;
6import org.apache.lucene.document.Field;
7import org.apache.lucene.document.TextField;
8import org.apache.lucene.index.DirectoryReader;
9import org.apache.lucene.index.IndexReader;
10import org.apache.lucene.index.IndexWriter;
11import org.apache.lucene.index.IndexWriterConfig;
12import org.apache.lucene.queryparser.classic.ParseException;
13import org.apache.lucene.queryparser.classic.QueryParser;
14import org.apache.lucene.search.*;
15import org.apache.lucene.store.Directory;
16import org.apache.lucene.store.FSDirectory;
17
18import java.io.IOException;
19import java.nio.file.Paths;
20
21/**
22 * This class illustrates how to get all documents in Lucene index.
23 */
24public class GetAllDocs {
25
26 /**
27 * Get all documents in Lucene index using query parser syntax of *:*
28 *
29 * @param indexPath the index path where index is stored
30 * @throws IOException the io exception
31 * @throws ParseException the parse exception
32 */
33 public void getAllDocsWithQueryParser(String indexPath) throws IOException, ParseException {
34
35 //We need to open an IndexReader to read the lucene index stored at given indexPath
36 IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
37
38 //IndexSearcher will help us query the index
39 IndexSearcher indexSearcher = new IndexSearcher(indexReader);
40
41 //We will use standard analyzer while we parse our query.
42 Analyzer analyzer = new StandardAnalyzer();
43
44 //This query parser will search in title field by default if no field is specified.
45 //Also, this will use our Standard analyzer to create terms for the query.
46 QueryParser queryParser = new QueryParser("title", analyzer);
47
48 //We pass in our special syntax to fetch all documents. This searches for all terms in all fields.
49 Query query = queryParser.parse("*:*");
50
51 searchAndPrintResults(indexSearcher, query);
52
53 }
54
55 /**
56 * Get all documents in Lucene index using MatchAllDocsQuery.
57 *
58 * @param indexPath the index path where index is stored
59 * @throws IOException the io exception
60 * @throws ParseException the parse exception
61 */
62 public void getAllDocsWithMatchAllDocsQuery(String indexPath) throws IOException, ParseException {
63
64 //We need to open an IndexReader to read the lucene index stored at given indexPath
65 IndexReader indexReader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
66
67 //IndexSearcher will help us query the index
68 IndexSearcher indexSearcher = new IndexSearcher(indexReader);
69
70 //This query will match with all documents in the index
71 Query query = new MatchAllDocsQuery();
72
73 searchAndPrintResults(indexSearcher, query);
74
75 }
76
77
78 private void searchAndPrintResults(IndexSearcher indexSearcher, Query query) throws IOException {
79 //We perform the search and get search results 10 at a time.
80 TopDocs topDocs = indexSearcher.search(query, 10);
81
82 //Print the count of matching documents.
83 long totalHits = topDocs.totalHits.value;
84 System.out.println(String.format("Found %d hits.", totalHits));
85
86 //Print the title field of each matching document
87 while(topDocs.scoreDocs.length != 0){
88 ScoreDoc[] results = topDocs.scoreDocs;
89 for(ScoreDoc scoreDoc: results){
90
91 //Returns the id of the document matching the query
92 int docId = scoreDoc.doc;
93 float score = scoreDoc.score;
94
95 //We fetch the complete document from index via its id
96 Document movie = indexSearcher.doc(docId);
97
98 //Now we print the title of the document
99 System.out.println(String.format("Found: %s", movie.get("title")));
100 }
101
102 //we fetch the last doc of this page. We will need to pass this to index searcher to get next page.
103 ScoreDoc lastDoc = results[results.length -1];
104
105 //Get next 10 documents after lastDoc. This gets us the next page of search results.
106 topDocs = indexSearcher.searchAfter(lastDoc, query, 10);
107 }
108 }
109
110 /**
111 * This method adds 100 documents to our index.
112 *
113 * @param indexPath the index path where we want to store the index
114 * @throws IOException the io exception
115 */
116 public void index(String indexPath) throws IOException {
117
118 //We open a File System directory as we want to store the index on our local file system.
119 Directory directory = FSDirectory.open(Paths.get(indexPath));
120
121 //The analyzer is used to perform analysis on text of documents and create the terms that will be
122 //added in the index.
123 Analyzer analyzer = new StandardAnalyzer();
124 IndexWriterConfig indexWriterConfig = new IndexWriterConfig(analyzer);
125
126 //This will always overwrite the existing index. This way even if we run the program multiple times
127 //we won't see duplicate documents.
128 indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
129 IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
130
131 System.out.println("Going to index 100 documents.");
132
133 //Now we create 100 documents. We have only one Field called title in each document.
134 for(int i = 1; i <= 100; i++){
135 Document doc = new Document();
136 doc.add(new TextField("title", "This is document "+ i, Field.Store.YES));
137 indexWriter.addDocument(doc);
138 }
139
140 System.out.println("Documents Indexed Successfully!");
141
142 indexWriter.close();
143 }
144
145 /**
146 * The entry point of application.
147 *
148 * @param args the input arguments
149 * @throws IOException the io exception
150 * @throws ParseException the parse exception
151 */
152 public static void main(String[] args) throws IOException, ParseException {
153 String indexPath = "index";
154 GetAllDocs getAllDocsExample = new GetAllDocs();
155 getAllDocsExample.index(indexPath);
156
157 System.out.println("Here are all documents fetched with QueryParser Syntax");
158 getAllDocsExample.getAllDocsWithQueryParser(indexPath);
159
160 System.out.println("\n");
161 System.out.println("Here are all documents fetched with MatchAllDocsQuery");
162 getAllDocsExample.getAllDocsWithMatchAllDocsQuery(indexPath);
163 }
164}
Conclusion
In this post, we saw how to get all documents in Lucene using two methods. We first introduced the methods and then we gave a detailed example of how to use both methods.
What next?
You now seem to be enjoying working with Lucene, but have you ever wondered how Lucene is able to give search result this fast? It is so because it is powered by an awesome data structure known as the inverted index. Read all about it here.