Interface IFieldExtractor

    public interface IFieldExtractor
    Provides methods for extracting fields from a document.

    The example demonstrates how to implement the interface.

     public class LogExtractor implements IFieldExtractor {
         private final String[] extensions = new String[] { ".log" };
         public final String[] getExtensions() { return extensions; }
         public final DocumentField[] getFields(String filePath) {
             File file = new File(filePath);
             DocumentField[] fields = new DocumentField[] {
                 new DocumentField("FileName", file.getAbsolutePath()),
                 new DocumentField("Content", extractContent(filePath)),
             return fields;
         private String extractContent(String filePath) {
             StringBuilder result = new StringBuilder();
             try {
                 List<String> lines = Files.readAllLines(Paths.get(filePath), StandardCharsets.UTF_8);
                 for (int i = 0; i < lines.size(); i++) {
                     String line = lines.get(i);
                     String processedLine = line.substring(12);
             } catch (IOException ex) {
                 throw new RuntimeException(ex);
             return result.toString();

    The example demonstrates how to use the custorm extractor for indexing.

     String indexFolder = "c:\\MyIndex\\"; // Specify path to the index folder
     String documentsFolder = "c:\\MyDocuments\\"; // Specify path to a folder containing documents to search
     Index index = new Index(indexFolder); // Creating or loading an index
     index.getIndexSettings().getCustomExtractors().addItem(new LogExtractor()); // Adding custom text extractor to index settings
     index.add(documentsFolder); // Indexing documents from the specified folder
    • Method Detail

      • getExtensions

        String[] getExtensions()

        Gets the supported extensions.

        The supported extensions.
      • getFields

        DocumentField[] getFields(String filePath)

        Extracts all fields from the specified document.

        filePath - The document file path.
        The extracted fields.
      • getFields

        DocumentField[] getFields(InputStream stream)

        Extracts all fields from the specified document.

        stream - The document stream.
        The extracted fields.