com.groupdocs.search.common

Interface IFieldExtractor

  • All Known Subinterfaces:
    IContainerItemExtractor


    public interface IFieldExtractor

    Provides methods for extracting fields from a document.

    Learn more

    The example demonstrates how to implement the interface.

     public class LogExtractor implements IFieldExtractor
     {
         private final String[] extensions = new String[] { ".log" };
         public final String[] getExtensions() { return extensions; }
         public final DocumentField[] getFields(String filePath)
         {
             File file = new File(filePath);
             DocumentField[] fields = new DocumentField[]
             {
                 new DocumentField("FileName", file.getAbsolutePath()),
                 new DocumentField("Content", extractContent(filePath)),
             };
             return fields;
         }
         private String extractContent(String filePath)
         {
             StringBuilder result = new StringBuilder();
             try {
                 List<String> lines = Files.readAllLines(Paths.get(filePath), StandardCharsets.UTF_8);
                 for (int i = 0; i < lines.size(); i++)
                 {
                     String line = lines.get(i);
                     String processedLine = line.substring(12);
                     result.append(processedLine);
                 }
             } catch (IOException ex) {
                 throw new RuntimeException(ex);
             }
             return result.toString();
         }
     }
     

    The example demonstrates how to use the custorm extractor for indexing.

     String indexFolder = "c:\\MyIndex\\"; // Specify path to the index folder
     String documentsFolder = "c:\\MyDocuments\\"; // Specify path to a folder containing documents to search
     Index index = new Index(indexFolder); // Creating or loading an index
     index.getIndexSettings().getCustomExtractors().addItem(new LogExtractor()); // Adding custom text extractor to index settings
     index.add(documentsFolder); // Indexing documents from the specified folder
     
    • Method Detail

      • getExtensions

        String[] getExtensions()

        Gets the supported extensions.

        Returns:
        The supported extensions.
      • getFields

        DocumentField[] getFields(String filePath)

        Extracts all fields from the specified document.

        Parameters:
        filePath - The document file path.
        Returns:
        The extracted fields.
      • getFields

        DocumentField[] getFields(InputStream stream)

        Extracts all fields from the specified document.

        Parameters:
        stream - The document stream.
        Returns:
        The extracted fields.