Text Search in PDFs with C# and .NET

Finding specific text within a PDF document is a common requirement for various applications, such as document management systems, data extraction tools, and automated workflows. The SautinSoft.PDF library provides a robust and efficient way to locate text within PDF files using C# and .NET. This article will guide you through the process of finding text in PDF documents.

Searching text in PDFs is essential for:

  • Data Extraction: Extracting specific information from large documents.
  • Content Management: Indexing and organizing documents based on their content.
  • Automated Workflows: Automating tasks that depend on the presence of specific text within documents.

Step-by-step guide:

  1. Add SautinSoft.PDF from NuGet.
  2. Load the PDF document.
  3. Find the specific text in the PDF file.
  4. Output the number of occurrences found in the console.

Input file:

Output result:

Полный код

using System;
using System.IO;
using SautinSoft;
using SautinSoft.Pdf;
using SautinSoft.Pdf.Content;
using System.Linq;

namespace Sample
{
    class Sample
    {
        /// <summary>
        /// Find text in the PDF.
        /// </summary>
        /// <remarks>
        /// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/find-text.php
        /// </remarks>
        static void Main(string[] args)
        {
            // Before starting this example, please get a free 100-day trial key:
            // https://sautinsoft.com/start-for-free/

            // Apply the key here:
            // PdfDocument.SetLicense("...");

            string pdfFile = Path.GetFullPath(@"..\..\..\simple text.pdf");

            var document = PdfDocument.Load(pdfFile);
            {
                // Find all occurrences of a given text in a pdf file.
                var text = document.Pages[0].Content.GetText().Find("the");
                
                Console.WriteLine("Found " + text.Count() + " elements of this symbol combination.");
            }
        }
    }
}

Download

Option Infer On

Imports System
Imports System.IO
Imports SautinSoft
Imports SautinSoft.Pdf
Imports SautinSoft.Pdf.Content
Imports System.Linq

Namespace Sample
	Friend Class Sample
		''' <summary>
		''' Find text in the PDF.
		''' </summary>
		''' <remarks>
		''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/find-text.php
		''' </remarks>
		Shared Sub Main(ByVal args() As String)
			' Before starting this example, please get a free license:
			' https://sautinsoft.com/start-for-free/

			' Apply the key here:
			' PdfDocument.SetLicense("...");

			Dim pdfFile As String = Path.GetFullPath("..\..\..\simple text.pdf")

			Dim document = PdfDocument.Load(pdfFile)
			If True Then
				' Find all occurrences of a given text in a pdf file.
				Dim text = document.Pages(0).Content.GetText().Find("the")

				Console.WriteLine("Found " & text.Count() & " elements of this symbol combination.")
			End If
		End Sub
	End Class
End Namespace

Download


Если вам нужен пример кода или у вас есть вопрос: напишите нам по адресу support@sautinsoft.ru или спросите в онлайн-чате (правый нижний угол этой страницы) или используйте форму ниже:



Вопросы и предложения всегда приветствуются!

Мы разрабатываем компоненты .Net с 2002 года. Мы знаем форматы PDF, DOCX, RTF, HTML, XLSX и Images. Если вам нужна помощь в создании, изменении или преобразовании документов в различных форматах, мы можем вам помочь. Мы напишем для вас любой пример кода абсолютно бесплатно.