etrieving Text Dimensions in PDF using C# and .NET

Reading additional information about a text in a PDF document is a common task that can be accomplished using the SautinSoft SautinSoft.Pdf library. This library provides a convenient way to interact with PDF content, including text, images, and other elements. Below is an example of how you can use SautinSoft.Pdf to read additional information about text elements, such as their bounds, font properties, and color.

Output result:

Step-by-Step Guide

  1. Create a New Project

    Open Visual Studio and create a new Console Application project.

  2. Add PDF.Net Reference

    Download the PDF.Net library and add it to your project. You can do this by right-clicking on your project in the Solution Explorer, selecting "Add Reference," and browsing to the PDF.Net DLL.

  3. Write the Code to Get a text bounds and size.
  4. Полный код

    using System;
    using System.IO;
    using SautinSoft;
    using SautinSoft.Pdf;
    using SautinSoft.Pdf.Content;
    
    class Program
    {
        /// <summary>
        /// Reading additional info.
        /// </summary>
        /// <remarks>
        /// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/reading-additional-information.php
        /// </remarks>
        static void Main()
        {
            // Before starting this example, please get a free 100-day trial key:
            // https://sautinsoft.com/start-for-free/
    
            // Apply the key here:
            // PdfDocument.SetLicense("...");
    
            string pdfFile = Path.GetFullPath(@"..\..\..\table.pdf");
            // Iterate through all PDF pages and through each page's content elements,
            // and retrieve only the text content elements.
            using (var document = PdfDocument.Load(pdfFile))
            {
                foreach (var page in document.Pages)
                {
                    var contentEnumerator = page.Content.Elements.All(page.Transform).GetEnumerator();
                    while (contentEnumerator.MoveNext())
                    {
                        if (contentEnumerator.Current.ElementType == PdfContentElementType.Text)
                        {
                            var textElement = (PdfTextContent)contentEnumerator.Current;
                            var text = textElement.ToString();
                            var font = textElement.Format.Text.Font;
                            var color = textElement.Format.Fill.Color;
                            var bounds = textElement.Bounds;
    
                            contentEnumerator.Transform.Transform(bounds);
                            // Read the text content element's additional information.
                            Console.WriteLine($"Unicode text: {text}");
                            Console.WriteLine($"Font name: {font.Face.Family.Name}");
                            Console.WriteLine($"Font size: {font.Size}");
                            Console.WriteLine($"Font style: {font.Face.Style}");
                            Console.WriteLine($"Font weight: {font.Face.Weight}");
                            if (color.TryGetRgb(out double red, out double green, out double blue))
                                Console.WriteLine($"Color: Red={red}, Green={green}, Blue={blue}");
                            Console.WriteLine($"Bounds: Left={bounds.Left:0.00}, Bottom={bounds.Bottom:0.00}, Right={bounds.Right:0.00}, Top={bounds.Top:0.00}");
                            Console.WriteLine();
                        }
                    }
                }
            }
        }
    }

    Download

    Option Infer On
    
    Imports System
    Imports System.IO
    Imports SautinSoft
    Imports SautinSoft.Pdf
    Imports SautinSoft.Pdf.Content
    
    Friend Class Program
    	''' <summary>
    	''' Reading additional info.
    	''' </summary>
    	''' <remarks>
    	''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/reading-additional-information.php
    	''' </remarks>
    	Shared Sub Main()
    		' Before starting this example, please get a free license:
    		' https://sautinsoft.com/start-for-free/
    
    		' Apply the key here:
    		' PdfDocument.SetLicense("...");
    
    		Dim pdfFile As String = Path.GetFullPath("..\..\..\table.pdf")
    		' Iterate through all PDF pages and through each page's content elements,
    		' and retrieve only the text content elements.
    		Using document = PdfDocument.Load(pdfFile)
    			For Each page In document.Pages
    				Dim contentEnumerator = page.Content.Elements.All(page.Transform).GetEnumerator()
    				Do While contentEnumerator.MoveNext()
    					If contentEnumerator.Current.ElementType = PdfContentElementType.Text Then
    						Dim textElement = CType(contentEnumerator.Current, PdfTextContent)
    						Dim text = textElement.ToString()
    						Dim font = textElement.Format.Text.Font
    						Dim color = textElement.Format.Fill.Color
    						Dim bounds = textElement.Bounds
    
    						contentEnumerator.Transform.Transform(bounds)
    						' Read the text content element's additional information.
    						Console.WriteLine($"Unicode text: {text}")
    						Console.WriteLine($"Font name: {font.Face.Family.Name}")
    						Console.WriteLine($"Font size: {font.Size}")
    						Console.WriteLine($"Font style: {font.Face.Style}")
    						Console.WriteLine($"Font weight: {font.Face.Weight}")
    						Dim red As Double
    						Dim green As Double
    						Dim blue As Double
    						If color.TryGetRgb(red, green, blue) Then
    							Console.WriteLine($"Color: Red={red}, Green={green}, Blue={blue}")
    						End If
    						Console.WriteLine($"Bounds: Left={bounds.Left:0.00}, Bottom={bounds.Bottom:0.00}, Right={bounds.Right:0.00}, Top={bounds.Top:0.00}")
    						Console.WriteLine()
    					End If
    				Loop
    			Next page
    		End Using
    	End Sub
    End Class
    

    Download

  5. Run the Application

    Build and run your application. If everything is set up correctly, the content from the specified PDF file will be extracted.

Additional Features

    PDF.Net offers various other features for handling PDF documents, such as:
  • Extracting images from PDF files.
  • Converting PDF to other formats like DOCX, HTML, and images.
  • Merging and splitting PDF files.
  • Adding and reading interactive forms.

Conclusion

With SautinSoft.Pdf, reading additional information about text in a PDF document is straightforward. The library provides a clear API to access text properties, which can be invaluable for tasks such as document analysis, content extraction, and automated processing.


Если вам нужен пример кода или у вас есть вопрос: напишите нам по адресу [email protected] или спросите в онлайн-чате (правый нижний угол этой страницы) или используйте форму ниже:



Вопросы и предложения всегда приветствуются!

Мы разрабатываем компоненты .Net с 2002 года. Мы знаем форматы PDF, DOCX, RTF, HTML, XLSX и Images. Если вам нужна помощь в создании, изменении или преобразовании документов в различных форматах, мы можем вам помочь. Мы напишем для вас любой пример кода абсолютно бесплатно.