Reading additional information about a text in a PDF document is a common task that can be accomplished using the SautinSoft SautinSoft.Pdf library. This library provides a convenient way to interact with PDF content, including text, images, and other elements. Below is an example of how you can use SautinSoft.Pdf to read additional information about text elements, such as their bounds, font properties, and color.
Input file: Asset Recovery Evaluation.pdf
Open Visual Studio and create a new Console Application project.
Download the PDF.Net library and add it to your project. You can do this by right-clicking on your project in the Solution Explorer, selecting "Add Reference," and browsing to the PDF.Net DLL.
Полный код
using System;
using System.IO;
using SautinSoft;
using SautinSoft.Pdf;
using SautinSoft.Pdf.Content;
class Program
{
/// <summary>
/// Reading additional info.
/// </summary>
/// <remarks>
/// Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/reading-additional-information.php
/// </remarks>
static void Main()
{
// Before starting this example, please get a free 100-day trial key:
// https://sautinsoft.com/start-for-free/
// Apply the key here:
// PdfDocument.SetLicense("...");
string pdfFile = Path.GetFullPath(@"..\..\..\table.pdf");
// Iterate through all PDF pages and through each page's content elements,
// and retrieve only the text content elements.
using (var document = PdfDocument.Load(pdfFile))
{
foreach (var page in document.Pages)
{
var contentEnumerator = page.Content.Elements.All(page.Transform).GetEnumerator();
while (contentEnumerator.MoveNext())
{
if (contentEnumerator.Current.ElementType == PdfContentElementType.Text)
{
var textElement = (PdfTextContent)contentEnumerator.Current;
var text = textElement.ToString();
var font = textElement.Format.Text.Font;
var color = textElement.Format.Fill.Color;
var bounds = textElement.Bounds;
contentEnumerator.Transform.Transform(bounds);
// Read the text content element's additional information.
Console.WriteLine($"Unicode text: {text}");
Console.WriteLine($"Font name: {font.Face.Family.Name}");
Console.WriteLine($"Font size: {font.Size}");
Console.WriteLine($"Font style: {font.Face.Style}");
Console.WriteLine($"Font weight: {font.Face.Weight}");
if (color.TryGetRgb(out double red, out double green, out double blue))
Console.WriteLine($"Color: Red={red}, Green={green}, Blue={blue}");
Console.WriteLine($"Bounds: Left={bounds.Left:0.00}, Bottom={bounds.Bottom:0.00}, Right={bounds.Right:0.00}, Top={bounds.Top:0.00}");
Console.WriteLine();
}
}
}
}
}
}
Option Infer On
Imports System
Imports System.IO
Imports SautinSoft
Imports SautinSoft.Pdf
Imports SautinSoft.Pdf.Content
Friend Class Program
''' <summary>
''' Reading additional info.
''' </summary>
''' <remarks>
''' Details: https://sautinsoft.com/products/pdf/help/net/developer-guide/reading-additional-information.php
''' </remarks>
Shared Sub Main()
' Before starting this example, please get a free license:
' https://sautinsoft.com/start-for-free/
' Apply the key here:
' PdfDocument.SetLicense("...");
Dim pdfFile As String = Path.GetFullPath("..\..\..\table.pdf")
' Iterate through all PDF pages and through each page's content elements,
' and retrieve only the text content elements.
Using document = PdfDocument.Load(pdfFile)
For Each page In document.Pages
Dim contentEnumerator = page.Content.Elements.All(page.Transform).GetEnumerator()
Do While contentEnumerator.MoveNext()
If contentEnumerator.Current.ElementType = PdfContentElementType.Text Then
Dim textElement = CType(contentEnumerator.Current, PdfTextContent)
Dim text = textElement.ToString()
Dim font = textElement.Format.Text.Font
Dim color = textElement.Format.Fill.Color
Dim bounds = textElement.Bounds
contentEnumerator.Transform.Transform(bounds)
' Read the text content element's additional information.
Console.WriteLine($"Unicode text: {text}")
Console.WriteLine($"Font name: {font.Face.Family.Name}")
Console.WriteLine($"Font size: {font.Size}")
Console.WriteLine($"Font style: {font.Face.Style}")
Console.WriteLine($"Font weight: {font.Face.Weight}")
Dim red As Double
Dim green As Double
Dim blue As Double
If color.TryGetRgb(red, green, blue) Then
Console.WriteLine($"Color: Red={red}, Green={green}, Blue={blue}")
End If
Console.WriteLine($"Bounds: Left={bounds.Left:0.00}, Bottom={bounds.Bottom:0.00}, Right={bounds.Right:0.00}, Top={bounds.Top:0.00}")
Console.WriteLine()
End If
Loop
Next page
End Using
End Sub
End Class
Build and run your application. If everything is set up correctly, the content from the specified PDF file will be extracted.
With SautinSoft.Pdf, reading additional information about text in a PDF document is straightforward. The library provides a clear API to access text properties, which can be invaluable for tasks such as document analysis, content extraction, and automated processing.
Если вам нужен пример кода или у вас есть вопрос: напишите нам по адресу [email protected] или спросите в онлайн-чате (правый нижний угол этой страницы) или используйте форму ниже: