Как загрузить документ

  1. Добавьте SautinSoft.Document из Nuget.
  2. Загрузите документ из файла или потока.

SautinSoft.Document поддерживает форматы:

PDF DOCX RTF HTML Текст Изображения
Create/Read/Write Create/Read/Write Create/Read/Write Create/Read/Write Create/Read/Write Create/Read(OCR)/Write

Чтобы загрузить документ из файла, достаточно одной строки:


            //It's easy to load any document.
            DocumentCore dc = DocumentCore.Load(@"d:\Book.pdf");
DocumentCore является корневым классом, он представляет сам документ.

В этом примере метод Load() обнаруживает, что загружаемый документ является PDF с расширением ".pdf".

Вы также можете явно указать тип загружаемого документа в качестве второго параметра. Например, PdfLoadOptions или DocxLoadOptions или другой:


DocumentCore dc = DocumentCore.Load(@"d:\Book.pdf", new PdfLoadOptions()            
{
	// 'false' - means to load vector graphics as is. Don't transform it to raster images.
    RasterizeVectorGraphics = false,

	// The PDF format doesn't have real tables, in fact it's a set of orthogonal graphic lines.
	// In case of 'true' the component will detect and recreate tables from graphic lines.
	DetectTables = false,

	// 'Disabled' - Never load embedded fonts in PDF. Use the fonts with the same name installed at the system or similar by font metrics.
	// 'Enabled' - Always load embedded fonts in PDF.
	// 'Auto' - Load only embedded fonts missing in the system. In other case, use the system fonts.
    PreserveEmbeddedFonts = PropertyState.Auto
});

Все параметры загрузки являются производными от базового абстрактного класса LoadOptions.

После загрузки вы получаете полное Tree Of Objects и можете делать все что угодно: Найти, Заменить, Удалить, Вставить, Изменить, сохранить в другой формат.

SautinSoft.Document provides you by the full tree of its objects after the loading document.
 

Загрузка из потока также проста:


            // Let us say we already have a DOCX document as array of bytes.        
            DocumentCore dc = null;
            using (MemoryStream docxStream = new MemoryStream(docxBytes))
            {
                dc = DocumentCore.Load(docxStream, new DocxLoadOptions());
            }
            // Here we can do with our document 'dc' anything we need.

Полный код

using System.IO;
using SautinSoft.Document;
using System;

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            // Get your free 100-day key here:   
            // https://sautinsoft.com/start-for-free/

            LoadFromFile();
            //LoadFromStream();
            //LoadFromBytes()
        }

        /// <summary>
        /// Loads a document into DocumentCore (dc) from a file.
        /// </summary>
        /// <remarks>
        /// Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
        /// </remarks>
        static void LoadFromFile()
        {
            string filePath = @"..\..\..\example.docx";
            // The file format is detected automatically from the file extension: ".docx".
            // But as shown in the example below, we can specify DocxLoadOptions as 2nd parameter
            // to explicitly set that a loadable document has Docx format.
            DocumentCore dc = DocumentCore.Load(filePath);

            if (dc!=null)
                Console.WriteLine("Loaded successfully!");
			
			Console.ReadKey();			
        }

        /// <summary>
        /// Loads a document into DocumentCore (dc) from a Stream.
        /// </summary>
        /// <remarks>
        /// Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
        /// </remarks>
        static void LoadFromStream()
        {
            // We've knowingly created an empty DocumentCore instance before "Using {}"
            // to continue work with it after stream will be closed.
            DocumentCore dc = null;
            using (FileStream fs = new FileStream(@"..\..\..\example.docx", FileMode.Open))
            {

                // Here we explicitly set that a loadable document is Docx.
                dc = DocumentCore.Load(fs, new DocxLoadOptions());
            }
            if (dc != null)
                Console.WriteLine("Loaded successfully!");
			
			Console.ReadKey();
        }

        /// <summary>
        /// Loads a document into DocumentCore (dc) from an array of bytes.
        /// </summary>
        /// <remarks>
        /// Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
        /// </remarks>
        static void LoadFromBytes()
        {
            // Get document bytes from a file.
            byte[] fileBytes = File.ReadAllBytes(@"..\..\..\example.pdf");

            DocumentCore dc = null;
            using (MemoryStream ms = new MemoryStream(fileBytes))
            {

                // With PdfLoadOptions we explicitly set that a loadable document is PDF.
                PdfLoadOptions pdfLO = new PdfLoadOptions()
                {

                    // 'false' - means to load vector graphics as is. Don't transform it to raster images.
                    RasterizeVectorGraphics = false,

                    // The PDF format doesn't have real tables, in fact it's a set of orthogonal graphic lines.
                    // In case of 'true' the component will detect and recreate tables from graphic lines.
                    DetectTables = false,

                    // 'Disabled' - Never load embedded fonts in PDF. Use the fonts with the same name installed at the system or similar by font metrics.
					// 'Enabled' - Always load embedded fonts in PDF.
					// 'Auto' - Load only embedded fonts missing in the system. In other case, use the system fonts.
                    PreserveEmbeddedFonts = PropertyState.Auto,

                    // Load only first 2 pages from the document.
                    PageIndex = 0,
                    PageCount = 2
                };
                dc = DocumentCore.Load(ms, pdfLO);
            }
            if (dc != null)
                Console.WriteLine("Loaded successfully!");

			Console.ReadKey();
        }
    }
}

Download

Imports System
Imports System.IO
Imports SautinSoft.Document

Module Sample
    Sub Main()
        LoadFromFile()
        'LoadFromStream()
        'LoadFromBytes()
    End Sub
    ''' Get your free 100-day key here:   
    ''' https://sautinsoft.com/start-for-free/
    ''' <summary>
    ''' Loads a document into DocumentCore (dc) from a file.
    ''' </summary>
    ''' <remarks>
    ''' Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
    ''' </remarks>
    Sub LoadFromFile()
        Dim filePath As String = "..\..\..\example.docx"
        ' The file format is detected automatically from the file extension: ".docx".
        ' But as shown in the example below, we can specify DocxLoadOptions as 2nd parameter
        ' to explicitly set that a loadable document has Docx format.
        Dim dc As DocumentCore = DocumentCore.Load(filePath)

        If dc IsNot Nothing Then
            Console.WriteLine("Loaded successfully!")
        End If

        Console.ReadKey()
    End Sub

    ''' <summary>
    ''' Loads a document into DocumentCore (dc) from a Stream.
    ''' </summary>
    ''' <remarks>
    ''' Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
    ''' </remarks>
    Sub LoadFromStream()
        ' We've knowingly created an empty DocumentCore instance before "Using {}"
        ' to continue work with it after stream will be closed.
        Dim dc As DocumentCore = Nothing
        Using fs As New FileStream("..\..\..\example.docx", FileMode.Open)

            ' Here we explicitly set that a loadable document is Docx.
            dc = DocumentCore.Load(fs, New DocxLoadOptions())
        End Using
        If dc IsNot Nothing Then
            Console.WriteLine("Loaded successfully!")
        End If

        Console.ReadKey()
    End Sub

    ''' <summary>
    ''' Loads a document into DocumentCore (dc) from an array of bytes.
    ''' </summary>
    ''' <remarks>
    ''' Details: https://www.sautinsoft.com/products/document/help/net/developer-guide/load-document.php
    ''' </remarks>
    Sub LoadFromBytes()
        ' Get document bytes from a file.
        Dim fileBytes() As Byte = File.ReadAllBytes("..\..\..\example.pdf")

        Dim dc As DocumentCore = Nothing
        Using ms As New MemoryStream(fileBytes)

            ' With PdfLoadOptions we explicitly set that a loadable document is PDF.
            Dim pdfLO As New PdfLoadOptions()
            With pdfLO
                .RasterizeVectorGraphics = False
                .DetectTables = False
                ' 'Disabled' - Never load embedded fonts in PDF. Use the fonts with the same name installed at the system or similar by font metrics.
                ' 'Enabled' - Always load embedded fonts in PDF.
                ' 'Auto' - Load only embedded fonts missing in the system. In other case, use the system fonts.			
                .PreserveEmbeddedFonts = PropertyState.Auto
                .PageIndex = 0
                .PageCount = 2
            End With

            ' RasterizeVectorGraphics = False
            ' This means to load vector graphics as is. Don't transform it to raster images.

            ' DetectTables = False
            ' This means don't detect tables.
            ' The PDF format doesn't have real tables, in fact it's a set of orthogonal graphic lines.
            ' Set it to 'True' and the component will detect and recreate tables from graphic lines.

            dc = DocumentCore.Load(ms, pdfLO)
        End Using
        If dc IsNot Nothing Then
            Console.WriteLine("Loaded successfully!")
        End If

        Console.ReadKey()
    End Sub
End Module

Download


Если вам нужен пример кода или у вас есть вопрос: напишите нам по адресу support@sautinsoft.com или спросите в онлайн-чате (правый нижний угол этой страницы) или используйте форму ниже:



Вопросы и предложения всегда приветствуются!

Мы разрабатываем компоненты .Net с 2002 года. Мы знаем форматы PDF, DOCX, RTF, HTML, XLSX и Images. Если вам нужна помощь в создании, изменении или преобразовании документов в различных форматах, мы можем вам помочь. Мы напишем для вас любой пример кода абсолютно бесплатно.