Just fast HTML -> Text.
Lightweight, hand rolled, high-performance HTML to plain text conversion for .NET.
This library focuses on extracting the text content of a page as quickly and predictably as possible. No attempt is undertaken to interpret layout, CSS, visibility, or rendering rules, other than applying some basic formatting for readability to table headings and table data rows to make them look nice in plain text.
- High performance: designed for low allocations and fast throughput.
- Text extraction only: get the words from the page/document.
- No dependencies: Lightweight, not an embedded browser engine. No dependencies other than .NET itself.
- Respecting CSS, computed styles,
display:none, or visibility. - Pixel-accurate layout, whitespace mirroring, or browser-equivalent rendering.
- Executing JavaScript or loading remote resources.
- .NET 8+
When I've published to NuGet (coming soon!), you will be able to:
dotnet add package Html2Text
Or, for now, download or submodule the repo and reference the project directly.
Simple as possible:
using Html2Text;
string html = "<h1>Hello</h1><p>World</p>";
string text = Html2Text.Convert(html);
// Hello
//
// World- Text nodes are emitted in document order.
- Basic block separation is preserved (e.g., paragraphs/headings insert newlines).
- Whitespace is normalized to produce readable plain text.
HTML document -> Lexer (tokens) -> Parser (AST nodes) -> Renderer (string text)
Basic formatting behaviour is defined in Html2Text\Rendering.
- Designed for converting many documents quickly (batch processing, indexing, search pipelines).
- Avoids DOM dependencies.
- uses a lightweight, hand rolled lexer/parser/renderer pipeline.
Benchmarks are in Html2Text.PerfTests.
Html2Text/: core libraryHtml2Text.Example/: small example appHtml2Text.Tests/: unit testsHtml2Text.RegressionTests/: regression/acceptance testsHtml2Text.PerfTests/: performance benchmarking console appSamples/: sample HTML files used during development and automated regression testing
Build with:
dotnet build
Run unit tests and regression tests:
dotnet test
Run performance benchmarks:
dotnet run -c Release --project Html2Text.PerfTests
MPL-2.0 see LICENSE.txt