Description
LtGt is a minimalistic library for working with HTML. It can parse any HTML5-compliant code into an object model which you can use to traverse nodes or locate specific elements. The library establishes itself as a foundation that you can build upon, and comes with a lot of extension methods that can help navigate the DOM easily. It also supports HTML rendering, so you can turn any HTML object tree to code.
LtGt alternatives and similar packages
Based on the "HTML and CSS" category.
Alternatively, view LtGt alternatives based on common mentions on social networks and blogs.
-
AngleSharp
:angel: The ultimate angle brackets parser library parsing HTML5, MathML, SVG and CSS to construct a DOM based on the official W3C specifications. -
HtmlAgilityPack
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. -
FluentBootstrap
DISCONTINUED. Makes the Bootstrap CSS framework easier to use from ASP.NET MVC or WebPages. -
LibSass Host
.NET wrapper around the LibSass library with the ability to support a virtual file system. -
White Screen
A white screen is used to identify the dead pixel that is not visible on the screen. It can be caused by dust, dirt, or other objects that are blocking the view of the pixel.
InfluxDB - Purpose built for real-time analytics at any scale.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of LtGt or a related project?
README
LtGt
Project status: maintenance mode (bug fixes only).
LtGt is a minimalistic library for working with HTML. It can parse any HTML5-compliant code into an object model which you can use to traverse nodes or locate specific elements. The library establishes itself as a foundation that you can build upon, and comes with a lot of extension methods that can help navigate the DOM easily.
This project mostly serves an educational purpose. For real-life performance-critical applications I recommend using AngleSharp instead.
Download
- NuGet:
dotnet add package LtGt
Features
- Parse any HTML5-compliant code
- Traverse the DOM using LINQ or Seq
- Use basic element selectors like
GetElementById()
,GetElementsByTagName()
, etc - Use CSS selectors via
QueryElements()
- Convert any HTML node to its equivalent Linq2Xml representation
- Render any HTML entity to code
- Targets .NET Framework 4.5+ and .NET Standard 1.6+
Screenshots
[dom](.screenshots/dom.png) [css selectors](.screenshots/css-selectors.png)
Usage
LtGt is a library written in F# but it provides two separate idiomatic APIs that you can use from both C# and F#.
Parse a document
C#
using LtGt;
const string html = @"<!doctype html>
<html>
<head>
<title>Document</title>
</head>
<body>
<div>Content</div>
</body>
</html>";
// This throws an exception on parse errors
var document = Html.ParseDocument(html);
// -or-
// This returns a wrapped result instead
var documentResult = Html.TryParseDocument(html);
if (documentResult.IsOk)
{
// Handle result
var document = documentResult.ResultValue;
}
else
{
// Handle error
var error = documentResult.ErrorValue;
}
F#
open LtGt
let html = "<!doctype html>
<html>
<head>
<title>Document</title>
</head>
<body>
<div>Content</div>
</body>
</html>"
// This throws an exception on parse errors
let document = Html.parseDocument html
// -or-
// This returns a wrapped result instead
match Html.tryParseDocument html with
| Result.Ok document -> // handle result
| Result.Error error -> // handle error
Parse a fragment
C#
const string html = "<div id=\"some-element\"><a href=\"https://example.com\">Link</a></div>";
// Parse an element node
var element = Html.ParseElement(html);
// Parse any node
var node = Html.ParseNode(html);
F#
let html = "<div id=\"some-element\"><a href=\"https://example.com\">Link</a></div>"
// Parse an element node
let element = Html.parseElement html
// Parse any node
let node = Html.parseNode html
Find specific element
C#
var element1 = document.GetElementById("menu-bar");
var element2 = document.GetElementsByTagName("div").FirstOrDefault();
var element3 = document.GetElementsByClassName("floating-button floating-button--enabled").FirstOrDefault();
var element1Data = element1.GetAttributeValue("data");
var element2Id = element2.GetId();
var element3Text = element3.GetInnerText();
F#
let element1 = document |> Html.tryElementById "menu-bar"
let element2 = document |> Html.elementsByTagName "div" |> Seq.tryHead
let element3 = document |> Html.elementsByClassName "floating-button floating-button--enabled" |> Seq.tryHead
let element1Data = element1 |> Option.bind (Html.tryAttributeValue "data")
let element2Id = element2 |> Option.bind Html.tryId
let element3Text = element3 |> Option.map Html.innerText
You can leverage the full power of CSS selectors as well.
C#
var element = document.QueryElements("div#main > span.container:empty").FirstOrDefault();
F#
let element = document |> CssSelector.queryElements "div#main > span.container:empty" |> Seq.tryHead
Check equality
You can compare two HTML entities by value, including their descendants.
C#
var element1 = new HtmlElement("span",
new HtmlAttribute("id", "foo"),
new HtmlText("bar"));
var element2 = new HtmlElement("span",
new HtmlAttribute("id", "foo"),
new HtmlText("bar"));
var element3 = new HtmlElement("span",
new HtmlAttribute("id", "foo"),
new HtmlText("oof"));
var firstTwoEqual = HtmlEntityEqualityComparer.Instance.Equals(element1, element2); // true
var lastTwoEqual = HtmlEntityEqualityComparer.Instance.Equals(element2, element3); // false
F#
let element1 = HtmlElement("span",
HtmlAttribute("id", "foo"),
HtmlText("bar"))
let element2 = HtmlElement("span",
HtmlAttribute("id", "foo"),
HtmlText("bar"))
let element3 = HtmlElement("span",
HtmlAttribute("id", "foo"),
HtmlText("oof"))
let firstTwoEqual = Html.equal element1 element2 // true
let lastTwoEqual = Html.equal element2 element3 // false
Convert to Linq2Xml
You can convert LtGt's objects to System.Xml.Linq
objects (XNode
, XElement
, etc). This can be useful if you need to convert HTML to XML or if you want to use XPath to select nodes.
C#
var htmlDocument = Html.ParseDocument(html);
var xmlDocument = (XDocument) htmlDocument.ToXObject();
var elements = xmlDocument.XPathSelectElements("//input[@type=\"submit\"]");
F#
let htmlDocument = Html.parseDocument html
let xmlDocument = htmlDocument |> Html.toXObject :?> XDocument
let elements = xmlDocument.XPathSelectElements("//input[@type=\"submit\"]")
Render nodes
You can turn any entity to its equivalent HTML code.
C#
var element = new HtmlElement("div",
new HtmlAttribute("id", "main"),
new HtmlText("Hello world"));
var html = element.ToHtml(); // <div id="main">Hello world</div>
F#
let element = HtmlElement("div",
HtmlAttribute("id", "main"),
HtmlText("Hello world"))
let html = element |> Html.toHtml // <div id="main">Hello world</div>
Benchmarks
This is how LtGt compares to popular HTML libraries when it comes to parsing a document (in this case, a YouTube video watch page). The results are not in favor of LtGt so if performance is important for your task, you should probably consider using a different parser. That said, these results are still pretty impressive for a parser built with parser combinators as opposed to a traditional manual approach.
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.14393.3384 (1607/AnniversaryUpdate/Redstone1)
Intel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
Frequency=3125000 Hz, Resolution=320.0000 ns, Timer=TSC
.NET Core SDK=3.1.100
[Host] : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT DEBUG
DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
Method | Mean | Error | StdDev | Ratio | Rank |
---|---|---|---|---|---|
AngleSharp | 11.94 ms | 0.104 ms | 0.097 ms | 0.29 | 1 |
HtmlAgilityPack | 20.51 ms | 0.140 ms | 0.124 ms | 0.49 | 2 |
LtGt | 41.59 ms | 0.450 ms | 0.399 ms | 1.00 | 3 |