Robots Exclusion Tools alternatives and similar packages
Based on the "CLI" category.
Alternatively, view Robots Exclusion Tools alternatives based on common mentions on social networks and blogs.
-
Command Line Parser
The best C# command line parser that brings standardized *nix getopt style, for .NET. Includes F# support -
Fluent Command Line Parser
A simple, strongly typed .NET C# command line parser library using a fluent easy to use interface -
DotMake Command-Line
Declarative syntax for System.CommandLine via attributes for easy, fast, strongly-typed (no reflection) usage. Includes a source generator which automagically converts your classes to CLI commands and properties to CLI options or CLI arguments. -
NFlags
Simple yet powerfull library to made parsing CLI arguments easy. Library also allow to print usage help "out of box". -
DarkXaHTeP.CommandLine
Allows creating CommandLine applications using Microsoft.Extensions.CommandLineUtils together with DI, Configuration and Logging in a convenient way similar to AspNetCore Hosting -
Tamar.ANSITerm
“ANSITerm” provides ANSI escape codes and true color formatting for .NET Core's Console on Linux terminals.
CodeRabbit: AI Code Reviews for Developers
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Robots Exclusion Tools or a related project?
README
[Icon](images/icon.png)
Robots Exclusion Tools
A "robots.txt" parsing and querying library for .NET
Closely following the NoRobots RFC and other details on robotstxt.org.
Features
- Load Robots by string, by URI (Async) or by streams (Async)
- Supports multiple user-agents and "*"
- Supports
Allow
andDisallow
- Supports
Crawl-delay
entries - Supports
Sitemap
entries - Supports wildcard paths (*) as well as must-end-with declarations ($)
- Built-in "robots.txt" tokenization system (allowing extension to support other custom fields)
- Built-in "robots.txt" validator (allowing to validate a tokenized file)
- Dedicated parser for the data from
<meta name="robots" />
tag and theX-Robots-Tag
header
Licensing and Support
Robots Exclusion Tools is licensed under the MIT license. It is free to use in personal and commercial projects.
There are support plans available that cover all active Turner Software OSS projects. Support plans provide private email support, expert usage advice for our projects, priority bug fixes and more. These support plans help fund our OSS commitments to provide better software for everyone.
NoRobots RFC Compatibility
This library attempts to stick closely to the rules defined in the RFC document, including:
- Global/any user-agent when none is explicitly defined (Section 3.2.1 of RFC)
- Field names (eg. "User-agent") are character restricted (Section 3.3)
- Allow/disallow rules are performed by order-of-occurence (Section 3.2.2)
- Loading by URI applies default rules based on access to "robots.txt" (Section 3.1)
- Interoperability for varying line endings (Section 5.2)
Tokenization & Validation
At the core of the library is a tokenization system to parse the file format. It follows the formal syntax rules defined in Section 3.3 of the NoRobots RFC to the characters that are valid. When used in conjunction with the token validator, it can enforce the correct token structure too.
The major benefit for designing the library around this system is that is allows for greater extendability.
If you wanted to support custom fields that the core RobotsFile
class didn't use, you can parse the data with the tokenizer.
Parsing in-request robots rules (metatags and header)
Similar to the rules from a "robots.txt" file, there can be in-request rules deciding whether a page allows indexing or following links. The process of extracting this data from a request isn't currently part of this library, avoiding a dependency to parse HTML.
If you extract the raw rules from the metatags and X-Robots-Tag
header, you can pass those into the parser.
The parser takes an array of rules and returns a RobotsPageDefinition
file which allows querying of the rules by user agent.
Like the RobotsFileParser
, this parser is built around the tokenization and validation system and is similarly extendable.
There is no RFC available to define the formats of metatag or X-Robots-Tag
data.
The parser follows the base formatting rules described in the NoRobots RFC regarding fields combined with rules from Google's documentation on the robots metatag.
There are ambiguities in the rules described there (like whether there is rule inheritence from global scope) which may be different to what other implementations may use.
Example Usage
Parsing a "robots.txt" file from URI
using TurnerSoftware.RobotsExclusionTools;
var robotsFileParser = new RobotsFileParser();
RobotsFile robotsFile = await robotsFileParser.FromUriAsync(new Uri("http://www.example.org/robots.txt"));
var allowedAccess = robotsFile.IsAllowedAccess(
new Uri("http://www.example.org/some/url/i-want-to/check"),
"MyUserAgent"
);
Parsing robots data from metatags or the X-Robots-Tag
using TurnerSoftware.RobotsExclusionTools;
//These rules are gathered by you from the Robots metatag and `X-Robots-Tag` header
var pageRules = new[] {
"noindex, notranslate",
"googlebot: none",
"otherbot: nofollow",
"superbot: all"
};
var robotsPageParser = new RobotsPageParser();
RobotsPageDefinition robotsPageDefinition = robotsPageParser.FromRules(pageRules);
robotsPageDefinition.CanIndex("SomeNotListedBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("SomeNotListedBot/1.0"); //True
robotsPageDefinition.Can("translate", "SomeNotListedBot/1.0"); //False
robotsPageDefinition.CanIndex("GoogleBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("GoogleBot/1.0"); //False
robotsPageDefinition.Can("translate", "GoogleBot/1.0"); //False
robotsPageDefinition.CanIndex("OtherBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("OtherBot/1.0"); //False
robotsPageDefinition.Can("translate", "OtherBot/1.0"); //False
robotsPageDefinition.CanIndex("superbot/1.0"); //True
robotsPageDefinition.CanFollowLinks("superbot/1.0"); //True
robotsPageDefinition.Can("translate", "superbot/1.0"); //True
*Note that all licence references and agreements mentioned in the Robots Exclusion Tools README section above
are relevant to that project's source code only.