Robots Exclusion Tools alternatives and similar packages
Based on the "CLI" category.
Alternatively, view Robots Exclusion Tools alternatives based on common mentions on social networks and blogs.
-
spectre.console
A .NET library that makes it easier to create beautiful console applications. -
Command Line Parser
The best C# command line parser that brings standardized *nix getopt style, for .NET. Includes F# support -
CommandLineUtils
Command line parsing and utilities for .NET -
CliFx
Class-first framework for building command-line interfaces -
Colorful.Console
Style your .NET console output! -
Sieve
⚗️ Clean & extensible Sorting, Filtering, and Pagination for ASP.NET Core -
ReadLine
A Pure C# GNU-Readline like library for .NET/.NET Core -
Console Framework
Cross-platform toolkit for easy development of TUI applications. -
Fluent Command Line Parser
A simple, strongly typed .NET C# command line parser library using a fluent easy to use interface -
Power Args
The ultimate .NET Standard command line argument parser -
CommandDotNet
A modern framework for building modern CLI apps -
UnionArgParser
A declarative CLI argument parser for F# -
CsConsoleFormat
.NET C# library for advanced formatting of console output [Apache] -
Typin
Declarative framework for interactive CLI applications -
EntryPoint
Composable CLI Argument Parser for all modern .Net platforms. -
NFlags
Simple yet powerfull library to made parsing CLI arguments easy. Library also allow to print usage help "out of box". -
Sitemap Tools
A sitemap (sitemap.xml) querying and parsing library for .NET -
Appccelerate - Command Line Parser
A simple command line parser with fluent definition API. -
RunInfoBuilder
A unique command line parser for .NET that utilizes object trees for commands. -
Jarilo
Framework for building .NET command line applications. -
JustCli
Just a quick way to create your own command line tool -
Tamar.ANSITerm
“ANSITerm” provides ANSI escape codes and true color formatting for .NET Core's Console on Linux terminals. -
DarkXaHTeP.CommandLine
Allows creating CommandLine applications using Microsoft.Extensions.CommandLineUtils together with DI, Configuration and Logging in a convenient way similar to AspNetCore Hosting
Static code analysis for 29 languages.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Robots Exclusion Tools or a related project?
README
[Icon](images/icon.png)
Robots Exclusion Tools
A "robots.txt" parsing and querying library for .NET
Closely following the NoRobots RFC and other details on robotstxt.org.
Features
- Load Robots by string, by URI (Async) or by streams (Async)
- Supports multiple user-agents and "*"
- Supports
Allow
andDisallow
- Supports
Crawl-delay
entries - Supports
Sitemap
entries - Supports wildcard paths (*) as well as must-end-with declarations ($)
- Built-in "robots.txt" tokenization system (allowing extension to support other custom fields)
- Built-in "robots.txt" validator (allowing to validate a tokenized file)
- Dedicated parser for the data from
<meta name="robots" />
tag and theX-Robots-Tag
header
Licensing and Support
Robots Exclusion Tools is licensed under the MIT license. It is free to use in personal and commercial projects.
There are support plans available that cover all active Turner Software OSS projects. Support plans provide private email support, expert usage advice for our projects, priority bug fixes and more. These support plans help fund our OSS commitments to provide better software for everyone.
NoRobots RFC Compatibility
This library attempts to stick closely to the rules defined in the RFC document, including:
- Global/any user-agent when none is explicitly defined (Section 3.2.1 of RFC)
- Field names (eg. "User-agent") are character restricted (Section 3.3)
- Allow/disallow rules are performed by order-of-occurence (Section 3.2.2)
- Loading by URI applies default rules based on access to "robots.txt" (Section 3.1)
- Interoperability for varying line endings (Section 5.2)
Tokenization & Validation
At the core of the library is a tokenization system to parse the file format. It follows the formal syntax rules defined in Section 3.3 of the NoRobots RFC to the characters that are valid. When used in conjunction with the token validator, it can enforce the correct token structure too.
The major benefit for designing the library around this system is that is allows for greater extendability.
If you wanted to support custom fields that the core RobotsFile
class didn't use, you can parse the data with the tokenizer.
Parsing in-request robots rules (metatags and header)
Similar to the rules from a "robots.txt" file, there can be in-request rules deciding whether a page allows indexing or following links. The process of extracting this data from a request isn't currently part of this library, avoiding a dependency to parse HTML.
If you extract the raw rules from the metatags and X-Robots-Tag
header, you can pass those into the parser.
The parser takes an array of rules and returns a RobotsPageDefinition
file which allows querying of the rules by user agent.
Like the RobotsFileParser
, this parser is built around the tokenization and validation system and is similarly extendable.
There is no RFC available to define the formats of metatag or X-Robots-Tag
data.
The parser follows the base formatting rules described in the NoRobots RFC regarding fields combined with rules from Google's documentation on the robots metatag.
There are ambiguities in the rules described there (like whether there is rule inheritence from global scope) which may be different to what other implementations may use.
Example Usage
Parsing a "robots.txt" file from URI
using TurnerSoftware.RobotsExclusionTools;
var robotsFileParser = new RobotsFileParser();
RobotsFile robotsFile = await robotsFileParser.FromUriAsync(new Uri("http://www.example.org/robots.txt"));
var allowedAccess = robotsFile.IsAllowedAccess(
new Uri("http://www.example.org/some/url/i-want-to/check"),
"MyUserAgent"
);
Parsing robots data from metatags or the X-Robots-Tag
using TurnerSoftware.RobotsExclusionTools;
//These rules are gathered by you from the Robots metatag and `X-Robots-Tag` header
var pageRules = new[] {
"noindex, notranslate",
"googlebot: none",
"otherbot: nofollow",
"superbot: all"
};
var robotsPageParser = new RobotsPageParser();
RobotsPageDefinition robotsPageDefinition = robotsPageParser.FromRules(pageRules);
robotsPageDefinition.CanIndex("SomeNotListedBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("SomeNotListedBot/1.0"); //True
robotsPageDefinition.Can("translate", "SomeNotListedBot/1.0"); //False
robotsPageDefinition.CanIndex("GoogleBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("GoogleBot/1.0"); //False
robotsPageDefinition.Can("translate", "GoogleBot/1.0"); //False
robotsPageDefinition.CanIndex("OtherBot/1.0"); //False
robotsPageDefinition.CanFollowLinks("OtherBot/1.0"); //False
robotsPageDefinition.Can("translate", "OtherBot/1.0"); //False
robotsPageDefinition.CanIndex("superbot/1.0"); //True
robotsPageDefinition.CanFollowLinks("superbot/1.0"); //True
robotsPageDefinition.Can("translate", "superbot/1.0"); //True
*Note that all licence references and agreements mentioned in the Robots Exclusion Tools README section above
are relevant to that project's source code only.