Popularity

1.9

Declining

Activity

0.0

Stable

Stars 42

Watchers 5

Forks 7

Last Commit about 10 years ago

Code Quality Rank: L4

Programming language: C#

License: MIT License

Tags: Misc

AzureCrawler alternatives and similar packages

Based on the "Misc" category.
Alternatively, view AzureCrawler alternatives based on common mentions on social networks and blogs.

Polly

9.7 9.8 L3 AzureCrawler VS Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.
FluentValidation

9.6 7.9 L5 AzureCrawler VS FluentValidation

A popular .NET validation library for building strongly-typed validation rules.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

MediatR

9.6 6.2 L5 AzureCrawler VS MediatR

Simple, unambitious mediator implementation in .NET
Humanizer

9.5 9.6 L3 AzureCrawler VS Humanizer

Humanizer meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities
Edge.js

9.1 0.0 L1 AzureCrawler VS Edge.js

Run .NET and Node.js code in-process on Windows, MacOS, and Linux
CsvHelper

9.0 8.6 L3 AzureCrawler VS CsvHelper

Library to help reading and writing CSV files
Jint

8.7 9.1 L3 AzureCrawler VS Jint

Javascript Interpreter for .NET
ReactJS.NET

8.4 0.0 L3 AzureCrawler VS ReactJS.NET

.NET library for JSX compilation and server-side rendering of React components
Rant

8.3 2.5 L2 AzureCrawler VS Rant

The Rant Procedural Text Generation DSL http://berkin.me/rant/
Coravel

8.3 6.3 AzureCrawler VS Coravel

Near-zero config .NET library that makes advanced application features like Task Scheduling, Caching, Queuing, Event Broadcasting, and more a breeze!
YoutubeExplode

8.2 9.0 AzureCrawler VS YoutubeExplode

Abstraction layer over YouTube's internal API
ScriptCS

8.0 0.0 L3 AzureCrawler VS ScriptCS

Write C# apps with a text editor, nuget and the power of Roslyn!
Hashids.net

8.0 4.5 AzureCrawler VS Hashids.net

A small .NET package to generate YouTube-like hashes from one or many numbers. Use hashids when you do not want to expose your database ids to the user.
Enums.NET

6.9 3.6 AzureCrawler VS Enums.NET

Enums.NET is a high-performance type-safe .NET enum utility library
Scientist.NET

6.7 0.0 AzureCrawler VS Scientist.NET

A .NET library for carefully refactoring critical paths. It's a port of GitHub's Ruby Scientist library
WorkflowEngine

6.4 5.8 AzureCrawler VS WorkflowEngine

WorkflowEngine.NET - component that adds workflow in your application. It can be fully integrated into your application, or be in the form of a specific service (such as a web service).
Jurassic

6.1 2.4 L1 AzureCrawler VS Jurassic

A .NET library to parse and execute JavaScript code.
ENet-CSharp

5.9 6.2 AzureCrawler VS ENet-CSharp

Reliable UDP networking library
HidLibrary

5.9 0.0 AzureCrawler VS HidLibrary

This library enables you to enumerate and communicate with Hid compatible USB devices in .NET.
TinyMapper

5.7 0.0 L5 AzureCrawler VS TinyMapper

A quick object-object mapper for .NET
DeviceId

5.5 6.5 AzureCrawler VS DeviceId

A simple library providing functionality to generate a 'device ID' that can be used to uniquely identify a computer.
Warden

5.3 0.0 L4 AzureCrawler VS Warden

Define "health checks" for your applications, resources and infrastructure. Keep your Warden on the watch.
Guard

5.2 0.0 AzureCrawler VS Guard

A high-performance, extensible argument validation library.
Aeron.NET

5.2 1.6 L5 AzureCrawler VS Aeron.NET

Efficient reliable UDP unicast, UDP multicast, and IPC message transport - .NET port of Aeron
Jot

5.1 5.8 L5 AzureCrawler VS Jot

Jot is a library for persisting and applying .NET application state.
ByteSize

4.9 2.7 L4 AzureCrawler VS ByteSize

ByteSize is a utility class that makes byte size representation in code easier by removing ambiguity of the value being represented. ByteSize is to bytes what System.TimeSpan is to time.
Jering.Javascript.NodeJS

4.6 6.8 AzureCrawler VS Jering.Javascript.NodeJS

Invoke Javascript in NodeJS, from C#
Streams

4.6 0.0 AzureCrawler VS Streams

A lightweight F#/C# library for efficient functional-style pipelines on streams of data.
Mediator.Net

4.4 3.9 L4 AzureCrawler VS Mediator.Net

A simple mediator for .Net for sending command, publishing event and request response with pipelines supported
DeviceDetector.NET

4.3 7.2 AzureCrawler VS DeviceDetector.NET

The Universal Device Detection library will parse any User Agent and detect the browser, operating system, device used (desktop, tablet, mobile, tv, cars, console, etc.), brand and model.
LINQPad.QueryPlanVisualizer

4.3 5.1 AzureCrawler VS LINQPad.QueryPlanVisualizer

SQL Server and PostgreSQL query execution plan visualizer for LINQPad
TypeShape

3.9 2.7 AzureCrawler VS TypeShape

Practical generic programming for F#
Valit

3.9 0.0 AzureCrawler VS Valit

Valit is dead simple validation for .NET Core. No more if-statements all around your code. Write nice and clean fluent validators instead!
https://github.com/minhhungit/ConsoleTableExt

3.9 3.9 AzureCrawler VS https://github.com/minhhungit/ConsoleTableExt

A fluent library to print out a nicely formatted table in a console application C#
FormHelper

3.9 0.0 AzureCrawler VS FormHelper

ASP.NET Core - Transform server-side validations to client-side without writing any javascript code. (Compatible with Fluent Validation)
SolidSoils4Arduino

3.8 3.2 L4 AzureCrawler VS SolidSoils4Arduino

C# .NET - Arduino library supporting simultaneous serial ASCII, Firmata and I2C communication
SystemWrapper

3.6 0.0 AzureCrawler VS SystemWrapper

.NET library for easier testing of system APIs.
Shielded

3.6 0.0 L2 AzureCrawler VS Shielded

A strict and mostly lock-free Software Transactional Memory (STM) for .NET
Validot

3.6 2.7 AzureCrawler VS Validot

Validot is a performance-first, compact library for advanced model validation. Using a simple declarative fluent interface, it efficiently handles classes, structs, nested members, collections, nullables, plus any relation or combination of them. It also supports translations, custom logic extensions with tests, and DI containers.
RecordParser

3.5 7.9 AzureCrawler VS RecordParser

Zero Allocation Writer/Reader Parser for .NET Core
LinkCrawler

3.4 0.0 L5 AzureCrawler VS LinkCrawler

Find broken links in webpage
BitSharp

3.2 0.0 L3 AzureCrawler VS BitSharp

C# Bitcoin Node
Outcome.NET

3.1 3.4 L5 AzureCrawler VS Outcome.NET

Never write a result wrapper again! Outcome.NET is a simple, powerful helper for methods that return a value, but sometimes also need to return validation messages, warnings, or a success bit.
NaturalSort.Extension

3.0 7.9 AzureCrawler VS NaturalSort.Extension

🔀 Extension method for StringComparison that adds support for natural sorting (e.g. "abc1", "abc2", "abc10" instead of "abc1", "abc10", "abc2").
FlatMapper

2.7 0.0 L1 AzureCrawler VS FlatMapper

FlatMapper is a library to import and export data from and to plain text files.
NIdenticon

2.6 0.0 L2 AzureCrawler VS NIdenticon

NIdenticon is a library for creating simple Identicons
SystemTextJson.JsonDiffPatch

2.5 3.4 AzureCrawler VS SystemTextJson.JsonDiffPatch

High-performance, low-allocating JSON object diff and patch extension for System.Text.Json. Support generating patch document in RFC 6902 JSON Patch format.
BerTlv.NET

1.9 0.0 L5 AzureCrawler VS BerTlv.NET

A library for parsing BER TLV data (like EMV credit cards).
dotnet-exec

1.8 9.6 AzureCrawler VS dotnet-exec

dotnet execute with custom entry point, another dotnet run without project file
trybot

1.6 3.6 AzureCrawler VS trybot

A transient fault handling framework including such resiliency solutions as Retry, Timeout, Fallback, Rate Limit and Circuit Breaker.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of AzureCrawler or a related project?

Add another 'Misc' Package

Popular Comparisons

README

What´s this?

If you are developing applications using new modern JavaScript frameworks like Angular, Ember, Durandal ... etc. you probably already know that this type of applications are not crawlable by search engine robots without a couple of extra steps.

SEO according to Google

If you want your JavaScript application to be crawlable, you need to implement some steps on your own. You can find information about the process on this Google document. Take a look to it in order to understand better what is required both client and server side.

What is AzureCrawler about?

AzureCrawler helps with taking HTML Snapshots of your dynamically generated content.

This project is specific to Azure and is ready to be deployed as a Cloud Service. AzureCrawler is a Worker Role that uses OWIN to Self-Host a Web API.

Said that, it´s easy to bring the code to your own solution if you don´t want to use it as a separate Cloud Service. As well, if you are not using .NET and Azure it´s not complicated to port it to another platform like Amazon Web Services.

How AzureCrawler works?

The self-hosted Web API contained in AzureCrawler exposes a resource with an endpoint in:

POST api/snapshot

If you make a api call there, a PhantomJS process will run and take care of the HTML Snapshot against the provided url.

You can pass some parameters in the body of the POST call

string ApiId (required). The application identification
string Application (required). The application name
string Url (required). The url to crawl
bool Store (optional). If you want to store the snapshot for future calls
DateTime ExpirationDate (optional). The expiration of  the stored snapshot
string UserAgent (optional). The user agent of the bot crawling your application

ApiId and Application fields are required and will be validated together.

There isn´t any special mechanism for doing this validation more than the following private method:

/// <summary>
/// Validate ApiKey. In the real world you should this against a custom store
/// </summary>
/// <param name="apiKey">The api key</param>
/// <param name="apiKey">The application</param>
/// <returns>bool</returns>
private bool ValidateCredentials(string apiKey, string application)
{
    if (apiKey == "Any ApiId" && application == "Any Application name")
    {
        return true;
    }
    return false;
}

So you can supply a new mechanism, use your own keys or use a database to store application credentials.

The Url is the resource you want to crawl. The PhantomJS process will take care of the snapshot and will wait until all the dynamically generated content will be loaded.

The latest fields are about providing information for storing the HTML Snapshot in the store you prefer to.

By default, AzureCrawler will store the snapshots in Azure Storage within a blob container with the name of the Application field.

If you do this, next time a bot requests the same Url, the snapshot will be provided from the storage.

When the snapshot stored expires, a new crawl will be done and a new snapshot will be stored.

Know issues

There is a incompatibility between the Azure Compute Emulator included in the SDK 2.2. and the latest 3.x Storage assemblies so you should test with live containers until next Azure toolkit will be released.