Configuring the LinkSpider Background Component for one site in Single-server and Multi-server architectures

The LinkSpider Background Component is part of the Active Integrity Maintenance (AIM) system.
During AIM scanning several types of references are detected.

All internal references are stored using the AIM ContentRelations.

Other references are stored in the ExternalReferences table.

There are 4 categories of External References:

  • LocalFile : reference to a file on the local server (like: /images/pixel.gif)
  • Resource  : reference to a resource file (like: /res/open.gif)
  • External  : reference to an external site (like: http://www.google.nl)
  • Invalid  : reference of which the type could not be determined (mostly misspelled)

These external references will not be tested during the AIM scanning process itself, because of the unwanted delays that may occur.
Instead, they are spidered asynchronously using the LinkSpider Background Component running in a seperate thread.

Single-Server Architecture

Obviously, when using one server the LinkSpider proces is configured to run on that server. The LinkSpider is enabled by default.

Multi-Server Architecture

In a multi-server architecture there are a few considerations:

 

  1. Multiple servers using a single Database
    Because the external references are stored in the database, the external references for all servers are the same., although the refences are found by the AIM scanning on all servers.
    Enabling the LinkSpider on all servers using the same database could lead to too much scanning (because each server scans all references). However, these server may not have the same file system or have the same privileges to acces the internet or other resources.  Therefore scanning results may differ. 
     
    In this scenario you should configure the LinkSpider on the server:
    • which has access to all relevant sources (like the Internet)
    • with the filesystem you would like to monitor.


  2. Multiple servers using multiple Databases
    In this scenario the AIM process of each server stores the references in a different database. 

    Some servers may have different content and different files, due to outscaling configuration or front end content contribution (like a forum). Some servers may also have different hostheaders configured as Internal (recognized by Smartsite). The scanning results may therefore differ from server to server.

    If each server has exactly the same content and files the LinkSpider shoud be configured on the CMS server. The scanning of links on the publication server may still be configured but is not necessary.

    If content on the servers differ, it is adivsed to configure the LinkSpider on all servers. 

NOTE: Please note that disabling the LinkSpider Background Component does NOT include disabling AIM.
The AIM scanning process should never be disabled.