Tika

Smartsite 7.9 - ...

Purpose

Tika implements document text and metadata extraction. Tika runs as a Windows service and offers a stateless webservice. The webservice is accessed over http by the components of the Enterprise Search solution.

Version

Tika:

  • Tika-server standard v2.6.0.

Tika is not 32 bit or 64 bit specific; this is handled by the Java runtime environment.

Service manager:

  • Nssm 2.24.

The service manager comes as a 32 bit version and a 64 bit version. The 64 bit version will be used.

Media

The installation media includes folder Enterprise Search\R12.1\Tika, containing:

  • tika-server-standard-2.6.0.jar

and folder Enterprise Search\R12.1\Service Manager, containing:

  • nssm-2.24.zip.

Prerequisites

The following prerequisites apply to the Tika server.

In order to allow for side-by-side installations using different versions of the Java runtime environment the path to java.exe is no longer added to the system PATH. Instead java.exe is fully qualified when running interactively and when specified for the service manager. Likewise JAVA_HOME is no longer specified as system environment variable; it is specified as an environment variable within the service manager instead.

First time installation

These are the steps for a first time installation.

  1. Create an installation folder, for example E:\Program Files\Tika\2.6.0.
  2. Copy tika-server-standard-2.6.0.jar to the installation folder.
  3. Unpack nssm-2.24.zip in a temporary folder. Go to subfolder win64. Copy nssm.exe to the Tika installation folder.

Decide which port to use for the Tika server.

  1. Tika-server will use port 9998 by default.

Test interactively.

  1. Start a command box, as administrator.
  2. Go to the installation folder, for example E:\Program Files\Tika\2.6.0.
  3. Run: E:\Program Files\Java\18.36\bin\java -jar tika-server-standard-2.6.0.jar --host=host-126.example.com --port=9998
  4. Inspect the feedback. Expect a final message like Started Apache Tika server at http://host-126.example.com:9998/.
  5. Perform a sanity check. Start a browser and issue http://host-126.example.com:9998/. Expect a list of Tika endpoint descriptions.
  6. Stop with Ctrl-C.

Install and start Tika-server as a Windows service.

  1. Run: nssm install Tika-server "E:\Program Files\Java\18.36\bin\java" -jar tika-server-standard-2.6.0.jar --host=host-126.example.com --port=9998. Use the commandline as tested in the above steps.
  2. Run: nssm set Tika-server AppDirectory E:\Program Files\Tika\2.6.0. Specify the installation folder.
  3. Run: nssm set Tika-server Description Tika document text and metadata extraction.
  4. Run: nssm set Tika-server AppEnvironmentExtra JAVA_HOME="E:\Program Files\Java\18.36"
  5. Run: nssm start Tika-server.
  6. Issue: sc failure Tika-server actions=restart/60000/restart//reboot/ reset=86400. For information: this requests to restart the service after 1 minute if the service fails a first or second time, to restart the computer if the service fails subsequent times, and to reset the the fail counts after 1 day.
  7. Perform a sanity check. Start a browser and issue http://host-126.example.com:9998/. Expect a list of Tika endpoint descriptions.

Remote usage

For ermote usage add a firewall rule.

  • On the Tika server add a firewall rule.
    • Inbound rule of type port
    • TCP, local port 9998
    • Allow the connection
    • Domain + Private + Public
    • Name: Tika-server-2.6.0
    • Description: Tika document text and metadata extraction
  • Consider using the firewall scope configuration to limit access to the Tika server to certain IP adresses or ranges of IP addresses.

Installation update

An installation update applies if the Tika version changes or in order to change the parameters issued in the java command, including the host and port.

  1. Start a command box, as administrator.
  2. Go to the current installation folder, for example E:\Program Files\Tika\2.6.0.
  3. Stop the service: nssm stop Tika-server.
  4. Remove the service: nssm remove Tika-server.
  5. Repeat the applicable steps of the above first time installation.

Alternatively:

  1. nssm stop Tika-server
  2. nssm edit Tika-server
  3. nssm start Tika-server

Troubleshooting

Tika server start fails

The command nssm start Tika-server may fail with:

Tika-server: Unexpected status SERVICE_START_PENDING in response to START control.

Retry the start. If this reports An instance of the service is already running perform a nssm stop Tika-server first.