LanguageTool
2016-10-12

Linux,Ubuntu,Apache,Java,LanguageTool,i18n,DE

LanguageTool (LT) ist eine freie Textprüfung für Grammatik, Stil und Rechtschreibung. Es kann ein Java-Socket-Server betrieben werden, welcher Anfragen via GET/POST akzeptiert und einen detaillierten Response als JSON ausliefert.

Requirements

  • Linux OS, bspw. Ubuntu
  • Java 8

Installation

Java 8

Es wird explizit Java 8 benötigt, checke daher welche Java Version installiert ist. Beispiel Ausgabe:

$ java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

In diesem Fall ist eine ältere Version (1.7*, oder auch einfach "7" genannt) installiert. Es bedarf somit einer separaten Installation von Java 8.

LT Source

$ cd /opt
$ sudo wget https://languagetool.org/download/LanguageTool-3.5.zip
$ sudo unzip Lang*
$ cd LanguageTool-3.5

LT Server

Betrieb

Es ist angedacht, den LT Server nur lokal erreichbar zu lassen (Security). Als Mittler (Proxy) kann bspw. ein PHP Script fungieren. Architektur sieht dann folgendermaßen aus:

  • LT Server läuft lokal und ist nur lokal erreichbar (SSL ggf. somit nur bedingt erforderlich)
  • Anfragen von aussen (Internet) werden an ein lokales PHP Script via POST gesendet (bspw. via Apache)
  • PHP Script verarbeitet POST Anfrage und
    • leitet diese an LT Server weiter,
    • bekommt ein JSON Response vom LT Server zurück,
    • kann wiederum diesen verarbeiten und an den Requester zurückgeben.

Run

Starten von LanguageTool als Server auf Kommandozeile:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081

Soll der LT Server auch Anfragen von aussen entgegennehmen können, so muß dieser mit dem Schalter "--public" gestartet werden:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --public

Um auch Direkt-Anfragen per JavaScript heraus zuzulassen, wird es notwendig sein, zusätzlich den Schalter "--allow-origin" hinzuzufügen:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --allow-origin "*" --port 8081 --public

Config

Man kann eine Config Datei anlegen in der erweiterte Optionen definiert sind. Ein Aufruf sähe dann so aus:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --config MYCONFIGFILE

Welche Optionen zur Verfügung stehen, zeigt der Schalter "--help":

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --help
Usage: HTTPServer [--config propertyFile] [--port|-p port] [--public]
--config FILE  
    a Java property file (one key=value entry per line) with values for:
    'mode' - 'LanguageTool' or 'AfterTheDeadline' for emulation of After the Deadline output (optional, experimental)
    'afterTheDeadlineLanguage' - language code like 'en' or 'en-GB' (required if mode is 'AfterTheDeadline')
    'maxTextLength' - maximum text length, longer texts will cause an error (optional)
    'maxCheckTimeMillis' - maximum time in milliseconds allowed per check (optional)
    'maxCheckThreads' - maximum number of threads working in parallel (optional)
    'requestLimit' - maximum number of requests (optional)
    'requestLimitPeriodInSeconds' - time period to which requestLimit applies (optional)
    'languageModel' - a directory with '1grams', '2grams', '3grams' sub directories which contain a Lucene index
    each with ngram occurrence counts; activates the confusion rule if supported (optional)
    'maxWorkQueueSize' - reject request if request queue gets larger than this (optional)
    'rulesFile' - a file containing rules configuration, such as .langugagetool.cfg (optional)
--port, -p PRT 
    port to bind to, defaults to 8081 if not specified
--public    
    allow this server process to be connected from anywhere; if not set,
    it can only be connected from the computer it was started on
--allow-origin ORIGIN  
    set the Access-Control-Allow-Origin header in the HTTP response,
    used for direct (non-proxy) JavaScript-based access from browsers;
    example: --allow-origin "*"
--verbose, -v  
    in case of exceptions, log the input text (up to 500 characters)

Anfragen an LT Server

Benutze zum Testen dazu curl auf comandozeile:

$ curl --data "language=en-US&text=a simple test" http://localhost:8081/v2/check

Bsp. Response JSON:

{"software":{"name":"LanguageTool","version":"3.5","buildDate":"2016-09-30 09:59","apiVersion":"1","status":""},"language":{"name":"English (US)","code":"en-US"},"matches":[{"message":"This sentence does not start with an uppercase letter","shortMessage":"","replacements":[{"value":"A"}],"offset":0,"length":1,"context":{"text":"a simple test","offset":0,"length":1},"rule":{"id":"UPPERCASE_SENTENCE_START","description":"Checks that a sentence starts with an uppercase letter","issueType":"typographical","category":{"id":"CASING","name":"Capitalization"}}}]}

APIv2 JSON

Es stehen folgende Möglichkeiten gemäß APIv2 JSON zur Verfügung:

  • POST: /check
  • GET: /languages
/check

POST: /check führt die Prüfung für Grammatik, Stil und Rechtschreibung durch. Hierzu müssen 2 Parameter mit übergeben werden.

  1. language (Bspw. language=de-DE)
  2. text (Bspw.: text="Dies ist ein Test")
/languages

GET: /languages gibt bspw. eine JSON Liste mit den zur Verfügung stehenden Sprachen zurück:

[{"name":"Asturian","code":"ast","longCode":"ast-ES"},{"name":"Belarusian","code":"be","longCode":"be-BY"},{"name":"Breton","code":"br","longCode":"br-FR"},{"name":"Catalan","code":"ca","longCode":"ca-ES"},{"name":"Catalan (Valencian)","code":"ca","longCode":"ca-ES-valencia"},{"name":"Chinese","code":"zh","longCode":"zh-CN"},{"name":"Danish","code":"da","longCode":"da-DK"},{"name":"Dutch","code":"nl","longCode":"nl"},{"name":"English","code":"en","longCode":"en"},{"name":"English (Australian)","code":"en","longCode":"en-AU"},{"name":"English (Canadian)","code":"en","longCode":"en-CA"},{"name":"English (GB)","code":"en","longCode":"en-GB"},{"name":"English (New Zealand)","code":"en","longCode":"en-NZ"},{"name":"English (South African)","code":"en","longCode":"en-ZA"},{"name":"English (US)","code":"en","longCode":"en-US"},{"name":"Esperanto","code":"eo","longCode":"eo"},{"name":"French","code":"fr","longCode":"fr"},{"name":"Galician","code":"gl","longCode":"gl-ES"},{"name":"German","code":"de","longCode":"de"},{"name":"German (Austria)","code":"de","longCode":"de-AT"},{"name":"German (Germany)","code":"de","longCode":"de-DE"},{"name":"German (Swiss)","code":"de","longCode":"de-CH"},{"name":"Greek","code":"el","longCode":"el-GR"},{"name":"Icelandic","code":"is","longCode":"is-IS"},{"name":"Italian","code":"it","longCode":"it"},{"name":"Japanese","code":"ja","longCode":"ja-JP"},{"name":"Khmer","code":"km","longCode":"km-KH"},{"name":"Lithuanian","code":"lt","longCode":"lt-LT"},{"name":"Malayalam","code":"ml","longCode":"ml-IN"},{"name":"Persian","code":"fa","longCode":"fa"},{"name":"Polish","code":"pl","longCode":"pl-PL"},{"name":"Portuguese","code":"pt","longCode":"pt"},{"name":"Portuguese (Brazil)","code":"pt","longCode":"pt-BR"},{"name":"Portuguese (Portugal)","code":"pt","longCode":"pt-PT"},{"name":"Romanian","code":"ro","longCode":"ro-RO"},{"name":"Russian","code":"ru","longCode":"ru-RU"},{"name":"Simple German","code":"de-DE-x-simple-language","longCode":"de-DE-x-simple-language"},{"name":"Slovak","code":"sk","longCode":"sk-SK"},{"name":"Slovenian","code":"sl","longCode":"sl-SI"},{"name":"Spanish","code":"es","longCode":"es"},{"name":"Swedish","code":"sv","longCode":"sv"},{"name":"Tagalog","code":"tl","longCode":"tl-PH"},{"name":"Tamil","code":"ta","longCode":"ta-IN"},{"name":"Ukrainian","code":"uk","longCode":"uk-UA"}]

Links


Comment++

E-Mail Adresse wird nicht veröffentlicht.
E-mail address will not be published.