LanguageTool

2016-10-12 00:00:00 IN1 , 2023-04-10 14:16:29 IN1


LanguageTool (LT) ist eine freie Textprüfung für Grammatik, Stil und Rechtschreibung. Es kann ein Java-Socket-Server betrieben werden, welcher Anfragen via GET/POST akzeptiert und einen detaillierten Response als JSON ausliefert.

Requirements

  • Linux OS, bspw. Ubuntu
  • Java 8

Installation

Java 8

Es wird explizit Java 8 benötigt, checke daher welche Java Version installiert ist. Beispiel Ausgabe:

$ java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

In diesem Fall ist eine ältere Version (1.7*, oder auch einfach "7" genannt) installiert. Es bedarf somit einer separaten Installation von Java 8.

LT Source

$ cd /opt
$ sudo wget https://languagetool.org/download/LanguageTool-3.5.zip
$ sudo unzip Lang*
$ cd LanguageTool-3.5

LT Server

Betrieb

Es ist angedacht, den LT Server nur lokal erreichbar zu lassen (Security). Als Mittler (Proxy) kann bspw. ein PHP Script fungieren. Architektur sieht dann folgendermaßen aus:

  • LT Server läuft lokal und ist nur lokal erreichbar (SSL ggf. somit nur bedingt erforderlich)
  • Anfragen von aussen (Internet) werden an ein lokales PHP Script via POST gesendet (bspw. via Apache)
  • PHP Script verarbeitet POST Anfrage und
    • leitet diese an LT Server weiter,
    • bekommt ein JSON Response vom LT Server zurück,
    • kann wiederum diesen verarbeiten und an den Requester zurückgeben.

Run

Starten von LanguageTool als Server auf Kommandozeile:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081

Soll der LT Server auch Anfragen von aussen entgegennehmen können, so muß dieser mit dem Schalter "--public" gestartet werden:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --public

Um auch Direkt-Anfragen per JavaScript heraus zuzulassen, wird es notwendig sein, zusätzlich den Schalter "--allow-origin" hinzuzufügen:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --allow-origin "*" --port 8081 --public

Config

Man kann eine Config Datei anlegen in der erweiterte Optionen definiert sind. Ein Aufruf sähe dann so aus:

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --config MYCONFIGFILE

Welche Optionen zur Verfügung stehen, zeigt der Schalter "--help":

$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --help
Usage: HTTPServer [--config propertyFile] [--port|-p port] [--public]
--config FILE  
    a Java property file (one key=value entry per line) with values for:
    'mode' - 'LanguageTool' or 'AfterTheDeadline' for emulation of After the Deadline output (optional, experimental)
    'afterTheDeadlineLanguage' - language code like 'en' or 'en-GB' (required if mode is 'AfterTheDeadline')
    'maxTextLength' - maximum text length, longer texts will cause an error (optional)
    'maxCheckTimeMillis' - maximum time in milliseconds allowed per check (optional)
    'maxCheckThreads' - maximum number of threads working in parallel (optional)
    'requestLimit' - maximum number of requests (optional)
    'requestLimitPeriodInSeconds' - time period to which requestLimit applies (optional)
    'languageModel' - a directory with '1grams', '2grams', '3grams' sub directories which contain a Lucene index
    each with ngram occurrence counts; activates the confusion rule if supported (optional)
    'maxWorkQueueSize' - reject request if request queue gets larger than this (optional)
    'rulesFile' - a file containing rules configuration, such as .langugagetool.cfg (optional)
--port, -p PRT 
    port to bind to, defaults to 8081 if not specified
--public    
    allow this server process to be connected from anywhere; if not set,
    it can only be connected from the computer it was started on
--allow-origin ORIGIN  
    set the Access-Control-Allow-Origin header in the HTTP response,
    used for direct (non-proxy) JavaScript-based access from browsers;
    example: --allow-origin "*"
--verbose, -v  
    in case of exceptions, log the input text (up to 500 characters)

Anfragen an LT Server

Benutze zum Testen dazu curl auf comandozeile:

$ curl --data "language=en-US&text=a simple test" http://localhost:8081/v2/check

Bsp. Response JSON:

{"software":{"name":"LanguageTool","version":"3.5","buildDate":"2016-09-30 09:59","apiVersion":"1","status":""},"language":{"name":"English (US)","code":"en-US"},"matches":[{"message":"This sentence does not start with an uppercase letter","shortMessage":"","replacements":[{"value":"A"}],"offset":0,"length":1,"context":{"text":"a simple test","offset":0,"length":1},"rule":{"id":"UPPERCASE_SENTENCE_START","description":"Checks that a sentence starts with an uppercase letter","issueType":"typographical","category":{"id":"CASING","name":"Capitalization"}}}]}

APIv2 JSON

Es stehen folgende Möglichkeiten gemäß APIv2 JSON zur Verfügung:

  • POST: /check
  • GET: /languages
/check

POST: /check führt die Prüfung für Grammatik, Stil und Rechtschreibung durch. Hierzu müssen 2 Parameter mit übergeben werden.

  1. language (Bspw. language=de-DE)
  2. text (Bspw.: text="Dies ist ein Test")
/languages

GET: /languages gibt bspw. eine JSON Liste mit den zur Verfügung stehenden Sprachen zurück:

[{"name":"Asturian","code":"ast","longCode":"ast-ES"},{"name":"Belarusian","code":"be","longCode":"be-BY"},{"name":"Breton","code":"br","longCode":"br-FR"},{"name":"Catalan","code":"ca","longCode":"ca-ES"},{"name":"Catalan (Valencian)","code":"ca","longCode":"ca-ES-valencia"},{"name":"Chinese","code":"zh","longCode":"zh-CN"},{"name":"Danish","code":"da","longCode":"da-DK"},{"name":"Dutch","code":"nl","longCode":"nl"},{"name":"English","code":"en","longCode":"en"},{"name":"English (Australian)","code":"en","longCode":"en-AU"},{"name":"English (Canadian)","code":"en","longCode":"en-CA"},{"name":"English (GB)","code":"en","longCode":"en-GB"},{"name":"English (New Zealand)","code":"en","longCode":"en-NZ"},{"name":"English (South African)","code":"en","longCode":"en-ZA"},{"name":"English (US)","code":"en","longCode":"en-US"},{"name":"Esperanto","code":"eo","longCode":"eo"},{"name":"French","code":"fr","longCode":"fr"},{"name":"Galician","code":"gl","longCode":"gl-ES"},{"name":"German","code":"de","longCode":"de"},{"name":"German (Austria)","code":"de","longCode":"de-AT"},{"name":"German (Germany)","code":"de","longCode":"de-DE"},{"name":"German (Swiss)","code":"de","longCode":"de-CH"},{"name":"Greek","code":"el","longCode":"el-GR"},{"name":"Icelandic","code":"is","longCode":"is-IS"},{"name":"Italian","code":"it","longCode":"it"},{"name":"Japanese","code":"ja","longCode":"ja-JP"},{"name":"Khmer","code":"km","longCode":"km-KH"},{"name":"Lithuanian","code":"lt","longCode":"lt-LT"},{"name":"Malayalam","code":"ml","longCode":"ml-IN"},{"name":"Persian","code":"fa","longCode":"fa"},{"name":"Polish","code":"pl","longCode":"pl-PL"},{"name":"Portuguese","code":"pt","longCode":"pt"},{"name":"Portuguese (Brazil)","code":"pt","longCode":"pt-BR"},{"name":"Portuguese (Portugal)","code":"pt","longCode":"pt-PT"},{"name":"Romanian","code":"ro","longCode":"ro-RO"},{"name":"Russian","code":"ru","longCode":"ru-RU"},{"name":"Simple German","code":"de-DE-x-simple-language","longCode":"de-DE-x-simple-language"},{"name":"Slovak","code":"sk","longCode":"sk-SK"},{"name":"Slovenian","code":"sl","longCode":"sl-SI"},{"name":"Spanish","code":"es","longCode":"es"},{"name":"Swedish","code":"sv","longCode":"sv"},{"name":"Tagalog","code":"tl","longCode":"tl-PH"},{"name":"Tamil","code":"ta","longCode":"ta-IN"},{"name":"Ukrainian","code":"uk","longCode":"uk-UA"}]

Links

This website uses Cookies to provide you with the best possible service. Please see our Privacy Policy for more information. Click the check box below to accept cookies. Then confirm with a click on "Save".