LanguageTool
LanguageTool (LT) ist eine freie Textprüfung für Grammatik, Stil und Rechtschreibung. Es kann ein Java-Socket-Server betrieben werden, welcher Anfragen via GET/POST akzeptiert und einen detaillierten Response als JSON ausliefert.
Requirements
- Linux OS, bspw. Ubuntu
- Java 8
Installation
Java 8
Es wird explizit Java 8 benötigt, checke daher welche Java Version installiert ist. Beispiel Ausgabe:
$ java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)
In diesem Fall ist eine ältere Version (1.7*, oder auch einfach "7" genannt) installiert. Es bedarf somit einer separaten Installation von Java 8.
LT Source
$ cd /opt
$ sudo wget https://languagetool.org/download/LanguageTool-3.5.zip
$ sudo unzip Lang*
$ cd LanguageTool-3.5
LT Server
Betrieb
Es ist angedacht, den LT Server nur lokal erreichbar zu lassen (Security). Als Mittler (Proxy) kann bspw. ein PHP Script fungieren. Architektur sieht dann folgendermaßen aus:
- LT Server läuft lokal und ist nur lokal erreichbar (SSL ggf. somit nur bedingt erforderlich)
- Anfragen von aussen (Internet) werden an ein lokales PHP Script via POST gesendet (bspw. via Apache)
- PHP Script verarbeitet POST Anfrage und
- leitet diese an LT Server weiter,
- bekommt ein JSON Response vom LT Server zurück,
- kann wiederum diesen verarbeiten und an den Requester zurückgeben.
Run
Starten von LanguageTool als Server auf Kommandozeile:
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081
Soll der LT Server auch Anfragen von aussen entgegennehmen können, so muß dieser mit dem Schalter "--public" gestartet werden:
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --public
Um auch Direkt-Anfragen per JavaScript heraus zuzulassen, wird es notwendig sein, zusätzlich den Schalter "--allow-origin" hinzuzufügen:
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --allow-origin "*" --port 8081 --public
Config
Man kann eine Config Datei anlegen in der erweiterte Optionen definiert sind. Ein Aufruf sähe dann so aus:
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --port 8081 --config MYCONFIGFILE
Welche Optionen zur Verfügung stehen, zeigt der Schalter "--help":
$ java -cp languagetool-server.jar org.languagetool.server.HTTPServer --help
Usage: HTTPServer [--config propertyFile] [--port|-p port] [--public]
--config FILE
a Java property file (one key=value entry per line) with values for:
'mode' - 'LanguageTool' or 'AfterTheDeadline' for emulation of After the Deadline output (optional, experimental)
'afterTheDeadlineLanguage' - language code like 'en' or 'en-GB' (required if mode is 'AfterTheDeadline')
'maxTextLength' - maximum text length, longer texts will cause an error (optional)
'maxCheckTimeMillis' - maximum time in milliseconds allowed per check (optional)
'maxCheckThreads' - maximum number of threads working in parallel (optional)
'requestLimit' - maximum number of requests (optional)
'requestLimitPeriodInSeconds' - time period to which requestLimit applies (optional)
'languageModel' - a directory with '1grams', '2grams', '3grams' sub directories which contain a Lucene index
each with ngram occurrence counts; activates the confusion rule if supported (optional)
'maxWorkQueueSize' - reject request if request queue gets larger than this (optional)
'rulesFile' - a file containing rules configuration, such as .langugagetool.cfg (optional)
--port, -p PRT
port to bind to, defaults to 8081 if not specified
--public
allow this server process to be connected from anywhere; if not set,
it can only be connected from the computer it was started on
--allow-origin ORIGIN
set the Access-Control-Allow-Origin header in the HTTP response,
used for direct (non-proxy) JavaScript-based access from browsers;
example: --allow-origin "*"
--verbose, -v
in case of exceptions, log the input text (up to 500 characters)
Anfragen an LT Server
Benutze zum Testen dazu curl
auf comandozeile:
$ curl --data "language=en-US&text=a simple test" http://localhost:8081/v2/check
Bsp. Response JSON:
{"software":{"name":"LanguageTool","version":"3.5","buildDate":"2016-09-30 09:59","apiVersion":"1","status":""},"language":{"name":"English (US)","code":"en-US"},"matches":[{"message":"This sentence does not start with an uppercase letter","shortMessage":"","replacements":[{"value":"A"}],"offset":0,"length":1,"context":{"text":"a simple test","offset":0,"length":1},"rule":{"id":"UPPERCASE_SENTENCE_START","description":"Checks that a sentence starts with an uppercase letter","issueType":"typographical","category":{"id":"CASING","name":"Capitalization"}}}]}
APIv2 JSON
Es stehen folgende Möglichkeiten gemäß APIv2 JSON zur Verfügung:
- POST:
/check
- GET:
/languages
/check
POST: /check
führt die Prüfung für Grammatik, Stil und Rechtschreibung durch. Hierzu müssen 2 Parameter mit übergeben werden.
- language (Bspw. language=de-DE)
- text (Bspw.: text="Dies ist ein Test")
/languages
GET: /languages
gibt bspw. eine JSON Liste mit den zur Verfügung stehenden Sprachen zurück:
[{"name":"Asturian","code":"ast","longCode":"ast-ES"},{"name":"Belarusian","code":"be","longCode":"be-BY"},{"name":"Breton","code":"br","longCode":"br-FR"},{"name":"Catalan","code":"ca","longCode":"ca-ES"},{"name":"Catalan (Valencian)","code":"ca","longCode":"ca-ES-valencia"},{"name":"Chinese","code":"zh","longCode":"zh-CN"},{"name":"Danish","code":"da","longCode":"da-DK"},{"name":"Dutch","code":"nl","longCode":"nl"},{"name":"English","code":"en","longCode":"en"},{"name":"English (Australian)","code":"en","longCode":"en-AU"},{"name":"English (Canadian)","code":"en","longCode":"en-CA"},{"name":"English (GB)","code":"en","longCode":"en-GB"},{"name":"English (New Zealand)","code":"en","longCode":"en-NZ"},{"name":"English (South African)","code":"en","longCode":"en-ZA"},{"name":"English (US)","code":"en","longCode":"en-US"},{"name":"Esperanto","code":"eo","longCode":"eo"},{"name":"French","code":"fr","longCode":"fr"},{"name":"Galician","code":"gl","longCode":"gl-ES"},{"name":"German","code":"de","longCode":"de"},{"name":"German (Austria)","code":"de","longCode":"de-AT"},{"name":"German (Germany)","code":"de","longCode":"de-DE"},{"name":"German (Swiss)","code":"de","longCode":"de-CH"},{"name":"Greek","code":"el","longCode":"el-GR"},{"name":"Icelandic","code":"is","longCode":"is-IS"},{"name":"Italian","code":"it","longCode":"it"},{"name":"Japanese","code":"ja","longCode":"ja-JP"},{"name":"Khmer","code":"km","longCode":"km-KH"},{"name":"Lithuanian","code":"lt","longCode":"lt-LT"},{"name":"Malayalam","code":"ml","longCode":"ml-IN"},{"name":"Persian","code":"fa","longCode":"fa"},{"name":"Polish","code":"pl","longCode":"pl-PL"},{"name":"Portuguese","code":"pt","longCode":"pt"},{"name":"Portuguese (Brazil)","code":"pt","longCode":"pt-BR"},{"name":"Portuguese (Portugal)","code":"pt","longCode":"pt-PT"},{"name":"Romanian","code":"ro","longCode":"ro-RO"},{"name":"Russian","code":"ru","longCode":"ru-RU"},{"name":"Simple German","code":"de-DE-x-simple-language","longCode":"de-DE-x-simple-language"},{"name":"Slovak","code":"sk","longCode":"sk-SK"},{"name":"Slovenian","code":"sl","longCode":"sl-SI"},{"name":"Spanish","code":"es","longCode":"es"},{"name":"Swedish","code":"sv","longCode":"sv"},{"name":"Tagalog","code":"tl","longCode":"tl-PH"},{"name":"Tamil","code":"ta","longCode":"ta-IN"},{"name":"Ukrainian","code":"uk","longCode":"uk-UA"}]
Links
- Project Website: https://languagetool.org/
- Forum: http://forum.languagetool.org/
- http-server HowTo: http://wiki.languagetool.org/http-server
- http-server Source: https://github.com/languagetool-org/languagetool
- APIv2 JSON: https://languagetool.org/http-api/swagger-ui/#/default
- Einbinden in Website: http://wiki.languagetool.org/integration-on-websites
- Ubuntu: separate Installation von Java 8: https://wiki.ubuntuusers.de/Java/Installation/Oracle_Java/Java_8/