Grammar Error Correction in Morphologically-Rich Languages: The Case of Russian
Abstract
Up until now, most of the research in grammar error correction focused on English, and the problem has hardly been explored for other languages. We address the task of correcting writing mistakes in morphologically-rich languages, with a focus on Russian. We present a corrected and error-tagged corpus of Russian learner writings and develop models that make use of existing state-of-the-art methods that have been well-studied for English. Although impressive results have recently been achieved for grammar error correction of non-native English writings, these results are limited to domains where plentiful training data is available. Since annotation is extremely costly, these approaches are not suitable for the majority of domains and languages. We thus focus on methods that use “minimal supervision”, i.e. those that do not rely on large amounts of annotated training data, and show how existing minimal-supervision approaches extend to a highly inflectional language such as Russian. The results demonstrate that these methods are particularly useful for correcting mistakes in grammatical phenomena that involve rich morphology.