Japanese Text Analysis and Readability Tools

Learning with content within your difficulty level can save you time, and allow you to actually learn rather than be frustrated with content that is too difficult. The measuring of difficulty is called readability, and the readability of Japanese texts can be determined by three methods: sentence features, kanji grade, and vocabulary lists. Let’s look into these methods and how to use them, from academic to practical.

Sentence Analysis

NagoyaObi Project

The NagoyaObi Project and jReadability analyzes a text’s sentence features by the ratio of kana and kanji use, punctuation and forms, number of sentences, and vocabulary. The importance of each feature is mathematically prioritized based on Japanese school graded texts in order to create a weighted difficulty score (NagaoyaObi abstract). Texts matching a low elementary grade will probably have short sentences, not have much kanji, and have highly common words. These tools may be useful to determine the school grade of content, or to estimate one’s comprehension ability.

Kanji Analysis

Tokyo International University

Kanji analysis tools grade texts by the order that kanji is taught in courses, books, and flashcard programs. This makes it possible to prioritize texts with only known kanji if learning by such resources. This includes Japan’s national school cirriculum (Jouyou), the Japanese Language Proficiency Test (JLPT), and WaniKani (a flashcard website). However, it may be difficult to avoid native Japanese texts with kanji above one’s current level. In order to be comfortable reading native texts, I recommend becoming familiar with the first 400 or 600 Joyou Kanji, which covers up until grades 3 and 4. Even Japanese children can understand a number of kanji above their school grade, even if beginner Japanese content has kana (furigana) spelling included alongside Kanji. Below are a number of kanji grading tools and their order system.

Vocabulary Analysis (Recommended)

Vocabulary analysis tools grade texts by a list of words, such as from those found in courses like the JLPT or a book, or from those you know - perhaps which are stored in a flashcard program like Anki SRS. Texts could then be prioritized based on how many words you know or need to know (e.g. for a test), so as to not be overwhelmed from learning too many new words at once. This is more personal than sentence-feature analysis, and more reliable than kanji analysis. For those who wish to rely on native Japanese in the wild, it is definitely worth integrating this type of analysis into one’s studies.

The Tokyo International University provides a tool to analyze the JLPT level of words in a text, and the somewhat archaic Wareya’s analyzer and Brochtrup’s Japanese Text Analysis tools find the frequency of words in texts among other forms of analysis and features. Unfortunately from my testing, Brochtrup’s tool seemed to produce inaccurate results despite using a known vocabulary list.

However, there are more modern and practical tools. The Japanese Readifier PC app, Japanese.io and Japanese Known Word Checker websites, and Manabi and Mondo phone apps all allow you to import or create a list of words - particularly those that you know - and grade texts by how many listed words are found in a text. These apps and websites may also find articles or texts that fit your readability level, and act as dictionary lookup tools.

Graded Readers

An alternative to using tools to find readable texts in the wild is to use graded readers. Graded readers are texts and books that target second language learners, written for various difficulty levels. One service is SatoriReader.

Further Research

If the concept of learning within your level interests you, look further into Input Hypothesis.

And, if you’re looking for a Japanese dictionary, how to study with one, or efficient goal making, then see here.