Anki deck

This deck was automatically generated from the 2000 most common words in a set of over 13000 hentai voice work scripts.

This means that it contains quite a bit of hentai related vocabulary, but most of the deck just consists of words that are commonly used in a casual setting.

Some errors in the deck like words appearing twice were fixed manually, but there might be some errors left. Please send an e-mail to contact@pyonpyon.moe if you find anything weird or wrong.

Most audio files come from http://nihongo.monash.edu/cgi-bin/wwwjdic?1C. About 250 entries in the deck did not have audio files on the site and use the google translate text-to-speech which may sound robotic and have wrong pronunciation.

Download

Word list

This is the list of all the words in the scripts and their total amount of occurences in the 13000+ scripts, ordered by occurences.

Kana only entries with less than 4 characters and every entry that contains a non-japanese character were removed to filter out the garbage that the parsing library sometimes generates.

There are still some oddities in the list such as 気持ちいい, 気持ちよい and 気持ち良い all appearing separately even though the parsing library is supposed to just return a dictionary form.

Other downloads

scripts These are all of the 13347 scripts that were used. A small amount of them are in english.

parser The python program that was used to parse the scripts. Depends on the MeCab library. Only tested on linux.

with sounds The same data as the anki deck in csv format, with pronunciation from google translate and some other added stuff. Made by another anon.

original deck The original anki deck without audio.

History

2019-04-18: Added download for a csv version.

2019-07-12: Added audio version of the anki deck.