Releases: polm/cutlet
v0.5.0: Change to Preserve Existing Whitespace
This release includes one small change from the previous v0.4.0, though it gets a minor version because it changes valid output. Essentially, whitespace in ASCII strings passed to cutlet will be preserved, rather than being modified using the same rules as Japanese text. See #66 for details.
v0.3.0: Token-aligned romaji
This release adds the romaji_tokens
function, which takes a list of input Node objects from fugashi and returns romaji for each individual token. This allows for romaji furigana or other applications.
The next release will likely be 1.0. No major extra functionality is planned, but some methods may be made internal, and the API will otherwise be cleaned up.
Fix Odoriji
This release of cutlet adds basic support for odoriji, or character repeating characters. In some cases it's impossible to handle them correctly, but at a minimum this makes sure they won't blow up.
Since the last release notes many other improvements have been included, and it's recommended you upgrade.
Fix Kana Unk Handling
This release fixes the issue (#8) where hiragana or katakana words not in the
dictionary would not be converted to romaji, but reproduced as-is. Now
they are romanized, though since they're not in the dictionary this will
often fail to capture original spelling.
A further consequence of this change is that unknown words in scripts
that aren't kana or ascii need to be handled. By default these
characters will be converted to "?" for maximum technical compatability,
though by setting the ensure_ascii
property on a Cutlet to False you
can disable this behavior, which will cause unknown characters to pass
through.
Example:
import cutlet
cut = cutlet.Cutlet()
cutlet.romaji('彁')
# -> ?
cut.ensure_ascii = False
cut.romaji('彁')
# -> 彁
Note that besides unknown kanji this affects non-latin scripts like Cyrillic and Hangul.
Small improvements and bugfixes
Thanks to recent attention and PRs from the community this release of cutlet has several nice improvements.
- fixed an issue with a few pronouns
- fixed behavior of cli script on ctrlc
- add support for Python 3.6
- add Kyoto to the list of exceptions
- don't blow up on empty strings