PHPackages                             googlei18n/myanmar-tools - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Localization &amp; i18n](/categories/localization)
4. /
5. googlei18n/myanmar-tools

ActiveLibrary[Localization &amp; i18n](/categories/localization)

googlei18n/myanmar-tools
========================

Zawgyi and Unicode Font Detector using Markov Model (Machine Learning)

v1.2.1+py(5y ago)26125.8k↓18.8%86[9 issues](https://github.com/googlei18n/myanmar-tools/issues)[2 PRs](https://github.com/googlei18n/myanmar-tools/pulls)1MITJavaPHP &gt;=7.0CI passing

Since Jun 4Pushed 1y ago32 watchersCompare

[ Source](https://github.com/googlei18n/myanmar-tools)[ Packagist](https://packagist.org/packages/googlei18n/myanmar-tools)[ Docs](http://stevenay.wordpress.com)[ RSS](/packages/googlei18n-myanmar-tools/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (5)Dependencies (1)Versions (15)Used By (1)

Myanmar Tools (Zawgyi detection &amp; conversion)
=================================================

[](#myanmar-tools-zawgyi-detection--conversion)

This project includes tools for processing font encodings used in Myanmar, currently with support for the widespread Zawgyi-One font encoding. For more information on font encodings in Myanmar, read [the Unicode Myanmar FAQ](http://www.unicode.org/faq/myanmar.html).

Unofficial demos:

- [Detector Demo Page](https://sffc.github.io/myanmar-tools-demos/detector_demo.html)
- [Converter Demo Page](http://zawgyi-unicode-test.appspot.com/convertui/)

Features:

- Detect whether a string is Zawgyi or Unicode.
    - Supported in C++, Java, JavaScript (both Node.js and browser), PHP, Ruby, Dart and C#
- Convert a string from Zawgyi to Unicode or from Unicode to Zawgyi
    - Supported in Java and JavaScript

Conversion is also available via ICU in languages without support via Myanmar Tools; see "Zawgyi-to-Unicode Conversion" below.

This is not an official Google product, but we hope that you’ll find *Myanmar Tools* useful to better support the languages of Myanmar.

[![Build Status](https://github.com/google/myanmar-tools/actions/workflows/mynmr_tools_ci.yml/badge.svg)](https://github.com/google/myanmar-tools/actions)

Why Myanmar Tools?
------------------

[](#why-myanmar-tools)

*Myanmar Tools* uses a machine learning model to give very accurate results when detecting Zawgyi versus Unicode. Detectors that use hand-coded rules for detection are susceptible to flagging content in other languages like Shan and Mon as Zawgyi when it is actually Unicode.

*Myanmar Tools* and the CLDR Zawgyi conversion rules are used by Google, Facebook, and others to provide great experiences to users in Myanmar.

Using the Zawgyi Detector
-------------------------

[](#using-the-zawgyi-detector)

See language-specific documentation:

- [C++](clients/cpp/README.md)
- [Go](clients/go/README.md)
- [Java](clients/java/README.md)
- [JavaScript](clients/js/README.md)
- [PHP](clients/php/README.md)
- [Python](clients/python/README.rst)
- [Ruby](clients/ruby/README.md)
- [Swift](clients/swift/README.md)
- [Dart](clients/dart/README.md)
- [C#](clients/c%23/README.md)

Depending on your programming language, a typical use case should look something like this:

```
if (zawgyiDetector.getZawgyiProbability(input) > THRESHOLD) {
    // Convert to Unicode, etc.
}
```

The method `getZawgyiProbability` returns a number between 0 and 1 to reflect the probability that a string is Zawgyi, given that it is either Zawgyi or Unicode. For strings that are sufficiently long, the detector should return a number very close to 0 or 1, but for strings with only a few characters, the number may be closer to the middle. With this in mind, use the following heuristics to set `THRESHOLD`:

- If *under*-predicting Zawgyi is bad (e.g., when a human gets to evaluate the result), set a low threshold like `0.05`. This threshold guarantees that fewer than 1% of Zawgyi strings will go undetected.
- If *over*-predicting Zawgyi is bad (e.g., when conversion will take place automatically), set a high threshold like `0.95`. This threshold guarantees that fewer than 1% of Unicode strings will be wrongly flagged.

Additionally, keep in mind that you may want to tune your thresholds to the distribution of strings in your input data. For example, if your input data is biased toward Unicode, in order to reduce false positives, you may want to set a higher Zawgyi threshold than if your input data is biased toward Zawgyi. Ultimately, the best way to pick thresholds is to obtain a set of labeled strings representative of the data the detector will be processing, compute their scores, and tune the thresholds to your desired ratio of precision and recall.

If a string contains a non-Burmese affix, it will get the same Zawgyi probability as if the affix were removed. That is, `getZawgyiProbability("hello  world")` == `getZawgyiProbability("")`.

Some strings are identical in both U and Z; this can happen if the string consists of mostly consonants with few diacritic vowels. The detector may return any value for such strings. If the user is concerned with this case, they can simply run the string through a converter and check whether or not the converter's output is equal to the converter's input.

Training the Model
------------------

[](#training-the-model)

The model used by the Zawgyi detector has been trained on several megabytes of data from web sites across the internet. The data was obtained using the [Corpus Crawler](https://github.com/googlei18n/corpuscrawler/) tool.

To re-train the model, first run Corpus Crawler locally. For example:

```
$ mkdir ~/corpuscrawler_output && cd ~/corpuscrawler_output
$ corpuscrawler --language my --output . &
$ corpuscrawler --language my-t-d0-zawgyi --output . &
$ corpuscrawler --language shn --output . &
$ corpuscrawler --language mnw --output . &
$ corpuscrawler --language kar --output . &
$ corpuscrawler --language pi-Mymr --output . &
```

This will take a long time, as in several days. The longer you let it run, the better your model will be. Note that at a minimum, you must ensure that you have obtained data for both Unicode and Zawgyi; the directory should contain files for the six languages listed in the commands above.

Once you have data available, train the model by running the following command in this directory:

```
$ make train CORPUS=$HOME/corpuscrawler_output
```

Zawgyi-to-Unicode Conversion
----------------------------

[](#zawgyi-to-unicode-conversion)

Once determining that a piece of text is Unicode or Zawgyi, it's often useful to convert from one encoding to the other.

This package supports conversion in Java and JavaScript. The rules are the same as the transliteration rules published in the Common Locale Data Repository (CLDR). In other languages, therefore, conversion functionality is available in [ICU](http://site.icu-project.org/). Versions 58+ have this built-in with the transform ID "Zawgyi-my":

- Java: [ICU4J Transliterator](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/Transliterator.html)
- C++: [ICU4C Transliterator](http://icu-project.org/apiref/icu4c/classicu_1_1Transliterator.html)

Many other languages, including Python, Ruby, and PHP, have wrapper libraries over ICU4C, which means you can use the Zawgyi converter in those languages, too. See the samples directory for examples on using the ICU Transliterator.

Contributing New Programming Language Support
---------------------------------------------

[](#contributing-new-programming-language-support)

We will happily consider pull requests that add clients in other programming languages. To add support for a new programming language, here are some tips:

- Add a new directory underneath `clients`. This will be the root of your new package.
- Use a build system customary to your language.
- Add your language to the `copy-resources` and `test` rules in the top-level Makefile.
- At a minimum, your package should automatically consume `zawgyiUnicodeModel.dat` and test against `compatibility.tsv`.
- Implementations of converters should run compatibility tests for both Z&lt;-&gt;U directions using the provide resources.
- See the other clients for examples. Most clients are only a couple hundred lines of code.
- When finished, add your client to the *.travis.yml* file.

###  Health Score

47

—

FairBetter than 94% of packages

Maintenance33

Infrequent updates — may be unmaintained

Popularity48

Moderate usage in the ecosystem

Community33

Small or concentrated contributor base

Maturity66

Established project with proven stability

 Bus Factor1

Top contributor holds 61.4% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~93 days

Recently: every ~162 days

Total

11

Last Release

1967d ago

Major Versions

v0.1.2+php → v1.1.0+js2018-07-11

### Community

Maintainers

![](https://www.gravatar.com/avatar/c635f93b6a7324670957f740c43fbb3ab4008ee1bdfe67c09b50eae6f46ac8cf?d=identicon)[stevenay](/maintainers/stevenay)

![](https://www.gravatar.com/avatar/625c9d087fd6b8905316baffd2a0722cab63b76afea0b8f3168aceba526dbf26?d=identicon)[google-myanmar-tools-user](/maintainers/google-myanmar-tools-user)

---

Top Contributors

[![sffc](https://avatars.githubusercontent.com/u/1145762?v=4)](https://github.com/sffc "sffc (196 commits)")[![stevenay](https://avatars.githubusercontent.com/u/6089226?v=4)](https://github.com/stevenay "stevenay (27 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (25 commits)")[![kirankarki](https://avatars.githubusercontent.com/u/5351916?v=4)](https://github.com/kirankarki "kirankarki (18 commits)")[![blackblitz](https://avatars.githubusercontent.com/u/19586410?v=4)](https://github.com/blackblitz "blackblitz (13 commits)")[![bhamiltoncx](https://avatars.githubusercontent.com/u/671827?v=4)](https://github.com/bhamiltoncx "bhamiltoncx (8 commits)")[![brawer](https://avatars.githubusercontent.com/u/1527880?v=4)](https://github.com/brawer "brawer (6 commits)")[![echeran](https://avatars.githubusercontent.com/u/963108?v=4)](https://github.com/echeran "echeran (5 commits)")[![roubert](https://avatars.githubusercontent.com/u/724587?v=4)](https://github.com/roubert "roubert (5 commits)")[![sven-oly](https://avatars.githubusercontent.com/u/10751250?v=4)](https://github.com/sven-oly "sven-oly (5 commits)")[![theinlinkyaw](https://avatars.githubusercontent.com/u/5856922?v=4)](https://github.com/theinlinkyaw "theinlinkyaw (4 commits)")[![gnrunge](https://avatars.githubusercontent.com/u/41129501?v=4)](https://github.com/gnrunge "gnrunge (4 commits)")[![seithuhtun](https://avatars.githubusercontent.com/u/16371760?v=4)](https://github.com/seithuhtun "seithuhtun (2 commits)")[![joycebrum](https://avatars.githubusercontent.com/u/22223372?v=4)](https://github.com/joycebrum "joycebrum (1 commits)")

---

Tags

unicodedetectorzawgyimyanmar-fontmyanmar-tools

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/googlei18n-myanmar-tools/health.svg)

```
[![Health](https://phpackages.com/badges/googlei18n-myanmar-tools/health.svg)](https://phpackages.com/packages/googlei18n-myanmar-tools)
```

###  Alternatives

[symfony/string

Provides an object-oriented API to strings and deals with bytes, UTF-8 code points and grapheme clusters in a unified way

1.8k724.1M827](/packages/symfony-string)[gettext/languages

gettext languages with plural rules

7530.3M11](/packages/gettext-languages)[punic/punic

PHP-Unicode CLDR

1542.9M29](/packages/punic-punic)[joedixon/laravel-translation

A tool for managing all of your Laravel translations

717911.4k11](/packages/joedixon-laravel-translation)[illuminate/translation

The Illuminate Translation package.

6936.4M495](/packages/illuminate-translation)[stevenay/myanfont

Zawgyi and Unicode Font Detector and Converter

157.5k2](/packages/stevenay-myanfont)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
