Releases: pemistahl/lingua-py
Releases · pemistahl/lingua-py
Lingua 1.3.3
Improvements
- Processing the language models now performs a little faster by performing binary search on the language model NumPy arrays.
Bug Fixes
-
Several bugs in multiple languages detection have been fixed that caused incomplete results to be returned in several cases. (#143, #154)
-
A significant amount of Kazakh texts were incorrectly classified as Mongolian. This has been fixed. (#160)
Miscellaneous
-
A new section on performance tips has been added to the README.
-
All dependencies have been updated to their latest versions.
Lingua 1.3.2
Improvements
- After applying some internal optimizations, language detection is now faster, at least between 20% and 30%, approximately. For long input texts, the speed improvement is greater than for short input texts.
Lingua 1.3.1
Lingua 1.3.0
Improvements
- The min-max normalization method for the confidence values has been replaced with applying the softmax function. This gives more realistic probabilities. Big thanks to @Alex-Kopylov for proposing and implementing this change. (#99)
Lingua 1.2.1
Bug Fixes
- Under certain circumstances, calling the method
LanguageDetector.detect_multiple_languages_of()
raised anIndexError
. This has been fixed. Thanks to @Saninsusanin for reporting this bug. (#98)
Lingua 1.2.0
Features
-
The new method
LanguageDetector.detect_multiple_languages_of()
has been introduced. It allows to detect multiple languages in mixed-language text. (#4) -
The new method
LanguageDetector.compute_language_confidence()
has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#86)
Improvements
- The computation of the confidence values has been revised and the min-max normalization algorithm is now applied to the values, making them better comparable by behaving more like real probabilities. (#78)
Miscellaneous
- The library now has a fresh and colorful new logo. Why? Well, why not? (-:
Lingua 1.1.3
Improvements
- An
__all__
variable has been added indicating which types are exported by the library. This helps with type checking programs using Lingua. Big thanks to @bscan for the pull request. (#76) - The rule-based language filter has been improved for German texts. (#71)
- A further bottleneck in the code has been removed, making language detection 30 % faster compared to version 1.1.2, approximately.
Lingua 1.1.2
Improvements
- The language models are now stored on disk as serialized NumPy arrays instead of JSON. This reduces the preloading time of the language models significantly.
- A bottleneck in the language detection code has been removed, making language detection 40 % faster, approximately.
Bug Fixes
- The
py.typed
file that actives static type checking was missing. Big thanks to @Vasniktel for reporting this problem. (#63) - The ISO 639-3 code for Urdu was wrong. Big thanks to @pluiez for reporting this bug. (#64)
Lingua 1.1.1
Lingua 1.1.0
Features
- The new method
LanguageDetectorBuilder.with_low_accuracy_mode()
has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance.
Improvements
- The memory footprint has been reduced significantly by storing the language models in structured NumPy arrays instead of dictionaries. This reduces memory consumption from 2600 MB to 800 MB, approximately.
- Several language model files have become obsolete and could be deleted without decreasing detection accuracy. This results in a smaller memory footprint.
Compatibility
- The lowest supported Python version is 3.8 now. Python 3.7 is no longer compatible with this library.