This repository aims to accelerate the process of finding Typo-Bridges, to foster interest and to encourage their intentional inclusion in texts.
Let's start with an example:
tet oyur safd lobely tyos
The sentence above contains five Typo-Bridges. It could be interpreted as an attempt to write (at least) one of the following:
test our safe lovely toys
get your sad lonely typos
That is, assuming English, the five alphabetic sequences ("words") above all appear to be typos, and assuming an ANSI keyboard with QWERTY layout was used, they all could be interpreted in at least two different ways. For example:
If the author was a bit tired, they might have missed the s
, intending to write test
. However, since t
is close to g
on their keyboard, it's also possible they meant to write get
.
It could be that they swapped o
and y
, intending to write your
, or accidentally pressed y
while typing u
, which is right next to it, intending to write our
.
Image by Denelson83 from Wikipedia, used under CC BY-SA 3.0 license. |
We consider the following:
- Keys neighboring the ones intended to be pressed are somewhat likely to be accidentally pressed instead or in addition to the intended ones.
- Two consecutive elements in the word intended could have been entered in the wrong order.
- Some keys intended to be pressed may have been left unpressed.
Quite essy, isn't it? Some explanations (words) may be considered, by somebody, less likely than others. Some mistakes may be considered, by someone, more likely to happen than others. Typo-Bridges is about plausibility, it's subjective and sensitive to the circumstances contemplated by the observer. They sit between words, between explanations, keeping them all considerable and roughly with the same likelihood.
You can get started by
git clone https://github.com/mongsvo/tyo.git
cd tyo
cargo build --release
# or, if Faiss installed
cargo build --release --features neighbors
To compile with the optional "--features neighbors" you first have to install Faiss.
The resulting command line tool tyo
comes with a default device (QWERTY-ANSI) but no default language. tyo
can work without any language installed (see usage bellow), but if you want to scan a language for all possible bridges to a word, you must first install one. For this purpose, after compilation, you can use ./target/release/install_lang
, as well as the helper python script at ./contrib/embeddings_filter.py
to clean up your datasets before installing.
As an example, for English, you could use one of these datasets and clean it a bit by using this much smaller dataset:
wget https://nlp.stanford.edu/data/glove.42B.300d.zip
unzip glove.42B.300d.zip
wget https://raw.githubusercontent.com/dwyl/english-words/refs/heads/master/words_alpha.txt
Assuming you compiled with Faiss, you could then run:
./target/release/install_lang english glove.42B.300d.txt 300 words_alpha.txt
# 300 is the dimension of the embeddings
# Only words in words_alpha.txt will pass in
If you didn't compile with Faiss, no problem, doing the same without "300 words_alpha.txt" should work, although you may want to pass it only "english words_alpha.txt" instead, as you don't need the extra data (vector representations) from "glove.42B.300d.txt".
./target/release/install_lang english words_alpha.txt
# No Faiss
Note: The datasets referenced above are external resources, and I am not responsible for the availability, accuracy, legality, or any other issues related to these external datasets. It is your responsibility to verify and review the terms of service, licenses, and any potential restrictions imposed by the original authors or providers of the datasets before downloading or using them.
Once compiled, you can find bridges between pairs of words like this:
./target/release/tyo
>>> toy typo
(typo, toy): tyo
# Loads the default language and device.
# Prints all found Typo-bridges between them.
If you compiled with Faiss and installed a language, you could use it like this:
./target/release/tyo -l english -d qwerty_ansi
>>> 100: food sleep
(waking, baking): aking
(bed, bread): bred
...
# Loads English and the qwerty_ansi device (the default), finds the 100 semantically closest words to `food` and `sleep` and prints all Typo-Bridges between them.
Without Faiss you could still do this:
./target/release/tyo -l english spanish
>>> l: friend
...
(friend, fired): fried
...
(friend, riendo): riend
...
# Loads English and Spanish with the default device and prints all bridges from "friend" to any word in English and Spanish.
You could also go the other way around, giving it a typo and letting it search for any words bridged by this typo in the loaded languages:
./target/release/tyo
>>> t: lobely
lobely -> lovely, lonely
# Loads the default language (English in this case) and device.
# Prints all words in the default language that are bridged (to some other word) by "lobely".
tyo
tries to be inclusive, to inform you about typos people may consider imaginable. It's up to you to decide which to use.
For more information enter:
./target/release/tyo --help
For a device D and a language L, a Typo-Bridge is a sequence of elements T that all could be produced by a performer using D, is not in L, and for which there are two other sequences A and B, both in L, such that T could be explained as an unsuccessful attempt to produce A or B using D, yet both A and B are estimated (by the observer) to have roughly the same likelihood to be the sequence originally intended by the performer. That is, assuming D, the observer would perceive T as unintentional, would not be able to tell which, A or B, was intended, yet recognize both A and B as likely to be.
By the above definition, if T is in the language, it is not a Typo-Bridge. We may consider it a Pseudo-Bridge and in some contexts, where T, despite being in the language, would be considered unexpected by the observer, T may play a similar role to a real Typo-Bridge.
The name Typo-Bridge suggests typographical sequences, but Typo-Bridges may appear as other sequences as well, e.g. a phonological sequence with an assumed speech production system and a given language.
Note that in Typo-Bridges errors are assumed to happen during the production and not reproduction or in observation, due to noise in the environment.
The typos considered by tyo
are one mistake away from A and one from B. Typo-Bridges may also be Sequences where the path to A and B is longer. However, tyo
would be more complicated to code and the results may be a bit too confusing for the typical human observer.
Interestingly, beyond the artistic value of Typo-Bridges, they can be used to say potentially controversial things. A bit like dog whistling, they create a shield of plausible deniability, leaving the door open for you to escape. Speaking without spaking!
Using autocorrection may eliminate Typo-Bridges, as it converts all strings into words. Sometimes however it converts typos into the wrong words, potentially leading to Pseudo-Bridges. With the emergence of powerful language models, it might seem like we are losing the room for error. It could be however, that we are simply shifting to another one. Certainly we are losing typos, the more traditional ones, but autocorrection and language models still are instruments, and like all instruments, used by an imperfect performer, a room for error is thankfully inevitable.
Yes! The most straightforward way to contribute would be to enrich the repository with more devices. Also, if you found bugs, or typos you believe tyo
should have shown in response to your query but didn't, please open an issue.
Aside from optimization and adding devices, it would be nice to enhance tyo
with a language model, so tyo
will fork your sentences while you write them, suggesting bridges based on the earlier words written in all branches.
Also, if there is an interest I may provide a SaaS, to save you downloading and installing dictionaries and models.
Copyright (C) Mongke Svoboda
Licensed under the GNU AGPLv3
You can reach me via email:
[a Typo-Bridge between "permission" and "emission" assuming ANSI keyboard with QWERTY layout] @ proton.me
I also have a PGP key: