laitimes

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

author:New Zhiyuan
How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

The loss of ancient languages is not only a loss for the academic community, but also a loss for all human civilization. MIT's newly developed system is designed to help linguists interpret forgotten "dead language."

To say that the British Museum is the most famous artifact, the Rosetta Stone definitely ranks in the top three. It stands in the window, ancient, mysterious, silent, but the dense words on the body record the history of ancient Egypt.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

At that time, Napoleon's troops were on an expedition to Egypt, and someone found this stone tablet near the Nile, which is inscribed in 3 scripts, 1 in Ancient Greek, recording the first anniversary of the young Ptolemy V's ascension to the throne as a pharaoh.

But the other 2 could not be understood, and even Napoleon tried to decipher the above text.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Later, a "linguistic genius" named Champollion lasted about 20 years and finally deciphered the other texts above. It turned out that the three languages were written about the same thing.

If Champerian had been born in modern times, perhaps his 20 years of study would soon be solved.

MIT's new research: Deciphering language without "nepotism."

Today, at least 12 languages in the world remain undeciphered. Deciphering a lost language often depends on its relationship with other languages.

Chambreals was able to decipher hieroglyphs, also because of his talent for multiple languages.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) claim to have developed a system that can decrypt lost languages without knowing their relationship to other languages.

And they also showed that their system itself could determine the relationship between languages and use it to confirm recent academic research that the Iberian language had nothing to do with Basque indeed.

Basque

it is an isolated language spoken in the Basque Country (the autonomous oblasts of Basque and Navarra in northeastern Spain, and southwestern France).

As the only isolated language in Western Europe, it is highly controversial whether Basque is related to existing languages.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Iberian

Iberian is a native language of Western Europe who, as confirmed by Greek and Roman sources, lived in the eastern and southeastern regions of the Iberian Peninsula during the pre-migration era (about 375 BC).

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Deciphering the two difficulties of "dead language"

Most undeciphered and lost languages have two characteristics, which pose a significant challenge to deciphering:

(1) The division is not detailed enough, and it is not completely divided into "characters"

(2) It is not known which "close relative" is, and the closest known language has not yet been determined

To this end, mit researchers created decoding models. Get answers by learning character embeddings based on the International Phonetic Alphabet (IPA).

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

The project builds on a paper the authors wrote last year that deciphered the ugaritic and linear language b, which takes decades to decode by humans.

In this project, both languages are known to be related to early forms of Hebrew and Greek.

This time the author challenges the unknown relationship between languages.

Get answers by learning character embeddings based on the International Phonetic Alphabet (IPA).

The algorithm learns to embed speech into a multidimensional space, and the differences in speech are reflected by the distance between the corresponding vectors. This design allows them to capture relevant patterns of language changes and represent them as computational constraints.

The resulting model can split words in an ancient language and map them to corresponding words in the relevant language.

Model Overview:

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Generate missing text from smaller units - from characters to logos, from logos to inscriptions. Character mapping is first performed on phonetic transcriptions of known languages. According to these mappings, a tag y in the known vocabulary y is converted to a tag x in the missing language according to the potential alignment variable a. Finally, all generated markers, as well as characters in unmatched spans, are concatenated to form the missing inscription.

The blue box displays the language properties associated with each level of the model

Generated graphical model of the x range: A graph model that generates span x represents:

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Unmatched characters are generated on independent, identically distributed conditions, while matched character ranges are conditioned by two hidden variables: y for known homologous characters, and a for character-level alignment between x and y

ipa embedding diagram:

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Each phoneme is first represented by a phoneme feature vector. The model first embeds each feature, and then connects all related feature embeddings to get the ipa embedding. For example, a phone [b] can be represented as a passive, stop, and labial embedded connection

Although a single tone is rarely added or removed for a given language, some tone substitutions can occur. Words with "p" in the parent language may become "b" in later languages, but due to large differences in pronunciation, it is less likely to become "k".

The proposed algorithm can assess the proximity between two languages. In fact, when testing a known language, it can even accurately identify language families.

Deepmind has long been developed to identify Greek inscriptions on stones

This isn't the only application of AI to the field of lost languages.

Deepmind has developed a system called Python that can identify patterns in 35,000 artifacts containing more than 3 million words.

It managed to guess words or characters lost between 1500 and 2600 years ago from Greek inscriptions, including stone, pottery and metal.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Damaged inscription: Decree of the Council of Athens governing the Acropolis

There are about 5615 surviving human languages, and like hieroglyphs, most of the languages that once existed are no longer in use, and dozens of them are thought to have disappeared or have not been deciphered.

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Without them, we risk losing a great deal of knowledge about the people who have used them historically. The team's goal is even bigger, they hope to be able to decipher the language in a few thousand words in the future.

About the author

jiaming creates

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

A PhD student at CSAIL and a member of the MIT NLP group. Before coming to MIT, he also did some emotional analysis and summary work at Peking University.

Reference Links:

https://venturebeat.com/2020/10/20/mit-csails-ai-revives-dead-languages-it-hasnt-seen-before/

https://news.mit.edu/2020/translating-lost-languages-using-machine-learning-1021

http://people.csail.mit.edu/j_luo/assets/publications/decipherunsegmented.pdf

How many decades will it take to decipher the Rosetta Stone? Dr. MIT of Chinese descent developed a new system to quickly decrypt it

Read on