laitimes

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Bowen is from The Temple of Convi

Qubits | Official account QbitAI

On the cover of the latest issue of Nature, AI is back in the center of the stage, this time in deciphering ancient scripts.

This is a transformer-based approach that was jointly developed by DeepMind, Google, Oxford University and many other research institutions.

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

When repairing damaged text alone, this method achieves 62% accuracy.

In practice, historians alone decipher an ancient Greek stele with only 25 percent accuracy, but after using this method, it has improved by nearly 3 times to 72 percent.

Not only can the text be restored, but this method also has a 71% accuracy rate on the task of geographical attribution, and it can also accurate the date of writing of ancient scripts to less than 30 years.

At present, this method has caused a lot of discussion:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

There is now a web version that can parse ancient Greek characters online, and the architecture method has been open sourced.

Transformer deciphers ancient scripts

It is an architecture called Ithaca, named after the Greek island in Homer's epic poem Odyssey.

The attention mechanism in the Ithaca architecture can understand the position of each part of the input text by concatenating the single characters of the input, the representation and sequential positions of the complete words, and ultimately weigh the impact of different inputs on the model decision-making process.

The complete architecture consists of multiple Transformer blocks, each of which outputs a sequence of Processed Representations, the length of which is the number of input characters.

Its input is then passed to three task heads responsible for text repair, geographic attribution, and time attribution, each consisting of a feed-forward neural network dedicated to the training of their respective tasks.

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

In the text repair task, Ithaca provides 20 predictors of the results of the analysis arranged by probability:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

In determining geographic attribution, the input text is classified according to the 84 regions in paleohistory, and the possible regional prediction categories are presented through maps and histograms:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

The date attribution task is also shown by a histogram of the distribution forecast.

As shown in the figure below, the 10-year group of dates from 300-250 BC is represented as 5 ranges with the same probability, while the inscriptions from 305 BC will be assigned to the decadal group of 300-310 BC with a 100% probability:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

AI assistant for historians

The researchers compared Ithaca with different methods of cracking ancient texts, such as historians, similar AI methods Pythonia, and Ithaca's cooperation with historians.

The lower the word error rate (CER), the better, and it is Ithaca's word error rate and accuracy rate are the best when it comes to word repair tasks, and the effect will be improved again when working with historians.

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

At the end of the paper, the researchers say that the research methods apply to all disciplines related to ancient texts, such as manuscripts, numismatics, and papyrus, as well as to any language in ancient and modern times.

This method is now in practical use, for example, in the date confirmation of the inscription of an important decree issued during the Athenian period, which historians previously believed to have been written before 446/5 BC.

Ithaca, along with historians, updated this date to 424/3 BC:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Ithaca now offers a way to try it out online, log in to the official website, type in ancient Greek inscriptions in boxes, mark missing characters as dashes (-), and mark predicted characters as question marks (?).

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Each query can predict up to 10 consecutive or non-contiguous question marks, and clicking on the query will display missing characters in the text below and attribute them to the original place and time:

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

About the author

The research was developed in collaboration with DeepMind, Ca' Foscari University of Venice, Harvard University, Athens University of Economics and Business, and several AI teams from Google.

The paper has two co-authored works, of which Annis Assael is a researcher in DeepMind's AI department, both of whom graduated from Oxford University with a master's degree and a ph.D., and is also one of Forbes' "30 Outstanding European Scientists Under 30":

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Thea Sommerschield is a historian who is currently a fellow at the Faculty of Humanities at the University of Foscari in Venice and a fellow at the Center for Greek Studies at Harvard University, where his research focuses on the application of machine learning to the study of written culture in the ancient Mediterranean.

Thesis: https://www.nature.com/articles/s41586-022-04448-z

Open Source Links: https://github.com/DeepMind/ithaca

Online trial: https://ithaca.DeepMind.com/?job=eyJyZXF1ZXN0SUQiOiJmYzUwNGY0NWNhZjJjZWMxZjIxZDA4YWVjNTdkMjEzMSIsImF0dHJpYnV0aW9uIjp0cnVlLCJyZXN0b3JhdGlvbiI6dHJ1ZX0%3D

Read on