Google Docs can now automatically generate text summaries!

2022-03-25 12:59:49

Reports from the Heart of the Machine

Editors: Chen Ping, Du Wei

While convenient, unfortunately, Google Docs' automatic digest generation feature is only available to enterprise customers. We hope that individual users can also use it as soon as possible.

For many of us, we need to deal with a large number of files every day. When receiving a new document, we usually want the document to contain a brief summary of the main points so that users can get the fastest understanding of the file contents. However, writing a document summary is a challenging and time-consuming task.

To address this, Google announced that Google Docs can now automatically generate suggestions to help document writers create content summaries. This is achieved through a machine learning model that understands text content and generates 1-2 sentences of natural language text descriptions. The document writer has full control over the document, and they can receive all of the model-generated recommendations, or make the necessary edits to the recommendations to better capture the document summary, or ignore them altogether.

Users can also use this feature for a higher level of understanding and browsing of documents. While all users can add summaries, automatically generated recommendations are currently only available to Google Workspace enterprise customers (Google Workspace is a suite of cloud productivity and collaboration software tools and software that Google offers on a subscription basis). Based on grammar suggestions, smart writing, and autocorrect, Google sees this as another valuable research to improve written communication in the workplace.

As shown in the following illustration: When document summary suggestions are available, a blue summary icon appears in the upper-left corner. Document writers can then view, edit, or ignore the suggested document summary.

Model details

Over the past five years, particularly with the introduction of Transformer and Pegasus, ML has had a huge impact on natural language understanding (NLU) and natural language generation (NLG).

However, generating abstract text excerpts requires addressing the language understanding and generation tasks of long documents. The more common approach is to combine NLU and NLG, which uses sequence-to-sequence learning to train an ML model where the input is a document word and the output is a summary word. The neural network then learns to map the input token to the output token. Early applications of the sequence-to-sequence paradigm used RNNs for encoders and decoders.

The introduction of Transformers provides a promising alternative to RNNs because Transformers uses self-attention to provide better modeling of long input and output dependencies, which is critical in documentation. Still, these models require a lot of manually labeled data to train adequately, so using Transformer alone is not enough to significantly improve document summary SOTA performance.

Pegasus' research takes this idea a step further by proposing in the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization, abstracting abstracts by introducing a pre-trained target customization. In Pegasus pre-training, also known as GSP (Gap Sentence Prediction), complete sentences in unlabeled news messages and web documents are masked out of the input, and the model needs to reconstruct them from sentences that are not masked. In particular, GSPs attempt to mask sentences that are critical to the document through different heuristics. The goal is to make the pre-training as close as possible to the summary task. Pegasus achieved SOTA results on a different set of summary datasets. However, many challenges remain in applying this research advance to products.

Google Docs can now automatically generate text summaries!

The PEGASUS infrastructure is a standard Transformer-decoder.

Apply recent research advances to Google Docs

data

The self-supervised pre-trained ML model has general language understanding and generation capabilities, but the next fine-tuning phase is critical for the model to adapt to the application area. Google fine-tuned earlier versions of the model in a documentation corpus where manually generated summaries were consistent with typical use cases. However, some earlier versions of the corpus were inconsistent and changed significantly because they contained many types of documentation and different methods of writing abstracts, such as academic abstracts that were often long and detailed, while administrative summaries were short and powerful. This makes the model easily confused because it is trained on a variety of types of documents and abstracts, making it difficult to learn how to relate to each other.

Fortunately, one of the key findings in Google's open-source Pegasus library (for automatically generating article summaries) is that an effective pre-training phase requires less supervisory data during the fine-tuning phase. Some summary generation benchmarks require only 1,000 Pegasus fine-tuning examples to rival transformer baseline performance that requires 10,000+ supervised examples, suggesting that we can focus on model quality rather than quantity.

Google carefully cleaned and filtered the fine-tuning data to include training examples that were more consistent and more representative of a coherent summary. Although the amount of training data is reduced, a higher quality model is generated. As with recent work in other areas such as dataset distillation, we can learn the important lesson that smaller, high-quality datasets are superior to larger, high-variance datasets.

serve

Once the high-quality model was trained, Google turned to solving the challenges of serving the model in production. The Transformer version of the Encoder-Decoder architecture is a mainstream method for training models for summary generation of equal sequence-to-sequence tasks, but this method is inefficient and impractical when it comes to serving services in real-world applications. The inefficiency is mainly due to the Transformer decoder, which uses autoregressive decoding to generate output summaries on a token-by-token basis. When the digest is longer, the decoding process becomes slow because the decoder processes all the tokens generated before each step. Recurrent neural networks (RNNs) are more efficient decoding architectures, thanks to the fact that they do not exert self-attention on previous tokens as the Transformer model did.

Google uses knowledge distillation (the process of migrating knowledge from larger models to smaller and more efficient models) to distill the Pegasus model into a hybrid architecture that includes a Transformer encoder and an RNN decoder. To improve efficiency, Google has also reduced the number of RNN decoder layers. The resulting model has significant improvements in latency and memory footprint, while the quality is still comparable to the original model. To further improve latency and user experience, Google uses the TPU as a digest generation model service, which achieves significant acceleration and allows a single machine to handle more requests.

Ongoing challenges

While Google is excited about the progress it has made so far, it will continue to tackle some of the following challenges:

Document coverage: Developing a set of documents during the fine-tuning phase is difficult because of significant differences between documents. The same challenges exist in the reasoning phase. In addition, some documents created by Google users, such as meeting notes, recipes, lesson plans, and resumes, are not suitable for summary or difficult to summarize.

Evaluation: Abstract summaries need to capture the essence of the document, be fluid, and be syntactically correct. A particular document may have many abstracts that can be considered correct, and different users may also like different abstracts. This makes it difficult to evaluate summaries using only automated metrics, and user feedback and usage statistics are critical for Google to understand and continuously improve the quality of the model.

Long document: The model is the most difficult to generate a summary of a long document because it is more difficult to capture all the main points and abstract (summarize) in one abstract. Additionally, the memory footprint increases significantly during training and service of long documents. However, long documents can be most useful for the task of automatically generating summaries for models because they can help document writers get ahead of this tedious task. Google hopes to apply the latest ML advances to better address this challenge.

https://ai.googleblog.com/2022/03/auto-generated-summaries-in-google-docs.html

Google Docs can now automatically generate text summaries!

Read on

Artificial intelligence top will be the truth! The popularity of deep learning is high, and the admission rate of papers is record low: intelligent east-west internal reference

Autoencoder 26 pages review paper: Concepts, Diagrams, and Applications

Can the original car halogen headlights be directly changed to LED headlights? Don't understand these, don't mess with the headlights!

Win11 comes with a player that is actually super easy to use! Teach you a few tricks to play with it

Direct hit 3·15 evening party: multi-brand electric bicycles blatantly violated the law to decode and speed up The green source, calves, etc. were named

The 315 party exposed the inside story of electric bicycle speeding: Brands such as Mavericks, Xinri, and Hello were named

Direct hit 315 evening: multi-brand electric bicycles blatantly illegal decoding speed up Green source, mavericks and so on were named

What languages, frameworks, models are kaggle gods using? Here's a detailed statistic

LSTM is not "dead" yet!

Rockchip: Part RK3588 has already started production in small batches

Why is observation | deep learning powerful? Proper neural network architecture + big data

"Deep Learning Attention Mechanisms" TKDE 2022 Research Review

How to explain the decisions made by AI? An article sorts out the application scenarios and interpretability of algorithms

The new multimodal king ascends the throne! OpenAI Releases DAL · E 2, generate the image "which to play which"

Extreme HIFI feeling, experience the 10,000 yuan decoder TEAC-NT-505-X

Nature Comm. Roundup: Why Deep Learning Can Shine in Life Sciences

Wu Jun, a well-known computer expert: ChatGPT is not a new technological revolution and does not bring any new opportunities

In the face of ChatGPT's global popularity, how should China's AI debut?

Silicon Valley Big L5: Survivors of Winter

Why can't Europe create a mobile operating system that can compete with Android and iOS?

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Samsung "backstabbed" Google

AI competition is intense, Google makes another big move! Merger of DeepMind and Google Brain

By merging DeepMind and Google Brain, Google ushered in a new era of AI

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Nothing has been achieved in AI research and development, and you still lay off employees while sending yourself "red envelopes"? Google's CEO made nearly $1.6 billion last year

Google CEO Pichai: Artificial intelligence occupies the C position Search is important but no longer the core business

Apple and Google led the development of draft tracking industry specifications to prevent abuse of features

After sparking outrage in Brazil, Google removed the Slave Simulator game

The Queens rights sold for more than $1 billion, and EXO members terminated their contracts with SM Entertainment

Can't stay 3 days a week, the Amazon CEO was forced to say ruthlessly: If you don't go back to the office, you will leave!

From Seq2Seq to Attention: Revolutionizing the Attention Mechanism of Sequence Modeling is a solution to the problem of context compression, short-term memory limitations, and bias in neural machine translation models

The first open-back headphones with the sound core come with an optional neck brace. Anker Soundcore has launched two new open-back earbuds, the AeroFit and AeroFitPro

Ghostwire Tokyo needs an effective solution to share with the VP9 video codec

DDColor: AI image colorization tool, excellent image coloring model, support for dual decoders!

#微头条首发挑战赛#北约国家全部自废武功, raised an American. Europe is not as weak as you think! Britain invented steam catapult and radar, which were given to the United States free of charge after the war

IP decoder for Python penetration testing introductory