laitimes

Google Docs can now automatically generate text summaries!

Reports from the Heart of the Machine

Editors: Chen Ping, Du Wei

While convenient, unfortunately, Google Docs' automatic digest generation feature is only available to enterprise customers. We hope that individual users can also use it as soon as possible.

For many of us, we need to deal with a large number of files every day. When receiving a new document, we usually want the document to contain a brief summary of the main points so that users can get the fastest understanding of the file contents. However, writing a document summary is a challenging and time-consuming task.

To address this, Google announced that Google Docs can now automatically generate suggestions to help document writers create content summaries. This is achieved through a machine learning model that understands text content and generates 1-2 sentences of natural language text descriptions. The document writer has full control over the document, and they can receive all of the model-generated recommendations, or make the necessary edits to the recommendations to better capture the document summary, or ignore them altogether.

Users can also use this feature for a higher level of understanding and browsing of documents. While all users can add summaries, automatically generated recommendations are currently only available to Google Workspace enterprise customers (Google Workspace is a suite of cloud productivity and collaboration software tools and software that Google offers on a subscription basis). Based on grammar suggestions, smart writing, and autocorrect, Google sees this as another valuable research to improve written communication in the workplace.

As shown in the following illustration: When document summary suggestions are available, a blue summary icon appears in the upper-left corner. Document writers can then view, edit, or ignore the suggested document summary.

Model details

Over the past five years, particularly with the introduction of Transformer and Pegasus, ML has had a huge impact on natural language understanding (NLU) and natural language generation (NLG).

However, generating abstract text excerpts requires addressing the language understanding and generation tasks of long documents. The more common approach is to combine NLU and NLG, which uses sequence-to-sequence learning to train an ML model where the input is a document word and the output is a summary word. The neural network then learns to map the input token to the output token. Early applications of the sequence-to-sequence paradigm used RNNs for encoders and decoders.

The introduction of Transformers provides a promising alternative to RNNs because Transformers uses self-attention to provide better modeling of long input and output dependencies, which is critical in documentation. Still, these models require a lot of manually labeled data to train adequately, so using Transformer alone is not enough to significantly improve document summary SOTA performance.

Pegasus' research takes this idea a step further by proposing in the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization, abstracting abstracts by introducing a pre-trained target customization. In Pegasus pre-training, also known as GSP (Gap Sentence Prediction), complete sentences in unlabeled news messages and web documents are masked out of the input, and the model needs to reconstruct them from sentences that are not masked. In particular, GSPs attempt to mask sentences that are critical to the document through different heuristics. The goal is to make the pre-training as close as possible to the summary task. Pegasus achieved SOTA results on a different set of summary datasets. However, many challenges remain in applying this research advance to products.

Google Docs can now automatically generate text summaries!

The PEGASUS infrastructure is a standard Transformer-decoder.

Apply recent research advances to Google Docs

data

The self-supervised pre-trained ML model has general language understanding and generation capabilities, but the next fine-tuning phase is critical for the model to adapt to the application area. Google fine-tuned earlier versions of the model in a documentation corpus where manually generated summaries were consistent with typical use cases. However, some earlier versions of the corpus were inconsistent and changed significantly because they contained many types of documentation and different methods of writing abstracts, such as academic abstracts that were often long and detailed, while administrative summaries were short and powerful. This makes the model easily confused because it is trained on a variety of types of documents and abstracts, making it difficult to learn how to relate to each other.

Fortunately, one of the key findings in Google's open-source Pegasus library (for automatically generating article summaries) is that an effective pre-training phase requires less supervisory data during the fine-tuning phase. Some summary generation benchmarks require only 1,000 Pegasus fine-tuning examples to rival transformer baseline performance that requires 10,000+ supervised examples, suggesting that we can focus on model quality rather than quantity.

Google carefully cleaned and filtered the fine-tuning data to include training examples that were more consistent and more representative of a coherent summary. Although the amount of training data is reduced, a higher quality model is generated. As with recent work in other areas such as dataset distillation, we can learn the important lesson that smaller, high-quality datasets are superior to larger, high-variance datasets.

serve

Once the high-quality model was trained, Google turned to solving the challenges of serving the model in production. The Transformer version of the Encoder-Decoder architecture is a mainstream method for training models for summary generation of equal sequence-to-sequence tasks, but this method is inefficient and impractical when it comes to serving services in real-world applications. The inefficiency is mainly due to the Transformer decoder, which uses autoregressive decoding to generate output summaries on a token-by-token basis. When the digest is longer, the decoding process becomes slow because the decoder processes all the tokens generated before each step. Recurrent neural networks (RNNs) are more efficient decoding architectures, thanks to the fact that they do not exert self-attention on previous tokens as the Transformer model did.

Google uses knowledge distillation (the process of migrating knowledge from larger models to smaller and more efficient models) to distill the Pegasus model into a hybrid architecture that includes a Transformer encoder and an RNN decoder. To improve efficiency, Google has also reduced the number of RNN decoder layers. The resulting model has significant improvements in latency and memory footprint, while the quality is still comparable to the original model. To further improve latency and user experience, Google uses the TPU as a digest generation model service, which achieves significant acceleration and allows a single machine to handle more requests.

Ongoing challenges

While Google is excited about the progress it has made so far, it will continue to tackle some of the following challenges:

Document coverage: Developing a set of documents during the fine-tuning phase is difficult because of significant differences between documents. The same challenges exist in the reasoning phase. In addition, some documents created by Google users, such as meeting notes, recipes, lesson plans, and resumes, are not suitable for summary or difficult to summarize.

Evaluation: Abstract summaries need to capture the essence of the document, be fluid, and be syntactically correct. A particular document may have many abstracts that can be considered correct, and different users may also like different abstracts. This makes it difficult to evaluate summaries using only automated metrics, and user feedback and usage statistics are critical for Google to understand and continuously improve the quality of the model.

Long document: The model is the most difficult to generate a summary of a long document because it is more difficult to capture all the main points and abstract (summarize) in one abstract. Additionally, the memory footprint increases significantly during training and service of long documents. However, long documents can be most useful for the task of automatically generating summaries for models because they can help document writers get ahead of this tedious task. Google hopes to apply the latest ML advances to better address this challenge.

https://ai.googleblog.com/2022/03/auto-generated-summaries-in-google-docs.html

Read on