laitimes

Retrieval Augmentation Technology of Knowledge Graph | TF131 review

author:CCFvoice

On May 7, the 131st session of CCF TF "Knowledge Graph Retrieval and Augmentation Technology" was successfully held in the form of an online conference. This event was planned and presented by CCF TF Knowledge Graph SIG, and invited the research leaders of 360 Artificial Intelligence Research Institute, Tencent AI Lab, Alibaba Tongyi Lab, NetEase Youdao QAnything and other Internet companies to share the cutting-edge development of retrieval augmentation technology, the opportunities and challenges of the integration of knowledge graph and retrieval augmentation technology, as well as typical cases and best practices.

The expert reports related to CCF TF activities are included in the CCF Digital Library [TF Album], welcome to press and hold to identify and watch the wonderful sharing. This event report will also be included in the near future, welcome to review!

The 131st issue of CCF TF "Knowledge Graph Retrieval and Augmentation Technology" was chaired by Wang Haofen, Chairman of CCF TF Knowledge Graph SIG. In the introduction of the event, Wang Haofen introduced the organizational structure, purpose and previous activities of CCF TF.

Retrieval Augmentation Technology of Knowledge Graph | TF131 review

"Document Understanding and Knowledge Base Construction Practice in RAG Implementation"

Liu Huanyong, a senior algorithm expert at the 360 Artificial Intelligence Research Institute, shared the implementation of RAG technology in 360 and the construction of enterprise-level knowledge base. First, the characteristics of the knowledge Q&A task and the application scenarios of document intelligence were introduced, and the standard process of RAG Q&A was analyzed. Secondly, the difficulties and bottlenecks in each link of the RAG process are deeply analyzed: complex and diverse layout, complex content, diverse organization, multiple factors affecting the effect of content recall, and difficulty in constructing supervised samples. Finally, the construction method of the knowledge base system is introduced, including the extraction of document levels, the establishment of databases by level, tags, document multimodal model KOSMOS, document-specific plate formats, and the extraction of table, formula and chart information.

Retrieval Augmentation Technology of Knowledge Graph | TF131 review
Retrieval Augmentation Technology of Knowledge Graph | TF131 review

Retrieval Enhancement Generation? Search is generated! 》

Cai Deng, a senior researcher at Tencent AI Lab, shared his research on retrieval augmentation models, which introduced the fusion paradigm of retrieval and generation. Firstly, the model mechanism of the current generative formula is reviewed, and then the principle of the CoG method is introduced, which integrates the process of retrieval and generation, and in the process of generating from left to right, the relevant phrase is retrieved from the memory database instead of the current mainstream generative model to predict the paradigm of token. The CoG method was validated on multiple downstream tasks, demonstrating better accuracy, interpretability, and scalability than the baseline.

Retrieval Augmentation Technology of Knowledge Graph | TF131 review
Retrieval Augmentation Technology of Knowledge Graph | TF131 review

《GTE-Embedding/Ranking:统一文本表示与排序模型》

Yanzhao Zhang, algorithm engineer at Alibaba Tongyi Lab, shared his research work on unified text representation and sorting models. This sharing sorted out the development path of the Embedding model, and focused on the training process of the GTE-Embedding model. Firstly, in the pre-training stage, the LLM training and optimization technology is reused, and the Encoder-Only base with multi-language/long text support is reused. Secondly, the text representation ability of the basic model was improved through weakly supervised pre-training. It is then trained again under high-quality supervised data. Subsequently, Yanzhao Zhang introduced the technical details of the GTE-Rerank model, including the training process and the design of the loss function, and finally discussed the comparison between the current RAG and the LLM in the long future.

Retrieval Augmentation Technology of Knowledge Graph | TF131 review
Retrieval Augmentation Technology of Knowledge Graph | TF131 review

"Youdao QAnything's Landing Experience Sharing"

Lin Hui, technical director of NetEase Youdao, shared NetEase Youdao's open-source RAG engine QAnything and RAG's landing experience. The sharing first reviewed the accumulation of NetEase Youdao in OCR and NMT technologies and the evolution history of QAnything, the rapid iteration of related technologies and products, and the technology has experienced from image translation, document translation, to input + understanding based on large models. tasks and the rapid evolution from document Q&A to speech assistant, Youdao speed reading, AI college planner and Xiao P teacher. Lin Hui then focused on the key modules of QAything (document parsing, Embedding/Rerank, LLM, and VectorDB) and the main processes (Query understanding, search, relevance sorting, and LLM generation). Finally, Lin Hui focused on the landing scenarios of RAG, and deeply analyzed a number of key issues in the RAG process, such as RAG versus fine-tuning, and RAG versus long language models.

Retrieval Augmentation Technology of Knowledge Graph | TF131 review
Retrieval Augmentation Technology of Knowledge Graph | TF131 review
Retrieval Augmentation Technology of Knowledge Graph | TF131 review

In the interactive session, Liu Huanyong and Lin Hui analyzed and summarized the current open-source RAG framework, pointing out that the current framework is more about seeking common ground while reserving differences, and there are similarities in the early stage of development, and there will be more differentiation in the later stage. Zhang Yanzhao answered whether large language models can be embedding while maintaining the ability to generate models. Finally, the experts discussed the balance between the plug-in of large model knowledge and the internalization of knowledge.

Wang Haofen mentioned in his conclusion that he believes that in the era of large models, the knowledge graph has entered a more generalized research stage, which is not limited to the traditional triple, and the management and utilization of knowledge is still an important topic worthy of research.

Upcoming Events

Instalments date SIG topic format
TF132 May 16th Architecture Cloud-native architecture in the AI era online
TF133 May 23rd Intelligent front-end The front end of the intelligent era: new productivity and new experience online
TF134 June 2nd Smart manufacturing Discussion on the application scenarios of large models in industrial intelligence online

About CCF TF

Founded in June 2017, CCF TF Tech Frontier aims to provide a top-level communication platform for engineers, better serve computer professionals in the business world, help professional and technical professionals in the business community develop their careers, achieve normalized cooperation and development by building a platform, and promote technical exchanges between enterprises, academia and enterprises. At present, 12 SIGs (Special Interest Groups) have been established, including knowledge graph, data science, intelligent manufacturing, architecture, security, intelligent equipment and interaction, digital transformation and enterprise architecture, algorithm and AI, intelligent front-end, engineer culture, R&D efficiency, and quality engineering, to provide rich technical front-line content sharing.

Join the CCF

Join CCF members to enjoy more value-added activities and make a good investment in your own technology growth.

Click on the link to learn more about membership benefits:

CCF Individual Membership Benefits CCF Corporate Membership Benefits

Identify or scan the QR code to join

Welcome to pay attention to the official account of CCFTF and CCF Business Headquarters, and the excitement will be opened one after another!

Follow the CCFTF for TF activity information

Pay attention to the CCF business headquarters to book a meeting venue at a discount

CCF Recommended

【Articles】

  • Cloud-native architecture in the AI era | May 16 TF132 registration
  • The 2024 TF event is officially launched! Unlock the annual plan with one click