laitimes

Real-time tracking of scientific research trends丨Microsoft and Peking University proposed MusicAgent, 10.19 selected new papers

author:AMiner scientific and technological intelligence mining

As a researcher, you need to search and browse a large number of academic literature every day to obtain the latest scientific and technological progress and research results. However, traditional retrieval and reading methods can no longer meet the needs of researchers.

AMiner AI, a literature knowledge tool that integrates search, reading, and knowledge Q&A. Help you quickly improve the efficiency of searching and reading papers, and obtain the latest research trends in the field, so that scientific research work is more comfortable.

Real-time tracking of scientific research trends丨Microsoft and Peking University proposed MusicAgent, 10.19 selected new papers

Combined with the cutting-edge dynamic subscription function, arXiv selects the hot new papers of the day to form a paper review, so that everyone can quickly understand the cutting-edge news.

If you want to have an in-depth conversation about a paper, you can copy the link to the paper directly on the browser or go directly to the AMiner AI page: https://www.aminer.cn/chat/g/explain

List of Featured New Papers for October 19, 2023:

1.Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

The paper introduces a new framework called Self-RAG to improve the quality and authenticity of language models through self-reflection. Existing large language models (LLMs) often produce responses that contain factual errors by relying only on their parameterized knowledge. Retrieval Enhanced Generation (RAG) is a temporary way to enhance LLMs by retrieving relevant knowledge to reduce this problem. However, a blanket search and inclusion of a fixed number of retrieved paragraphs, regardless of whether the search is necessary or not, reduces the flexibility of the LLM or results in an unhelpful response. The Self-RAG framework enhances the quality peace factual nature of LLM through retrieval and self-reflection. The framework trains a single arbitrary LLM to adaptively retrieve paragraphs as needed, and uses special markup (called reflection markers) to generate and reflect on the retrieved passages and their own generation. Generating reflection markers makes LLM controllable during the inference phase, adapting its behavior to diverse task requirements. Experimental results show that Self-RAG (7B and 13B parameters) significantly outperforms state-of-the-art LLMs and retrieval-enhanced models in diverse tasks. Specifically, Self-RAG outperforms ChatGPT and retrieval-enhanced Llama2-chat on open-domain question answering, reasoning, and fact-verification tasks, and it makes significant progress in improving the accuracy of long-form generation and citation accuracy relative to these models.

https://www.aminer.cn/pub/65309159939a5f4082843d1b?f=toutiao

2.Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

The paper introduces a general framework called Progressive3D for generating text-to-3D content with complex semantic hints. Existing text-to-3D generation methods are able to achieve impressive 3D content generation capabilities due to advances in image diffusion models and optimization strategies. However, these methods often struggle to produce correct 3D content when dealing with complex semantic cues that describe multiple interacting objects with different properties. The Progressive3D framework breaks down the entire generation process into a series of local progressive editing steps to create accurate 3D content and constrain content changes only in areas determined by user-defined region prompts. In addition, we propose an overlapping semantic component suppression technique to encourage the optimization process to pay more attention to semantic differences between prompts. Experimental results show that the Progressive3D framework is capable of generating accurate 3D content for prompts with complex semantics and is applicable to a variety of text-to-3D methods driven by different 3D representations.

https://www.aminer.cn/pub/65309159939a5f4082843e31?f=toutiao

3.MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

The paper introduces an AI agent called MusicAgent for music understanding and generation. It covers many music processing tasks, such as generation tasks (such as timbre synthesis) and understanding tasks (such as music classification). Since music data representation and model applicability vary greatly between various tasks, it is very difficult for developers and enthusiasts to master all these tasks to meet their needs in music processing. Therefore, it is necessary to establish a system to organize and integrate these tasks to help practitioners automatically analyze their needs and call the right tools as solutions to meet their requirements. Influenced by the latest success of Large Language Models (LLMs) in task automation, we developed a system called MusicAgent that integrates many music-related tools and autonomous workflows to address user needs. Specifically, we built the 1) toolset to collect tools from a variety of sources, including Hugging Face, GitHub, and Web API, among others. 2) An autonomous workflow empowered by LLM (e.g. ChatGPT) to organize these tools and automatically break down user requests into multiple subtasks and invoke the corresponding music tools. The main goal of the system is to free users from the complexity of AI-music tools and allow them to focus on the creative side. By giving users the freedom to easily combine tools, the system provides a seamless and rich music experience.

https://www.aminer.cn/pub/65309159939a5f4082843ede?f=toutiao

4.Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs

This paper introduces a new method called Multi-view Contrastive Learning for knowledge graph Entity Typing (MCLET) for inferring possible types of entities in a knowledge graph. Existing knowledge graph entity type inference methods focus primarily on how neighbors and types around entities are encoded into their representation, but ignore the semantic knowledge provided by how types can be clustered together. The MCLET consists of three modules: i) Multi-Perspective Generation and Encoder Module for encoding structured information from entity-type, entity-clustering, and clustering-type views; ii) cross-perspective comparative learning modules to encourage different views to collaborate to improve view-specific representations of entities and types; iii) Entity Type Prediction module, which combines multi-head attention and expert hybrid strategies to infer missing entity types. Experimental results show that MCLET is very powerful relative to state-of-the-art methods.

https://www.aminer.cn/pub/65309159939a5f4082843f13?f=toutiao

5.A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge

This paper provides a comprehensive investigation of vector databases, including storage and retrieval techniques and challenges. Vector databases are used to store high-dimensional data that cannot be described by traditional database management systems. Although there are not many articles about existing or new vector database architectures, the approximate nearest neighbor search problem behind vector databases has been studied for a long time, and many related algorithm articles can be found in the literature. This article attempts to provide a comprehensive review of the algorithms involved to provide readers with a comprehensive understanding of this thriving field of research. Our framework classifies these studies based on methods that solve ANNS problems, based on hash, tree, graph, and quantization methods. We then outline the current challenges faced by vector databases. Finally, we outline how to combine vector databases with large language models to provide new possibilities.

https://www.aminer.cn/pub/65309159939a5f4082843ddf?f=toutiao

6.Masked Pretraining for Multi-Agent Decision Making

This paper explores the problem of using masking pretraining in multi-agent decision making. Recently, significant progress has been made in building a single expert agent with zero-shot capability in decision-making. However, scaling this capability to multi-agent scenarios presents challenges. Most current jobs struggle with zero-shot capabilities, two challenges unique to multi-agent setups: the mismatch between centralized pre-training and decentralized execution, and variations in the number of agents and action space, which make it difficult to create generic representations in different downstream tasks. To overcome these challenges, we propose a masking pre-training framework (MaskMA) for multi-agent decision making. This model based on deformer architecture adopts a mask-based collaborative learning strategy, which is suitable for decentralized execution of partial observations. In addition, MaskMA integrates a generalizable action representation by dividing the action space into self-information-related actions and other entity-related actions. This flexibility allows MaskMA to handle tasks with different numbers of agents and therefore different action spaces. Numerous experiments on SMAC have shown that with decentralized execution, MaskMA can achieve an impressive zero-shot win rate of 77.8% on 60 unseen test maps on 11 training maps pre-trained on one model, while also excelling in other types of downstream tasks such as various strategy collaborations and ad-hoc team games.

https://www.aminer.cn/pub/65309159939a5f4082843e70?f=toutiao

END

We have added the "Daily Selection of New Papers" feature on the homepage of the AMiner website, you can click "Subscribe" and "Join Knowledge Base" to get all paper information!

Real-time tracking of scientific research trends丨Microsoft and Peking University proposed MusicAgent, 10.19 selected new papers

Check out the new papers of the day: AMiner - AI Empowering Science and Technology Intelligence Mining - Academic Search - Paper Search - Paper Patent - Literature Tracking - Scholar Portrait

Read on