laitimes

Chen Baoya Chen Yue: The Cognitive Reduction Mode of Human Language Acquisition: Starting from the Speech Reduction Mode of ChatGPT

author:Ancient
Chen Baoya Chen Yue: The Cognitive Reduction Mode of Human Language Acquisition: Starting from the Speech Reduction Mode of ChatGPT

Abstract:Although ChatGPT, a large language model of linguistic artificial intelligence, has made great progress, the philosophical Turing and Searle controversy continues. However, ChatGPT's ability to generate brand-new sentences that conform to grammar must have restored language units (tokens) and rules, solving the long-standing problem of natural language understanding in artificial intelligence, which is an important turning point. ChatGPT's learning model relies on powerful computing power and the computer's massive storage capacity, which can be collectively referred to as strong computing power. In contrast, the human brain has only a weak ability to store and calculate. It is precisely because of the limitation of weak computing power that the language learning of the human brain cannot completely follow the language learning mode of ChatGPT. The human brain restores finite units and rules through experience-based intimate activities to generate new sentences. ChatGPT currently adopts the verbal learning mode, rather than the experience-based pro-knowledge learning mode, and the future large language model may extend the pro-knowledge learning mode to truly simulate the human acquisition of pro-knowledge reduction mode. At that time, it may be said that robots truly understand natural language, and the philosophical Turing and Searle dispute may be resolved.

1. Origin: The dispute between Turing and Searle ChatGPT's powerful function in natural language understanding has attracted widespread attention. One of the important issues that has attracted attention is whether ChatGPT is able to think like a human. Turing proposed an imitation game in "Computing Machines and Intelligence", which was later called the "Turing test". The basic idea is that in a state where Subject C, Subject Machine A, and Subject B are all isolated from each other, Subject C asks various questions, Subject A and Subject B answer the questions, and if Subject C cannot distinguish between A and B, Turing thinks that Machine A can think. The Turing test itself is not clear enough, because many questions cannot be answered by humans, and machines are able to answer them. For example, if you ask what is 6 cents, it should be a robot that can give the answer, so it is actually easy to distinguish between humans and machines. However, Turing's basic idea is clear, as long as the machine can do most of the work that a human can do, the machine can be said to be able to think. As far as general questions are concerned, ChatGPT, which is based on GPT-3.5, is basically able to complete the Turing test, and it can be said that it has met the thinking conditions mentioned by Turing. Can it be said that ChatGPT can think? In the 80s of the 20th century, Searle raised the question of the Chinese room for the Turing test in "Mind, Brain and Program". In a nutshell, if the robot in the Turing test is replaced by an English-speaking person who does not know Chinese, the English-speaking person can answer in Chinese sentences based on questions based on Chinese sentences with the help of tools such as manuals written in English about Chinese. However, this does not mean that the machine understands Chinese, nor does it prove that the machine understands Chinese. Serge's Chinese room is known as a thought experiment. In the Chinese room experiment, English speakers did not understand Chinese, but could output Chinese sentences. Searle's point is that just because a machine can output a sentence doesn't mean that the machine understands the sentence. In Searle's thought experiment, English-speaking people understand English, and the output of Chinese sentences is manipulated by linguists, even though they are not Chinese-speaking. The question now is, ChatGPT completes Chinese and English translations, generates new sentences and texts, and does not need the operation of linguists, does this count as ChatGPT can understand language? 2. ChatGPT linguistic method: Based on the distribution theory of speech and knowledge, ChatGPT currently has many abilities that cannot be compared with humans, such as mathematical proof ability. Some of ChatGPT's conversational texts can feel inexplicable. But what is certain is that every new sentence that ChatGPT says conforms to grammatical and semantic rules, and it will not say a grammatical violation of expressions such as "meet the teacher", nor will it say a statement that violates the rules of metasemantics such as "drink steak". This proves that ChatGPT must have restored the linguistic units and rules, otherwise ChatGPT would not be able to generate new sentences that conform to the semantic rules. This is an important turning point in natural language processing, and until then, getting machines to speak the correct sentences like humans has been a difficult problem in computational linguistics. ChatGPT is a large language model with two important foundations, one is the mathematical model of artificial neural networks, and the other is big data. Artificial neural networks, also referred to as neural networks, are based on the principle of imitating human brain neural networks for nonlinear regression calculations and automatic prediction models. This neural network has many layers, and these layers are deeply hidden, so the process of automatic modeling is also called deep learning. ChatGPT's neural network is made up of dozens of network layers, each of which is a transformer. Converters are crucial, and they were proposed by Vaswani et al. in their foundational paper on artificial intelligence, "Attention is all you need", which effectively solves the problem of extracting natural language features. The T in ChatGPT is the abbreviation of the first letter of transformer. Compared with previous robots, artificial neural networks mimic the mechanism of the human brain and are closer to the language learning mode of the human brain. The word "attention" in Vaswani et al. is a mathematical description of the close relationship between one word and another. ChatGPT also includes RLHF (Reinforcement Learning Human Feedback), which is reinforcement learning with human feedback, so that it can constantly adjust itself and more closely follow human behavior patterns. Based on artificial neural networks, ChatGPT is able to read massive amounts of text on the Internet, including Wikipedia, and simulate language generation models from them. ChatGPT far surpasses the human brain in terms of mathematical calculations and data storage, and it is precisely because of these two super abilities that words in natural language can really be automatically annotated, achieve high-dimensional vectorization, and form complex network associations, and artificial neural networks can perform large-scale operations to get the best output. In 2013, Google released the Word2vec model, which is mainly about the vectorization of words. The model has more than 600 word vectors, that is, each word is composed of more than 600 parameters, and each parameter expresses the similarity between one word and other words. This model can be seen as a paradigm for large language models. For example, "elder brother" and "elder sister" can be labeled with the characteristics of "peer relatives, elder", that is, two vectors, "elder brother" and "elder sister" are correlated, and in the context of "peer relatives, elder", these words may co-exist. The difference is that feature annotation in linguistics and computational linguistics is artificial, "peer relatives, seniority" are only two features, and there are far more than two linguistic features related to language combination rules in natural language, and the aggregation and combination relationships between words cannot be described by just a few features. With a large number of feature annotations, the difficulty becomes dramatically. Moreover, which characteristics of the annotation words can reflect the sentence generation rules, manual annotation is not very clear, and it must be constantly manually debugged, and each debugging is a huge workload. ChatGPT based on Chat-3.5 is the best simulation device for automatic identification, automatic debugging, automatic feedback and automatic output, which solves massive computing problems. For example, in the paper proposed Transformer by Vaswani et al., the vector dimension reflecting the number of terms distribution features has reached 512, and by GPT-3, the vector dimension has reached 12288, and the neural network has 96 layers. The information contained is already quite large. GPT-4, although it does not publish a vector dimension, is certainly not weaker than GPT-3. More importantly, the large-scale expansion, storage, and computation of ChatGPT vectorization are still automatically realized through neural networks, which provides the possibility for machines to automatically establish feature labeling or vectorization of words in massive texts. It is also through the vectorization of words that ChatGPT can further obtain the distribution probability of words, establish a combination model of words, and complete the work of generating new sentences. Although ChatGPT's human-computer dialogue is still deficient in content, from the perspective of language theory, ChatGPT can generate new sentences and texts that conform to grammar, which is a very important progress in natural language understanding, so it is certain that ChatGPT can restore units and rules in the existing text corpus and generate new sentences. If ChatGPT only relies on its ability to process massive amounts of data to memorize language sentences, it is impossible to generate entirely new sentences and text. Since the artificial neural network is a black box inside, it is not clear how ChatGPT automatically builds a vector space and how to build a language generation model. But one thing is clear, ChatGPT doesn't deal with experience, but directly with massive amounts of text. Massive text can provide sufficiently detailed distribution information for each word, and ChatGPT can and can only rely on the distribution of words to restore units and rules through artificial neural networks to build a language generation model. From the perspective of mathematical methods, artificial neural network is essentially a nonlinear regression algorithm, as long as the input material is rich enough, this algorithm can simulate the laws behind the material and form automatic modeling. ChatGPT converts the distribution of words into vectors in mathematics, and with the help of massive linguistic texts, it finally simulates language rules, and uses these rules to generate new sentences and texts that conform to grammar. The method principle of artificial neural networks is the regression theory of Legendre and Gauss in mathematics, but the regression of Legendre and Gauss is linear regression, and later mathematicians developed nonlinear regression, but the basic principle is the same, that is, how to simulate a mathematical model from the complex distribution of elements, and then predict the distribution of unknown elements. This is a distribution theory. From the perspective of language theory, ChatGPT's linguistic approach is also a distribution theory, which is exactly the idea of Harris's distribution theory. Earlier, Bloomfield's theory of linguistic behaviorism also looked at the meaning of words as the usage (distribution) of words. According to Herris, the rules of each morpheme can be obtained by fully describing the rules of distribution of that morpheme. Because the distribution of morphemes in a language is extremely complex, and almost every morpheme has its own different distribution, it was impossible for Herris to fully describe the detailed distribution of all the morphemes of a language at that time, but could only summarize the distribution theory by way of examples. It is also difficult for linguists to exhaust the distribution of morphemes by hand. ChatGPT makes this kind of massively distributed operations possible, whether it's morphemes, words made up of morphemes, or other linguistic units (tokens). Hayris's theory of distribution is a purely formal analysis independent of experience. Since ChatGPT is able to obtain distribution rules independently of experience, this also validates the formalist grammatical theory that grammatical rules as opposed to semantics can be independent of experience, which is the basic idea of Chomsky, a student of Haris. From then on, we can form an understanding of the symbolic system. Since the formation of the axiomatic system in mathematics and logic, people have realized that mathematics is a pure formal axiom system, which does not need semantic explanation, and in layman's terms, it does not need the support of experience, as long as the axioms do not contradict each other, so that the idea that mathematics is essentially a pure formal symbol system has been determined. As for the relationship between mathematics and practical applications, it depends on practical needs, such as Euclidean geometry for Euclidean space, non-Euclidean geometry for relativity, and so on. ChatGPT's success in language generation is independent of experience, which also proves that there are formal systems in natural language that are independent of experience. As for how this form of system is applied, it needs to be linked to experience. The artificial training and supervision done by ChatGPT belongs to the alignment with human values and moral levels, which is like children learning language, and they still need to be educated in morality, values, and laws. This manual training and supervision is not part of the training of language ability. 3. Human Language Acquisition: Knowing the Restoration Ability Returns to the Turing and Searle Controversy: Can ChatGPT Think, Does It Understand Natural Language?The answer to this question depends on how we define thinking and language, but it is worth noting that although ChatGPT obtains a language generation model from massive texts, ChatGPT's automatic learning method is not the same as the human language learning method. As mentioned earlier, the human brain is far inferior to ChatGPT in terms of storage and computing power, and only has weak storage and computing capabilities. It is precisely because of the limitation of weak computing power that the language learning of the human brain cannot completely follow the language learning mode of ChatGPT. The human brain can also generate new sentences and texts like ChatGPT, but the human brain's language generation capabilities are to generate new sentences and texts from a limited and few rules and units. To complete the transition from finite to infinite, the human brain needs to restore finite units and rules in experience-based personal activities to generate new sentences and texts. The amount of text used to restore units and rules in the human brain is much smaller than the amount of text used by ChatGPT. The units and rules that are reduced based on personal knowledge are personal knowledge languages rooted in experience. In contrast, ChatGPT currently uses a verbal learning model, rather than an experience-based learning model. ChatGPT can perform large-scale calculations through multi-dimensional vectors, and can also summarize a lot of information about the outside world from the vast corpus, but this information is obtained by words rather than personal knowledge. From the perspective of storage and computing power, ChatGPT can restore formal units (tokens), but it is unknown whether it can restore meaningful units based on experience. In the future, ChatGPT may expand the learning mode of personal knowledge, such as making progress in smell, touch, pain, sadness, pleasure and synaesthesia, so as to simulate humans to obtain the best reduction model. Why does ChatGPT need to learn how to generate new sentences with very large data, while humans can learn to understand and generate new sentences with only limited data? Three-year-olds have basically mastered their native language, and the number of sentences they are exposed to is quite limited. Obviously, due to the weak computing ability of human beings, human beings can only restore rules and units in a certain number of sentences (including one-word sentences), and then generate new sentences based on these limited units and limited rules. Specifically, children acquire the usage of some words and sentences through life play, which is only the first step in speech acquisition, and the second step is the process of reducing units and rules. The restore process is analogy. For example, children can learn the following phrases: cloth shoes, straw shoes, leather shoes, gold watches, copper watches, silver watches, children will restore the units here by analogy, and generate new combinations: gold shoes, copper shoes, silver shoes, the result of the analogy here is to establish a model based on the cophase "X shoes", where X represents the material. The essence of analogy is to speak of new knowledge with existing knowledge. The knowledge here is related to the empirical world, that is, the formation of "material" knowledge is the personal knowledge formed by human beings when dealing with the empirical world. Without empirical analogy, neither the reduction process nor the generation process of natural language can be realized. In a nutshell, human knowledge is not only verbal knowledge, but also personal knowledge, and verbal knowledge is based on personal knowledge. The core of analogy is whether it can be inferred repeatedly. Some analogies are inexhaustible, and some analogies are exhaustive: parallel circumferential patterns: white paper, white walls, white shoes, white hair, white car ......#白菜, #白金, #白铁......$ white coal...... Parallel and inexhaustible mode: wrist, leg, table, door, book, mouth, liver, intestines, #心儿, #眼儿...... * pen, * ink, *foot, *hand, *palm, *finger, *nose, *tooth, *stomach, *kidney...... Counterexamples encountered in the parallel circumvention pattern are usually explainable. One is an escape, such as "cabbage, platinum, white iron" marked with the symbol #. Another case is a combination of empirical knowledge that does not yet exist, such as "white coal" marked with the symbol $ on it. In addition to these counterexamples, the "white X" is a model that can be inferred from all over again, and this pattern can be used for innovation. The generation of new sentences is based on this kind of circumference. The parallel inexhaustible pattern "X" above is not used as a rule to generate new instances. For example, "legs" and "feet" are parallel in the compositional relationship, and "legs" can be said, but "feet" cannot be said. Only instances of the parallel circumferential pattern need to be memorized, and the parallel circumferential mode does not need to be memorized, which is the adequacy of reductive learning. The instance machines in these two modes can be fully stored in the database, and there is no need to restore smaller units, because the computers have sufficient strong storage and computing power. This is an important difference between computer natural language processing and human brain processing language. Can ChatGPT restore language and rules like the human brain? The human brain is able to fully restore rules and units, relying on personal knowledge, which may be the key to human beings being able to fully restore units and rules even with weak computing ability. The artificial neural networks that ChatGPT and others rely on are not yet known, only known. For example, Wenxin Yiyan, a large language model launched by Baidu, is based on artificial neural networks like ChatGPT, and its definition of "material" uses "matter", defines "substance" uses "entity", defines "entity" uses "object", defines "object" uses "object", defines "object" uses "entity", and finally forms a circular interpretation. The reason why these words are used to define each other is that artificial neural networks find that words such as "material, substance, entity, physical object, object" often appear in the same position in the sentence in large-scale sentences, and are therefore placed in the same vector space with similarity. Similarity can be used to extract similar features. Extracting the similarity of words based on their distribution is the basic working principle of large language models. In daily life, people's understanding of "materials" and "substances, entities, objects, and objects" is the result of perception obtained through experience, such as touch and vision. By analogy, the computer can automatically establish the similar relationship between "mango", "banana", "jackfruit" and "durian" through the distribution of words such as "mango", "banana", "jackfruit" and "durian", and extract the characteristics of "fruit", but still does not know what the real taste of these fruits is. The explanation of "Dictionary of Modern Chinese" (7th Edition) is circular: know: to have an understanding of things and truths; Know(p1678)Know:Know(Meaning, Practice, etc.) (P312)Know:1.Be able to determine that a person or thing is this person or thing and nothing else. 2. Understand and master objective things through practice (P1102) Understanding: Knowing clearly (P820) The above "knowing, understanding, knowing, and understanding" are all circular definitions and circular explanations. Obviously, human beings do not gain the meaning of "knowing" by defining and explaining such modes of verbal learning, but through the use of language in the process of knowing. Use precedes definition, and knowing precedes words. Natural language is the first kind of metalanguage based on intuition, and defining words is ultimately a cyclical process. In propositional logic, there is the following definition: ¬ (negative symbol, ¬p means "non-p"), which is the natural language "no, no" to define the symbol. However, the natural language "no", "not" and "no" are defined in a circular manner in the Modern Chinese Dictionary:

No: Expressing disagreement No: No, no, not agreeing: used in front of verbs, adjectives and other adverbs to indicate the most important judgment word in negation logic, the definition in the Modern Chinese Dictionary is also circular: Yes: Yes, True: True, True: Conformity, Correct, NormalCorrect: Conformity: ConformityHuman natural language and the symbol system established on the basis of natural language, from the perspective of language hierarchy, there are circular interpretations. Wittgenstein said, "If I can't define 'plant,' don't I know what I'm talking about?" It is also an acknowledgment that use precedes definitions, and that knowledge precedes words. Let's go back to Turing and Searle's question of "thinking" and "understanding", in fact, Turing's functionalism and Searle's interpretivist school do not have a strict definition of "thinking and understanding", and we need to strictly distinguish between the two kinds of "thinking and understanding". Robots in Turing and artificial intelligence are capable of thinking and understanding language, and this kind of thinking and understanding refers to the network relationships between words, and Searle's thinking and understanding refers to experience-based thinking, which refers to the understanding of the empirical world behind words. We can call the language that humans learn through their own knowledge the language of their own knowledge, and the language that robots learn through their knowledge of speech is called the language of their own knowledge. Natural language is a system of symbols acquired through the initial terms and their continuous expansion, which includes both metaphorical or analogical usage, as well as definitions and explanations, but the initial language system itself cannot be obtained through definitions and interpretations, but can only be obtained through personal knowledge. Whether robots can realize the learning mode of personal knowledge and obtain personal language in the future is the key to natural language understanding. At present, the image recognition and voice recognition of real people by robots is the beginning of personal knowledge. But at least for the time being, it is still quite difficult to reach the level of human knowledge of the world, because the structure of computers and human brains is not the same. The human brain is an organic structure with complex biological structures behind it, as well as complex biological structures of various parts of the body, through which human beings acquire their own language by interacting with the world of experience. The complex biological structure of human beings is gradually formed in the process of long-term evolution, with a complex system of perception of the world, which is not yet possessed by current robots. A key to ChatGPT's leap forward is big data computing, which comes from data collected by research institutions, online data, etc., including data from online Wikipedia. The data on the Internet is mixed, which will inevitably affect the quality of GPT. Microsoft researcher Gunasekar et al. published an article "Text books Are All You Need" on Microsoft's preprint server arXix, emphasizing the need to improve data quality. In the PHI-1 model published in January 2023, the performance of the model was significantly improved after improving the data quality. GPT5 will also follow the idea of improving data quality, but no matter how high the quality of the text is, it is also based on the data of speech knowledge, and the model is still based on the model of speech knowledge after all, which is different from the way human language learning is performed. One possible development trend is the development of physical robots to biological robots, and robots gradually develop the ability to know. Today's computer's audio-visual recognition of the external world can be seen as a precursor to the ability to know. 4. Conclusion: Language behavior and language cognitionRobots based on large language models can obtain sentence generation ability and generate new sentences based on existing texts without personal knowledge, and complete a large number of information processing, reasoning work and creative activities on the basis of this language and language ability. Therefore, our theories about language ability, language knowledge, and thinking ability need to be adjusted. As for Turing and Serge's argument about whether machines are capable of thinking and understanding, it depends on how we define "thinking" and "understanding." Chomsky argues that ChatGPT doesn't tell us anything about language, and how to understand this depends on how you define "language knowledge". Circumventing these arguments, one basic problem is clear: ChatGPT's speech and knowledge storage and computing model must rely on strong storage and computing capabilities to restore language units and rules. Humans learn languages and can't use ChatGPT's methods. It is known that the storage and computing model only needs weak storage and computing ability to restore language units and rules, which is the characteristic of human language learning, and the real mechanism behind this has not been studied clearly by linguists and artificial intelligence experts. Perhaps it is this kind of first-hand storage and calculation model that enables human beings to have other abilities, the most important of which are the ability to prove mathematics and construct theories such as relativity on the basis of doubt, reflection, and comprehension. This is something that has not yet been implemented by artificial neural networks, and further research is needed to determine whether it can be realized in the future. Even if robots develop pro-knowledge ability and pro-language through deep learning in the future, it does not mean that humans have a full understanding of the mechanism of language, which is only the realization of language behavior. ChatGPT has emerged with some highly complex patterns of behavior, and AI experts are not yet fully aware of the mechanisms of these behaviors. Understanding the operating mechanism of human language is always the goal of language science research, which may provide theoretical support for promoting the learning of robotics to know language, and also provide a reference for us to limit the destructive behavior of robots. Once the robot can learn the language in a way that is completely like a human, coupled with the robot's own strong computing ability, the robot's language ability and thinking ability are amazing. The process of learning language by robots also provides more windows for humans to understand the operation mechanism of language. For example, ChatGPT does not use a large number of grammatical terms and concepts related to grammar system in learning language, which suggests that we should pay more attention to the study of synchronic rules and diachronic rules, and the study of language operation mechanism, rather than blindly constructing abstract and complex grammar systems. Linguistics needs to study not only the mechanism by which humans learn language, but also the mechanism by which robots learn language. Our current linguistics is based on the linguistics of human language learning, and the mechanism of robot language learning should also be included in the study, so we need to have a generalized linguistics based on the study of both human language and robot language, so that our mechanism of language learning can be more adequate. ChatGPT still has a lot to offer, but its success in many aspects cannot be ignored, and these successes show the importance of artificial neural networks in natural language understanding. In the early days of natural language understanding, it experienced both rule models and probabilistic models, and encountered many difficulties. The difficulty of the rule model is not that the rules themselves are not important, but that they are difficult to find and establish manually. The lesson here is that it is not easy for people who can speak language to find the rules behind language, just as it is not easy for people who can digest to find the laws of digestion. In fact, ChatGPT is also looking for rules, but it is looking for rules in big operations and big data, which reflects the difficulty of finding rules from one side. Probabilistic models are also very important, and the calculation of artificial neural networks is based on probabilistic models, but early probabilistic models also encountered difficulties in large-scale operations on massive texts. Both the rule model and the probabilistic model are valuable models, but the question is how to implement them automatically, and artificial neural networks do the work of automatic realization. Paleoanthropologists, archaeologists, paleogeneticists, etc., all consider the generation of symbolic systems to be an important feature of human evolution, but do not consider it to be the most critical feature, and do not use symbols as a distinguishing feature, which is regrettable. Existing experiments have shown that animals have many more abilities than humans, such as dolphins' ability to locate with sound waves, chimpanzees' memory ability, etc., but these animals have not developed a highly developed human society. We believe that a fundamental turning point in human evolution was the mastery of language. With language as a symbolic system, experience can be ordered, human beings can make plans for the future, can they share innovations, and they can accumulate knowledge. The most critical link in human evolution is the generation of symbolic systems, that is, the production of symbolic systems. Natural language is a system of symbols. Cassirer's definition of man as a symbolic animal now seems to make sense. OpenAI has a strong focus on natural language understanding in artificial intelligence, and its large language model ChatGPT realizes natural language generation in artificial intelligence, which is an important turn. The emergence of ChatGPT is as important to the development of robots as the emergence of natural language symbol systems is to human development, from which robots can have natural language conversations with humans, and directly read, inherit and use the vast textual knowledge recorded by humans in natural language. Language is the most important tool for thinking and communication, and trying to bypass language to achieve artificial intelligence is like bypassing language to talk about the origin of human beings, which is a mistake in direction. It can be said that the emergence of language humans is a turning point in human evolution, and the emergence of language robots is a turning point in the history of robots, which leads us to think more about language theory. This article was published in the Journal of Peking University (Philosophy and Social Science Edition), Issue 2, 2024.

Read on