
Feature丨As a stenographer encounters AIGC

author:21st Century Business Herald

21st Century Business Herald Reporter Guo Meiting, Intern Lin Wanna and Mai Zihao reported from Guangzhou

When ChatGPT ignited the enthusiasm of the whole network, Yan Hong was not moved.

Such a "wolf is coming", which has been shouted countless times in the shorthand line, "now it is no longer insensitive." ”

Shorthand is the profession of translating fleeting spoken words into written words. In many large conferences across the country, stenographers (officially known as stenographers in the industry) can often be seen. Today, there are more than 200 stenographers in first-tier cities, such as Beijing, while in second- and third-tier provinces and cities, such as Guizhou, the number is only 4.

ChatGPT has led to the fire of AI large models, and AI large models are gradually empowering specific fields in specific industries. At present, the voice recognition track has taken the lead in action.

On June 1, Alibaba Cloud announced the official launch of the new AI product "Tongyi Listening Insight", which focuses on audio and video content, becoming the first large-model application product in China to open public beta. Tongyi Listening Access is connected to the understanding and summarization ability of the Tongyi Qianwen model, and can complete the transcription, retrieval, abstraction and organization of audio and video content.

On June 9th, iFLYTEK's software product "iFLYTEK Hearing and Writing" also ushered in the first major version update after the launch of iFLYTEK Spark Model. According to reports, iFLYTEK not only integrates the ability of speech to text, but also extracts meeting minutes, generates to-do lists, and uses AI to write manuscripts.

Many people predict that the emergence of ChatGPT marks the beginning of a new round of industrial revolution - this time, is the "wolf" really coming?

"Reverse Propaganda"

Will stenographers be replaced by AI? In his career of more than ten years, Li Jun has always had to be questioned every three times a day.

On the mainland, modern shorthand dates back to the late 90s of the 19th century, but the industry only really developed after 2000. In 2003, stenographers were officially included in the "People's Republic of China Occupational Classification Dictionary", and in the media at the time, shorthand was described as an "emerging sunrise profession", with high salaries and a shortage of talents. In 2013, the Beijing Shorthand Association disclosed that the national shortage of secretarial shorthand recording talents had reached about 250,000.

Li Jun entered the industry in those years. Born in a poor mountainous area of Guizhou, she entered the rural girls' school opened by Bingxin's daughter Wu Qing because of a poverty alleviation activity.

At that time, she did not have the concept of artificial intelligence and did not know speech recognition. "The first mobile phone I bought when I first came out to work happened to use the iFLYTEK input method, and I knew the word 'iFLYTEK' for the first time." It's just that Li Jun doesn't know that in the following years, the discussion of her career will always be accompanied by the rise of a group of intelligent speech recognition companies represented by these two words.

From Lee Sedol's miserable defeat of Alpha Dog in 2016 to the emergence of ChatGPT at the end of 2022, the intelligence of AI has subverted people's cognition again and again. Recently, there has been a resurgence of talk about AI replacing human jobs, and the city is full of storms.

But in stenographers' own circles, advances in AI technology don't seem to be keeping up with the pace of their business expansion for the time being. At the beginning, Li Jun worked in Beijing, and later returned to Guizhou, and at first he only sorted out recordings for his counterparts in other places, and it was difficult to receive jobs locally. With the rapid development of the economy, landmark large-scale events such as data expos, wine fairs, and ecological meetings are increasingly held in Guizhou, and the local conference market has gradually emerged, so shorthand has a broader "place to use". Li Jun feels that at least before the arrival of the epidemic, the development momentum of shorthand is becoming stronger and stronger year by year.

From the perspective of remuneration, according to Yuan Yuan, who came from the Beijing Stenographer Association, in view of the different objective factors such as the qualification level of the stenographer, the customer group served, and the use of shorthand transcription manuscripts, the charging standards are different, generally 600-3000 yuan / half day.

A number of shorthand practitioners reported to 21 reporters that compared with AI, what really brings skin pain to the industry is the epidemic. Before the epidemic, Li Jun's average monthly income was about 15,000, and the sudden epidemic made it impossible to hold a large number of meetings offline, her income was almost cut in previous years, and the market may gradually recover this year.

AI certainly has an impact. Yan Hong, a stenographer from Beijing, occasionally hears complaints from peers that voice recognition has cost them some big customers. Some speed recording work that is not demanding and difficult can be completed through artificial intelligence technology.

But there are also some interesting phenomena. Li Jun and Yan Hong both mentioned that originally, except for first-tier cities such as Beijing, Shanghai, Guangzhou and Shenzhen, the popularity of shorthand was not high, and many places did not even know the existence of this line. The overwhelming promotion of AI, speech recognition related companies has also reversed publicity and popularized the shorthand industry. They lost some of their old customers, but also met many new customers, or some customers tried speech recognition for a while and then went back to shorthand.

Shenzhen Black Box Shorthand Co., Ltd. (hereinafter referred to as "Black Box Shorthand") also observed that the utilization rate of some units after purchasing AI speech recognition services is not high, and there is an idle phenomenon.

In fact, there is another factor that bothers the black box shorthand. In the past two years, the confidentiality regulations have become stricter, which has restricted the qualifications of third-party stenography service providers to participate in internal meetings of confidentiality, and the threshold for confidentiality qualifications is too high, which is not suitable for the status quo of small and micro shorthand enterprises, which has caused a great impact on the company's business.

Large model rework

On the big screen, the speech content is transcribed into subtitles by AI speech recognition and presented to the audience in real time. In the corner of the stage, the stenographer quickly taps his finger on the keyboard to record synchronously, and outputs a detailed shorthand script after the meeting. On and off stage, they each perform their own duties, and there is no word to fight. Such a scene is not uncommon in large forums in the past two years.

Years of experience in shorthand have made Li Jun deeply feel the changes in speech recognition technology, "from the inability to complete even clause segmentation, segmentation, etc., to the increasingly standardized use of punctuation, the ability to distinguish speakers, filter 'saliva words', the advancement of technology and publicity to create momentum, anxiety is still there." But also in this process of "discussion", she firmly believes that human beings are still irreplaceable.

Many stenographers believe that AI speech recognition has high requirements for the environment and accent, "record what you hear", and weak screening and recognition functions. Sometimes, the recording file commissioned by the customer is accompanied by a speech recognition transcript, and they have tried to process and modify the draft, but it often takes longer than listening to the recording from scratch.

Instead of recording every sentence intact, Li Jun may now communicate with clients before the meeting about the purpose of the shorthand script, whether it is for publicity, archiving or meeting minutes, and consider reasonable trade-offs and summaries according to needs.

"In some cases, the client requires the manuscript to be not only accurate, but also to be 'Cindaya', for example, for publishing a book, which is the most rigorous manual shorthand recording." Meta supplement.

Hong Qingyang, associate professor of the School of Information Science and head of the Intelligent Speech Laboratory of Xiamen University, pointed out that in more ideal environments (such as near-field and relatively quiet), the accuracy of speech recognition in the general field can generally reach more than 95%. At present, speech recognition technology mainly has difficulties in three aspects.

First, it is difficult to identify professional and unfamiliar terms in specific fields (such as electric power, medical treatment, etc.), and it is necessary to collect special corpus to fine-tune the original model and develop targeted industry models; Second, overlapping speech or complex scenes, such as interjection in the conference scene, positive and negative talk, etc., are still the current research difficulties, and it is necessary to start from both software and hardware aspects, such as innovative algorithms and microphones that increase the reception to better locate and focus sound sources. The third is multilingualism, language mixing, and dialect scenes, which are also very urgent needs.

The access of large models represented by ChatGPT can solve some problems. At present, the voice products announced on the domestic market that have been connected to large models, including Alibaba Cloud Tongyi Listening and iFLYTEK Hearing and Writing.

Yan Zhijie, head of Alibaba Cloud Tongyi Listening Technology, told 21 reporters that the large model can effectively improve the readability of text in terms of tone word filtering, segmentation, refinement and rewriting, and at the same time correct some typical speech recognition errors, such as phonetic words and exclusive nouns. In addition, the addition of large models can also extend to further applications, such as segmentation, abstracts, etc.

But this optimization comes after the step of speech recognition. Jiang Jiahui, product manager of iFLYTEK, explained that the information that can be received and understood by large models is still mainly text, which means that speech recognition is still a separate task. After accessing the large model, you can realize dynamic meeting records, automatic organization and output of meeting content, and generate various types of meeting copy with one click under the module of speech recognition, helping users work efficiently.

"As large models will further develop in the direction of multimodality, we expect the next phase of large models to understand sound." Jiang Jiahui said that at that time, it was possible to know more intuitively how much the large model had improved the accuracy of speech recognition.

However, the reliability of vigilant generated content is still needed at this stage. Hong Qingyang reminded that when the large model generates new text based on speech recognition text, there may be common sense errors, such as Zhang Guan Li Dai, understanding bias, etc., if there are more speech recognition errors in the previous step, this deviation will be further increased.

Human-machine coupling

Engaging in shorthand has brought Li Jun a great sense of pride and achievement. With each attendance, she learns new knowledge, a feeling she describes as "seeing the world on the shoulders of giants." ”

In the past, she received intensive law-related conference tasks, and due to the strong professionalism of the conference content, she had to look up relevant terms on the Internet again and again, and slowly became familiar with this field. Taking advantage of the epidemic at home, she taught herself to pass the law exam.

Li Jun's generation of stenographers still have a passion for the industry, however, they can also feel that fewer and fewer people are learning shorthand.

On the one hand, under the influence of artificial intelligence, many people are worried that their positions will be replaced in the future and dare not enter the industry rashly. On the other hand, shorthand itself has a low completion rate and a long training cycle.

"Compared with the hot scene of the speed recording training industry more than a decade ago, it is not the same." Yuan Yuan mentioned that there were dozens of training institutions and colleges and universities across the country that opened short-hand recording courses, and there are not a few left. The stenographer training industry in the south is acceptable, but the main employment outlet is the courts. In the stenography industry, the court is not the most ideal employment choice, the required technical level is not very high, and the general stenographer is still working towards the direction of senior stenographers who can independently attend the conference.

Li Jun occasionally sighed, "When our group of people get old, there will really be no shorthand in Guizhou." "She brought two small apprentices, one of whom is already interning and may be able to attend the meeting herself next year." The two of them are the hope of Guizhou's future. ”

In interviews, this unease did not last long, and many stenographers showed their welcome and acceptance of artificial intelligence technology. They want AI to be able to assist them in their work.

Black box shorthand believes that shorthand + AI may be the future development direction. Imagine that if stenographers recorded text while AI could help them review and correct typos, grammar, proper nouns, etc. in the text, it would be easier.

Yan Hong mentioned that at first, the shorthand would be a duo, or one person would play the main hit, one person modified, or the two would alternately play separately, but later it generally became a shorthand for one person responsible for the entire meeting, which required the stenographer to concentrate and not relax for a moment. It is worth looking forward to whether the development of AI in the future can complement one of the "doubles".

Regarding the human-machine coupling in the shorthand industry, most technical experts have proposed the idea that AI recognizes first and then further modifies people. For example, Yan Zhijie believes that in the future, AI can undertake the process of verbatim recording, and stenographers will be liberated from high mental tension and pay attention to subsequent abstracts, refinement, and sorting. If the manual corrects and summarizes the content generated by the machine, it is a kind of data from the perspective of the algorithm. If this data can be provided for further training of the model, it will effectively help the AI to improve, and thus better help the stenographer.

"This means that stenographers may need to pay more attention to some difficult points, such as professional terminology, expression habits, language mixing, etc., and review and correct difficult problems that AI cannot solve." Hong Qingyang said.

According to Jiang Jiahui, some stenographers are already using iFLYTEK's real-time recording function to improve efficiency. Combined with AI large models, he built a cooperative chain of "AI speech recognition - large model correction - artificial refinement - large model correction", "machines have certain advantages for checking typos, humans are better at speech understanding, and if more efficient content processing is needed in the future, such as generating abstracts, articles or meeting minutes, presentations, etc., you can also use large models to assist in generation." ”

However, it may be some time before technology really profoundly affects the industry. Since the beginning of this year, the stenographers' work has returned to busyness. The day before the interview, Yan Hong attended a full day of meetings at a university, was busy until 10 p.m., and got up at 7 p.m. the next day to continue working. Even when she doesn't attend the meeting, she still has to sort through the batches of audio files sent by customers.

"When speech recognition technology first came out, we were worried about stenographers being replaced, but now the advent of ChatGPT makes us think that all walks of life can be replaced. When every industry is on the same starting line, I'm not in a hurry. Talking about the future, Li Jun said with a smile: "(Technology) come here, I just think about how to combine my industry with technology, and then continue to live." In fact, it's all work, but the way to do it is different. ”

(Li Jun, Yan Hong, and Yuan Yuan are pseudonyms in the text)

For more information, please download 21 Finance APP