天天看點

基于深度神經網絡的社交媒體使用者級心理壓力檢測

User-Level Psychological Stress Detection from Social Media Using Deep Neural Network

基于深度神經網絡的社交媒體使用者級心理壓力檢測

ABSTRACT

It is of significant importance to detect and manage stress before it turns into severe problems. However, existing stress detection methods usually rely on psychological scales or physiological devices, making the detection complicated and costly. In this paper, we explore to automatically detect individuals’ psychological stress via social media. Employing real online micro-blog data, we first investigate the correlations between users’ stress and their tweeting content, social engagement and behavior patterns. Then we define two types of stress-related attributes: 1) low-level content attributes from a single tweet, including text, images and social interactions; 2) user-scope statistical attributes through their weekly micro-blog postings, leveraging information of tweeting time, tweeting types and linguistic styles. To combine content attributes with statistical attributes, we further design a convolutional neural network (CNN) with cross autoencoders to generate user-scope content attributes from low-level content attributes. Finally, we propose a deep neural network (DNN) model to incorporate the two types of userscope attributes to detect users’ psychological stress. We test the trained model on four different datasets from major micro-blog platforms including Sina Weibo, Tencent Weibo and Twitter. Experimental results show that the proposed model is effective and efficient on detecting psychological stress from micro-blog data. We believe our model would be useful in developing stress detection tools for mental health agencies and individuals.

在壓力變成嚴重問題之前,檢測和管理壓力具有重要意義。然而,現有的壓力檢測方法通常依賴于心理量表或生理裝置,使得檢測複雜且成本高昂。在本文中,我們探索通過社交媒體自動檢測個人的心理壓力。利用真實的線上微網誌資料,我們首先調查了使用者壓力與其推特内容、社交參與度和行為模式之間的相關性。然後,我們定義了兩種與壓力相關的屬性:1)來自單個推特的低級内容屬性,包括文本、圖像和社互動動;2) 使用者範圍統計屬性通過他們每周的微網誌文章,利用推特時間、推特類型和語言風格的資訊。為了将内容屬性與統計屬性相結合,我們進一步設計了一個帶有交叉自動編碼器的卷積神經網絡(CNN),以從低級内容屬性生成使用者範圍的内容屬性。最後,我們提出了一個深度神經網絡(DNN)模型,将兩種類型的使用者範圍屬性結合起來,以檢測使用者的心理壓力。我們在新浪微網誌、騰訊微網誌和推特等主要微網誌平台的四個不同資料集上測試了訓練後的模型。實驗結果表明,該模型能夠有效地檢測微網誌資料中的心理壓力。我們相信,我們的模型将有助于為心理健康機構和個人開發壓力檢測工具。

Keywords

Stress detection; convolutional neural network; cross auto encoders; deep learning; micro-blog; social media

應力檢測;卷積神經網絡;交叉自動編碼器;深度學習;微網誌;社會化媒體

  1. INTRODUCTION

    1.1 Motivation

    Psychological stress is the root cause to many health problems and mental diseases. Chronic stress increases the risk of developing health problems such as insomnia, obesity, heart diseases, cancer etc. [1]. Many studies have revealed a link between stress and mental diseases like anxiety disorders, depression etc. [2]. Stress has been a threat to human health for a long time. Time magazine’s June 6, 1983 cover story called stress “The Epidemic of the Eighties” and referred to it as our leading health problem (http://www.stress.org/americas-1-healthproblem/). Meanwhile, stress has been progressively worsened and spread recent years. With the rapid development of modern society, many people feel increasingly stressed under the rapid pace of life. Numerous surveys have confirmed that adult Americans are feeling under much more stress than a decade or two ago. A 1996 Prevention magazine survey found that almost 75% feel they have “great stress” one day a week and with more than 30% indicating they feel this way more than twice a week, which is 55% compared to the same survey conducted in 1983 (http://www.anxietycentre.com/stress.shtml). In a word, the rapid increase of stress has become a great challenge to human health and life quality.

Psychological stress detection remains a large problem at the present stage. Detecting and managing stress before it turns into severe problems is of significant importance. Recent decades, many efforts have been devoted to stress detection by researchers from diverse areas. They have developed many methods to measure psychological stress, including psychological questionnaire based interviews [3, 4] and physiological signal based measures [5, 6]. However, these methods have their limitations in many aspects. Psychological questionnaires often contain a range of questions designed by psychologists. People are usually unwilling to do these questionnaires unless they have to. Physiological methods usually require professional devices to measure users’ physiological and biochemical properties and need specialists to analyze the acquired data. Thus, it is very important and useful to find a way to detect user’s stress state reliably, automatically and non-invasively.

With the fast development of social networks, people are widely using social media platforms to share their thoughts and feelings. A statistic report from statisticbrain.com (http://www.statisticbrain.com/twitter-statistics/) shows that by 2014.1.1, the total number of active registered users on Twitter has reached more than 645 million, with an average 58 million tweets posted per day. As for Sina weibo (the largest micro-blog platform in China), the number of weibo users has reached more than 600 million (http://www.comsoc.org/blog?page=3). People post tweets containing text and images on micro-blog platforms to share opinions, express emotions, record daily routines and communicate with friends. We can obtain linguistic and visual content that may indicate stress related symptoms. This makes the detection of users’ psychological stress through their tweets and posting patterns from micro-blog feasible.

1、引言

1.1動機

心理壓力是許多健康問題和精神疾病的根源。慢性壓力會增加患失眠、肥胖、心髒病、癌症等健康問題的風險[1]。許多研究揭示了壓力與焦慮症、抑郁症等精神疾病之間的聯系[2]。長期以來,壓力一直威脅着人類健康。《時代》雜志1983年6月6日的封面故事将壓力稱為“80年代的流行病”,并将其稱為我們的主要健康問題(http://www.stress.org/americas-1-health問題/)。與此同時,近年來壓力逐漸惡化和蔓延。随着現代社會的快速發展,許多人在快速的生活節奏下感到越來越緊張。許多調查證明,美國成年人的壓力比十年或二十年前大得多。1996年《預防》雜志的一項調查發現,近75%的人每周有一天感到“壓力很大”,超過30%的人表示他們每周有兩次以上有這種感覺,與1983年的調查相比,這一數字為55%(http://www.anxietycentre.com/stress.shtml). 總之,壓力的快速增加已經成為對人類健康和生活品質的巨大挑戰。

心理壓力檢測在現階段仍然是一個大問題。在壓力演變為嚴重問題之前檢測和管理壓力具有重要意義。近幾十年來,來自不同領域的研究人員緻力于壓力檢測。他們開發了許多測量心理壓力的方法,包括基于心理問卷的訪談[3,4]和基于生理信号的測量[5,6]。然而,這些方法在許多方面都有其局限性。心理問卷通常包含心理學家設計的一系列問題。除非必須,否則人們通常不願意做這些問卷調查。生理學方法通常需要專業裝置來測量使用者的生理和生化特性,并需要專家來分析獲得的資料。是以,尋找一種可靠、自動和非侵入性地檢測使用者壓力狀态的方法是非常重要和有用的。

随着社交網絡的快速發展,人們廣泛使用社交媒體平台來分享自己的想法和感受。來自statisticbrain的統計報告(http://www.statisticbrain.com/twitter-statistics/)資料顯示,截至2014年1月1日,推特上的活躍注冊使用者總數已超過6.45億,平均每天釋出5800萬條推特。至于新浪微網誌(中國最大的微網誌平台),微網誌使用者已經超過6億(http://www.comsoc.org/blog?page=3). 人們在微網誌平台上釋出包含文字和圖像的推特,以分享意見、表達情感、記錄日常生活并與朋友交流。我們可以獲得可能訓示壓力相關症狀的語言和視覺内容。這使得通過使用者的推特和微網誌釋出模式檢測使用者的心理壓力成為可能。

1.2 Related Work

Existing methods for stress detection. Many efforts have been devoted to developing convenient tools for individual stress detection recent years. Researchers are trying to leverage pervasive devices like personal computers and mobile phones for routine stress detection. Hong L. etc. [7] proposed StressSense to unobtrusively recognize stress from human voice using smartphones. Paredes, P. etc. [8] investigated the initial lab evidence of the use of a computer mouse in the detection of stress. However, such applications rely on collecting one’s real-life data, which is easy to trigger antipathy. It makes stress detection invasive to normal life, and can’t be used widely in more people.

Researches on using social media for healthcare. With the rapid spread of social networks, researches on using social media data for physical and mental healthcare are also increasingly growing. Sadilek et al. [9] leverage Tweeter postings to identify the spread of flu symptoms. Paul M.J. etc. [10] apply the Ailment Topic Aspect Model to over 1.5 million health related tweets and discover correlations between behavioral risk factors and aliments. Munmun etc. [11] leverage behavioral cues indicated from Twitter postings to predict depression before it is reported. These studies show the feasibility of harnessing social media data for developing healthcare tools. However, they mainly leverage the textual content in the social media data, while other equally important content, like images and social behavior are ignored.

Deep learning approaches for cross-media data modeling. Micro-blog data is typical cross-media data. Items may come from diverse sources and modalities. It is difficult to handle the heterogeneous cross-media data. Recent years, extensive researches on deep learning show superior ability of deep neural networks (DNN) in learning features from large scale unlabeled data [12-14]. [15, 16] further extend the deep models for multimodal learning. [17] design a cross-media learning method based on DNN, and leverage the model for detecting psychological states and corresponding categories from a single tweet. However, stress is a continuous state compared to instant emotions, indicating that the stressed stated can last for several days in psychology [3]. It remains a challenge to make use of aggregated cross-media data for user-level modeling.

1.2相關工作

現有的應力檢測方法。近年來,人們緻力于開發友善的個人應力檢測工具。研究人員正試圖利用個人電腦和手機等普及裝置進行日常壓力檢測。Hong L.等[7]提出了壓力感覺(StressSense),即使用智能手機從人聲中隐秘地識别壓力。Paredes,P.等[8]研究了使用電腦滑鼠檢測壓力的初步實驗室證據。然而,這些應用程式依賴于收集真實生活中的資料,這很容易引發反感。它使壓力檢測侵入了正常生活,無法在更多人中廣泛應用。

利用社交媒體進行醫療保健的研究。随着社交網絡的迅速普及,利用社交媒體資料進行身心健康的研究也越來越多。Sadilek等人[9]利用推特文章識别流感症狀的傳播。Paul M.J.等[10]将疾病主題方面模型應用于150多萬條健康相關推文,并發現行為風險因素與營養之間的相關性。Munmun等人[11]利用推特文章中顯示的行為線索,在抑郁症被報道之前預測它。這些研究表明了利用社交媒體資料開發醫療工具的可行性。然而,他們主要利用社交媒體資料中的文本内容,而忽略了其他同樣重要的内容,如圖像和社交行為。

跨媒體資料模組化的深度學習方法。微網誌資料是典型的跨媒體資料。項目可能來自不同的來源和方式。異構跨媒體資料的處理比較困難。近年來,對深度學習的廣泛研究表明,深度神經網絡(DNN)在從大規模未标記資料中學習特征方面具有優越的能力[12-14]。[15,16]進一步擴充了多模式學習的深度模型。[17] 設計了一種基于DNN的跨媒體學習方法,并利用該模型從單個推文中檢測心理狀态和相應類别。然而,與即時情緒相比,壓力是一種持續狀态,這表明在心理學中,壓力狀态可以持續幾天[3]。利用聚合的跨媒體資料進行使用者級模組化仍然是一個挑戰。

1.3 Our Work

In this paper, we explore the potential to use social media to detect psychological stress for individuals. Micro-blog is one of the most popular social media that can be publicly accessed. People can post text with no more than 140 words, upload images or have social interactions with others. Employing real online micro-blog data, we first investigate the correlations between users’ stress and their tweeting content, behavior patterns and social engagement. Then we define two types of stress-related attributes: 1) low-level content attributes from a single tweet, including text, images and social interactions like comments, retweets and favorites; 2) userscope statistical attributes through their weekly micro-blog postings, leveraging information of tweeting time, tweeting types, linguistic styles, and social engagement with friends indicated from the @-mentions and @-replies, etc. To combine low-level content attributes with user-scope statistical attributes, we further design a convolutional neural network (CNN) with cross autoencoders to learn the latent high-level attributes on crossmodal units [17][18]. Finally, we propose a deep neural network (DNN) model to incorporate the two types of user-scope attributes to detect users’ psychological stress. The experimental results on four datasets from different Micro-blog platforms indicate the effectiveness and efficiency of the proposed method.

We have to face several challenges in this work. And the corresponding contributions are:

1)Challenge 1: Micro-blog platforms contain massive data. It is infeasible to manually label the data. How to find effective methods to automatically label the ground truth remains a challenge.

Our solution: Inspired by previous research [19], we have built a stressed-twitter-posting database using the “I feel stressed” sentence pattern as the ground-truth label for detecting stress from micro-blog data. With a small set of psychological stress scale score labeled dataset as test, it is proved that our ground truth labeling method is reliable;

2 ) Challenge 2: Attributes in a tweet come with multiple modalities and the components are often incomplete, which is a typical problem in cross-media. Numbers of tweets in a certain period of time also differ from person to person and from week to week. Traditional models have limited abilities to extract modality-invariant attributes from such data.

Our solution: We design a convolutional neural network with cross autoencoders to aggregate low-level content attributes and generate modality-invariant user-scope attributes which support user-level stress detection;

3)Challenge 3: Modeling stress in user-level is more difficult than in discrete tweet-level, since both the overview and detailed attributes should be concerned about.

Our solution: We propose a stress detection model based on DNN to incorporate content attributes and statistical attributes together. The DNN model along with CNN forms a unified integral deep network which can extract attributes from single tweets and detect user-level continuous psychological stress.

在本文中,我們探讨了使用社交媒體來檢測個人心理壓力的可能性。微網誌是最受歡迎的社交媒體之一,可以公開通路。人們可以釋出不超過140個單詞的文本,上傳圖像或與他人進行社互動動。利用真實的線上微網誌資料,我們首先調查了使用者壓力與其推特内容、行為模式和社交參與度之間的相關性。然後,我們定義了兩種與壓力相關的屬性:1)來自單個推特的低級内容屬性,包括文本、圖像和社互動動,如評論、轉發和收藏夾;2) 使用者範圍統計屬性通過其每周的微網誌文章,利用推特時間、推特類型、語言風格以及@提及和@回複中訓示的朋友社交參與度等資訊,将低級内容屬性與使用者範圍統計屬性相結合,我們進一步設計了一個帶有交叉自動編碼器的卷積神經網絡(CNN),以學習交叉模态單元上的潛在進階屬性[17][18]。最後,我們提出了一個深度神經網絡(DNN)模型,将兩種類型的使用者範圍屬性結合起來,以檢測使用者的心理壓力。在來自不同微網誌平台的四個資料集上的實驗結果表明了該方法的有效性和效率。
     在這項工作中,我們必須面對幾個挑戰。相應的貢獻是:
    1) 挑戰1:微網誌平台包含大量資料。手動标記資料是不可行的。如何找到有效的方法來自動标記地面真相仍然是一個挑戰。
    我們的解決方案:受之前研究[19]的啟發,我們建立了一個壓力推特文章資料庫,使用“我覺得有壓力”句型作為基本事實标簽,從微網誌資料中檢測壓力。以一小部分心理應激量表分數标記資料集作為測試,證明了我們的地面真實值标記方法是可靠的;
    2)挑戰2:推文中的屬性具有多種形式,并且元件通常不完整,這是跨媒體中的一個典型問題。在一段時間内,推特的數量也因人而異,也因周而異。傳統模型從此類資料中提取模态不變屬性的能力有限。
    我們的解決方案是:我們設計了一個帶有交叉自動編碼器的卷積神經網絡來聚合低級内容屬性,并生成支援使用者級壓力檢測的模态不變使用者範圍屬性;
    3) 挑戰3:在使用者級模組化壓力比在離散推特級模組化壓力更困難,因為應該關注概述和詳細屬性。
    我們的解決方案:我們提出了一種基于DNN的壓力檢測模型,将内容屬性和統計屬性結合在一起。DNN模型與CNN一起形成了一個統一的整體深度網絡,可以從單個推文中提取屬性,并檢測使用者級的持續心理壓力。
           
  1. DATA OBSERVATION

    2.1 Observation dataset

    We first crawl 350 million tweets data via Sina Weibo’s streaming APIs from 2009.10 to 2012.10. Then we collect tweets containing sentence patterns like“I feel stressed this week” and “I feel stressed so much this week” as the weekly stressed state label, and tweets containing “I feel relaxed” and “I feel non-stressed” as the non-stressed state label. The “I feel” pattern has been proved to be effective as ground truth data labels in emotion analysis in [19]. In this way, we collect over 19000 weeks of users’ tweets that are labeled as stressed, and over 17000 weeks of non-stressed users’ tweets. There are 492,676 tweets from 23304 users in total. We take this dataset for observation and further experiments, which is represented by DB1 in this paper. The details of the dataset are shown in Table 1.

    基于深度神經網絡的社交媒體使用者級心理壓力檢測

    2、資料觀察

    2.1 觀察資料集

    從2009.10到2012.10,我們首先通過新浪微網誌的流媒體API抓取3.5億條推文資料。然後,我們收集包含句型的推文,如“我本周感到有壓力”和“我本周感到壓力很大”作為每周壓力狀态标簽,以及包含“我感覺放松”和“我感覺沒有壓力”作為非壓力狀态标簽的推文。在[19]中,“我感覺”模式已被證明是情感分析中有效的基礎真相資料标簽。通過這種方式,我們收集了超過19000周的使用者推文,這些推文被标記為有壓力的推文,以及超過17000周的無壓力使用者推文。共有來自23304名使用者的492676條推文。我們使用這個資料集進行觀察和進一步的實驗,在本文中用DB1表示。資料集的詳細資訊如表1所示。

2.2 Observation and analysis

We first conduct a series of analyses on the DB1 and present some patterns related to individuals’ psychological stress reflected by tweets. In the analysis, we randomly pick 1000 weeks of stressed and non-stressed tweets from the DB1 and focus on the following aspects:

Content correlation: the difference of stressed and nonstressed tweets in tweets’ content, including text and images;

Social engagement correlation: the difference between stressed and non-stressed weekly tweets on users’ social interactions with friends via @-mentions, @-replies and tweets’ comments, retweets and likes;

Behavioral correlation: the difference of stressed and nonstressed tweeting behavior in tweeting frequency, tweeting types and tweeting time.

2.2觀察和分析

我們首先對DB1進行了一系列分析,并提出了一些與推特反映的個人心理壓力相關的模式。在分析中,我們從DB1中随機選取1000周的有壓力和無壓力推文,重點關注以下方面:

内容相關性:強調和非強調推文在推文内容上的差異,包括文本和圖像;

社交參與相關性:在使用者通過@提及、@-回複和推特的評論、轉發和喜歡與朋友的社互動動上,每周有壓力和無壓力的推特之間的差異;

行為相關性:壓力和非壓力推特行為在推特頻率、推特類型和推特時間上的差異。

2.2.1 Observations on content correlation

Tweets on micro-blog mainly consist of text and images. We leverage a widely used psychological dictionary LIWC [20] to measure the most frequently occurred words in stressed and nonstressed tweets text content. The results are shown in Figure 1. From the figure, we observe that there is evident difference in text content between the stressed and non-stressed tweets. For the stressed tweets, there are more words categories from negative emotions, social, friends and family etc. While for the nonstressed tweets, there exist more word categories from positive emotions, work, health and anxiety etc.

As for the image content of tweets, we consider brightness and saturation as observed visual features. The results are shown in Figure 2(a) and Figure 2(b). From Figure 2(a), we can observe that the presence of images with low brightness (<0.3) from stressed class is obviously higher than that from non-stressed class, indicating that stressed users are more likely to post images with lower brightness.

As for the saturation distribution in Figure 2(b), we observe that the saturation of non-stressed users’ images are more likely to be lower (<0.5), while the stressed class is more likely to be in the higher range (>0.5).

微網誌上的推文主要由文字和圖檔組成。我們利用廣泛使用的心理詞典LIWC[20]來測量重音和非重音推文文本内容中出現頻率最高的單詞。結果如圖1所示。從圖中,我們觀察到重音推文和非重音推文之間的文本内容存在明顯差異。對于有壓力的推文,有更多來自負面情緒、社交、朋友和家人等的詞類。而對于無壓力的推文,有更多來自正面情緒、工作、健康和焦慮等的詞類。

對于推文的圖像内容,我們将亮度和飽和度視為觀察到的視覺特征。結果如圖2(a)和圖2(b)所示。從圖2(a)中,我們可以觀察到,來自壓力等級的低亮度圖像(<0.3)明顯高于來自非壓力等級的圖像,這表明壓力使用者更可能釋出亮度較低的圖像。

至于圖2(b)中的飽和度分布,我們觀察到,非壓力使用者圖像的飽和度更可能較低(<0.5),而壓力類别更可能處于較高範圍(>0.5)。

基于深度神經網絡的社交媒體使用者級心理壓力檢測

2.2.2 Observations on social engagement correlation

Micro-blog is an important platform for users to share information and interact with friends. The social interactions on micro-blog usually consist of @-mentions, @-replies, retweets, comments and likes etc. We analyze the correlation between social interactions and users’ stress states.

Figure 3 shows the social interaction patterns from tweets of users in stressed and non-stressed states. The patterns are measured as the proportion of the numbers of comments, likes, retweets, @-mentions and @-replies in users’ weekly tweets.

From the figure, we observe that for the non-stressed class, users’ tweets get more comments, likes and retweets from friends, indicating that people are generally more likely to interact with the followed users when they are at a non-stressed state. Meanwhile, compared to non-stressed weeks, the stressed weeks have less @-mentions and @-replies of friends. This also proves that stressed users are less social active than non-stressed users.

2.2.2社會參與相關性觀察

微網誌是使用者分享資訊、與朋友互動的重要平台。微網誌上的社互動動通常包括@提及、@-回複、轉發、評論和喜歡等。我們分析了社互動動與使用者壓力狀态之間的相關性。

圖3顯示了在壓力和非壓力狀态下使用者推特的社互動動模式。這些模式以使用者每周推文中評論、贊、轉發、提及和回複的數量所占的比例來衡量。

從圖中,我們觀察到,對于沒有壓力的類,使用者的推文從朋友那裡獲得了更多的評論、贊和轉發,這表明當使用者處于沒有壓力的狀态時,人們通常更容易與關注的使用者進行互動。同時,與沒有壓力的一周相比,有壓力的一周對朋友的提及和回複更少。這也證明了有壓力的使用者比沒有壓力的使用者社交活躍度低。

2.2.3 Observations on Behavioral Correlation

As revealed by psychology theories [1], there are many common symptoms may be related to stress, including insomnia, social withdrawal .etc. These symptoms can also be reflected by tweeting behavior changes on micro-blog. We observe tweeting time distributions to measure users’ tweeting behavior.

Figure 4 shows the results of tweeting time distribution of users from the two classes. Tweeting time distribution is measured in tweet postings in hours of a day. From the result, we observe that there are more stressed postings during 0 to 6 in the morning, revealing that stressed users are more likely to be insomnia.

Summary To very briefly summarize, we have the following intuitions which will be further leveraged and incorporated in our method design:

The different content of a single tweet including text, image and social interactions are all related to different one’s stress state at some point.

One’s stress state can be related to the social engagement with friends in weekly unit.

One’s stress state can also be related to the tweeting behavior on micro-blog.

2.2.3行為相關性觀察

心理學理論[1]揭示,有許多常見症狀可能與壓力有關,包括失眠、社交退縮等。這些症狀也可以通過微網誌上的推特行為變化來反映。我們觀察推特時間分布來衡量使用者的推特行為。

圖4顯示了這兩個類使用者的推特時間分布結果。推特時間分布以每天的小時數在推特文章中測量。從結果中,我們觀察到,在早上0到點,有更多壓力的文章,表明有壓力的使用者更有可能失眠。

概括地說,我們有以下直覺,這些直覺将被進一步利用并納入我們的方法設計中:

一條推文的不同内容,包括文本、圖像和社互動動,在某些方面都與一個人的壓力狀态有關。

一個人的壓力狀态可能與每周與朋友的社交活動有關。

一個人的壓力狀态也可能與微網誌上的推特行為有關。

  1. ATTRIBUTES DEFINITION

    The micro-blog data is a typical type of cross-media data, containing text, emoticons, images and social interactions. Besides, the patterns of micro-blog usage behavior in a period such as one week unit also contain useful information for stress detection. To leverage both content information contained in single cross-media micro-blog tweet and the micro-blog usage behavior in weekly tweets, guided by psychological theories, we define two sets of attributes to measure the differences of the stressed and non-stressed users on micro-blog: 1) content attributes from the content of a single tweet; 2) statistical attributes from the users’ behavior of weekly tweet postings.

    3、屬性定義

    微網誌資料是一種典型的跨媒體資料,包含文本、表情、圖像和社互動動。此外,微網誌使用行為在一周等時間段内的模式也包含了有用的壓力檢測資訊。為了利用單個跨媒體微網誌推文中包含的内容資訊和每周推文中的微網誌使用行為,在心理學理論的指導下,我們定義了兩組屬性來衡量壓力和非壓力使用者在微網誌上的差異:1)來自單個推文内容的内容屬性;2) 來自使用者每周釋出推文行為的統計屬性。

3.1 Content Attributes

The content of a tweet from micro-blog usually consists of text, image and social interaction. We define linguistic, visual and social attributes from each part of a tweet respectively as follows:

  1. Linguistic Attributes:

    As users usually express their emotions using tweets, we measure the emotions in a single tweet using linguistic attributes. To describe the linguistic attributes, we leverage a psychological dictionary named “Language Inquiry and Word Count Dictionary” [20]. The simplified Chinese LIWC dictionary [21] is developed by Chinese psychologists and linguists, based on the psycholinguistic dictionary LIWC (http://www.liwc.net), which has been proved to be effective on determining affect in Twitter. It is composed of almost 4500 words and categorized into over 60 categories [20].

Based on the dictionary, we define the text content related features as the tweet’s linguistic attributes:

Positive and Negative Emotion Words (2 dimension). Measured by the number of positive and negative emotion words in the tweet’s text, indicating how positive or negative emotions are expressed in the tweet.

Positive and Negative Emoticons (2 dimensions). Measured by the number of positive and negative emotions. Emoticons are widely used in micro-blog platforms to express users’ emotional states. We manually categorize the 129 emoticons provided by Sina Weibo platform into positive and negative categories.

Punctuation Marks and Associated Emotion Words (4 dimensions). We use this attribute to signify the intensity of emotion in a tweet, either positive or negative according to the associated emotional words. Four typical punctuation marks (exclamation mark, question mark, dot mark and the Chinese full stop mark “。”) are considered.

Degree Adverbs and Associated Emotion Words (2 dimensions). Degree adverbs are also used to express the degree of emotions. For example, “I feel a little bit sad” and “I feel terribly sad” express different level of negative feelings. We use a number range of 1-3 to represent neural, moderate and severe degrees of positive expression and the minus to represent the negative ones.

Thus, we get 10-dimensional vector to denote the linguistic attributes from a tweet’s text content.

  1. Visual Attributes:

    Based on previous work on affective image classification [22] and color psychology theories [23], we combine the following features as the visual middle-level representation:

Five-color theme (15 dimensions): a combination of five dominant in the HSV color space, representing the main color distribution of an image. It has been revealed to have important impact on human emotions according to psychology and art theories [22].

Saturation (2 dimensions): the mean value of saturation and its contrast.

Brightness (2 dimensions): the mean value of brightness and its contrast.

Warm or cool color (1 dimension): ratio of cool colors with hue ([0-360]) in the HSV space between 30 and 110.

Clear or dull color (1 dimension): ratio of colors with brightness ([0-1]) and saturation less than 0.6.

Thus, based on the psychological studies and color theories, we finally get a 21 dimensional attributes from the tweet’s image content.

  1. Social Attributes:

    Besides the text content and image content of a tweet, some additional features like comments, retweets and likes indicate the tweet’s social attention from one’s friends. They can also imply one’s stress state to some degree. We use the number of comments, retweets and likes of a tweet to measure the tweet’s social attention degree into social attributes. Thus, we get a 3-dimensional vector to represent the social attributes of a tweet.

3.1内容屬性

微網誌推文的内容通常包括文本、圖像和社互動動。我們分别從推特的每個部分定義語言、視覺和社會屬性,如下所示:

1) 語言屬性:

由于使用者通常使用推文表達情感,我們使用語言屬性測量單個推文中的情感。為了描述語言屬性,我們利用了一個名為“語言查詢和字數詞典”[20]的心理詞典。簡體中文LIWC詞典[21]是由中國心理學家和語言學家在心理語言學詞典LIWC的基礎上開發的(http://www.liwc.net)這已經被證明是有效的确定影響在推特。它由近4500個單詞組成,分為60多個類别[20]。

基于字典,我們将文本内容相關特征定義為推特的語言屬性:

積極和消極情緒詞(2維)。通過推文文本中積極和消極情緒詞的數量來衡量,表明積極或消極情緒在推文中的表達方式。

正面和負面表情(2個次元)。通過積極和消極情緒的數量來衡量。表情符号廣泛應用于微網誌平台,表達使用者的情感狀态。我們手動将新浪微網誌平台提供的129個表情符号分為正面和負面兩類。

标點符号和相關情感詞(4個次元)。我們使用這個屬性來表示推特中的情緒強度,根據相關的情緒詞,可以是積極的,也可以是消極的。四種典型的标點符号(感歎号、問号、點号和中文句号)被考慮在内。

程度副詞和相關情感詞(2個次元)。程度副詞也用來表達情緒的程度。例如,“我感到有點難過”和“我感到非常難過”表達了不同程度的消極情緒。我們使用1-3的數字範圍來表示神經、中度和重度的陽性表達,使用負數來表示陰性表達。

是以,我們得到10維向量來表示推特文本内容的語言屬性。

2) 視覺屬性:

基于之前對情感圖像分類[22]和顔色心理學理論[23]的研究,我們結合以下特征作為視覺中層表征:

五色主題(15個次元):HSV顔色空間中五種主要顔色的組合,代表圖像的主要顔色分布。心理學和藝術理論表明,它對人類情感有重要影響[22]。

飽和度(2維):飽和度的平均值及其對比度。

亮度(二維):亮度及其對比度的平均值。

暖色或冷色(一維):HSV空間中色調([0-360])介于30和110之間的冷色比率。

透明或暗淡顔色(一維):亮度([0-1])和飽和度小于0.6的顔色的比率。

是以,基于心理學研究和色彩理論,我們最終從推特的圖像内容中獲得了21維屬性。

3) 社會屬性:

除了推文的文本内容和圖像内容外,一些額外的功能,如評論、轉發和喜歡,表明推文受到朋友的社會關注。它們也可以在某種程度上暗示一個人的壓力狀态。我們使用一條推文的評論、轉發和贊數來衡量推文的社會關注度,将其轉化為社會屬性。是以,我們得到了一個三維向量來表示推特的社會屬性。

3.2 Statistical Attributes

Statistical attributes are summarized from users’ tweets in a specific sampling period. We use one week as the sampling period in this paper. On one hand, psychological stress often results from cumulative events or mental states; on the other hand, users may express their chronic stress in a series of tweets rather than one. Appropriately designed statistical attributes can provide a macroscope of a user’s stress states, and avoid noise or missing data. We define statistical attributes from three aspects to measure the differences between stressed and non-stressed states based on users’ weekly tweet postings. The details of the statistical attributes are described as follows:

  1. Social Engagement:

    We consider 3 measures to characterize the social engagement from users’ weekly tweet postings: the @-mentions, @-replies and the retweets from a user’s friend. These three behaviors are the most commonly used ways to interact with friends on microblog platforms. Unlike the social attributes in a single tweet, the social engagement attributes are measured in numbers of @-mentions and @-replies in weekly tweet postings, indicating one’s social interaction activeness with friends.

  2. Behavioral Attributes:

    We define a set of behavioral measures for users, including tweeting time and tweeting types, based on the weekly tweet postings. These measures are described as follows:

    Tweeting time:

    Tweeting time can indicate users’ daily routines at some point. We consider two measures that derive from the tweeting time information of tweets: tweeting frequency and tweeting time distribution. Tweeting frequency is measured in the average number of tweets posted in a day, while tweeting time distribution is measured in numbers of tweets posted in hours with a 24 dimensional vector.

Tweeting Type:

Users usually post tweets on micro-blog with diverse motivations, making the tweets to be presented in different types. We categorize users’ tweets into mainly four types: 1) image tweets (tweets containing images) 2) original tweets (tweets that are originally posted by tweets’ users) 3) information query tweets 4) information sharing tweets (tweets that contain outside hyperlinks). We use a 4 dimensional vector of the numbers of tweets in the above 4 types respectively to represent the tweeting type attribute.

  1. Linguistic Style:

    We introduce measures to characterize linguistic styles in users’ weekly tweet postings using the psychological dictionary LIWC [20]. LIWC categorizes frequently-used words into more than 60 categories. We adapt 10 categories from LIWC that are related to daily life, social events, e.g.: personal pronouns, home, work, money, religion, death, health, ingestion, friends and family. We extract words from users’ weekly tweet postings and use a 10 dimensional vector of numbers of words in the 10 categories to represent the linguistic style attribute. Different from the linguistic attributes of a single tweet which mainly measures the emotions, the linguistic style can measure one’s linguistic behavior in aggregated tweets.

3.2統計屬性

統計屬性總結自特定采樣期内使用者的推文。本文以一周為采樣周期。一方面,心理壓力往往是由累積事件或心理狀态引起的;另一方面,使用者可能會通過一系列推文而不是一條推文來表達他們的長期壓力。适當設計的統計屬性可以提供使用者壓力狀态的宏觀範圍,并避免噪音或丢失資料。我們從三個方面定義統計屬性,根據使用者每周釋出的推文來衡量壓力狀态和非壓力狀态之間的差異。統計屬性的詳細資訊描述如下:

1) 社會參與:

我們考慮了三種衡量使用者每周推特文章社交參與度的名額:提及@、回複@和使用者朋友的轉發。這三種行為是在微網誌平台上與朋友互動最常用的方式。與單個推文中的社交屬性不同,社交參與屬性是以每周推文文章中@提及和@回複的數量來衡量的,表明一個人與朋友的社互動動活躍度。

2) 行為屬性:

我們根據每周釋出的推文,為使用者定義了一組行為度量,包括推文時間和推文類型。這些措施描述如下:

推特時間:

推特時間可以訓示使用者在某個時刻的日常活動。我們考慮從推文的推文時間資訊得出的兩個度量:推文頻率和推文時間分布。推文頻率以一天内釋出的平均推文數衡量,而推文時間分布則以24維向量以小時内釋出的推文數衡量。

推文類型:

使用者通常以不同的動機在微網誌上釋出推文,使推文呈現出不同的類型。我們将使用者的推文主要分為四類:1)圖像推文(包含圖像的推文)2)原始推文(最初由推文使用者釋出的推文)3)資訊查詢推文4)資訊共享推文(包含外部超連結的推文)。我們分别使用上述4種類型中推文數量的四維向量來表示推文類型屬性。

3) 語言風格:

我們使用心理詞典LIWC[20]介紹了描述使用者每周推特文章中語言風格的方法。LIWC将常用詞分為60多個類别。我們改編了LIWC中與日常生活、社會事件相關的10個類别,例如:人稱代詞、家庭、工作、金錢、宗教、死亡、健康、攝入、朋友和家人。我們從使用者每周釋出的推文中提取單詞,并使用10個類别中單詞數量的10維向量來表示語言風格屬性。與單個推文的語言屬性主要衡量情感不同,語言風格可以衡量聚合推文中的語言行為。

  1. MODEL AND LEARNING

    4.1 Architecture

    As described in section 3, we define low-level content attributes from each single tweet in tweet-scope, and statistical attributes from aggregated tweets in user-scope. In tweet-scope, we concern about the low-level content attributes of a single tweet as defined in Section 3.1, while in user-scope, we concern about one’s states reflected by several tweets in a period. These two sets of attributes cannot be combined directly since their mathematical descriptions are not in the same domain. So we need to generate latent userscope content attributes from low-level content attributes at first. After that, both of the two user-scope attribute sets, including the content attributes and statistical attributes, can be finally fed into a classifier for user-level stress detection.

In the following sections, we will address our solution through the following two key components: 1) First we design a convolutional neural network with cross autoencoders to generate user-scope content attributes from low-level content attributes, thus the tweet-scope content attributes can be combined with the userscope statistical attributes; 2) We propose a deep neural network model to incorporate the two types of user-scope attributes for user-level psychological stress detection.

4、模組化與學習

4.1架構

如第3節所述,我們定義推特範圍内每個推特的低級内容屬性,以及使用者範圍内聚合推特的統計屬性。在推特範圍内,我們關注第3.1節中定義的單個推特的低級内容屬性,而在使用者範圍内,我們關注一段時間内多條推特反映的個人狀态。這兩組屬性不能直接組合,因為它們的數學描述不在同一個域中。是以,我們首先需要從低級内容屬性生成潛在的使用者範圍内容屬性。然後,兩個使用者範圍屬性集(包括内容屬性和統計屬性)最終都可以輸入分類器,用于使用者級壓力檢測。

在以下幾節中,我們将通過以下兩個關鍵元件來解決我們的解決方案:1)首先,我們設計了一個帶有交叉自動編碼器的卷積神經網絡,從低級内容屬性生成使用者範圍内容屬性,是以推特範圍内容屬性可以與使用者範圍統計屬性相結合;2) 我們提出了一種深度神經網絡模型,将兩種類型的使用者範圍屬性結合起來,用于使用者級心理壓力檢測。

  1. EXPERIMENTS

    5.1 Experimental setup

    Dataset. We perform our experiments on four datasets DB1-DB4 collected from three different micro-blog platforms: Sina Weibo, Tencent Weibo1, and Twitter. DB1 from Sina Weibo has the most number of tweets and users which has been described in Section 2, Table 1. The details of the other 3 datasets are shown in Table 3. The Tencent Weibo (DB3) and Twitter (DB4) are labeled using the sentence pattern method described in Section 2. Especially, to avoid the noise in data ground truth, we establish a small scale dataset DB2 from Sina Weibo. DB2 is collected from the users that have shared the score of a psychological stress scale2 with 50 items via Sina Weibo. If the resulted score is over 80, then the test subject is claimed to be stressed. We crawl the shared scores and the corresponding users’ information and weeks’ tweets. In this way, for DB2 we finally get 98 weeks of stressed tweets (scale score > 80) and 112 weeks of non-stressed tweets (scale score < 80) as a small but reliable ground truth data to further validate the reliability of the sentence pattern based ground truth labeling method.

In the following experiments, we first train and test our model on the large-scale Sina Weibo dataset DB1. Then we further test our model on the other 3 datasets to show effectiveness of the proposed model on different data sources or different ground truth labeling methods. For all of our analyses, we use 5-fold cross validation, over 10 randomized experimental runs.

Comparison Methods. We compare the following classification methods for user-level psychological stress detection:

Naive Bayes (NB) is a simple probabilistic classifier based on Bayes’ theorem that calculates the posterior probability by calculating prior probability of attributes. The classifier assigns sample with the largest calculated posterior [26].

Support Vector Machine (SVM) is a popular and binary classifier that is proved to be effective on a huge category of classification problems. It tries to find a hyperplane that divides training samples into their classes with maximum margin [27]. In our problem we use SVM with RBF kernel which can handle most nonlinear binary classifications better.

Random Forest (RF) is an ensemble learning method for decision trees by building a set of decision trees with random subsets of attributes and bagging them for classification results [28].

Deep Neural Network (DNN). The proposed model in this paper. We use a 4-layer DNN with a softmax classifier for the detection task. We also evaluate the influence of using different size of networks.

Measures. For a fully investigation of proposed methods, we consider the following aspects: Performance. To evaluate the detection performance of our method, we evaluate the results with Accuracy and F1-score.

By dividing user samples as stressed (positive) and nonstressed (negative) ones, detection results of testing data can be categorized into the following classes:

True Positive (TP): stressed user sample correctly detected (true) as stressed (positive).

False Negative (FN): stressed user sample incorrectly determined (false) as non-stressed (negative).

False Positive (FP): non-stressed user sample incorrectly detected (false) as stressed (positive).

True Negative (TN): non-stressed user sample correctly determined (true) as non-stressed (negative).

Accuracy is the proportion of correct prediction or true results among testing samples. More formally it is given by

5實驗

5.1實驗裝置

資料集。我們在從三個不同的微網誌平台:新浪微網誌、騰訊微網誌和推特收集的四個資料集DB1-DB4上進行了實驗。新浪微網誌DB1的推文和使用者數量最多,如第2節表1所述。其他3個資料集的詳細資訊如表3所示。騰訊微網誌(DB3)和推特(DB4)使用第2節所述的句型方法進行标記。特别是,為了避免資料背景真實性中的噪聲,我們從新浪微網誌建立了一個小規模的資料集DB2。DB2是從通過新浪微網誌分享了50個條目的心理壓力量表2的使用者中收集的。如果結果分數超過80分,則表明受試者承受了壓力。我們抓取共享分數和相應使用者的資訊以及數周的推文。這樣,對于DB2,我們最終得到98周的有壓力推文(尺度分數>80)和112周的無壓力推文(尺度分數<80),作為一個小但可靠的基礎真理資料,以進一步驗證基于句型的基礎真理标記方法的可靠性。

在接下來的實驗中,我們首先在大規模新浪微網誌資料集DB1上訓練并測試了我們的模型。然後,我們在其他3個資料集上進一步測試了我們的模型,以證明所提出的模型在不同資料源或不同地面真值标記方法上的有效性。對于我們的所有分析,我們使用5倍交叉驗證,超過10次随機實驗運作。

比較方法。我們比較了以下用于使用者級心理壓力檢測的分類方法:

樸素貝葉斯(NB)是一種基于貝葉斯定理的簡單機率分類器,通過計算屬性的先驗機率來計算後驗機率。分類器為樣本配置設定最大的計算後驗值[26]。

支援向量機(SVM)是一種流行的二進制分類器,已被證明對一大類分類問題有效。它試圖找到一個超平面,将訓練樣本劃分為具有 最大裕度的類[27]。在我們的問題中,我們使用帶RBF核的支援向量機,它可以更好地處理大多數非線性二進制分類。

随機森林(RF)是一種決策樹內建學習方法,通過建構一組具有随機屬性子集的決策樹,并将其打包以獲得分類結果[28]。

深度神經網絡(DNN)。本文提出的模型。我們使用帶有softmax分類器的4層DNN進行檢測任務。我們還評估了使用不同大小網絡的影響。

措施。為了全面研究所提出的方法,我們考慮了以下幾個方面:性能。為了評估我們方法的檢測性能,我們使用準确性和F1分數來評估結果。

通過将使用者樣本分為應激(陽性)和非應激(陰性)樣本,測試資料的檢測結果可分為以下幾類:

真陽性(TP):應力使用者樣本正确檢測(真)為應力(陽性)。

假陰性(FN):應力使用者樣本錯誤地确定(假)為非應力(陰性)。

假陽性(FP):非應力使用者樣本被錯誤檢測(假)為應力(陽性)。

真陰性(TN):無應力使用者樣本正确确定(真)為無應力(陰性)。

準确度是測試樣本中正确預測或真實結果的比例。更正式的說法是

5.2 Detection Performance

To evaluate the effectiveness of the proposed model, we first perform a fully test against large-scale DB1 from Sina Weibo. We consider working with statistical attributes and content attributes extracted by proposed CNN with CAE from cross-modal tweets data of a week respectively, and then using both of them together. For the pooling method, we also test the all three methods: max pooling, mean -over-instance (MOI) pooling and mean-over-time (MOT) pooling. For comprehensive comparisons, we test SVM, RF, NB as well as the proposed DNN as classifiers in this experiment. For this experiment, a 4-layer DNN is used.

Table 4 demonstrates the results of extensive experiments. Regarding different classifiers, SVM gets an accuracy of 75.62% and F1-score 0.8341 using both attributes together and max pooling. RF gets similar results where the accuracy is 76.75% and F1-score is 0.8341. NB does not work well with statistical attributes. It gets its best result working with content based attribute alone using MOI pooling. The proposed DNN classifier reaches the overall best performance with an accuracy of 78.57% and F1-score of 0.8443. Classification using two types of attributes together with MOT pooling outperforms all the baselines. It achieves a ~3% improvement over SVM and ~2% improvement over RF. When it works with the single type of attribute or other pooling methods it also get competitive results.

基于深度神經網絡的社交媒體使用者級心理壓力檢測

As for comparison with previous work, due to the different goal, our results are not comparable with [17]. Actually, the most related user-level prediction work is [11], with the best result of 74% for a binary choice. Our model can achieve a more compelling result of 84%.

5.2檢測性能

為了評估所提出模型的有效性,我們首先對新浪微網誌上的大規模DB1進行了全面測試。我們考慮使用拟議的CNN和CAE分别從一周的跨模式推文資料中提取的統計屬性和内容屬性,然後将兩者結合使用。對于池方法,我們還測試了這三種方法:最大池、執行個體平均(MOI)池和時間平均(MOT)池。為了進行綜合比較,我們在本實驗中測試了支援向量機、RF、NB以及提出的DNN作為分類器。在本實驗中,使用了4層DNN。

表4顯示了大量實驗的結果。對于不同的分類器,同時使用屬性和最大池,支援向量機的準确率為75.62%,F1得分為0.8341。RF得到了相似的結果,準确率為76.75%,F1分數為0.8341。NB與統計屬性不比對。使用MOI池單獨使用基于内容的屬性可以獲得最佳效果。提出的DNN分類器達到了總體最佳性能,準确率為78.57%,F1分數為0.8443。使用兩種類型的屬性以及MOT池進行分類的性能優于所有基線。它比支援向量機提高了約3%,比射頻提高了約2%。當它與單一類型的屬性或其他池方法一起使用時,也會得到有競争力的結果。

至于與之前工作的比較,由于目标不同,我們的結果與[17]不可比。實際上,最相關的使用者級預測工作是[11],對于二進制選擇,最佳結果為74%。我們的模型可以實作84%的更令人信服的結果。

5.4 Factor Contribution Analysis

Impact of content and statistical attributes: Table 4 also reveals the impact of two types of attributes. With solely statistical or content attribute, all classifiers get fair results around accuracy of 70%. While both types of attributes are used, there is a growth of about 5%. Trend of F1-score is similar that using both types of attributes provides a better result. These results show the effectiveness of combining both classes of attributes, which also prove that the proposed model is reliable for user-level stress detection.

Impact of pooling methods: Comparison results using max pooling, MOI pooling and MOT pooling are also shown in Table 4. We can see that MOT pooling gets an obvious better result working with DNN. When SVM or RF is considered, all three methods get similar results and max pooling is fractionally ahead in all three pooling methods. In summary, MOT is a better choice for high performance detection.

Impact of different modalities in content attributes: Tweets content come with multiple modalities. To evaluate the contribution of each data modality, we conduct experiments with different combination of attributes. Since text is the necessary part of a tweet, we test using solely text attributes, using combination of text and visual attributes, using combination of text and social attributes, as well as using all attributes.

基于深度神經網絡的社交媒體使用者級心理壓力檢測

As shown in Table 5, we report predict performance of using content attributes (composed with only the named attributes in Table 5) alone as well as combining with statistical attributes. Using just text attribute gains rather high performance. Simply combining visual or social attributes even reduces the result, especially the social attributes. This trend is even more obvious when both types of attributes (content and statistical) are used. Nevertheless, using all attributes together outperforms using only text attributes. Highest detection performance is observed when using all attribute and working with both types of attributes.

Impact of scale of data. Model learning of the proposed CNN attributes extraction model with CAE is a key link of the whole framework. The model is trained in unsupervised scheme and takes advantage of large-scale unlabeled data. DNN classifier model also utilizes large-scale training data. We investigate the impact of data scale on training the network.

We measure the overall quality by final detection performance. In order to focus the discussion on neural network model, we evaluate with all attributes and only use content attributes. Figure 9 shows the trend of detection performance with different proportion of training data. In our case, the size of time series sets is the number of weeks. We pretrain with all data in DB1 (Table 1) and each filteris trained with roughly 1M patches when 100% data is used. We can see the advantage of using larger training set from the result.

基于深度神經網絡的社交媒體使用者級心理壓力檢測

Impact of size of network. Size of network is a critical issue in setting up DNN model. Shallow networks result in trivial model that cannot catch any underlying correlation in data, whereas too deep networks lead to over-complex model which is difficult to tune and may suffer from problems like over-fitting. To choose an appropriate DNN model for classification, we test DNN with different number of layers.

Table 6 summarizes the experiment results. It is clear that 2-layer is not enough for the model to get a satisfactory result. 3-layer model improve significantly while 4-layer model reaches the peak. 5-layer model does not get better result. This is mainly due to the network is too large that it cannot be tuned to a good local minimum with available data and within a feasible training time.

基于深度神經網絡的社交媒體使用者級心理壓力檢測

5.4因素貢獻分析

内容和統計屬性的影響:表4還顯示了兩種類型屬性的影響。僅使用統計或内容屬性,所有分類器都可以獲得大約70%的準确率。雖然使用了這兩種類型的屬性,但增長了約5%。F1得分的趨勢相似,使用這兩種類型的屬性可以提供更好的結果。這些結果表明了将這兩類屬性相結合的有效性,這也證明了所提出的模型對于使用者級壓力檢測是可靠的。

池方法的影響:使用最大池、MOI池和MOT池的比較結果也顯示在表4中。我們可以看到,使用DNN時,MOT池的效果明顯更好。當考慮支援向量機或RF時,這三種方法得到的結果相似,最大池在所有三種池方法中都略微領先。總之,MOT是高性能檢測的更好選擇。

不同模式對内容屬性的影響:推特内容具有多種模式。為了評估每個資料模式的貢獻,我們使用不同的屬性組合進行了實驗。由于文本是推特的必要部分,我們測試僅使用文本屬性,使用文本和視覺屬性的組合,使用文本和社交屬性的組合,以及使用所有屬性。

如表5所示,我們報告了單獨使用内容屬性(僅由表5中的命名屬性組成)以及與統計屬性相結合的預測性能。僅使用文本屬性可以獲得相當高的性能。簡單地結合視覺或社會屬性甚至會降低結果,尤其是社會屬性。當使用兩種類型的屬性(内容和統計)時,這種趨勢更加明顯。然而,同時使用所有屬性優于僅使用文本屬性。當使用所有屬性并使用這兩種類型的屬性時,可以觀察到最高的檢測性能。

資料規模的影響。利用CAE對提出的CNN屬性提取模型進行模型學習是整個架構的關鍵環節。該模型在無監督方案中訓練,并利用大規模未标記資料。DNN分類器模型還利用大規模訓練資料。我們研究了資料規模對網絡訓練的影響。

我們通過最終檢測性能來衡量整體品質。為了集中讨論神經網絡模型,我們使用所有屬性進行評估,并且僅使用内容屬性。圖9顯示了不同訓練資料比例下檢測性能的趨勢。在我們的例子中,時間序列集的大小是周數。我們用DB1(表1)中的所有資料進行預訓練,當使用100%資料時,每個濾波器用大約1M個更新檔進行訓練。從結果中我們可以看出使用較大訓練集的優勢。

網絡規模的影響。網絡規模是建立DNN模型的關鍵問題。淺層網絡導緻瑣碎的模型無法捕捉資料中的任何潛在相關性,而太深的網絡則導緻模型過于複雜,難以調整,并可能出現過拟合等問題。為了選擇合适的DNN模型進行分類,我們測試了不同層數的DNN。

表6總結了實驗結果。顯然,2層模型不足以獲得滿意的結果。三層模型顯著改善,而四層模型達到峰值。五層模型并沒有得到更好的結果。這主要是由于網絡太大,無法在可用資料和可行的訓練時間内将其調整到良好的局部最小值。

5.5 Model Efficiency

For the classification models aforementioned, we also consider their efficiency performance. Though the training of model can be done offline, efficiency is still a considerable factor for evaluating an algorithm. For DNN model, we sum up both pre-training phase and finetuning phase. Table 7 lists the CPU time of each model to train with all labeled data. The results show that training DNN takes around 5 hours which is still reasonable while it get the best detection performance results.

基于深度神經網絡的社交媒體使用者級心理壓力檢測

5.5模型效率

對于上述分類模型,我們還考慮了它們的效率性能。雖然模型的訓練可以離線完成,但效率仍然是評估算法的一個重要因素。對于DNN模型,我們總結了預訓練階段和微調階段。表7列出了每個模型使用所有标記資料進行訓練的CPU時間。結果表明,訓練DNN大約需要5小時,這仍然是合理的,但它獲得了最佳的檢測性能結果。

5.6 Results on Other Datasets

We further evaluate our model on other datasets DB2-DB4 to show that our model is a universal model. For this part of experiments, we use statistical attributes together with content attributes using MOT pooling, and with 4-layer DNN model.

DB2 from Sina Weibo with PSTR label. We use a matured model trained with large scale Sina Weibo dataset, and then test it against another set of subject independently sampled from Sina Weibo. For the test set, we collect weekly tweets from the users that have shared the score of a psychological stress scale with 50 items via Sina Weibo. Detection result shows that the test accuracy is 74.13% and f1-score is 0.7778, which approves that the overall model is consistent and the sentence pattern based ground truth labeling method is reliable.

DB3 from Tencent Weibo. We test on data collected from another major Chinese Micro-blog platform. For this test, we use the attribute extractor trained with large scale Sina Weibo dataset and only finetune the network with Twitter dataset in 5-fold. The accuracy is 76.78% and f1-score is 0.7915 which demonstrate the capability of the proposed model.

DB4 from Twitter. We also test against the twitter dataset. We still use the attribute extractor trained with large scale Sina Weibo dataset and only finetune the network with Twitter dataset in 5-fold. The accuracy is 67.43% and f1-score is 0.7224. One reason for this modest result is that users in Twitter dataset and Sina Weibo dataset come from different language and culture background. Another factor could be that the scale of this dataset is rather small. Subjects in the Twitter dataset are on the order of 10% of large scale Sina Weibo dataset. We look into the collected data and find that, by coincidence, all tweets in this dataset have no social activity. We suggest this is also a cause of the unsatisfactory result.

5.6其他資料集的結果

我們在其他資料集DB2-DB4上進一步評估了我們的模型,以表明我們的模型是一個通用模型。在這部分實驗中,我們使用統計屬性和使用MOT池的内容屬性,并使用4層DNN模型。

DB2來自新浪微網誌,帶有PSTR标簽。我們使用了一個用大規模新浪微網誌資料集訓練的成熟模型,然後用從新浪微網誌上獨立抽樣的另一組主題進行測試。對于測試集,我們每周收集來自使用者的推文,這些使用者通過新浪微網誌分享了50個條目的心理壓力量表分數。檢測結果表明,測試準确率為74.13%,f1分數為0.7778,驗證了整體模型的一緻性和基于句型的地面真值标注方法的可靠性。

DB3來自騰訊微網誌。我們測試了從另一個主要的中國微網誌平台收集的資料。在這個測試中,我們使用了使用大規模新浪微網誌資料集訓練的屬性提取器,并且隻使用推特資料集對網絡進行了5倍的微調。精度為76.78%,f1分數為0.7915,證明了該模型的能力。

來自Twitter的DB4。我們還針對twitter資料集進行了測試。我們仍然使用用大規模新浪微網誌資料集訓練的屬性提取器,并且隻使用推特資料集對網絡進行5倍微調。準确率為67.43%,f1得分為0.7224。這一适度結果的一個原因是推特資料集和新浪微網誌資料集中的使用者來自不同的語言和文化背景。另一個因素可能是該資料集的規模相當小。推特資料集中的主題約占新浪微網誌大規模資料集的10%。我們檢視了收集的資料,發現巧合的是,這個資料集中的所有推特都沒有社交活動。我們認為這也是結果不令人滿意的原因之一。

  1. CONCLUSION

    In this paper, we present a user-level psychological stress detection from users’ weekly micro-blog data. First we use the sentence patterns like “I feel stressed” to collect the ground truth labeled microblog data in week unit. Then we define a set of low-level content attributes from single tweet’s text, images and social interactions. We also present a variety of statistical attributes like behavioral attributes, social engagement and linguistic style attributes from users’ weekly tweet postings. A convolutional neural network with cross autoencoders is designed to aggregate weekly low-level content attributes and generate user-scope attributes. Finally we propose a deep neural network model to further learn higher-level attributes in user-scope and predict users’ stress. In our proposed method, the userscope attribute extractor and classification model forms a uniform deep architecture which bridges the gap between each single tweet and user’s psychological stress state. We test the model on four different datasets from major micro-blog platforms with different scales and ground truth labeling methods, and deeply discuss the influence of model parameters on experimental results. The results show that the proposed model is effective and efficient on detecting psychological stress from micro-blog data.

    6、結論

    本文提出了一種基于使用者每周微網誌資料的使用者級心理壓力檢測方法。首先,我們使用“我感到有壓力”這樣的句型,以周為機關收集标記為微網誌資料的基本事實。然後,我們從單個推文的文本、圖像和社互動動中定義了一組低級内容屬性。我們還提供了各種統計屬性,如使用者每周釋出的推文中的行為屬性、社交參與度和語言風格屬性。設計了一種帶有交叉自動編碼器的卷積神經網絡,用于聚合每周的低級内容屬性并生成使用者範圍屬性。最後,我們提出了一個深度神經網絡模型,以進一步學習使用者範圍内的進階屬性并預測使用者的壓力。在我們提出的方法中,使用者範圍屬性抽取器和分類模型形成了一個統一的深層架構,它彌合了每條推特和使用者心理壓力狀态之間的差距。我們使用不同的尺度和地面真值标記方法,在來自主要微網誌平台的四個不同資料集上測試了該模型,并深入讨論了模型參數對實驗結果的影響。結果表明,該模型能夠有效地檢測微網誌資料中的心理壓力。

繼續閱讀