繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

經過調參之後，預測準确率可以達到平均90%，上下波動範圍約10%。
看到預測的準确率還不錯，我提取出了特征的權重值，來思考決策樹為什麼預測準确率還不錯。然後做了政策不加預測大盤和加上預測大盤的對比。可以發現，一般的政策（例如純随機，小于4購買）在加上了對大盤預測的系數後，收益不一定得到了明顯提高，但是風險得到了明顯的降低。

經過提取特征值比對，發現對于未來30個交易日後的漲跌預測，最重要的是前30個交易日的最小值，最大值，以及日變化，可以簡單了解為市場震蕩程度。

可以認為決策樹預測大盤是有效的，對量化交易有一定的積極作用。

import numpy as np
import pandas as pd
from CAL.PyCAL import Date
from CAL.PyCAL import Calendar
from CAL.PyCAL import BizDayConvention
from sklearn.tree import DecisionTreeClassifier
start = '2014-01-01'                       # 回測起始時間
end = '2016-12-01'                         # 回測結束時間
benchmark = 'HS300'                        # 政策參考标準
universe = set_universe('HS300')  # 證券池，支援股票和基金
capital_base = 100000                      # 起始資金
freq = 'd'                                 # 政策類型，'d'表示日間政策使用日dw線回測，'m'表示日内政策使用分鐘線回測
refresh_rate = 1                           # 調倉頻率，表示執行handle_data的時間間隔，若freq = 'd'時間間隔的機關為交易日，若freq = 'm'時間間隔為分鐘

處理資料

×

1
fields = ['tradeDate','closeIndex', 'highestIndex','lowestIndex', 'turnoverVol','CHG','CHGPct']
2
stock = '000300'
3
#tradeDate是交易日、closeIndex是收盤指數、highestIndex是當日最大指數，lowestIndex是當日最小指數，CHG是漲跌
4
index_raw = DataAPI.MktIdxdGet(ticker=stock,beginDate=u"2006-03-01",endDate=u"2015-03-01",field=fields,pandas="1")
5
#擷取2006年3月1日到2015年3月1日，上一行代碼設定的所有索引的相關資訊。
6

7
index_date = index_raw.set_index('tradeDate')
8
index_date = index_date.dropna()
9
index_date['max_difference'] = index_date['highestIndex'] - index_date['lowestIndex']
10

11
index_date['max_of_30day'] = None
12
index_date['min_of_30day'] = None
13
index_date['max_difference_of_30day'] = None
14
index_date['closeIndex_after30days'] = None
15
#預設需要處理的值為None，友善之後直接用dropna函數去掉無效資料
16

17
for i in xrange(len(index_date)-30):
18
    #對資料進行處理
19
    index_date['max_of_30day'][i+30] = max(index_date['highestIndex'][i:i+30])
20
    #找出前30天最大值。
21
    index_date['min_of_30day'][i+30] = min(index_date['lowestIndex'][i:i+30])
22
    #找出前30天最小值
23
    index_date['max_difference_of_30day'][i+30] = max(index_date['max_difference'][i:i+30])
24
    #找出前30天最大日波動
25
    index_date['closeIndex_after30days'][i]=index_date['closeIndex'][i+30]
26
    #找出30天後的收盤價。
27

28
index_date = index_date.dropna()   #去掉前30個和後30個無效的資料。
29
lables_raw = index_date['closeIndex_after30days'] #提取出需要預測的資料
30
lables = index_date['closeIndex_after30days'] > index_date['closeIndex'] #為分類處理資料，判斷30天後的收盤價是否大于今日收盤價
31
lables_ud = lables.replace({True:'up',False:'down'}) #友善他人閱讀，将True和False改為up和down，意味着30天後收盤價漲了還是跌了
32
features = index_date.drop(['closeIndex_after30days'],axis = 1) #在特征值中去掉我們要預測的資料。

在未調參之前，我們先擷取一次準确率：得到0.88

from sklearn import cross_validation
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(features)
features_scaler = scaler.transform(features)
#上面兩行代碼用來标準化資料

X_train,X_test, y_train, y_test = cross_validation.train_test_split(features_scaler, lables, test_size = 0.2, random_state = 0)

clf_tree = DecisionTreeClassifier(random_state=0)
clf_tree.fit(X_train, y_train)
print "預測準确率為：%0.2f" % (clf_tree.score(X_test, y_test))

然後調C值，這裡我是先讓max_depth在1~100的range跑，然後作圖

×

1
i_list = []
2
score_list = []
3
for i in range(1,100,1):
4
    i=i/1.
5
    clf_tree = DecisionTreeClassifier(max_depth =i )   #使用SVM分類器來判斷漲跌
6
    clf_tree.fit(X_train, y_train)
7
    i_list.append(i)
8
    score_list.append(clf_tree.score(X_test, y_test))
9

10
score_list_df =  pd.DataFrame({'max_depth':i_list,'score_list':score_list})
11
score_list_df.plot(x='max_depth' ,y='score_list',title='score change with max_depth')

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

然後是min_samples_leaf值，同理。這裡是從0.1到10變動範圍

i_list = []
score_list = []
for i in range(1,100,1):
    i=i/10.
    clf_tree = DecisionTreeClassifier(min_samples_leaf  = i ) 
    clf_tree.fit(X_train, y_train)
    i_list.append(i)
    score_list.append(clf_tree.score(X_test, y_test))

score_list_df =  pd.DataFrame({'min_samples_leaf':i_list,'score_list':score_list})
score_list_df.plot(x='min_samples_leaf' ,y='score_list',title='score change with min_samples_leaf')

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

然後是min_samples，這裡選用2~50

i_list = []
score_list = []

min_samples  =  range(2,50,1)
for i in min_samples :
    clf_tree = DecisionTreeClassifier(min_samples_split  = i ) 
    clf_tree.fit(X_train, y_train)
    i_list.append(i)
    score_list.append(clf_tree.score(X_test, y_test))

score_list_df =  pd.DataFrame({'min_samples_split':i_list,'score_list':score_list})
score_list_df.plot(x='min_samples_split' ,y='score_list',title='score change with min_samples_split')

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

然後是min_weight_fraction_leaf……自己看圖。發現調整這個參數對模型影響意義并不大。

i_list = []
score_list = []

min_weight  =  [x /1000. for x in range(1,500,1)]
for i in min_weight :
    clf_tree = DecisionTreeClassifier(min_weight_fraction_leaf  = i ) 
    clf_tree.fit(X_train, y_train)
    i_list.append(i)
    score_list.append(clf_tree.score(X_test, y_test))

score_list_df =  pd.DataFrame({'min_weight_fraction_leaf':i_list,'score_list':score_list})
score_list_df.plot(x='min_weight_fraction_leaf' ,y='score_list',title='score change with min_weight')

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

知道了大緻參數最優範圍以後，我們使用grisearchCV在這個範圍内找到最優解。

from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import ShuffleSplit
params = {'max_depth':range(1,50,1),'min_samples_split':range(2,50,1)}

X_train,X_test, y_train, y_test = cross_validation.train_test_split(features_scaler, lables, test_size = 0.2, random_state = 0)

clf_tree = DecisionTreeClassifier() 

# cv_sets = ShuffleSplit(X_train.shape[0], n_iter = 10, test_size = 0.20, random_state = 0)

grid = GridSearchCV(clf_tree, params)
grid = grid.fit(X_train, y_train)
print grid.best_estimator_

然後在最優解的基礎上再次計算一次準确率

from sklearn import cross_validation
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(features)
features_scaler = scaler.transform(features)
#上面兩行代碼用來标準化資料

X_train,X_test, y_train, y_test = cross_validation.train_test_split(features_scaler, lables, test_size = 0.2, random_state = 0)

clf_tree = DecisionTreeClassifier(max_depth = 32 , min_samples_split=3 )   
clf_tree.fit(X_train, y_train)
print "預測準确率為：%0.2f" % (clf_tree.score(X_test, y_test))

然後我們通過傳回資料重要性，來看看對我們決策樹哪些特征影響最大。

通過下方的表格（和直方圖）發現影響最大的特征主要是30日的最小值，30日的最大值以及30日的最大日波動，然後是當日交易量。可以簡單認為，前30日的波動幅度，最大震蕩能一定程度上反映出未來30日後的市場漲跌趨勢。

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

為了判斷模型是否穩健，我們讓訓練集合處于變化中，然後觀察随着訓練集合的變化，準确率的波動範圍圖。這裡采取的是1000~2500資料每10個變化一次。

從圖的表現可以看出，波動範圍0.1左右，平均線在0.88左右，大緻還算穩健……吧。

num_list = []
score_list = []
for i in xrange((len(features_scaler)-1000)/2):
    num_now = len(features_scaler)%2 + 2*i +1000
    X_train,X_test, y_train, y_test = cross_validation.train_test_split(features_scaler[:num_now], lables[:num_now], test_size = 0.2, random_state = 0)
    clf_tree = DecisionTreeClassifier( max_depth = 32 , min_samples_split=3 )   #使用SVM分類器來判斷漲跌
    clf_tree.fit(X_train, y_train)
    num_list.append(num_now)
    score_list.append(clf_tree.score(X_test, y_test))
    
score_list_df =  pd.DataFrame({'sets_num':num_list,'accuracy':score_list})
score_list_df.plot(x='sets_num' ,y='accuracy',title='Accuracy with sets')

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

接下來是比對用的空白組，純随機政策（不控制風險，隻是随機買，1.20倍賣出）

import random
start = '2016-01-01'                       # 回測起始時間
end = '2016-12-01'                         # 回測結束時間
benchmark = 'HS300'                        # 政策參考标準
universe = set_universe('HS300')  # 證券池，支援股票和基金
capital_base = 100000                      # 起始資金
freq = 'd'                                 # 政策類型，'d'表示日間政策使用日線回測，'m'表示日内政策使用分鐘線回測
refresh_rate = 1                           # 調倉頻率，表示執行handle_data的時間間隔，若freq = 'd'時間間隔的機關為交易日，若freq = 'm'時間間隔為分鐘

def initialize(account):                   # 初始化虛拟賬戶狀态
    pass

features_list = []
def handle_data(account):
    random.shuffle(account.universe)       # 随機化股票池一遍随機政策
    for stock in account.universe:         # 股票是股票池中的股票，并且優礦幫你自動剔除了當天停牌退市的股票
        p = account.reference_price[stock]        # 股票前一天的收盤價
        cost = account.security_cost.get(stock)  # 股票的平均持倉成本
        if not cost:                           # 判斷目前沒有買入該股票
            order_pct_to(stock, 0.10)          # 将滿足條件的股票買入，總價值占虛拟帳戶的10%
        elif cost and p >= cost * 1.20:        # 賣出條件，當p這個價格漲幅到買入價的1.20倍；
            order_to(stock, 0)        # 将滿足條件的股票賣到剩餘0股，即全部賣出

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

然後是純随機政策基礎上，隻增加一個預測盤指的漲跌，如果預測漲，則随機買入，否則不買。和純随機政策比，的确好了一丢丢。

import random
2
start = '2016-01-01'                       # 回測起始時間
3
end = '2016-12-15'                         # 回測結束時間
4
benchmark = 'HS300'                        # 政策參考标準
5
universe = set_universe('HS300')  # 證券池，支援股票和基金
6
capital_base = 100000                      # 起始資金
7
freq = 'd'                                 # 政策類型，'d'表示日間政策使用日線回測，'m'表示日内政策使用分鐘線回測
8
refresh_rate = 1                           # 調倉頻率，表示執行handle_data的時間間隔，若freq = 'd'時間間隔的機關為交易日，若freq = 'm'時間間隔為分鐘
9
stock = '000300' #預測的指數，滬深300指數。和政策參考池一緻。
10
fields = ['tradeDate','closeIndex', 'highestIndex','lowestIndex', 'turnoverVol','CHG','CHGPct'] 
11
#tradeDate是交易日、closeIndex是收盤指數、highestIndex是當日最大指數，lowestIndex是當日最小指數，CHG是漲跌
12

13
def initialize(account):                   # 初始化虛拟賬戶狀态
14
    pass
15

16
features_list = []
17
def handle_data(account):
18
    # 生成買入清單
19
    last_date = account.previous_date.strftime("%Y-%m-%d") #擷取上一個交易日日期并格式化
20
    begin_date = pd.date_range(end=last_date,periods=60)[0] #擷取60日之前的交易日日期
21
    begin_date = begin_date.strftime("%Y-%m-%d") #格式化這個日期
22
    to_class = DataAPI.MktIdxdGet(ticker='000300',beginDate=begin_date,endDate=last_date,field=fields,pandas="1")
23
    to_class = to_class.dropna()
24
    to_class = to_class[-30:] #擷取我們要的30天的指數資訊
25
    to_class_date = to_class.set_index('tradeDate')
26
    to_class_date['max_difference'] = to_class_date['highestIndex'] - to_class_date['lowestIndex']
27
    
28
    to_class_date_max_of_30day = max(to_class_date['highestIndex'])
29
    #找出前30天最大值。
30
    to_class_date_min_of_30day = min(to_class_date['lowestIndex'])
31
    #找出前30天最小值
32
    to_class_date_max_difference_of_30day = max(to_class_date['max_difference'])
33
    #找出前30天最大日波動
34
    
35
    features_for_predict = to_class_date[-1:]
36
    features_for_predict['max_of_30day'] = to_class_date_max_of_30day
37
    features_for_predict['min_of_30day'] = to_class_date_min_of_30day
38
    features_for_predict['max_difference_of_30day'] = to_class_date_max_difference_of_30day
39
    
40
    features_fp_scaler = scaler.transform(features_for_predict)
41
    predict_up = clf_tree.predict(features_fp_scaler)
42

43
    #預測30天後的收盤是漲還是跌。
44
    random.shuffle(account.universe)
45
    for stock in account.universe:         # 股票是股票池中的股票，并且優礦幫你自動剔除了當天停牌退市的股票
46
        p = account.reference_price[stock]        # 股票前一天的收盤價
47
        cost = account.security_cost.get(stock)  # 股票的平均持倉成本
48
        if predict_up and not cost:                        # 判斷目前沒有買入該股票
49
            order_pct_to(stock, 0.10)          # 将滿足條件的股票買入，總價值占虛拟帳戶的10%
50
        elif cost and p >= cost * 1.20:        # 賣出條件，當p這個價格漲幅到買入價的1.20倍；
51
            order_to(stock, 0)        # 将滿足條件的股票賣到剩餘0股，即全部賣出

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

×

1
import random
2
start = '2015-01-01'                       # 回測起始時間
3
end = '2016-12-15'                         # 回測結束時間
4
benchmark = 'HS300'                        # 政策參考标準
5
universe = set_universe('HS300')  # 證券池，支援股票和基金
6
capital_base = 100000                      # 起始資金
7
freq = 'd'                                 # 政策類型，'d'表示日間政策使用日線回測，'m'表示日内政策使用分鐘線回測
8
refresh_rate = 1                           # 調倉頻率，表示執行handle_data的時間間隔，若freq = 'd'時間間隔的機關為交易日，若freq = 'm'時間間隔為分鐘
9
stock = '000300' #預測的指數，滬深300指數。和政策參考池一緻。
10
fields = ['tradeDate','closeIndex', 'highestIndex','lowestIndex', 'turnoverVol','CHG','CHGPct'] 
11
#tradeDate是交易日、closeIndex是收盤指數、highestIndex是當日最大指數，lowestIndex是當日最小指數，CHG是漲跌
12

13
def initialize(account):                   # 初始化虛拟賬戶狀态
14
    pass
15

16
features_list = []
17
def handle_data(account):
18
    # 生成買入清單
19
    last_date = account.previous_date.strftime("%Y-%m-%d") #擷取上一個交易日日期并格式化
20
    begin_date = pd.date_range(end=last_date,periods=60)[0] #擷取60日之前的交易日日期
21
    begin_date = begin_date.strftime("%Y-%m-%d") #格式化這個日期
22
    to_class = DataAPI.MktIdxdGet(ticker='000300',beginDate=begin_date,endDate=last_date,field=fields,pandas="1")
23
    to_class = to_class.dropna()
24
    to_class = to_class[-30:] #擷取我們要的30天的指數資訊
25
    to_class_date = to_class.set_index('tradeDate')
26
    to_class_date['max_difference'] = to_class_date['highestIndex'] - to_class_date['lowestIndex']
27
    
28
    to_class_date_max_of_30day = max(to_class_date['highestIndex'])
29
    #找出前30天最大值。
30
    to_class_date_min_of_30day = min(to_class_date['lowestIndex'])
31
    #找出前30天最小值
32
    to_class_date_max_difference_of_30day = max(to_class_date['max_difference'])
33
    #找出前30天最大日波動
34
    
35
    features_for_predict = to_class_date[-1:]
36
    features_for_predict['max_of_30day'] = to_class_date_max_of_30day
37
    features_for_predict['min_of_30day'] = to_class_date_min_of_30day
38
    features_for_predict['max_difference_of_30day'] = to_class_date_max_difference_of_30day
39
    
40
    features_fp_scaler = scaler.transform(features_for_predict)
41
    predict_up = clf_tree.predict(features_fp_scaler)
42

43
    #預測30天後的收盤是漲還是跌。
44
    random.shuffle(account.universe)
45
    for stock in account.universe:         # 股票是股票池中的股票，并且優礦幫你自動剔除了當天停牌退市的股票
46
        p = account.reference_price[stock]        # 股票前一天的收盤價
47
        cost = account.security_cost.get(stock)  # 股票的平均持倉成本
48
        if p<4 and  not cost:                        # 判斷目前沒有買入該股票
49
            order_pct_to(stock, 0.10)          # 将滿足條件的股票買入，總價值占虛拟帳戶的10%
50
        elif cost and p >= cost * 1.20:        # 賣出條件，當p這個價格漲幅到買入價的1.20倍；
51
            order_to(stock, 0)        # 将滿足條件的股票賣到剩餘0股，即全部賣出

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

處理資料

在未調參之前，我們先擷取一次準确率：得到0.88

然後調C值，這裡我是先讓max_depth在1~100的range跑，然後作圖

然後是min_samples_leaf值，同理。這裡是從0.1到10變動範圍

然後是min_samples，這裡選用2~50

知道了大緻參數最優範圍以後，我們使用grisearchCV在這個範圍内找到最優解。

然後在最優解的基礎上再次計算一次準确率

然後我們通過傳回資料重要性，來看看對我們決策樹哪些特征影響最大。

為了判斷模型是否穩健，我們讓訓練集合處于變化中，然後觀察随着訓練集合的變化，準确率的波動範圍圖。這裡采取的是1000~2500資料每10個變化一次。

從圖的表現可以看出，波動範圍0.1左右，平均線在0.88左右，大緻還算穩健……吧。

接下來是比對用的空白組，純随機政策（不控制風險，隻是随機買，1.20倍賣出）

然後是純随機政策基礎上，隻增加一個預測盤指的漲跌，如果預測漲，則随機買入，否則不買。和純随機政策比，的确好了一丢丢。

繼續閱讀

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

CRC32和CRC8校驗代碼，C語言版

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

C語言：初學者必定看懂的注釋！！！猴子吃桃問題。猴子第一天摘下若幹個桃子，每天都吃了前一天剩下的一半零一個，到第10天早上想再吃的時候，就剩下一個桃子. 求第一天共摘多少個桃子。

[轉]九大排序算法——C語言實作及詳解

while 循環、do- while 循環和 for 循環之間的那點事C語言自學之三種循環比較

結構體：typedef與struct的差別

hdu7108哈希

使用AdaBoost預測預測大盤漲跌 繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。

處理資料

在未調參之前，我們先擷取一次準确率：得到0.88

然後調C值，這裡我是先讓max_depth在1~100的range跑，然後作圖

然後是min_samples_leaf值，同理。這裡是從0.1到10變動範圍

然後是min_samples，這裡選用2~50

知道了大緻參數最優範圍以後，我們使用grisearchCV在這個範圍内找到最優解。

然後在最優解的基礎上再次計算一次準确率

然後我們通過傳回資料重要性，來看看對我們決策樹哪些特征影響最大。

為了判斷模型是否穩健，我們讓訓練集合處于變化中，然後觀察随着訓練集合的變化，準确率的波動範圍圖。這裡采取的是1000~2500資料每10個變化一次。

從圖的表現可以看出，波動範圍0.1左右，平均線在0.88左右，大緻還算穩健……吧。

接下來是比對用的空白組，純随機政策（不控制風險，隻是随機買，1.20倍賣出）

然後是純随機政策基礎上，隻增加一個預測盤指的漲跌，如果預測漲，則随機買入，否則不買。和純随機政策比，的确好了一丢丢。

繼續閱讀

使用AdaBoost預測預測大盤漲跌繼使用SVM預測大盤漲跌, 使用決策樹預測大盤漲跌後的第三個預測大盤漲跌的模型。包括調參的過程以及模型穩健性驗證。