Handwritten Digits資料集的簡介
根據官方對資料集的描述,我們可以知道完整的手寫體數字圖像分為兩個資料集合。其中,訓練資料樣本3823條,測試資料1797條,圖像資料通過8X8的像素矩陣表示,共有64個像素次元。1個目标次元用來标記每個圖像樣本代表的數字類别。該資料沒有缺失的特征值,并且不論是訓練還是測試樣本.在數字類别方面都采樣得非常平均,是一份非正常整的資料集。
We used preprocessing programs made available by NIST to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions.
我們使用NIST提供的預處理程式從預先列印的表單中提取手寫數字的标準化位圖。共有43人參加,其中30人參加了train,13人參加了test。32x32位圖分為不重疊的4x4塊,每個塊中的像素數都計算在内。這将生成8x8的輸入矩陣,其中每個元素都是0到16之間的整數。這減少了維數,并使小變形不變性。
Number of Instances: optdigits.tra Training 3823 optdigits.tes Testing 1797 The way we used the dataset was to use half of training for actual training, one-fourth for validation and one-fourth for writer-dependent testing. The test set was used for writer-independent testing and is the actual quality measure.
optdigits.tra 訓練3823份+測試1797份。我們使用資料集的方法是将一半的訓練用于實際訓練,四分之一用于驗證,四分之一用于依賴作者的測試。測試集用于獨立于作者的測試,是實際的品質度量。
屬性數64輸入+1類屬性7。對于每個屬性:所有輸入屬性都是0到16範圍内的整數。最後一個屬性是類代碼0..9 8。缺少屬性值無
内容轉載自:Optical Recognition of Handwritten Digits
9. Class Distribution
Class: No of examples in training set
0: 376
1: 389
2: 380
3: 389
4: 387
5: 376
6: 377
7: 387
8: 380
9: 382
Class: No of examples in testing set
0: 178
1: 182
2: 177
3: 183
4: 181
5: 182
6: 181
7: 179
8: 174
Handwritten Digits資料集的安裝
點選對應資料檔案即可下載下傳!
資料集下載下傳:
https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/訓練集網址:
https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.traHandwritten Digits資料集的使用方法
Two versions of this database available.
1) Preprocessed data can be found in optdigits.tra and optdigits.tes
See optdigits.names for information regarding the preprocessing.
2) The original format of the data can be found in files prefixed with
optdigits-orig.
Cathy Blake
Sept 3,1998
、