天天看點

python安裝scikit-learn遇到問題彙總

python開源工具包:scikit-learn 是關于機器學習的開發包,首頁:http://scikit-learn.org/stable/index.html

這個包把經典的機器學習算法都利用python進行了實作,是學習機器學習很好理論與實踐結合材料,但是在安裝scikit-learn 出現各種奇怪問題,這裡做一個總結。

為了友善以後python各類工具包安裝,可以先安裝python easy_install 

下載下傳安裝python安裝工具      

我安裝在:D:\pytho27\Scripts 下,可以個這個路徑配置path ,這樣友善cmd 中直接調用 ,類似下圖:

python安裝scikit-learn遇到問題彙總

檢驗是否安裝成功如下圖:

python安裝scikit-learn遇到問題彙總

安裝了easy_install 之後安裝python的庫就很簡單了,以後需要安裝python的庫的話則直接在指令行使用

easy_install + python庫的名字 如:easy_install numpy

scikit-learn需要以下包或者工具:

  • Python (>= 2.6 or >= 3.3),
  • NumPy (>= 1.6.1),
  • SciPy (>= 0.9).

但是我在安裝後發現出現了一下幾種錯誤:

I cannot import datetime from a python script,

ValueError: numpy.ufunc has the wrong size, try recompiling

後面看到http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling

Numpy developers follow in general a policy of keeping a backward compatible binary interface (ABI). However, the ABI is not forward compatible.

What that means:

A package, that uses numpy in a compiled extension, is compiled against a specific version of numpy. Future version of numpy will be compatible with the compiled extension of the package (for exception see below). Distributers of those other packages do not need to recompile their package against a newer versions of numpy and users do not need to update these other packages, when users update to a newer version of numpy.

However, this does not go in the other direction. If a package is compiled against a specific numpy version, say 1.7, then there is no guarantee that the binaries of that package will work with older numpy versions, say 1.6, and very often or most of the time they will not.

The binary distribution of packages like pandas and statsmodels, that are compiled against a recent version of numpy, will not work when an older version of numpy is installed. Some packages, for example matplotlib, if I remember correctly, compile their extensions against the oldest numpy version that they support. In this case, users with the same old or any more recent version of numpy can use those binaries.

The error message in the question is a typical result of binary incompatibilities.

The solution is to get a binary compatible version, either by updating numpy to at least the version against which pandas or statsmodels were compiled, or to recompile pandas and statsmodels against the older version of numpy that is already installed.

Breaking the ABI backward compatibility:

Sometimes improvements or refactorings in numpy break ABI backward compatibility. This happened (unintentionally) with numpy 1.4.0. As a consequence, users that updated numpy to 1.4.0, had binary incompatibilities with all other compiled packages, that were compiled against a previous version of numpy. This requires that all packages with binary extensions that use numpy have to be recompiled to work with the ABI incompatible version.

大意就是我的numpy版本和scikit-learn版本不搭配,然後我解除安裝了numpy ,從numpy1.6 一直嘗試到1.8 發現1.8安裝後沖突消失。真讓人蛋疼安裝,推薦大家直接用內建的環境如:WinPython 之類的簡單配置環境,工具幫你比對好各種包。

Windows下的安裝過程簡便安裝

巨硬公司真是人類的希望。在Windows下安裝scikit隻需要安裝一個“十全大補包”(Cocoa命名)即可完成所有依賴庫的安裝。具體過程如下:

  1. 安裝Python2.7.6:下載下傳位址,如果沒什麼要求的話Python2就可以了。不過要注意有64位和32位的差別。
  2. 安裝十全大補包:下載下傳位址,包含了所有scikit所需的庫,并且有分别對應Python2、Python3以及64位、32位的版本,實在是太友善了。
  3. 安裝scikit:下載下傳位址
  4. 打完收工