天天看點

在Spark中使用IPython Notebook

IPython Notebook現已更名為Jupyter Notebook,是一種互動式的筆記本,是可以用來重建和分享包含動态代碼、等式、可視化和解釋文本的文檔的Web程式。Spark提供了python解釋器pyspark,可以通過IPython Notebook将Spark的pyspark以筆記本這種互動式更強的方式來通路。

[[email protected] ~]# IPYTHON_OPTS="notebook --ip=1.2.3.4" pyspark
SPARK_MAJOR_VERSION is set to 1, using Spark
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
[I 22:28:23.204 NotebookApp] [nb_conda_kernels] enabled, 2 kernels found
[I 22:28:23.269 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 22:28:23.269 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 22:28:23.271 NotebookApp] [nb_conda] enabled
[I 22:28:23.333 NotebookApp] [nb_anacondacloud] enabled
[I 22:28:23.337 NotebookApp] Serving notebooks from local directory: /root
[I 22:28:23.337 NotebookApp] 0 active kernels 
[I 22:28:23.337 NotebookApp] The Jupyter Notebook is running at: http://1.2.3.4:8888/
[I 22:28:23.337 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 22:28:23.337 NotebookApp] No web browser found: could not locate runnable browser.
[I 22:28:34.702 NotebookApp] 302 GET / (172.31.64.222) 0.86ms
[I 22:32:28.357 NotebookApp] Creating new notebook in /Documents
[I 22:32:28.382 NotebookApp] Writing notebook-signing key to /root/.local/share/jupyter/notebook_secret
[I 22:32:36.049 NotebookApp] Kernel started: 4d304d11-f29f-456e-a9c2-c7dc30204cfd
           

啟動的指令為 IPYTHON_OPTS="notebook --ip=1.2.3.4" pyspark

需要提前安裝ipython。推薦使用Anaconda進行安裝。

在spark2.0之後的版本,使用上述指令會報錯:

[xdwan[email protected] bin]$ IPYTHON_OPTS="notebook --ip=211.71.76.25" ./pyspark

Error in pyspark startup:

IPYTHON and IPYTHON_OPTS are removed in Spark 2.0+. Remove these from the environment and set PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS instead.

在bashrc中增加環境變量:

vi .bashrc

增加:

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --ip=211.71.76.25"
           

重新啟動:

[[email protected] ~]$ pyspark
[I 14:33:18.032 NotebookApp] [nb_conda_kernels] enabled, 2 kernels found
[I 14:33:18.045 NotebookApp] Writing notebook server cookie secret to /home/xdwang/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:33:18.491 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 14:33:18.491 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'
[I 14:33:18.922 NotebookApp] [nb_anacondacloud] enabled
[I 14:33:18.933 NotebookApp] [nb_conda] enabled
[I 14:33:18.962 NotebookApp] Serving notebooks from local directory: /home/xdwang
[I 14:33:18.962 NotebookApp] 0 active kernels 
[I 14:33:18.962 NotebookApp] The Jupyter Notebook is running at: http://211.71.76.25:8888/
[I 14:33:18.963 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 14:33:18.964 NotebookApp] No web browser found: could not locate runnable browser.
[I 14:33:44.921 NotebookApp] 302 GET / (202.205.97.62) 1.95ms
           

繼續閱讀