How to start Anaconda (Data Science Python toolset) on Arch Linux

data_science
python
linux
Tags: #<Tag:0x00007f38a2158ba0> #<Tag:0x00007f38a21587b8> #<Tag:0x00007f38a2158498>

#1

Anaconda - what?

Anaconda is a pre-packaged python development environment with many useful packages for data science tasks. If you like the iPython style Data Science tools, such as Google DataLab you might also like Anaconda. It can be quite time-consuming to compile all the hundreds of Python modules for your tools for each and every OS environment. Instead you can use a unified special Python distribution, like Anaconda.

It's another Python

You have to switch the interpreter, using a virtualenv.

➜ cd /opt/anaconda2 
➜  source bin/activate
(root) ➜  anaconda2 python
Python 2.7.11 |Anaconda custom (64-bit)| (default, Jun 15 2016, 15:21:30) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> 

Now this looks good:

  • 64bit Python 2.7
  • supported build
  • all from the AUR.
  • In case you have a support issue, you are on RHEL of course :slight_smile:

Please note that root here does not refer to the Linux system user. That’s the name of the environment, because we have not created a conda virtualenv for the tasks. That’s if you need to extend upon the pre-distributed modules. Which at this point is a different topic.

Start the navigator

(root) ➜  anaconda2 pwd
/opt/anaconda2
(root) ➜  anaconda2 bin/anaconda-navigator 

The result should be similar to this:

It’s confusing, but this is how it works. It makes sense, but it’s probably unexpected. There might be menu entries, but these won’t work.

And there we start our iPython notebook (Jupyter) and work with Python and annotations instead of comments. Also we get all the plots, charts and visualizations inline if we want. I have also rendered iFrames inside of it, for d3.js (SVG) or babylon.js (WebGL). It’s a quite handy tool, and so are the others.

  • Jupyter is probably the most exploration friendly and interactive coding tool for Python. In opposite to an IDE you create a document and not a program. For many scripting tasks this is better.
  • qtconsole is a Python interpreter shell with extra functions
  • Spyder is an IDE. Personally I use PyCharm with Anaconda, mostly because I find it easier to include libs like pyspark or elasticsearch this way.
  • glueviz is a little bit like IBM SPSS for statistical exploration. With a limited feature set.
  • I have never used orange-app.

Results

If you are using Arch Linux, you can simply install and run Anaconda. It’s a useful cross-platform Python distribution for Data Science tasks, with a lot of packages which can take time to compile if you use a package manager. Pip and easy_install (setuptools) can be quite messy to deal with, and distract from the core tasks at hand. And a 64bit Python 2.7 interpreter might not be the standard for every OS or Linux distribution.


Netflow data analysis with SiLK and Pandas