How to set up IPython for Statistics on Linux

People who come from R to Python to do statistical analysis know that opening IPython does not immediately have the needed DataFrame class, and a number of shorthands for immediate analysis. So, here is a simple way to have them by default upon the start-up, which I find very useful.

Advice

1. Generate IPython profiles

$ ipython profile create

This will create files such as:

/home/you/.config/ipython/profile_default/ipython_config.py
/home/you/.config/ipython/profile_default/ipython_qtconsole_config.py
/home/you/.config/ipython/profile_default/ipython_notebook_config.py

2. Define your imports and functions, for instance:

Edit ipython_notebook_config.py to include:

c.InteractiveShellApp.exec_lines = [
    'import pylab',
    'import numpy as np',
    'from numpy import array as Array',
    'import pandas as pd',
    'import matplotlib.pyplot as plt',
    'import scipy as sp',
    'import scipy.stats as st',
    'import statsmodels.api as sm',
    'import statsmodels.formula.api as stm',
    'from pandas import Index, MultiIndex, Series, Categorical, DataFrame, Panel',
    '%load_ext rmagic',
    'import pandas.rpy.common as rpy',
    'pd.set_option("notebook_repr_html", False)',
    'pd.set_option("max_rows",5000)',
    'pd.set_option("max_columns",50)',
    'pd.set_option("display.max_columns", 50)',
    'pd.set_option("display.height", 10000)',
    'pd.set_option("display.width", 10000)',
    'import sympy as sy',
    'from sympy import diff',
    'from sympy import integrate as intt',
    'a, b, c, x, y, z, t = sy.symbols("a b c x y z t")',
    'k, m, n = sy.symbols("k m n", integer=True)',
    'f, g, h = sy.symbols("f g h", cls=sy.Function)',
]
 
c.InteractiveShellApp.exec_files = [ "/home/you/.config/ipython/runatstartup.py" ]

Then edit runatstartup.py to include extra multi-line functions you need:

def to_r(df, name):
    """Sends variables to R (thanks to Wes McKinney)"""
    from rpy2.robjects import r, globalenv
    r_df = rpy.convert_to_r_dataframe(df)
    globalenv[name] = r_df

Note: In Windows XP, this does not work, the best what I found to work is:

def to_r(df, name):
    """Sends variables to R (Windows Version)"""
    from rpy2.robjects import r
    r_df = rpy.convert_to_r_dataframe(df)
    r.assign(name, r_df)

3. Restart your IPython

Here is an example of just one of the features that becomes available:

By the way, to avoid having to type all imports for other people, you can create your imports in a single file, and mention it on top of your python script like this:

# Imports: http://git.io/JGB_XQ
X = DataFrame({'x': [1,2,3], 'y': [4,5,6]})

You can generate a short url on git.io. This way you can share with others short useful lines of code.