IPython Statistics Essentials, and "stt" Package

OK. Python is not convenient for statistical data analysis, because you have to import a number of packages every time you want to do mathematical operations: NumPy, SciPy, Pandas, SkLearn, StatsModels, Sympy, etc. Follow this guide to create your own package of imports. I had started creating one, which suits me, -- the stt package, which you can use, after you install everything from http://ipython.org/. Once ready, just install pip install -U stt, and then, whenever you need the statistical capabilities inside python, you can do from stt import *, and it will import things often needed in statistical analysis. You can check out what does it exactly import by typing from stt.verbose import *.

Anyway, once I have that installed, I can work with Python almost like with R.

So, open the IPython by the below command, and let's try:

ipython qtconsole --pylab inline

Reading data

# imports
from stt import *
 
# read data
df = pd.read_csv('http://www.quandl.com/api/v1/datasets/GOOG/NASDAQ_GOOG.csv?sort_order=asc', parse_dates=[0], index_col=[0])

Viewing data

df.head()
df[['Open', 'Close']]['2013-01-01':'2013-01-10']

Computing Descriptive Summary Statistics

df.describe()
s = df.describe().T; s

Adding Variables

s['skew'] = df.skew()
df['Id'] = range(df.shape[0])

Fitting Linear Regression Model

model = smf.ols('Close ~ Id', df).fit()
model.summary()

Computing Prediction from the Model, and Residuals

model.predict(df)
df['EY'] = model.predict(df)
df['Residuals'] = df['Close'] - df['EY']

Plotting model, and Residuals

df[['EY', 'Close']].plot()
df['Residuals'].plot()

All in one plot:

rcParams['figure.figsize'] = 12, 5
fig, axes = plt.subplots(ncols=2)
plot1 = df[['EY', 'Close']].plot(ax=axes[0], title='Regression Line')
plot2 = df['Residuals'].plot(ax=axes[1], title='Residuals')

Plotting Correlation Matrix

sm.graphics.plot_corr(df.corr(), xnames=list(df.columns));