You Are What You Decide: A Journey in Automation of Our Selves. |
OK. Python is not convenient for statistical data analysis, because you have to import a number of packages every time you want to do mathematical operations: NumPy, SciPy, Pandas, SkLearn, StatsModels, Sympy, etc. Follow this guide to create your own package of imports. I had started creating one, which suits me, -- the stt package, which you can use, after you install everything from http://ipython.org/. Once ready, just install pip install -U stt
, and then, whenever you need the statistical capabilities inside python, you can do from stt import *
, and it will import things often needed in statistical analysis. You can check out what does it exactly import by typing from stt.verbose import *
.
Anyway, once I have that installed, I can work with Python almost like with R.
So, open the IPython by the below command, and let's try:
ipython qtconsole --pylab inline
# imports from stt import * # read data df = pd.read_csv('http://www.quandl.com/api/v1/datasets/GOOG/NASDAQ_GOOG.csv?sort_order=asc', parse_dates=[0], index_col=[0])
df.head() df[['Open', 'Close']]['2013-01-01':'2013-01-10']
df.describe() s = df.describe().T; s
s['skew'] = df.skew() df['Id'] = range(df.shape[0])
model = smf.ols('Close ~ Id', df).fit() model.summary()
model.predict(df) df['EY'] = model.predict(df) df['Residuals'] = df['Close'] - df['EY']
df[['EY', 'Close']].plot() df['Residuals'].plot()
All in one plot:
rcParams['figure.figsize'] = 12, 5 fig, axes = plt.subplots(ncols=2) plot1 = df[['EY', 'Close']].plot(ax=axes[0], title='Regression Line') plot2 = df['Residuals'].plot(ax=axes[1], title='Residuals')
sm.graphics.plot_corr(df.corr(), xnames=list(df.columns));