You Are What You Decide: A Journey in Automation of Our Selves. |
One of the major goals of the Infinity Project, is to help ourselves to define and pursue common goal: "if we are able to define our common goal, the problem of creating friendly artificial intelligence reduces to creating an optimization system to optimize for our common goal."
We started with a conceptualization of categories (need, goal, idea, plan, step, task, work), which people seem to use when they break down all of the work they do, and built a goal-pursuit/task-management system to help people define their goals explicitly.
Unfortunately, people don't always know what they want, and are often unable to explicitly define their true goals. Fortunately...
Probabilistic programming is a relatively new form of programming, that was developed as a result of a conceptual breakthrough in generalizing probabilistic modelling from inflexible Bayesian networks and graphical models to a probabilistic analogy of higher-order logic (HOL). There is a very good Google talk by Noah Goodman about this in the fourth conference on artificial general intelligence in 2011, explaining this new paradigm.
Unlike deep learning, probabilistic programming enables humans to understand the structure of complex probability distributions in a similar way that ...
The Infinity Project presents one possible way to understand how everything what people had ever created, was created, through introducing abstract categories (need, goal, idea, plan, step, task, work), which people seem to use when they break down the work they do. In this post, I take a look how the Infinity Project appears in the context of context-free grammars, - an abstract model for modelling languages. I conclude that from one stand point, these categories can be viewed as non-terminal symbols, based on which we could learn the production rules people use(d) to solve problems.
Here you can find some useful IPython Notebook extensions. For example, Django ORM Magic lets you construct and query a Django database as follows:
[In] 1:
%load_ext django_orm_magic
[In] 2:
%%django_orm from django.db import models class Poll(models.Model): question = models.CharField(max_length=200) pub_date = models.DateTimeField('date published')
Every life form exists today, because it survived the evolutionary pressure over billions of years, which induced the inclination to choose actions optimizing for survival, so, life forms are good at recognizing what's good for me, but have difficulty in recognizing what's good universally. However, it seems there is a criterion to decide what is good universally -- good is to let everything exist, and bad is to destroy everything, where everything is defined as the world as a whole, as well as the world as its perspective from every no matter how small or large part of it.
Wouldn't it be cool to import Jupyter Notebook just as a Python module? Well, there is a convenient way, you just need to add one line to your Jupyter configuration to execute several Python functions. So, do:
1. Generate the profile files.
$ jupyter notebook --generate-config
/home/you/.jupyter/jupyter_notebook_config.py
2. Edit the jupyter_notebook_config.py to append:
c.InteractiveShellApp.exec_files = [ "/home/you/.jupyter/notebook_finder.py" ]
Sometimes you want to create a complex multi-indexed DataFrame with named axes in Pandas in just one line. Here's one way how:
import pandas as pd import numpy as np df = pd.DataFrame(np.arange(12).reshape(4,3), index=pd.MultiIndex.from_arrays([['a','a','b','b'], [1,2,1,2]], names=['AAAAA','BBBBBB']), columns=pd.MultiIndex.from_arrays([['VAR-A','VAR-A', 'VAR-B'],
Suppose we have a multi-indexed dataframe, and want to select all observations with certain ids in a certain level or levels.
df = pd.DataFrame(np.arange(15).reshape(5,3), index=[['x','x','x','y','z'], ['a','a','b','b','c'], [1,2,1,2,3]], columns=[['VAR', 'VAR', 'VAR'], ['VAR-A','VAR-A', 'VAR-B'], ['var-a','var-b', 'var-c']] ) df.index.names = ['one', 'two', 'three']
Suppose you have a list of dicts, and you want to get them into a dataframe. Here's an example how:
from pandas import DataFrame data = [{'a': 1, 'b': 2, 'c': '3'}, {'a':2, 'b':1}, {'c':1}] df = DataFrame.from_records(data) df.head()
Here's what you get:
a b c 0 1 2 3 1 2 1 NaN 2 NaN NaN 1 [3 rows x 3 columns]
import matplotlib.image as mpimg import urllib2 as urllib import io fd = urllib.urlopen("http://mindey.com/wamh.jpg") image_file = io.BytesIO(fd.read()) img=mpimg.imread(image_file, format='jpg')
More info: http://mindey.com/reading_image_to_r_python.html, http://mindey.com/jpg2matrix.html
OK. Python is not convenient for statistical data analysis, because you have to import a number of packages every time you want to do mathematical operations: NumPy, SciPy, Pandas, SkLearn, StatsModels, Sympy, etc. Follow this guide to create your own package of imports. I had started creating one, which suits me, -- the stt package, which you can use, after you install everything from http://ipython.org/.
1. 安装 Pidgin(Ubuntu 的话,可以在软件中心找到)
2. 安装 pidgin-lwqq 插件
sudo add-apt-repository ppa:lainme/pidgin-lwqq sudo apt-get update sudo apt-get install pidgin-lwqq
就不要用 Wine 安装 QQ 的 Windows 版(一下的做法)
注意:现在足够 http://www.longene.org/forum/viewtopic.php?f=6&t=4700 下载最新的版安装它。
It is very convenient to collect and slice various data streams using MongoDB and Pandas (mentioned here). However, since I recently had been reading this wonderful Rob J Hyndman's R course on forecasting, I realized that I need to be able to be able to convert Pandas DataFrames to R's xts
and ts
objects.
While working with AdWords data in R, I wanted to get data from a Python function. In those situations, rPython package comes in handy. It lets you define and execute arbitrary Python function, and get the results.
install.packages("rPython", repos = "http://www.datanalytics.com/R")
Suppose you have a Python function that returns a JSON-like dictionary. Then can import it into an R data.frame as follows:
Improving information processing capabilities:
1) Gave birth to life. Over four billion years ago, when first life forms appeard, they continued to exist, because certain molecules (like RNA) were able to process information, i.e., copy, and that was useful for survival;
R用户来Python做统计分析的时候都一定知道IPython没有很重要的DataFrame
类,也没有其它对于马上开始数据分析方便的函数。所以,希望用Python为了做数据分析的时候,我觉得最麻烦的事之一,是几乎每次要import这些包。那么,我在这里要描写一个很简单的方法如何把所有的重要的工具自动import:
$ ipython profile create
/home/you/.config/ipython/profile_default/ipython_config.py /home/you/.config/ipython/profile_default/ipython_qtconsole_config.py /home/you/.config/ipython/profile_default/ipython_notebook_config.py
编辑上面生成的所有的文件,里面包括,比如:
c.InteractiveShellApp.exec_lines = [
People who come from R to Python to do statistical analysis know that opening IPython does not immediately have the needed DataFrame
class, and a number of shorthands for immediate analysis. So, here is a simple way to have them by default upon the start-up, which I find very useful.
$ ipython profile create
This will create files such as:
/home/you/.config/ipython/profile_default/ipython_config.py /home/you/.config/ipython/profile_default/ipython_qtconsole_config.py
MongoDB collections consists of binary JSON objects, the reading of which in Python is well covered here. However, I did not find a starightforward way to read the JSON objects into DataFrames, so here is one way I had found to complete the task.
最近有关对风险的兴趣,我别用机器学习的数据分类方法,想应用最优贝叶斯分类器的。但是,我希望分类高维数的数据的。还有,因有对风险的兴趣,我想练习一下如何做有参数的分布拟合。但是,我们知道在不少状况下(比如,人的脸的表面)我们没有一个合适数据的参数分布,所以想用非参数统计分布。其实,我最近看的有关在风险分析被应用的分布的书的作者说我们应该用参数分布只在一下的4个状况下:1)根据理论这个数据应该服从一个分布,2)没有理论承认它,但是普遍承认一个随机变量服从一个分布,3)分布是个专家意见的好模型,并没有很高的准确性的需要,4)希望用长尾(超过极值/观测值)的分布。这些状况写以外的状况下作者推荐用非参数分布。那么,我做了这样的计划:
1. 先学习如何用R做高维随机向量。
2. 学习如何用R做主成分分析。
3. 学习如何用R做分布拟合。
4. 学习如何用R做非参数分布求出。
5. 应用简单的贝叶斯判定规则。
然后写了这样的代码:
我今天被AIG公司的人做面试了。我希望做 Risk Analyst 的工作,因为我想自动化最优决策。我觉得我要复习以下我的知识关于分类器。请看看说,我没有错吗?
那么,我们先想起来吧这个定理吧。
定理(贝叶斯) 如果$A, B$是两个随机事件(把$A$叫它假设吧),那么如果我们知道概率$P(A), P(B)$的比例$\frac{P(A)}{P(B)}$和条件概率$P(B|A)$这三个概率,并如果随机事件$B$已经发生了,我们可以计算假设$A$的概率$P(A|B)$。
$$P(A|B) = P(B|A) \frac{P(A)}{P(B)}.$$
这个定理的证明依靠条件概率的定义:$P(X|Y) \overset{def}{=} \frac{P(X \cap Y)}{P(Y)}$。代入这个定义到贝叶斯定理的表示,求出:$$\frac{P(A \cap B)}{P(B)} = \frac{P(A \cap B)}{\color{red}{P(A)}} \frac{\color{red}{P(A)}}{P(B)}.$$
从这里 取消 $\color{red}{P(A)}$ 之后就得到等式的证明。
我们都一定在追求最有用的知识。“有用”一定意味着“帮助实现我们的目标”。如果我们要选择一个最影响我们能控制的我们成功的因素,这个因素一定是我们的决定,对吧?我们的决定的好不好一定是最影响我们的生活的事。我这样思考着,最近看了一本很简单的书关于决策论,叫”The Ri9ht Decision“的书。这本书非常非技术的,可以用三个句子总结所有的真有用的内容。但是,这本书提醒我一些我过去知道的知识关于最优决定,风险分析,等等。我会先写关于这本书里知道的一些重要的(有用的)概念,然后写关于类似的风险分析的概念。