Mindey's blog

How To Learn The True Intent of Society?

One of the major goals of the Infinity Project, is to help ourselves to define and pursue common goal: "if we are able to define our common goal, the problem of creating friendly artificial intelligence reduces to creating an optimization system to optimize for our common goal."

We started with a conceptualization of categories (need, goal, idea, plan, step, task, work), which people seem to use when they break down all of the work they do, and built a goal-pursuit/task-management system to help people define their goals explicitly.

Unfortunately, people don't always know what they want, and are often unable to explicitly define their true goals. Fortunately...

Probabilistic Programming

Probabilistic programming is a relatively new form of programming, that was developed as a result of a conceptual breakthrough in generalizing probabilistic modelling from inflexible Bayesian networks and graphical models to a probabilistic analogy of higher-order logic (HOL). There is a very good Google talk by Noah Goodman about this in the fourth conference on artificial general intelligence in 2011, explaining this new paradigm.

Unlike deep learning, probabilistic programming enables humans to understand the structure of complex probability distributions in a similar way that ...

The Infinity Project in Context of Context-Free Grammars.

The Infinity Project presents one possible way to understand how everything what people had ever created, was created, through introducing abstract categories (need, goal, idea, plan, step, task, work), which people seem to use when they break down the work they do. In this post, I take a look how the Infinity Project appears in the context of context-free grammars, - an abstract model for modelling languages. I conclude that from one stand point, these categories can be viewed as non-terminal symbols, based on which we could learn the production rules people use(d) to solve problems.

IPython Notebook Extensions, Django ORM Magic

Here you can find some useful IPython Notebook extensions. For example, Django ORM Magic lets you construct and query a Django database as follows:

[In] 1:

%load_ext django_orm_magic

[In] 2:

%%django_orm
 
from django.db import models
 
class Poll(models.Model):
   question = models.CharField(max_length=200)
   pub_date = models.DateTimeField('date published')

...

Wise Intelligence

Every life form exists today, because it survived the evolutionary pressure over billions of years, which induced the inclination to choose actions optimizing for survival, so, life forms are good at recognizing what's good for me, but have difficulty in recognizing what's good universally. However, it seems there is a criterion to decide what is good universally -- good is to let everything exist, and bad is to destroy everything, where everything is defined as the world as a whole, as well as the world as its perspective from every no matter how small or large part of it.

Importing IPython Notebooks as Modules

Wouldn't it be cool to import Jupyter Notebook just as a Python module? Well, there is a convenient way, you just need to add one line to your Jupyter configuration to execute several Python functions. So, do:

1. Generate the profile files.

$ jupyter notebook --generate-config

/home/you/.jupyter/jupyter_notebook_config.py

2. Edit the jupyter_notebook_config.py to append:

c.InteractiveShellApp.exec_files = [ "/home/you/.jupyter/notebook_finder.py" ]

Creating Multi-Indexed and Named DataFrame in Pandas in Statement

Sometimes you want to create a complex multi-indexed DataFrame with named axes in Pandas in just one line. Here's one way how:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(12).reshape(4,3),
                 index=pd.MultiIndex.from_arrays([['a','a','b','b'],
                                                  [1,2,1,2]],
                                                  names=['AAAAA','BBBBBB']),
                 columns=pd.MultiIndex.from_arrays([['VAR-A','VAR-A', 'VAR-B'],

Selecting Pandas DataFrame Observations from Custom Multiple Levels of MultiIndex

Suppose we have a multi-indexed dataframe, and want to select all observations with certain ids in a certain level or levels.

df = pd.DataFrame(np.arange(15).reshape(5,3),
                         index=[['x','x','x','y','z'],
                                ['a','a','b','b','c'],
                                [1,2,1,2,3]],
                         columns=[['VAR', 'VAR', 'VAR'],
                                  ['VAR-A','VAR-A', 'VAR-B'],
                                  ['var-a','var-b', 'var-c']]
)
df.index.names = ['one', 'two', 'three']

How to read list of dict to Pandas DataFrame

Suppose you have a list of dicts, and you want to get them into a dataframe. Here's an example how:

from pandas import DataFrame
data = [{'a': 1, 'b': 2, 'c': '3'},
        {'a':2, 'b':1},
        {'c':1}]
df = DataFrame.from_records(data)
df.head()

Here's what you get:

    a   b    c
0   1   2    3
1   2   1  NaN
2 NaN NaN    1
 
[3 rows x 3 columns]

How to read a jpeg image to Numpy Array

import matplotlib.image as mpimg
import urllib2 as urllib
import io
fd = urllib.urlopen("http://mindey.com/wamh.jpg")
image_file = io.BytesIO(fd.read())
img=mpimg.imread(image_file, format='jpg')

More info: http://mindey.com/reading_image_to_r_python.html, http://mindey.com/jpg2matrix.html

IPython Statistics Essentials, and "stt" Package

OK. Python is not convenient for statistical data analysis, because you have to import a number of packages every time you want to do mathematical operations: NumPy, SciPy, Pandas, SkLearn, StatsModels, Sympy, etc. Follow this guide to create your own package of imports. I had started creating one, which suits me, -- the stt package, which you can use, after you install everything from http://ipython.org/.

如何在 Linux 里登录 QQ 信使

用 Pidgin 的做法

1. 安装 Pidgin(Ubuntu 的话,可以在软件中心找到)
2. 安装 pidgin-lwqq 插件

sudo add-apt-repository ppa:lainme/pidgin-lwqq
sudo apt-get update
sudo apt-get install pidgin-lwqq

3. 附加新的账户的时候选择『WebQQ』,输入自己的账户的用户和密码。

就不要用 Wine 安装 QQ 的 Windows 版(一下的做法)

用 Wine 安装 QQ2012 的做法 (Thanks, S!)

注意:现在足够 http://www.longene.org/forum/viewtopic.php?f=6&t=4700 下载最新的版安装它。
 

How to convert Pandas DataFrame to xts and ts TimeSeries

It is very convenient to collect and slice various data streams using MongoDB and Pandas (mentioned here). However, since I recently had been reading this wonderful Rob J Hyndman's R course on forecasting, I realized that I need to be able to be able to convert Pandas DataFrames to R's xts and ts objects.

How to get data from Python to R

While working with AdWords data in R, I wanted to get data from a Python function. In those situations, rPython package comes in handy. It lets you define and execute arbitrary Python function, and get the results.

install.packages("rPython", repos = "http://www.datanalytics.com/R") 

Example

Suppose you have a Python function that returns a JSON-like dictionary. Then can import it into an R data.frame as follows:

Why You Should Invest in Information Processing Capabilities

Improving information processing capabilities:

1) Gave birth to life. Over four billion years ago, when first life forms appeard, they continued to exist, because certain molecules (like RNA) were able to process information, i.e., copy, and that was useful for survival;

一些建议如何设定 Python 对于统计分析

R用户来Python做统计分析的时候都一定知道IPython没有很重要的DataFrame类,也没有其它对于马上开始数据分析方便的函数。所以,希望用Python为了做数据分析的时候,我觉得最麻烦的事之一,是几乎每次要import这些包。那么,我在这里要描写一个很简单的方法如何把所有的重要的工具自动import:

建议

1. 生成IPython文件:

$ ipython profile create

这个命令应该生成config文件,比如:
/home/you/.config/ipython/profile_default/ipython_config.py
/home/you/.config/ipython/profile_default/ipython_qtconsole_config.py
/home/you/.config/ipython/profile_default/ipython_notebook_config.py

2. 下定义你需要什么:

编辑上面生成的所有的文件,里面包括,比如:

c.InteractiveShellApp.exec_lines = [

How to set up IPython for Statistics on Linux

People who come from R to Python to do statistical analysis know that opening IPython does not immediately have the needed DataFrame class, and a number of shorthands for immediate analysis. So, here is a simple way to have them by default upon the start-up, which I find very useful.

Advice

1. Generate IPython profiles

$ ipython profile create

This will create files such as:

/home/you/.config/ipython/profile_default/ipython_config.py
/home/you/.config/ipython/profile_default/ipython_qtconsole_config.py

How to read a MongoDB into Pandas DataFrame

MongoDB collections consists of binary JSON objects, the reading of which in Python is well covered here. However, I did not find a starightforward way to read the JSON objects into DataFrames, so here is one way I had found to complete the task.

高维数的数据分类的例子

最近有关对风险的兴趣,我别用机器学习的数据分类方法,想应用最优贝叶斯分类器的。但是,我希望分类高维数的数据的。还有,因有对风险的兴趣,我想练习一下如何做有参数的分布拟合。但是,我们知道在不少状况下(比如,人的脸的表面)我们没有一个合适数据的参数分布,所以想用非参数统计分布。其实,我最近看的有关在风险分析被应用的分布的的作者说我们应该用参数分布只在一下的4个状况下:1)根据理论这个数据应该服从一个分布,2)没有理论承认它,但是普遍承认一个随机变量服从一个分布,3)分布是个专家意见的好模型,并没有很高的准确性的需要,4)希望用长尾(超过极值/观测值)的分布。这些状况写以外的状况下作者推荐用非参数分布。那么,我做了这样的计划:

1. 先学习如何用R做高维随机向量。
2. 学习如何用R做主成分分析。
3. 学习如何用R做分布拟合。
4. 学习如何用R做非参数分布求出。
5. 应用简单的贝叶斯判定规则。

然后写了这样的代码:

# 步骤1: 随机向量发生:多变量正态分布。

 

一些重要的分类器

我今天被AIG公司的人做面试了。我希望做 Risk Analyst 的工作,因为我想自动化最优决策。我觉得我要复习以下我的知识关于分类器。请看看说,我没有错吗?

贝叶斯分类器

那么,我们先想起来吧这个定理吧。

定理(贝叶斯) 如果$A, B$是两个随机事件(把$A$叫它假设吧),那么如果我们知道概率$P(A), P(B)$的比例$\frac{P(A)}{P(B)}$和条件概率$P(B|A)$这三个概率,并如果随机事件$B$已经发生了,我们可以计算假设$A$的概率$P(A|B)$。

$$P(A|B) = P(B|A) \frac{P(A)}{P(B)}.$$

这个定理的证明依靠条件概率的定义:$P(X|Y) \overset{def}{=} \frac{P(X \cap Y)}{P(Y)}$。代入这个定义到贝叶斯定理的表示,求出:$$\frac{P(A \cap B)}{P(B)} = \frac{P(A \cap B)}{\color{red}{P(A)}} \frac{\color{red}{P(A)}}{P(B)}.$$

从这里 取消 $\color{red}{P(A)}$ 之后就得到等式的证明。

决策论和风险分析

我们都一定在追求最有用的知识。“有用”一定意味着“帮助实现我们的目标”。如果我们要选择一个最影响我们能控制的我们成功的因素,这个因素一定是我们的决定,对吧?我们的决定的好不好一定是最影响我们的生活的事。我这样思考着,最近看了一本很简单的书关于决策论,叫”The Ri9ht Decision“的书。这本书非常非技术的,可以用三个句子总结所有的真有用的内容。但是,这本书提醒我一些我过去知道的知识关于最优决定,风险分析,等等。我会先写关于这本书里知道的一些重要的(有用的)概念,然后写关于类似的风险分析的概念。

关于决策论

The Freedom of People Lies in Free Automation


One of the greatest weak links for people's freedom is the connection of monetary system to life supplies: food supply, housing and transportation, and ultimately, energy supply. To free itself, the public has to automate, and own the means of automation of the supply of the goods needed for life.*, Mindey, 2010...