I have the following list of data frames:
import pandas as pd
rep1 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux']), ('RP1',[1.00,23.22,11.12])], orient='columns')
rep2 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'w']), ('Gene', ['foo', 'bar', 'wux']), ('RP2',[11.33,31.25,22.12])], orient='columns')
rep3 = pd.DataFrame.from_items([('Probe', ['x', 'y', 'z']), ('Gene', ['foo', 'bar', 'qux'])], orient='columns')
tmp = []
tmp.append(rep1)
tmp.append(rep2)
tmp.append(rep3)
With this list output:
In [35]: tmp
Out[35]:
[ Probe Gene RP1
0 x foo 1.00
1 y bar 23.22
2 z qux 11.12, Probe Gene RP2
0 x foo 11.33
1 y bar 31.25
2 w wux 22.12, Probe Gene
0 x foo
1 y bar
2 z qux]
Note the following:
- Each DF will contain 3 columns, but last column could have different names
rep3
contain no value at 3rd column we'd like to discard it automatically- The row
w wux
only exist inrep2
, we'd like to include that and give the value 0 for other data frame that doesn't contain it.
What I want to do is to perform outer merge so that it produce the following result:
Probe Gene RP1 RP2
0 x foo 1.00 11.33
1 y bar 23.22 31.25
2 z qux 11.12 22.12
3 w wux 22.12 0
I tried this but doesn't work
In [25]: reduce(pd.merge,how="outer",tmp)
File "<ipython-input-25-1b2a5f2dd378>", line 1
reduce(pd.merge,how="outer",tmp)
SyntaxError: non-keyword arg after keyword arg
What's the right way to do it?