Skip to content Skip to sidebar Skip to footer

Python Pandas, A Function Will Be Applied To The Combinations Of The Elements In One Row Based On A Condition On The Other Row

It seems like there are similar questions, but I couldn't find a proper answer. Let's say this is my dataframe which has different observations for a different brand of cars: df =

Solution 1:

UPDATE:

In [49]: x = pd.DataFrame(np.triu(squareform(pdist(df[['distance']], my_func))),
    ...:                  columns=df.Car.str.split('_').str[0],
    ...:                  index=df.Car.str.split('_').str[0]).replace(0, np.nan)
    ...:

In [50]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[50]:
Car
BMW     197.0
Fiat      NaN
WW      221.0
dtype: float64

OLD answer:

IIUC you can do something like the following:

from scipy.spatial.distance import pdist, squareform

defmy_func(x,y):
    return2*x + 3*y

x = pd.DataFrame(
    squareform(pdist(df[['distance']], my_func)),
    columns=df.Car.str.split('_').str[0],
    index=df.Car.str.split('_').str[0])

it produced:

In[269]: xOut[269]:
CarBMWBMWBMWWWWWFiatFiatCarBMW0.095.086.092.0131.0119.0167.0BMW95.00.0116.0122.0161.0149.0197.0BMW86.0116.00.0116.0155.0143.0191.0WW92.0122.0116.00.0159.0147.0195.0WW131.0161.0155.0159.00.0173.0221.0Fiat119.0149.0143.0147.0173.00.0213.0Fiat167.0197.0191.0195.0221.0213.00.0

exluding the same brand:

In [270]:x.apply(lambdacol:col.index!=col.name)Out[270]:CarBMWBMWBMWWWWWFiatFiatCarBMWFalseFalseFalseTrueTrueTrueTrueBMWFalseFalseFalseTrueTrueTrueTrueBMWFalseFalseFalseTrueTrueTrueTrueWWTrueTrueTrueFalseFalseTrueTrueWWTrueTrueTrueFalseFalseTrueTrueFiatTrueTrueTrueTrueTrueFalseFalseFiatTrueTrueTrueTrueTrueFalseFalseIn [273]:x[x.apply(lambdacol:col.index!=col.name)]Out[273]:CarBMWBMWBMWWWWWFiatFiatCarBMWNaNNaNNaN92.0131.0119.0167.0BMWNaNNaNNaN122.0161.0149.0197.0BMWNaNNaNNaN116.0155.0143.0191.0WW92.0122.0116.0NaNNaN147.0195.0WW131.0161.0155.0NaNNaN173.0221.0Fiat119.0149.0143.0147.0173.0NaNNaNFiat167.0197.0191.0195.0221.0NaNNaN

selecting maximum per row:

In [271]: x[x.apply(lambda col: col.index != col.name)].max(1)
Out[271]:
Car
BMW     167.0
BMW     197.0
BMW     191.0
WW      195.0
WW      221.0
Fiat    173.0
Fiat    221.0
dtype: float64

max per brand:

In [276]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[276]:
Car
BMW     197.0
Fiat    221.0
WW      221.0
dtype: float64

Solution 2:

i, j = np.tril_indices(len(df), 1)

defmy_func(x,y):
    z = 2 * x + 3 * y
    return z

d = df.distance.values
c = df.Car.values
s = pd.Series(my_func(d[i], d[j]), [c[i], c[j]])

deftest_name(df):
    name = df.index[0]
    n1, n2 = map(lambda x: x.split('_')[0], name)
    return n1 != n2

s.groupby(level=[0, 1]).filter(test_name).groupby(level=1).apply(list)

BMW_1       [78, 104, 96, 128]
BMW_2     [123, 149, 141, 173]
BMW_3     [114, 140, 132, 164]
Fiat_1                   [173]
WW_1           [116, 138, 170]
WW_2                [177, 209]
dtype: object

Post a Comment for "Python Pandas, A Function Will Be Applied To The Combinations Of The Elements In One Row Based On A Condition On The Other Row"