How To Find Duplicate Words In A Line Using Pandas?
Here is sample jason data. id opened_date title exposure state 1 06/11/2014 9:28 AM Device rebooted and crashed with error 0x024 critical open 2 06/11/2014 7:12 AM
Solution 1:
You can use loc
for selecting by condition created str.contains
with parameter case=False
. Last if you need list
use tolist
:
li = ['Sensor','0x024']
for i in li:
print (df.loc[df['title'].str.contains(i, case=False),'id'].tolist())
[3, 4]
[1, 4]
For storing you can use dict
comprehension:
dfs = { i: df.loc[df['title'].str.contains(i, case=False),'id'].tolist() for i in li }
print (dfs['Sensor'])
[3, 4]
print (dfs['0x024'])
[1, 4]
If you need function
, try get_id
:
defget_id(id):
ids = df.loc[df['title'].str.contains(id, case=False),'id'].tolist()
return"Input String = %s : Output = ID " % id +
" and ".join(str(x) for x in ids) +
" has '%s' in it." % idprint (get_id('Sensor'))
Input String = Sensor : Output = ID 3and4 has 'Sensor'in it.
print (get_id('0x024'))
Input String = 0x024 : Output = ID 1and4 has '0x024'in it.
EDIT by comment:
Now it is more complicated, because use logical and
:
defget_multiple_id(ids):
#split ids and crete list of boolean series containing each id
ids1 = [df['title'].str.contains(x, case=False) for x in ids.split()]
#http://stackoverflow.com/a/20528566/2901002
cond = np.logical_and.reduce(ids1)
ids = df.loc[cond,'id'].tolist()
return"Input String = '%s' : Output = ID " % id +
' and '.join(str(x) for x in ids) +
" has '%s' in it." % idprint (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 4 has '0x024 Sensor'in it.
If use logical or
, it is more easier, because or
in re
is |
, so you can use 0x024|Sensor
:
defget_multiple_id(id):
ids = df.loc[df['title'].str.contains(id.replace(' ','|'), case=False),'id'].tolist()
return"Input String = '%s' : Output = ID " % id +
' and '.join(str(x) for x in ids) +
" has '%s' in it." % idprint (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 1and3and4 has '0x024 Sensor'in it.
Post a Comment for "How To Find Duplicate Words In A Line Using Pandas?"