Replace Any String In Columns With 1
Solution 1:
You can do this with DataFrame.replace()
with a regular expression:
In [14]: df
Out[14]:
fol T_opp T_Dir T_Enh
0 1 0 0 vo
1 2 vr 0 0
2 2 0 0 0
3 3 0 bt 0
In [15]: df.replace(regex={'vr|bt|vo': '1'}).convert_objects(convert_numeric=True)
Out[15]:
fol T_opp T_Dir T_Enh
0 1 0 0 1
1 2 1 0 0
2 2 0 0 0
3 3 0 1 0
If for some reason you're against dict
s, you can be very explicit about it too:
In [19]: df.replace(regex='vr|bt|vo', value='1')
Out[19]:
fol T_opp T_Dir T_Enh
01001121002200033010
But wait there's more! You can specify the columns you want to operate on by passing a nested dict
(keys cannot be regular expressions, well, they can but it won't do anything except return the frame):
In [22]: df.replace({'T_opp': {'vr': 1}, 'T_Dir': {'bt': 1}})
Out[22]:
fol T_opp T_Dir T_Enh
0 1 0 0 vo
1 2 1 0 0
2 2 0 0 0
3 3 0 1 0
EDIT: Since you to replace all strings with the number 1
(as per your comments below) do:
In [23]: df.replace(regex={r'\D+': 1})
Out[23]:
fol T_opp T_Dir T_Enh
01001121002200033010
EDIT: Microbenchmarks might be useful here:
Andy's method (faster):
In [11]: timeit df.convert_objects(convert_numeric=True).fillna(1)
1000 loops, best of 3: 590 µs per loop
DataFrame.replace()
:
In [46]: timeit df.replace(regex={r'\D': 1})
1000 loops, best of 3: 801 µs per loop
If you have columns containing strings that you want to keep
In [45]: cols_to_replace = 'T_opp', 'T_Dir', 'T_Enh'
In [46]: d = dict(zip(cols_to_replace, [{r'\D': 1}] * len(cols_to_replace)))
In [47]: d
Out[47]: {'T_Dir': {'\\D': 1}, 'T_Enh': {'\\D': 1}, 'T_opp': {'\\D': 1}}
In [48]: df.replace(d)
Out[48]:
fol T_opp T_Dir T_Enh Activity
01001 hf
12100 hx
22000 fe
33010 rn
Yet another way is to use filter
and join the results together after replacement:
In [10]: df
Out[10]:
fol T_opp T_Dir T_Enh Activity
0 1 0 0 vo hf
1 2 vr 0 0 hx
2 2 0 0 0 fe
3 3 0 bt 0 rn
In [11]: filtered = df.filter(regex='T_.+')
In [12]: res = filtered.replace({'\D': 1})
In [13]: res
Out[13]:
T_opp T_Dir T_Enh
0 0 0 1
1 1 0 0
2 0 0 0
3 0 1 0
In [14]: not_filtered = df[df.columns - filtered.columns]
In [15]: not_filtered
Out[15]:
Activity fol
0 hf 1
1 hx 2
2 fe 2
3 rn 3
In [16]: res.join(not_filtered)
Out[16]:
T_opp T_Dir T_Enh Activity fol
0 0 0 1 hf 1
1 1 0 0 hx 2
2 0 0 0 fe 2
3 0 1 0 rn 3
Note that the original order of the columns is not retained.
You can use regular expressions to search for column names, which might be more useful than explicitly constructing a list if you have many columns to keep. The -
operator performs set difference when used with two Index
objects (df.columns
is an Index
).
You'll probably need to call DataFrame.convert_objects()
afterward unless your columns are mixed string/integer columns. My solution assumes they are all strings so I call convert_objects()
to coerce the values to int
dtype
.
Solution 2:
Another option is to do this the other way around, first convert to numeric:
In [11]: df.convert_objects(convert_numeric=True)
Out[11]:
fol T_opp T_Dir T_Enh Activity
0100NaN hf
12NaN00 hx
22000 fe
330NaN0 rn
And then fill in the NaNs with 1:
In [12]: df.convert_objects(convert_numeric=True).fillna(1)
Out[12]:
fol T_opp T_Dir T_Enh Activity
01001 hf
12100 hx
22000 fe
33010 rn
Post a Comment for "Replace Any String In Columns With 1"