Why Not Just Use `shell=true` In Subprocess.popen In Python?
Solution 1:
Using shell = True
can be a security risk if your input data comes from an untrusted source. E.g. what if the content of your mid
variable is "/dev/null; rm -rf /"
. This does not seem to be the case in your scenario, so I would not worry too much about it.
In your code you write the result of awk
directly to the filename in mid
. To debug the problem, you might want to use subprocess.check_output
and read the result from your awk
invocation in your python program.
cmd = """sort -n -r -k5 %s |
head -n 500|
awk 'OFS="\t"{{if($2-{1}>0){{print $1,$2-{1},$3+{1},$4,$5}}}}'""".format(summit, top_count)
subprocess.check_call(cmd, shell=True, stdout=file)
Solution 2:
(1) whether the "second way" will be slower than "first way"
Starting a new process is an expensive operation therefore there should not be a large difference between allowing the shell to parse the command line and start child processes and doing it yourself in Python. The only benchmark that matters is your code on your hardware. Measure it.
(2) if I have to write in "first way" anyway (because it's faster to write), how can I avoid the complain like broken pipe
The first "broken pipe" might be similar to: 'yes' reporting error with subprocess communicate(). Try the workaround I've provided there.
The second broken pipe you could fix by redirecting the pipeline stdout to the mid
file:
withopen(mid, 'wb') as file:
check_call(pipeline, shell=True, stdout=file)
It implements > {2}
in your command without the shell.
(3) what might be the most compelling reason that I shouldn't write in "first way"
if any of top_count
, extend
, mid
, summit
come from a source that is not completely under your control then you risk running an arbitrary command under your user.
plumbum
module provides both security and readability (measure time performance if it is important for you in this case):
from plumbum.cmd import awk, head, sort
awk_cmd = 'OFS="\t"{if($2-%s>0){print $1,$2-%s,$3+%s,$4,$5}}' % (extend/2,)*3
(sort["-n", "-r", "-k5", summit] | head["-n", "500"] | awk[awk_cmd] > mid)()
See, How do I use subprocess.Popen to connect multiple processes by pipes?
Solution 3:
It is unlikely to be any slower, but you can always test it with timeit to be sure. There are two good reasons not to do it the first way. The first is that while it may be marginally faster to type the first time, readability is greatly reduced, and Readability Counts. The second is that using shell=True
is a huge security risk, and should be avoided as a matter of principal.
Post a Comment for "Why Not Just Use `shell=true` In Subprocess.popen In Python?"