How to mass Search & Replace in text files.
William O'Higgins Witteman
william.ohiggins-H217xnMUJC0sA/PxXw9srA at public.gmane.org
Fri May 1 20:28:11 UTC 2009
On Fri, May 01, 2009 at 03:25:19PM -0400, Lance F. Squire wrote:
> Not finding a quick solution that worked in this case,
Sorry, work interferes :-)
> I have waded through the 154 files and cleaned them manually.
Drat, I'm late. For posterity, here is my quick and dirty solution,
that does work on my test data from your pastebin:
#!/usr/bin/python
"""
Strip the bad trojan horse junk out of an HTML file.
"""
import re, os, fnmatch
def stripick(string):
""""""
firstbit = r"</head>\s*<script language=javascript><!--\s*"
lastbit = r"\s*--></script>"
badbit = "\(function\(t.*?;"
wholething = firstbit + badbit + lastbit
pattern = re.compile(wholething)
newstring = re.sub(pattern, r"</head>", string)
return newstring
# Set the root of your recursive search here
top = "/home/willyyam/misc/python/cleanfiles"
for root, dirs, files in os.walk(top):
for file in files:
print(os.path.join(root, file))
if fnmatch.fnmatch(os.path.join(root, file), "*.html"):
fileobj = open(os.path.join(root, file), "r")
filestring = fileobj.read() # Get the file contents in memory
fileobj.close() # Close the file
newfilestring = stripick(filestring) # Clean the string
# Open the file for writing, clobbering it
fileobj = open(os.path.join(root, file), "w")
fileobj.write(newfilestring) # Write the new string into the file
fileobj.close() # Close the file
--
yours,
William
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://gtalug.org/pipermail/legacy/attachments/20090501/48ef9a7f/attachment.sig>
More information about the Legacy
mailing list