Nuxeo mailing list archives
[Nuxeo-tickets] [Nuxeo Repository] #1760: PortalTransforms: faster
scrubHTML
Nuxeo Repository
trac at nuxeo.com
Tue Oct 17 17:32:48 CEST 2006
#1760: PortalTransforms: faster scrubHTML
------------------------------+---------------------------------------------
Reporter: ybastide | Owner: trac
Type: enhancement | Status: new
Priority: P2 | Milestone:
Component: PortalTransforms | Version: TRUNK
Severity: normal | Keywords: PortalTransforms lxml html
------------------------------+---------------------------------------------
Hi,
!PortalTransforms.libtransforms.utils.scrubHTML is a function for cleaning
HTML, removing unknown tags and raising an exception if scripts, objects
and such are present. This function uses sgmllib's SGMLParser and is slow
as a dog.
Here's a much faster version using lxml.
Comments?
yves
--
Ticket URL: <http://svn.nuxeo.org/trac/pub/ticket/1760>
Nuxeo Repository <http://www.cps-project.org/>
Nuxeo Repository
This list archive provided by Nuxeo, the
leaders of open source ECM.
Check out the Nuxeo 5 open source,
standards-based ECM project.