[Nuxeo-tickets] [Nuxeo Repository] #1760: PortalTransforms: faster scrubHTML

Nuxeo Repository trac at nuxeo.com
Tue Oct 17 17:32:48 CEST 2006


#1760: PortalTransforms: faster scrubHTML
------------------------------+---------------------------------------------
 Reporter:  ybastide          |       Owner:  trac                      
     Type:  enhancement       |      Status:  new                       
 Priority:  P2                |   Milestone:                            
Component:  PortalTransforms  |     Version:  TRUNK                     
 Severity:  normal            |    Keywords:  PortalTransforms lxml html
------------------------------+---------------------------------------------
 Hi,

 !PortalTransforms.libtransforms.utils.scrubHTML is a function for cleaning
 HTML, removing unknown tags and raising an exception if scripts, objects
 and such are present. This function uses sgmllib's SGMLParser and is slow
 as a dog.

 Here's a much faster version using lxml.

 Comments?

 yves

-- 
Ticket URL: <http://svn.nuxeo.org/trac/pub/ticket/1760>
Nuxeo Repository <http://www.cps-project.org/>
Nuxeo Repository



This list archive provided by Nuxeo, the leaders of open source ECM. Check out the Nuxeo 5 open source, standards-based ECM project.