When rewriting image tags, repoze.bitblt removes the doctype of any (X)HTML
content (cf. attached test). It should not.
I have found a fix for XHTML code (cf. attached patch) by changing how the
content is parsed. However, the bug persists for HTML content (when 'try_html'
is not enforced). I tried to use the same technique as for XHTML (using
lxml.etree.parse() instead of lxml.html.document_fromstring()) but the transformed
content then always includes a doctype. Perhaps we could then remove it when
it was not present in the original content, but it starts to be a bit more
complicated than it should... (I admit that I did not dig too much in lxml...)
In a nutshell, the attached patch will keep the doctype for XHTML content. For
HTML content, the current (bogus) behaviour is kept (and the doctype is
removed). Malthe (or anyone who uses this package), if you do not object, I'll
commit the patch.
|