Interview with Edward Z. Yang
In case you haven't known this before, Edward Z. Yang is the man behind HTML Purifier, which is a highly effective whitelist filter to prevent Cross Site Scripting. I recommend to remember his name by the way.
A couple of days ago I thought it would be a good idea to interview him about his product in order to promote it, pretty funny that Chris Shiflett apparently had the same idea.
Thanks to Edward for answering my questions. I hope you enjoy it as much as I did.
1) Could you tell my readers a few words about yourself?
Hi, my name is Edward Z. Yang, and I am responsible for bringing HTML Purifier into this world. As a PHP programmer, you'll also find me helping other people with their questions at DevNetwork forums and contributing to PHP's documentation.
2) What is HTML Purifier and whats so special about it?
HTML Purifier is a standards-compliant HTML filter. What makes it special is the keyword "standards-compliant"; HTML Purifier operates off of the principle that if you implement the HTML spec, you can create a foolproof filter. HTML Purifier knows everything there is to know about HTML: valid attributes, content models, CSS, chameleon tags, etc. Plus, it attempts to fix poorly written HTML, rather than emit cryptic error messages.
3) What is technically required to use it?
HTML Purifier is written in PHP and has been tested with PHP 4.3.2 or higher. I have, however, had individuals contact me about interfacing with the library from other programming languages: while no port of HTML Purifier currently exists (last I heard, someone was attempting a Java port, but I am not sure if it ever came to fruition), it is not difficult to create a wrapper command line script to call HTML Purifier with.
4) When did you start working on your product and what was your intention at that time?
The concept of HTML Purifier emerged the Spring of 2006. However, I had been toying around with the idea as far back as 2005; originally, I needed some way to filter HTML for a literature management system (now defunct). One class survived from that original body of code: MarkupLexer, which was essentially a token based HTML parser; everything else followed.
5) What kind of feedback did you receive after the first release?
The first public beta was released on August 16, 2006; the 1.0.0 release followed shortly after on September 1st. I vaguely remember the response being lukewarm: the original pitch went to members of DevNetwork forum who loved the library, but I didn't do very much publicity: I submitted a Digg story which got 7 diggs (2.0.0 didn't do much better, but I diversified and HTML Purifier was a hit over at DZone and del.icio.us)
6) Who does actually use the Purifier today?
The four projects I know of that use HTML Purifier by default are BitWeaver, PHProjekt, Lilina and TikiWiki (BitWeaver hasn't officially released the HTML Purifier enabled version yet). We also have extensions available for Drupal, Wordpress, and Modx. And then, of course, there are developers from all over the world (I've talked to French, Japanese, Chinese and German users of HTML Purifier) using HTML Purifier.
7) You have a comparison between HTML Purifier and similar filtering solutions on your website. Could you summarize the results?
In a nutshell, the comparison states that HTML Purifier is better than the rest. ::laughs:: Of course, no one would believe me if I said just that, so the document is pretty lengthy. Most of the filters use blacklists, which are fundamentally insecure, and I've also noticed that most of them don't seem to be actively maintained, which is a big no-no in combination with blacklists. None of them can offer standards-compliance, although SafeHTMLChecker comes close, and none of them offer standards-compliance and at the same time try to correct poorly written HTML.
8) You have recently released version 2.0.0 and 2.0.1. Could you describe the major improvements to previous versions?
HTML Purifier 2.0 adds the Tidy module (nothing to do with HTMLTidy, by the way) and Advanced API which effectively make HTML Purifier feature-complete with regards to HTML filtering. There's a little more work to be done with cleaning up MSWord HTML, but users have all the facilities they need to implement custom HTML tags and attributes. 2.0.1 is your average stability/maintenance release, but it also sneaks in a number of experimental features such as error reporting and auto-paragraphing.
9) What do you think about the present status of Web application security in general?
It's still far too easy to do the wrong thing. While helping out newbies at DevNetwork, this is quite evident: people will come in because their code doesn't work, and we'll end up also fixing SQL injections, XSS vectors, and poor coding in general. But things are getting better, there's more literature out there on security and general awareness of the issue has been rising.
10) Is there anything left you want to say?
For more information, you can check out the library at its website, or poke it at the demo.
Thanks!

0 Comments:
Post a Comment