Sanitize HTML

We've allowed html in paragraphs for far to long. The original wiki disallowed all tags and added its own markup for bullets, italic and bold. Can we find something simpler? Maybe several things? issue

Notes

paste code with < properly unescaped

subset markup to h3, br, li, i and b tags.

create a paragraph type for more generous html, like embed tags, maybe that start with angle bracket.

Consider additional protections regarding CSRF attacks.

Resources

Stack Overflow has some good suggestions.

Google's Caja has the most carefully reviewed solution.

Ruby sanitizer has options for permitting specific embed tags. github

Ruby gem for massaging HTML, including sanitization. github

def massage_html(html, url) sanitize_options = HtmlMassage::DEFAULT_SANITIZE_OPTIONS.merge( :elements => %w[ a img h1 h2 h3 hr table th tr td em strong b i ], :attributes => { :all => [], 'a' => %w[ href ], 'img' => %w[ src alt ], } ) begin HtmlMassage.html html, :source_url => url, :links => :absolute, :images => :absolute, :sanitize => sanitize_options rescue Encoding::CompatibilityError return end end