We've allowed html in paragraphs for far to long. The original wiki disallowed all tags and added its own markup for bullets, italic and bold. Can we find something simpler? Maybe several things? issue
Notes
paste code with < properly unescaped
subset markup to h3, br, li, i and b tags.
create a paragraph type for more generous html, like embed tags, maybe that start with angle bracket.
Consider additional protections regarding CSRF attacks.
Resources
Stack Overflow has some good suggestions.
Google's Caja has the most carefully reviewed solution.
Ruby sanitizer has options for permitting specific embed tags. github
Ruby gem for massaging HTML, including sanitization. github
def massage_html(html, url) sanitize_options = HtmlMassage::DEFAULT_SANITIZE_OPTIONS.merge( :elements => %w[ a img h1 h2 h3 hr table th tr td em strong b i ], :attributes => { :all => [], 'a' => %w[ href ], 'img' => %w[ src alt ], } ) begin HtmlMassage.html html, :source_url => url, :links => :absolute, :images => :absolute, :sanitize => sanitize_options rescue Encoding::CompatibilityError return end end