Post Header
Along with the upgrade to Rails 3, there have been significant changes and improvements to our HTML sanitizing and parsing in Release 0.8.2. These changes should make things clearer for authors and much faster for readers!
Here is a quick breakdown for those who just want the highlights, followed by a more detailed explanation of what was changed and how it all works.
Highlights
- Blank lines and carriage returns will now be converted to paragraph (<p></p>) and line-break (<br />) tags in the text editor.
- The text will automatically be parsed and "cleaned up" -- any tags that were left open get closed, any mis-nested tags get fixed, etc.
- The text will be sanitized, to remove any elements that are potentially harmful to our server.
- This change fixes the known bug where switching from HTML mode to Rich Text mode causes all your paragraphs to disappear. (Yay!)
- This change will also allow users to embed video from: youtube, vimeo, blip.tv, dailymotion, viddler, metacafe, and 4shared. (Yay!)
What's Behind the Scenes
The new back end for content works in three steps.
- There is now a paragraph-adder that converts blank lines and carriage returns into paragraph tags (<p></p>) and break tags (<br />) based on a few simple rules:
- A blank line left between two pieces of text will be made separate paragraphs:
- A carriage return or newline in the middle of text will add a break tag:
- We also will preserve extra blank lines -- if you have TWO blank lines in a row, we will add in an empty paragraph:
- Note: The paragraph-adder will put <br /> tags at the end of each line whenever there is a carriage return, even in things like lists. So, if you have a nice chunk of HTML in your story that you coded up by hand like this:
Here is paragraph one.
Here is paragraph two.
will become:
<p>Here is paragraph one.</p>
<p>Here is paragraph two.</p>
Here is a line
with a carriage return in the middle.
will become:
Here is a line <br />
with a carriage return in the middle.
Here is paragraph one, and I want extra space after it.
Here is paragraph two.
will become:
<p>Here is paragraph one, and I want extra space after it.</p>
<p> </p>
<p>Here is paragraph two.</p>
<ul>
<li>Item one.</li>
<li>Item two.</li>
</ul>
You can avoid having <br /> tags added by putting the list into a single line with no carriage returns instead:
<ul><li>Item one.</li><li>Item two.</li></ul>
The next step is a Ruby on Rails gem (basically a kind of plugin) called Nokogiri, which parses the text and gives it back to us as a well-formed chunk of XHTML. What this means among other things is that:
- any tags that were left open get closed
- any mis-nested tags get fixed (eg, if you do <strong><em>foo!</strong></em> Nokogiri will turn that into the correct version (<strong><em>Foo!</em></strong>)
- any attribute values that aren't properly in quotes get fixed
Finally, we use the gem Sanitize to clean up this XHTML and take out anything that is legal but not necessarily safe. Sanitize uses a whitelist, meaning that only the tags and attributes we specifically tell it are allowed are let through. It's very customizable, and we have been able to write special rules for Sanitize to safely allow embeds of videos from specific sites (currently: youtube, vimeo, blip.tv, dailymotion, viddler, metacafe and 4shared.) Once Sanitize is done, the final version is saved into the database.
There is lots of documentation available on Nokogiri and Sanitize on their respective sites.
What you see when editing
- If you are working in a field (like content in the Post New Work form) that allows you to use the Rich Text Editor, the tags <p> and <br /> will show, because otherwise if you switch to the Rich Text Editor, it will do that horrible thing where your whitespace disappears and your text all runs together into one giant blob!
- If you manually put in some <p> tags that had extra attributes on them, like "<p align=center>", the tags will show.
- The <p> and <br /> tags will not show when you edit fields like notes and summary, however, where there is no option to use the Rich Text Editor.
Here's an example of how the tags will look on content in the Post New Work form:


Pages Navigation
RCS (RadioactiveCs) Sat 09 Jul 2016 06:28PM UTC
Comment Actions
rayningnight Sat 03 Dec 2016 10:32PM UTC
Comment Actions
SoulWeaver_Balinia (Naorimasa) Wed 19 Apr 2017 11:51PM UTC
Comment Actions
DistantVisitor Sat 15 Jul 2017 01:54AM UTC
Comment Actions
Zenithyl Fri 11 Aug 2017 08:21PM UTC
Last Edited Fri 11 Aug 2017 08:24PM UTC
Comment Actions
Avery_Fontaine Sun 26 Nov 2017 11:24PM UTC
Comment Actions
Account Deleted Sat 16 Jun 2018 07:06PM UTC
Comment Actions
Lieke_floor (Guest) Fri 17 Aug 2018 10:08AM UTC
Comment Actions
Pages Navigation