Auto HTML-aware text formatting (PHP)

I've written a small(ish) library for PHP to allow HTML-aware text formatting. The idea is that you pass it a string of text with some whitespace to denote general formatting structure and it should fill in the gaps itself, and do other things like auto-linking URLs. It should also leave existing HTML alone, which is pretty important, unless it appears badly formed in which case it will try to correct it (with varying success).

Code is on GitHub.

My host is still on PHP 5.2 so I'm afraid you'll have to live without namespaces.

Usage

1234
require_once 'src/HTMLFormatter.class.php';

$f = new HTMLFormatter();
echo $f->format($string);

More Information

Turns out this is a pretty hard problem if you want to do it right. Consider the fragment:

1234567
<h1>Heading</h1>

<ul>
    <li> 
        Something 
    </li>
</ul>

If we just take the obvious rules, we risk ending up with something like this:

1234567
<h1>Heading</h1>
<p></p>
<ul>
    <br><li>
	<br>Something
    <br></li>
<br></ul>

This kind of thing shouldn't happen, but there might be a few bugs hiding.

Performance seems okay. It's currently powering this blog, and the page rendering time (bottom left) appears only marginally slower than the less robust solution it's replacing. I haven't benchmarked it beyond that.

Talk is cheap

Leave a comment:

HTML is not valid. Use:
[url=http://www.google.com]Google[/url] [b]bold[/b] [i]italics[/i] [u]underline[/u] [code]code[/code]