Refactoring

Definition, for programming:

a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior
When I apply the term "refactoring" to markup, I'm mentally substituting:
  • "software" with "markup and stylesheets and whatever other components that a web browser interprets to display our web site"
  • "behavior" with "look, layout, style, and to some extent behavior too"

A warm thanks to Martin Fowler for giving us the terms we need in our industry to express ourselves coherently and succintly.

So we're taking a site in its present state, possibly a not-so maintainable state, and making small, discrete, isolated changes. We test that the change did not break or change anything. Once we're satisfied we can move on to the next small change. Over time, the entire site is slowly transformed into a clean, maintainable (even beautiful) one.

Small (yet important) flaws that were difficult to see (due to the overwhelming amount of markup in the original version) surface and are easily fixed. The pages become smaller, lighter. Browser parsing engines don't have as difficult a time loading our content. Less calls are made to the server to fetch a bunch of images. We discover that we didn't really need to make page x dynamic after all. It was an illusion, difficult to diagnose due to the complex structure of our site before we applied the refactorings. Indeed the process of refactoring a site is interleaved with the process of evolving it, growing it.

I recently inherited my local JUG site.

I admit it: like most JUG sites, our JUG site was really ugly. Our little community is thankful to one of our members for one day volunteering to redesign our site and give it a much improved, consistent look. It was a few months later that I inherited the duty of updating the content with new meeting information. I expected to find a modern, clean site. I was surprised when I opened the lid to find so many code smells lying beneath.

This particular site, I decided to refactor piecemeal. The main reason was simple: it was not a site I'd designed. I didn't quite understand how the various effects were implemented. I didn't want to break anything. The cautious way to proceed was to make small changes and test that I did not break anything.

Surprisingly, this site already contained a small stylesheet. I did the obvious thing first: I added global font styles to the stylesheet. Once I made sure that the global font styles properly applied to the text on the pages, I started removing <font> tags from the HTML pages. The DRY principle (Don't Repeat Yourself!) is at work here. Much refactoring in this domain has to do with removing duplication. In this case, we've got dozens, maybe hundreds of font tags, replaced with a single reference in a stylesheet. CSS inheritance of font styles ensures that styles applied to parent elements also "perculate" to the children.

Next, I began looking for tables containing only a single column. The technique of employing tables for the task of simulating the CSS box model used to be rampant. In pre-CSS days, there was no alternative. Just for the sake of applying padding, borders, and margins, a table would be summoned. Much markup was replaced with simple div tags. Each type of block needing a specific type of look would be categorized via the HTML class attribute, and styled appropriately in my stylesheet. This attribute was added to the HTML specification specifically to accomodate CSS.

Non-breaking spaces were all removed and their effects replaced with CSS margin and padding specifications. The same went for the <br> tag. References to blank images were removed and replaced with margins and padding as well.

I discovered a page that attempted to use tables to essentially simulate a float effect. The table was done away with and replaced with the CSS style property, designed specifically for such a layout task.

Sidebar elements were marked up uniformly and the styles for each list item were now applied uniformly. Another sidebar on the right hand side had the same problem: repetitive applications of the same styles were consolidated into a single set of rules that applied the style uniformly. The look of the site actually slightly improved. All margins between blocks were now uniform. The reason is simple: oftentimes, when the same style has to be applied repetitively, mistakes are made, and quality suffers.

CSS allowed me to easily apply diagnostic borders to various blocks to gain insight into the structure of the pages. This insight in turn revealed ways to simplify structures and collapse tables, nested divs, etc..

Once the minor stuff was cleaned up, I was now in a position to take a look at how some of the dynamic aspects of the site worked. There weren't many. The basic effect was the use of a single JSP that included a header, navbar, and content based on a parameter in the URL. Once I discovered that this was the only responsibility of the JSP file, I was able to consolidate two JSPs into a single file. I was able to eliminate a small but significant number of pages. I renamed pages to represent more clearly the content they carried, the role they played. I was now pleased that a directory listing yielded less files and more information.

I now knew where everything resided and where I needed to make a change. I'd finally gotten to a point the cumulative effects of my refactorings had created a site whose feel was distinctly better than its former state.

I continued refactoring in a similar fashion. I attacked other aspects of the site, one at a time. I inspected the construction of the site's header and undid its complexity. It's akin to taking knots out of tangled rope. The original version used tables and table cell rowspan and colspan tricks to make the ends of the header rectangle appear to protrude on either side. CSS relative positioning was applied instead to reproduce the look. In this one case, although I was able to reproduce its original effect, I finally decided to deviate from the original design and ended up going with a simpler (yet still decent) look.

Wherever possible, I removed extraneous content or markup. Less was more. The personality of the site changed subtly. As far as I know, no one has commented on these site changes to me. I'm assuming no one's noticed them.

Now, updating my local JUG's web site takes me less time, and is a much more, shall we say, fragrent experience.

If you're interested in actually reviewing the before/after pictures of this small site, I did check it in to CVS on austinjug.dev.java.net fairly early in the refactoring process. I do wish I'd made my CVS comments more specific instead of the useless "more cleanup" comment I usually went with. If you follow the above link, click on "Version Control", then on "Setup CVS command line client" for instructions on checking out the code. As for stats, I was able to make the site approximately 25% smaller, all the while adding content. But these figures highly depend on the quantity of content. A more useful figure would be the ratio of before to after markup, excluding the content (text). But I did not bother generating it.