Content Development

While there are many HTML editors that claim to be WYSIWYG in style, my early experience of many of them proved to be frustrating as the primary method of developing content. In this respect, a conventional WYSIWYG editor, such as Microsoft Word, was preferred from which other useful formats could also be generated. However, using Microsoft Word to produce usable HMTL code is not without its own set of problems.


In the current context, we are potentially talking about the development of content in the form of text, audio, images, animations and videos. In principle, Microsoft Word can handle all these formats, within a single document, with all the perceived advantages that you would expect from an established product, while supporting a large range of output formats including HTML. However, the format of the Microsoft HTML file is problematic when requiring a clear separation of the data content from its formatting, which is a goal of Cascading Style Sheets (CSS). While the Hyper Text Markup Language (HTML) is not a programming language, its development parallels some of the ideas embodied in object-orientated design in that data/content is separated from the code/formatting.

<meta ....>
<link ....>
<script >
<HTML tags> plus page content

The example above is representative of a HTML page at a very basic level, where <html> and </html> defines the scope of the web page in which <head> and </head> contains instructions and information associated with the page. The main content of the page is then encoded between <body> and </body> using numerous HTML <tags> that helps define how the content should be displayed, e.g.

<p><b>This text is bold</b></p>
<p><i>This text is italic</i></p>
<p>This is <sub>subscript</sub>and <sup>superscript</sup></p>

Again, the example above is only representative of just a few basic HTML formatting <tags> that can be extended to include the definition of lists, tables, images and the all important hyperlinks to other webpages. However, over time, the number of HTML <tags> continued to expand to meet the ever-growing sophistication of end-user devices.

But what has this to do with developing content?

The history of HTML goes back to the 1980s, when the formatting requirements were essentially orientated towards a one-colour standardised PC screen. However, the last 20 years has seen an explosion of device formats with different screen sizes and graphics capabilities, which have required decisions about the formatting of data content to be made dynamically, i.e. at run-time, based on the capability of the device receiving the webpage. Without addressing all the historical details, Cascading Style Sheets (CSS) were developed, which could be written as separate files, i.e. external to the main HTML file, but then included as required by instructions placed in the <head> section of the HTML page. Later, with the uptake of the javascript language, decisions about loading a specific version of a CSS file could be made, at run-time, based on the capabilities of the device loading the HTML file. However, the complexity of this situation was also compounded by the development of different browsers, by competing companies, which offered 'enhancements' over and above the generalised standards supported by HTML, which also existed in different versions across the market as a whole.

OK, but again, how does this affect the selection of a content editor?

As previously indicated, the development of HTML and CSS led to a partial separation of data content and the formatting information. Therefore, web developers needed to keep a tighter control over what goes into the CSS files, as opposed to the main HTML file. In essence, much of the formatting of the data content, which was originally contained within the HTML file, is now defined in external CSS files, which can address the capabilities of different devices. In this respect, Microsoft Word is not very supportive of its role as a general-purpose content editor, even though it supports the creation of a HTML file. For example if you open a WORD file and type a few characters and then 'Save As' a filtered web page and then examine the actual contents of the HTML output file using a plain text editor, such as Notepad, you will find that the <head> section contains a myriad of style definitions along with the excessive use of verbose class definitions within the main <body> section. In addition, the HTML output file, as created by Microsoft, appears to be designed to offer maximum compatibility towards its own Internet Explorer browser and not its competitors.

So why not use a purpose built HTML editor?

In theory, Microsoft's own replacement product, i.e Expression Web Studio, offers an enhanced ability to edit HTML, CSS and Javascript files in as much that it recognised the syntax of these language. It also support a level of WYSIWYG when displaying the HTML in 'design' mode as opposed to 'code' mode. The only problems is that while most of the HTML editors are now much improved and offer a form of WYSIWYG, they all seem to come with their own set of limitations. So despite the problems cited above, Microsoft Word is still used to develop all the content, simply because it is a better 'content' editor that also offers the flexibility of producing other useful formats, not just just single pages of HTML. However, the standard 'Save As' filtered HTML output has to be 'processed' through a visual basic program; see Update Automation for further details. So, based on this approach, all webpages are developed as WORD documents from which HTML output files are created, but where all Microsoft's verbose formatting and styles are subsequently removed using a Visual Basic program. This form of 'filtered' HTML file is then edited, using the HTML editor within the Microsoft Expression Web package in order to link the required formatting to the various style classes required by this specific website. As indicated, Expression Web can also be used to create and edit the CSS and javascript files plus has a number of tools for ensuring compatibility to an appropriate HTML standard.


It is recognised that this approach might appear overly complicated, but was driven by the desire to first develop the content in a recognised document format, which could then be used to created various formats, not just HTML. While there are many who do not like the dominance of Microsoft, the simple fact is that Microsoft Office has become a de-facto standard that is used by something like 500 million users worldwide. This number can be extended by those who prefer to use OpenOffice. As such, the WORD document format has proved to be reasonably future-proof to advances in technology over the last 20 years. However, it would be nice if somebody from Microsoft would consider adding a 'Save As' option that actually created a stripped out HTML file that only contained the most basic HTML tags with no additional class definitions.