Red Ant

Welcome to our world

Semantics and web content management

The migration of the web toward semantics

In terms of formalisation for most of the web the prefix WWW should stand for Wildly Wrong Web, rather than the accepted World Wide Web. However the pervasiveness of the Internet and the web often obscures the fact that it is still relatively young with only a little over ten years of mainstream commercial and consumer adoption in the UK; and the growth of new technologies and techniques with their relative ease of introduction bodes well for greater formalisation.

In early commercial sites the prime concern with web sites was how they looked in Internet Explorer on Microsoft Windows, the consumers’ browser of choice. Certain companies may have pushed the boundaries a little and considered other browsers (Netscape), and even other platforms (Apple / UN*X). In each case the primary concern was how a person might view the site via whichever tool the site author considered they might be using. Beyond how easy a piece of HTML might be to maintain little thought was generally given to the underlying structure and meaning of the HTML source, only the visual aspect of a site was important.

The growth of search engines, triggered in no small part by the exponential growth of the sites available to search over has caused a degree of reassessment; and as search engines have grown cute a move towards semantics has occurred. The original search engines had a somewhat over reliance on meta-data, which forced commercial sites to adopt keywords and descriptions on each page in best practise trying to tag and describe the content of the page, in likelihood these would be abused to the terms that best matched what people might be searching for.

The abuse of meta-data led search engines to start reviewing the content of the pages. As content was reviewed visual tricks (shrinking text, commenting text, making text the same colour as the background), were used to lead to higher rankings. Again the hands of search engine programmers were forced into devising more intelligent algorithms.

Since the web is big business this pattern has continued, but engines are now reaching a point that encourages well crafted websites and increased semantic adoption. Web pages and sites that rate highly in engines are those that split presentation from content and follow good syntactical practise. By necessity the migration of the web has:

Overtaken cunning [...], had sped past devious, had left artful far behind and had now, by a roundabout route, arrived at straightforward.

The meaning of web semantics

As a general term the title wraps it up, semantics are the study of meaning. In reference to the web semantics are about trying to place meaning on the content being returned by web sites and web pages.

The relevance of meaning is highlighted in the previous section in reference to search engines. Crawling search engines are driven by computers and computers are by nature not intelligent. A computer program may download, store, run algorithms against any particular web page, but it will not be able to understand adequately the meaning, content and context.

Semantics are closely tied to ontologies. Ontology is the study of what elements are and their relationship to other elements. Referring back to search engines and computer programs. A computer program may download a web page with content on it, but it does not know what that content is. Rather it will have a collection of words; maybe some structural inclination if the page is put together properly, but it cannot tell whether it is a press release, news article, bank statement etc. The program will have little idea about the content on the page or the relationship it has with other content both on the same page or remote.

Together semantics and ontologies combine to provide machine readable web pages and/or frameworks that describe what the content is and what the content means in context. When machines can follow what is going on across the web they can better serve the needs of people.

W3C Semantic Web

The W3C framework of providing semantics to the web uses two core technologies Resource Description Framework (RDF), and OWL (Ontological Web Language). In relation to a web page the RDF could be the following:

RDF-Ontology

Looking at the Dublin Core meta-data standard for web pages an RDF might take the following format:

RDF Snippet 1

In the example the Dublin namespace has been used to define general meta-data for the page, this can be extended to support multiple namespaces as in the following example:

RDF Snippet 2

OWL (Ontological Web Language), pushes back the boundaries of RDF and provides increased classification when describing ontologies.

Although OWL and RDF provide a framework until there are widely adopted standard ontologies this approach will be the future semantic web.

Semantics in HTML

Semantics within HTML documents are limited to document paradigm semantics (headers, lists, emphasised words etc.), which do not truly reflect the nature of the content. Whilst the nature and structure of the document can be defined using a collection of title, header, paragraph and list elements; the nature of the content itself is somewhat limited. Using the previous RDF example of a news article sitting on the same page as a piece of content, HTML provides no standard way of differentiating between each item of content.

An attempt to tackle the lack of semantic constructs within HTML is the microformats project. Microformats are a collection of open data formats using common HTML patterns and CSS class definitions. An example would be the hCard microformat which follows the vCard standard (RFC2426):

vCard

The corresponding hCard is:

hCard

Microformats are the leverage mechanism for grassroots developers to enhance the existing semantics of the web for increased semantics.

Need for Standardised Ontologies and Co-operation

Hopefully by this point the need for standardised ontologies is straightforward. Without standard ontologies, software even structured ontological grammars such as OWL and RDF will not be able to correctly interpret the data.

Microformats must follow / lead OWL/RDF initiatives to provide the linkage between the existing web semantic and the future web semantic. By adopting both routes towards semantics, each co-operating with the other as and when required the split between haves and have-nots over technical and feasible limitations can remain controlled.

With each of these points there is a foundation through which semantics can both evolve and grow around the web.

The Role of Content Management in Semantics

Use of a CMS is not an immediate panacea to web semantics. In fact many content management tools (both commercial and free), pay scant regard to even the most basic of semantic constructs. This is a pity since by constructing a CMS in the correct way a majority of the semantic overhead can be dealt with automatically. To a point where a computer will be generating what another computer can understand.

Areas of a semantic ready CMS

  • Ontological Breakdown

    It may sound like a mystical riddle, but “When is content not content?” The answer is a little less cryptic in that it is always content, but sometimes it’s a different type of content. Web pages are built from HTML, but that HTML may contain news articles, event listings, shopping baskets and a plethora of other types of content that is in addition to the document paradigm which forms HTML.

    A content management system that is semantic ready will have devised ontologies that link the more standard concepts of site, page and sub-page with more abstract concepts such as news, events, products, baskets, etc.

    The structures used within the ontological breakdown should wherever possible use or extend acknowledged existing formal grammars.

  • Content Abstraction

    Once the type of content is defined it must be possible to extract the data of the content from both presentation and render. If the data of the content is embedded within presentational or render information then any connecting semantic browser will have to attempt to extract the data, which is little better than having it sitting on the web page.

    For ease of interchange content data should be malleable. XML is a method of malleable message and data passing, with RDF, OWL and xHTML adopting XML it is and obvious choice. XML allows extension over time and rapid support of future standardised ontologies through XSL.

  • Best use of existing semantic constructs

    Whilst the semantic constructs available in HTML are limited to the document paradigm any semantic ready CMS should make best use of them. It should help authors and editors control document structure and encourage the use of semantic elements over purely visual elements.

  • Content in Context

    Content should be atomic in nature and therefore within the bounds of a CMS aware of its context. Consider a news article sitting within a page; if the page is already defined then within the semantic nature of the document it is likely to already have a H1 element, therefore on the render of the news article it should not include an additional H1 element. However if the news article has been referred to atomically it would take precedence and the title of the news article should form the H1 element and page title.

  • Multiple Delivery Mechanisms and the Semantic layer

    By using XML and controlled ontological breakdown it is possible to introduce a semantic layer into a content management system, as is shown in the following diagram:

    Layers

    The introduction of the semantic layer allows delivery in a number of different formats and technologies such that the underlying data can be passed seamlessly.

    By introducing the semantic layer content only is required to be entered once for each of the different delivery methods. Also future ontologies can be adopted easily by adding further delivery mechanisms.

    The last render method in the diagram was that of a SOAP render. Whilst there would be further steps to consider in any web service implementation; the application of a semantic layer to CMS is a distinct move towards being able to provide coherent web services and service orientated architectures.

Semantic Content Management

Content management is the leveraging technology that shields content authors from the technicalities of semantics. In the migration towards semantics and formalised ontologies it is the responsibility of content management system programmers to introduce semantics into their respective content management systems; those procuring content management systems should be requesting semantic delivery methods as standard and bodies such as W3C should be adopting not only ontological frameworks, but ontologies. There is a long way to go to delivery a semantic web, but it is a journey worth taking.

This blog entry was originally written by Richard Conyard 23rd June 2006.

April 26, 2009 Posted by | Semantics | , | Leave a Comment

An introduction to Web Accessibility

Developing a website with accessibility in mind not only maximises your market but has the added benefits of easier maintenance and greater support for search engines.

What is accessibility?

Some people use assistive technology such as screen readers, Braille displays or magnification, and access the internet on a variety of platforms including desktop computers, laptops, mobile phones and other devices. Accessibility is about making sure as many people as possible are able to access and interact with your website, regardless of disability or the restrictions of their browsing environment.

There are several guidelines and standards in place to ensure websites conform to a decent level of accessibility, most notably, the Web Accessibility Initiative (WAI) Web Content Accessibility Guidelines (WCAG). In the UK, the Disability Discrimination Act makes it a legal requirement for service providers to take reasonable steps to allow disabled people to access their services – and that includes websites.

Page Structure and CSS

HTML is designed to describe content, not to dictate styling. Traditionally, some web designers may have used a combination of text size and colour alone to distinguish headings and other page elements. While this may look correct visually, there is nothing in the HTML code to actually identify the different levels of headings, paragraphs and lists.

Assistive software such as screen readers rely on semantic HTML markup to correctly convey information to the user and to provide easy navigation methods to those who are unable to use a mouse. For example, many systems allow users to skip between headings – this only works if headings have been correctly identified in the source and not simply given a special colour or size.

Using correct markup and implementing Cascading Style Sheets (CSS) to dictate styling, keeping content and appearance separate, means that each content element has true meaning, regardless of its appearance. This separation allows the appearance of a web page to be customized to suit the user, and in addition to aiding accessibility, has the added benefit of providing search engines with useful information about the web page and again, allows for easier maintenance.

Links and Navigation

Imaging you find a link that says “click here”, without any surrounding text or explanation. Where will that link take you? Why should you click on it? Many users of assistive technology navigate web pages by ‘tabbing’ through the links on each page or by viewing a separate list of all the links, in a similar way to how sighted users can scan the page visually to look for what they want. “Click here” may make sense when read within a sentence, but when you encounter it out-of-context, there is no way to identify the target.

All text links must be descriptive and must clearly identify the target when read without the surrounding content. This technique has equally beneficial results for search engine optimization when keywords are included. Where images also act as links, the image alt attribute should be used to both describe the image and identify the target of the link.

Keyboard Navigation

Not everyone is able to use a mouse. Blind people cannot see where the mouse pointer is on the screen and users with mobility problems, including older users, don’t always have enough control in their hands to operate a mouse. Many of these usergroups will use their keyboard as an alternative and most tasks can be achieved using this method if the website is designed with these people in mind.

It should be possible to navigate to every page of the website using only the keyboard. If navigation systems are used that require mouse control, as is the case with some drop-down menus, an easily identifiable and accessible alternative route must be provided. It is also important that users are able to navigate through each page, in a logical order, and without triggering unexpected results such as popup-windows or causing the keyboard focus to be lost.

The Title Attribute

Often abused and misunderstood, the title attribute can be used with most HTML elements to provide additional information which is normally rendered by web browsers as a ‘tooltip’ when hovering over the element. Screen Reader support for the title attribute varies and they are often not announced if using the software’s default verbosity settings. For this reason, only additional, non-essential information should be provided in the title.

A common misconception is that every link must have a title. In the case of text links where the link phrase is descriptive enough, a title is unnecessary and simply repeating the link text in the title is worthless, particularly for screen reader users who may have to listen to the phrase twice.

The Alt Attribute

The alt attribute is used to give alternative textual content for non-textual elements, such as images. Users with screen readers or Braille displays cannot ‘see’ images – screen readers will announce and image when it finds one, but it cannot tell what the image shows or what its meaning is within the context of the page.

The alt attribute allows us to provide a description of the image, which can then be announced by a screen reader, transcribed into Braille or displayed in place of the image. As with the title attribute, there are situations where alt text should be omitted. Where images are used purely for decoration or layout and have to real meaning or impact on the page content, alt text is often more of a hindrance than a help.

Forms

One of the major benefits of a website is the ability to collect information from your visitors and the easiest way to achieve this is by using online forms. However, forms can present several barriers to accessibility if incorrectly implemented. Form fields should be correctly labelled with correctly positioned and associated ‘label’ lags – this allows screen readers, for example, to announce the correct name when interacting with each field and has the added benefit of providing a larger clickable area for mouse users.

Mandatory fields (those that require information to be entered before the form can be submitted) must be clearly identified in the field label – simply highlighting these fields with a different colour or other visual clue is insufficient for users who are able to see the screen. Similarly, if a form is returned with errors, it should be made obvious to which field the error applies and what the user has to do to correct their mistake.

JavaScript

Another common misconception is that users of screen readers and other assistive systems browse the internet with JavaScript disabled, or in a special browser that does not support JavaScript. In reality, the majority of these people use standard browsers such as Microsoft Internet Explorer and Mozilla Firefox and have no need to alter their basic configuration.

If JavaScript is used, it should not cause the user’s browser to behave unexpectedly such as spawning new windows without warning or causing the page to refresh or redirect. Equally, the website should remain usable if JavaScript is disabled.

April 26, 2009 Posted by | Accessibility, Semantics | | Leave a Comment

   

Follow

Get every new post delivered to your Inbox.