Schema.org site aims to standardize tags for Web content managers

Schema.org is a group working toward a standard for HTML tags to help Web content managers develop metadata for better search results across the Web.

When it comes to the future of Web content management (WCM), the new digital government strategy laid out by the Obama administration has it just about right. In part it reads, "New expectations require the federal government to be ready to deliver and receive digital information and services anytime, anywhere and on any device."

To get to the point envisioned by the administration's plan, federal agencies have been charged with adopting the strategy, one principle of which is an information-centric approach for delivering services to consumers. Along those lines, agencies will have to separate information from presentation and Web content managers will be tasked with administering "discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information." 

Gone are the days of self-contained federal agency websites and predefined Web pages. Government offices and departments are adding semantics and structure to the content they publish, steps that will make the information more useful to website visitors.

It's time for other public and private organizations and their Web content managers to follow these pioneering efforts, and a group called Schema.org is working toward a standard for HTML tags to help reach that goal.

The magic of tagging

Of course, notions about tagging and organizing content are not new. After all, site owners have been adding hidden tags to their Web pages (for remote Web crawlers to read) since the mid-1990s. But what has now changed is that open standards for marking up information and for parsing all types of content have matured to deliver smart results. Tim Berners-Lee's vision for defining the semantic Web as "a web of data that can be processed directly and indirectly by machines" is rapidly becoming a reality.

Not surprisingly, search engines have pioneered the path toward semantic services. With the introduction of the now-discontinued Yahoo SearchMonkey in 2008 and Google Rich Snippets in 2009, Web managers were offered a compelling value proposition. If they marked up their content using predefined tags with a specific syntax, then Yahoo and Google would deliver better search results. The problem was that each search engine defined things in slightly different ways and Microsoft also wanted to provide comparable customer experiences with Bing.

It quickly became clear that a proliferation of differing tag sets for WCM wasn't the answer. It made little sense to require content publishers to use different terms for different search engines. Fortunately, the technical staffs at the three search engine vendors began coordinating their efforts to standardize the tagging terms and syntax in an open format.

More about Web content managers and their tasks

Find out about Web content management's new role fueling digitally driven business

Read about Ektron's take on mobile Web content management, responsive design, DAM

Learn how mobile WCM strategy poses a multiple-choice dilemma

Toward a Web-wide tagging standard

Schema.org -- launched in June of 2011 -- exists to support this initiative and develop the categories and tag sets for commonly used items across the Web.

The website includes a core set of almost 300 different item types (encompassing familiar entities such as "event," "organization," "person" and "place") in an extensible and continuously growing hierarchy of related terms. External authorities (groups that have domain knowledge about a topic) can define the tag sets for items in their areas of expertise and work with Schema.org's maintainers to incorporate the relevant item types into the schema. With the World Wide Web Consortium's June release of RDFa Lite 1.1, a subset of the Resource Description Framework in Attributes specification, there is also agreement on the syntax for defining structured data in HTML, which provides the mechanism for processing semantically annotated content across the Web.

As a result, Schema.org's open markup vocabulary is rapidly gaining popularity as the underlying technical standard for enriching content with the descriptive tags that embed semantics within information streams.

The importance of plumbing

Why are these recent developments and their underlying Web plumbing going to be so important? Better search experiences are only the tip of the iceberg for developing semantically aware applications for better Web content management.

With open data and content that is defined by standardized syntax and term sets in place, search engines and other machines are going to automatically recognize what particular chunks of information are all about and make connections among related items. Information accessible on the Web is going to be a lot smarter. Just take a look at what Google is now delivering with its Knowledge Graph. It's an example of what Web publishers might do in the future.

Web developers and content managers are going to find it a lot easier to integrate information from third-party data sources, such as data published by federal agencies, and they will be able to build more useful applications because of that capability. People are going to find their Web experiences more rewarding. The digital government strategy is fast becoming reality. Perhaps information from federal agencies can even be incorporated into Google's Knowledge Graph for netizens to discover.

Evolving with smart content

How then can you make your content available anytime, anywhere and on any device, starting with what you already have?

Clearly, only large organizations can devote resources that compare to what the U.S. government is dedicating to this effort. Yet everybody benefits when Web-accessible content is smart and useful.

As a first step, WCM professionals must optimize their content for search results. It's important for website visitors to easily find the stuff they want. Google Rich Snippets is a great tool for tagging content. Besides, it automatically incorporates the standardization efforts from Schema.org. Be sure to use this tool when appropriate.

Along the way, organizations are going to need to define the tag sets that describe their content. Once enterprises and Web content managers identify their areas of expertise and their site users, they can readily define the relevant terms (sometimes referred to as controlled vocabulary) for categorizing their content. This takes time and Web design insights, so be prepared to devote attention to the task.

Finally, it's important to redouble your efforts around site information architecture. Whatever else is going on, if you have a website, you're in the business of organizing and structuring information for a target audience. You've already done lot of the background work during site development; now it's time to make this implicit structure explicit. Enriching content and automatically categorizing it using relevant terms is an essential step for producing compelling customer experiences.

ABOUT THE AUTHOR
Geoffrey Bock is the principal of Bock & Co., a consultancy focusing on digital strategies for content and collaboration. He also is an author specializing in the business impacts of content technologies. He can be reached at geoffbock@gmail.com.

This was first published in September 2012

Dig deeper on Enterprise Web content management software

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchBusinessAnalytics

SearchDataManagement

SearchManufacturingERP

SearchOracle

SearchSAP

SearchSQLServer

Close