Essential ATG Dynamo Training - Got atg Certified Relationship Management Developer?
This is not an official ATG site: ATG, Dynamo, Scenario Server and Personalization Server are trademarks or registered trademarks of Art Technology Group
Articles Exercises Resources Links Search

How to Auto-Load Content into a Content Repository

The article builds upon the previous article - How to set up an SQL Content Repository from Scratch 

DOWNLOAD FILES - Hybrid2.zip is a module implementing this example

In the previous article we saw how to create a hybrid SQL Content repository, we also saw how labor intensive and error prone it would be to maintain such a repository manually. This article discusses how that process can be automated.

Before you Start

  • Work through the previous article to set up the repository and understand what is going on
  • You must have the Hybrid module installed to run the Hybrid2 module for this article

Content Repository Loader

atg.adapter.gsa.ContentRepositoryLoader is the java class which implements automated content loading. A component of this class will scan a given set of disk folders and synchronize the database with the contents of the folder. This includes reading meta tag properties and populating those fields in the database.

The loader can populate the content repository database when it starts or can be configured to scan the disk folders on a schedule. In a production environment you probably won't want to configure the schedule, since this wastes cycles on production servers and you probably have an event which you can fire when content is changed and approved for publication. It's probably better to start the loader in response to an event. In a site with more than one Dynamo Server remember that they all share the same repository database and so you don't want to set up a loader on every server.

Repository Definition Tags used by the Loader

The Hybrid2 module contains a db\hybrid.xml file which will be combined with the file from Hybrid. Here are the main points of the combination

<gsa-template>
	<header>
		<description>This file adds the properties used by the content loader</description>
	</header>
	<!-- set the content-path-property to be used by the content loader -->
	<item-descriptor name="folder" content-path-property="path">
		<table name="hsqlc_folders" type="primary" id-column-name="id">
			<property name="path" data-type="string"/>
		</table>
	</item-descriptor>
	<item-descriptor name="article" content-path-property="path">
		<table name="hsqlc_articles" type="primary" id-column-name="id">
			<property name="path" data-type="string"/>
		</table>
	</item-descriptor>
</gsa-template>

Folder Item Descriptor

The content loader requires a content-path-property to be specified, we'll 

<item-descriptor name="folder" content-path-property="path">
	<table name="hsqlc_folders" type="primary" id-column-name="id">
		<property name="path" data-type="string"/>
	</table>
</item-descriptor>

Article Item Descriptor

Here too we set the content-path-property to path which we now store in the database

	<item-descriptor name="article" content-path-property="path">
		<table name="hsqlc_articles" type="primary" id-column-name="id">
			<property name="path" data-type="string"/>
		</table>
	</item-descriptor>

Configuring the Loader

The loader uses a component of type HTMLMetaTagParser, this component doesn't require much setup, we just need to create the component: /db/HTMLMetaTagParser

$class=atg.adapter.html.HTMLMetaTagParser
$scope=global

The loader component itself requires a little more work, I created mine in /db/ContentRepositoryLoader

$class=atg.adapter.gsa.ContentRepositoryLoader
$scope=global
HTMLMetaTagParser=/db/HTMLMetaTagParser
contentItemDescriptorName=article
ignoreMissingUpdatedStorageFile=true
lastUpdatedStorage=hybrid_auto_loader_update.txt
loggingDebug=true
monitoredPaths=articles
relativePathParent=..\\\\hybrid\\\\doc
removeStaleContentOnUpdate=true
repository=/db/HybridRepository
repositoryType=HTML
scanForUpdates=true
schedule=every\ 1\ minute
scheduler=/atg/dynamo/service/Scheduler

Running the Loader

The loader can be started manually from the DCC. Before starting the loader it's probably best to remove the manually entered folders and articles from the database, either via the DCC or SQL input.

delete from hsqlc_articles
delete from hsqlc_folders

After you start the loader component check the dynamo console for messages with the debug logging turned on I got these messages...

**** debug      Tue Mar 27 18:16:18 PST 2001    985745778617    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100004(/articles/diary.jhtml)
**** debug      Tue Mar 27 18:16:18 PST 2001    985745778637    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100005(/articles/thoughts.jhtml)
**** debug      Tue Mar 27 18:16:18 PST 2001    985745778657    /db/ContentRepositoryLoader     atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100006(/articles/journal.jhtml)

To view your new data in the DCC By File view you may need to disconnect and re-connect the DCC from the server

Summary

We've created the basics of a Hybrid SQL Content Repository, and demonstrated how to configure a content loader component to synchronize the database with the content on the file system.

 



Technical Training Advertise your Training Programs for Free! Los Angeles Web Design Shopping Cart Software  Form a Corporation