|
How to Auto-Load Content into a Content RepositoryThe article builds upon the previous article - How to set up an SQL Content Repository from Scratch DOWNLOAD FILES - Hybrid2.zip is a module implementing this example In the previous article we saw how to create a hybrid SQL Content repository, we also saw how labor intensive and error prone it would be to maintain such a repository manually. This article discusses how that process can be automated. Before you Start
Content Repository Loaderatg.adapter.gsa.ContentRepositoryLoader is the java class which implements automated content loading. A component of this class will scan a given set of disk folders and synchronize the database with the contents of the folder. This includes reading meta tag properties and populating those fields in the database. The loader can populate the content repository database when it starts or can be configured to scan the disk folders on a schedule. In a production environment you probably won't want to configure the schedule, since this wastes cycles on production servers and you probably have an event which you can fire when content is changed and approved for publication. It's probably better to start the loader in response to an event. In a site with more than one Dynamo Server remember that they all share the same repository database and so you don't want to set up a loader on every server. Repository Definition Tags used by the LoaderThe Hybrid2 module contains a db\hybrid.xml file which will be combined with the file from Hybrid. Here are the main points of the combination <gsa-template> <header> <description>This file adds the properties used by the content loader</description> </header> <!-- set the content-path-property to be used by the content loader --> <item-descriptor name="folder" content-path-property="path"> <table name="hsqlc_folders" type="primary" id-column-name="id"> <property name="path" data-type="string"/> </table> </item-descriptor> <item-descriptor name="article" content-path-property="path"> <table name="hsqlc_articles" type="primary" id-column-name="id"> <property name="path" data-type="string"/> </table> </item-descriptor> </gsa-template> Folder Item DescriptorThe content loader requires a content-path-property to be specified, we'll <item-descriptor name="folder" content-path-property="path"> <table name="hsqlc_folders" type="primary" id-column-name="id"> <property name="path" data-type="string"/> </table> </item-descriptor> Article Item DescriptorHere too we set the content-path-property to path which we now store in the database <item-descriptor name="article" content-path-property="path"> <table name="hsqlc_articles" type="primary" id-column-name="id"> <property name="path" data-type="string"/> </table> </item-descriptor> Configuring the LoaderThe loader uses a component of type HTMLMetaTagParser, this component doesn't require much setup, we just need to create the component: /db/HTMLMetaTagParser $class=atg.adapter.html.HTMLMetaTagParser $scope=global The loader component itself requires a little more work, I created mine in /db/ContentRepositoryLoader $class=atg.adapter.gsa.ContentRepositoryLoader $scope=global HTMLMetaTagParser=/db/HTMLMetaTagParser contentItemDescriptorName=article ignoreMissingUpdatedStorageFile=true lastUpdatedStorage=hybrid_auto_loader_update.txt loggingDebug=true monitoredPaths=articles relativePathParent=..\\\\hybrid\\\\doc removeStaleContentOnUpdate=true repository=/db/HybridRepository repositoryType=HTML scanForUpdates=true schedule=every\ 1\ minute scheduler=/atg/dynamo/service/Scheduler Running the LoaderThe loader can be started manually from the DCC. Before starting the loader it's probably best to remove the manually entered folders and articles from the database, either via the DCC or SQL input. delete from hsqlc_articles delete from hsqlc_folders After you start the loader component check the dynamo console for messages with the debug logging turned on I got these messages... **** debug Tue Mar 27 18:16:18 PST 2001 985745778617 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100004(/articles/diary.jhtml) **** debug Tue Mar 27 18:16:18 PST 2001 985745778637 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100005(/articles/thoughts.jhtml) **** debug Tue Mar 27 18:16:18 PST 2001 985745778657 /db/ContentRepositoryLoader atg.adapter.gsa.ContentRepositoryLoaderResources->loadedNewHTMLItem : loaded new HTML item: article:1100006(/articles/journal.jhtml) To view your new data in the DCC By File view you may need to disconnect and re-connect the DCC from the server SummaryWe've created the basics of a Hybrid SQL Content Repository, and demonstrated how to configure a content loader component to synchronize the database with the content on the file system.
|
|