WPS Portal project

Saturday, November 08, 2008

Migrating from PDM to Alfresco

Migrating from PDM to Alfresco

As PDM will not exist in WPS 6.1, we are currently migrating all our PDM Libraries to Alfresco document management system.

Basically, our approach is to create a root space in Alfresco for each PDM library, and to replace the standard PDM authoring Portlet by our (custom) Alfresco Portlet.

I would like to describe briefly here how we plan to do the migration and the main challenges we are currently facing:


A/ Extracting data from PDM:

First you should know that extracting document from PDM is not that easy:

A.1/ You have to install the 'PDM export tool' (specific fix required to be installed on WPS).

(Note: you might experience some trouble to configure this tool, especially if you use LDAP security....try to use LDAP 'dn' for admin account user login in this case, and make sure the lower/smaller case for login is correct).

A.2/ The role of the PDM export tools is to export all Libraries data (folders, files, security, etc) on the filesystem: you can choose between 2 modes:
- File mode will export everything into a single big XML file.
- Folder mode will export a tree structure of several distinct file nodes corresponding to the PDM folders and files.
In both case the structure will look like a JCR like nodes tree.
If you have large amount of data, then the Folder mode is probably most appropriate because it will output a lots of single file nodes rather than a big one (please not that there is no option to export one library at a time, but only all libraries existing on the portal server).
In our case we have 24 GB of data and export runned during approx. 10 hours, but the task was successful.

------------------

B/ Readind PDM data:

Then you will have to be able to read and browse this XML JCR tree (files are 64 bits encoded) to get data before doing the import in the target server.

To do that we will have chosen to work with an IT partner which is specialist of this type of document migration.

Some of them provide packaged connector (between pdm and afresco), other provide custom batch script....depending on your budget and your need each approach has pros and cons.

Obviously, benefits of connectors are that they can usually plug-in directly on the pdm data source, so that you do not have to export your PDM data (which is a long process). As a result you can re-run the synchronisation more than one time if needed.
Batch scripts (which read the xml extracted data) have more constraints because you might need to re-do the extract, but could also be less expensive.

------------------

C/ Import data in Alfresco:

To do the import in Alfresco one could use either the API, or the existing WebService (or even the Alfresco option for archive bulk upload).
What is the most difficult during the import is to make sure one will not loose any meta-data (e.g retrieving the same last modification time than the initial document seems feasible, but then in Alfresco I think we will have to create a custom fields to store the initial value).
Also, we will have to think about how to do the mapping between the PDM security and the corresponding role in Alfresco...

------------------


D/ Replace IBM PDM portlet:

Finally we will have to replace all the IBM PDM portlet by the Alfresco portlet. We will do that manually.

We had to identify all our IBM PDM portlet to be able to create the corresponding new instances of the Alfresco Portlet. To do that we 'simply' used a xmlaccess export file containing all PDM Portlet of our website, and then we used ant/xsl treatments to get an excel list which is more human readable.
Basically, the xls file should contain all PDM libraries, and for each Library all associated Portlets (including Portlet ACL, PDM default folder, etc).
Corresponding Alfresco Portlet will be created manually.

------------------

We will start the real data migration in a few weeks, so I will try to share with you the next steps in a future posting.

See other blog entries