WPS Portal project

Wednesday, March 18, 2009

Sun Microsystems and Liferay initiative

Some news about open-source portal.

Keywords: Portal, Liferay, Sun, GlassFish, Web 2.0, Collaboration tool

Sun Microsystems and Liferay Launch Initiative to Develop Next-Generation Web Technologies

Sunday, March 15, 2009

How to count WCM content items in v6 JCR database ?

As we did have big performances issues in portal v5 due to the high number of WCM content items in our database, we have decided to continue monitoring this count also in v6.
Of course the new JCR design in v6, the WCM software is now supposed to support about 500.000 content entries....but anyway I think it is still a good idea to monitor the volume of data you have in your system (this could be useful to report that to your management, and/or just to prevent reaching the limit).

Now, in v6 counting and analyzing the number of items is little bit more complex than in v5, mainly because the JCR database design is really different compare to a standard SQL schema design.

Actually, the v5 database request for counting the total number of content items are quite similar:
- Old request (in v5):
select count(*) from AJPE where COMPONENT_TYPE='AJPE_CONTENTS'
- New request (in v6.x):
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877

But the problem is that in JCR database, there is no simple 1 to 1 correspondancy between the WCM content items and the number of entries in db...
In v5, when creating a new content or deleting it, you can see that the count is simply increased (+1) or decreased (-1) accordingly, whatever the status of the content.

Now in v6 JCR model, here is the behaviour of the system:

Let's assume at start you have:
select count(ITEMID) from JCR.EV_ENTRY -> 39004
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27818

I will give you more details in the next section below, but basically:
- select count(ITEMID) from JCR.EV_ENTRY : this request gives you the total number of WCM objects in the db (could be content, but also design elements).
- select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 : give you only the WCM content items (like news, etc), assuming in your db "1762875877" also corresponds to the content item object type.

After creating 1 Published content, you will see that the count is increased by "+2":
select count(ITEMID) from JCR.EV_ENTRY -> 39006
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27820

After creating 1 new Draft content, you will see that the count is increased only by "+1":
select count(ITEMID) from JCR.EV_ENTRY -> 39007
select count(ITEMID) from JCR.EV_ENTRY where ITEMCLASS=1762875877 -> 27821

Also, please note that deleting content does not affect the above count values.


All these results clearly show that:
- 1 WCM content item can correspond to several JCR nodes,
- Deleting a WCM content will not necessarily decrease the database count (as content is probably not deleted physically, but its status might only be updated).


To validate this last hypothesis (regarding delete management), I did the following tests:

Get the total number of content items:
db2 => SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877
-----------
27818

Get the distinct 'ITEMTYPE' values that exist in the db:
db2 => SELECT DISTINCT ITEMTYPE FROM JCR.EV_ENTRY
ITEMTYPE
--------
D
L
P

Then check the corresponding content items count, depending on the 'ITEMTYPE':
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='D'
=> 566
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='L'
=> 0
SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='P'
=> 27252

Now we can see that the count of ITEMTYPE='P' and ITEMTYPE='D' is equal to the total 27818 of content items.

An obvious assumption is that 'D' corresponds to DRAFT contents, and 'P' to PUBLISHED content. These results match the status of our contents in our db : we have about 500 draft content items, and about 13.500 published content items (remind that a published item = 2 entries in JCR.EV_ENTRY table).


So basically, to evaluate the count of WCM content items, I think the following equation makes sense:
[SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='D'] + 0,5*[SELECT COUNT(*) FROM JCR.EV_ENTRY WHERE ITEMCLASS=1762875877 AND ITEMTYPE='P']

Note: I have no idea for the moment of what could be the meaning of ITEMTYPE='L' ? (if 'P' is published, and 'D' is draft, then it could correspond to EXPIRED status, but this count does not match our number of expired content....).


Hope this will help you to better understand the structure of the WCM JCR database. Also please find below the detail of the database requests I made to understand the model.

==========================
To connect to the JCR database, you first have to get the following values (in our case, we are using DB2):
# DbType: The type of database to be used for WebSphere Portal JCR domain
jcr.DbType=db2
jcr.DbName=jcrdbe6
jcr.DbSchema=jcr


v6.x PORTAL DATABASES configuration file can be found here:
/portal/WebSphere/wp_profile/ConfigEngine/
properties/wkplc_comp.properties

The table JCR.EV_ENTRY contains the reference to all the content items.

db2 => DESCRIBE TABLE JCR.EV_ENTRY
Column Type Type
name schema name Length Scale Nulls
------------------------------ --------- ------------------ -------- ----- ------
IGID SYSIBM CHARACTER 64 0 No
ITEMID SYSIBM CHARACTER 64 0 No
ITEMCLASS SYSIBM INTEGER 4 0 Yes
ITEMTYPE SYSIBM CHARACTER 1 0 No
STATE SYSIBM BIGINT 8 0 Yes
DELETED SYSIBM CHARACTER 1 0 Yes
PARENTID SYSIBM CHARACTER 64 0 Yes
TSTAMP SYSIBM TIMESTAMP 10 0 Yes

The table JCR.EV_TYPES gives you the list of WCM object types:

SELECT TYPENAME FROM JCR.EV_TYPES
com.ibm.workplace.wcm.services.library.Library
com.aptrix.pluto.site.Site
com.aptrix.pluto.site.SiteArea
com.aptrix.pluto.presentation.Template
com.aptrix.pluto.workflow.stages.WorkflowStage
com.aptrix.pluto.workflow.Workflow
com.aptrix.pluto.workflow.actions.PublishAction
com.aptrix.pluto.content.Content
com.aptrix.pluto.taxonomy.Taxonomy
com.aptrix.pluto.taxonomy.Category
com.aptrix.pluto.content.link.ContentLink
com.aptrix.pluto.presentation.Style
com.aptrix.pluto.workflow.actions.ExpireAction
com.aptrix.pluto.cmpnt.FileResourceCmpnt
com.aptrix.pluto.cmpnt.ImageResourceCmpnt
com.aptrix.pluto.cmpnt.MenuCmpnt
com.aptrix.pluto.cmpnt.HTMLCmpnt
com.aptrix.pluto.cmpnt.NavigatorCmpnt
com.aptrix.pluto.cmpnt.JSPCmpnt


The first field of the JCR.EV_TYPES tables is the object type ID than can be used to identify content in the JCR.EV_ENTRY table. Based on the list below, the value 1762875877 corresponds to the ITEMCLASS column in the JCR.EV_ENTRY table.

SELECT * FROM JCR.EV_TYPES
473104548 com.ibm.workplace.wcm.services.library.Library
298371183 com.aptrix.pluto.site.Site
-161464932 com.aptrix.pluto.site.SiteArea
1494027823 com.aptrix.pluto.presentation.Template
-325197706 com.aptrix.pluto.workflow.stages.WorkflowStage
540603119 com.aptrix.pluto.workflow.Workflow
1448064964 com.aptrix.pluto.workflow.actions.PublishAction
176287587 com.aptrix.pluto.content.Content
-1290676433 com.aptrix.pluto.taxonomy.Taxonomy
-977641252 com.aptrix.pluto.taxonomy.Category
-965915661 com.aptrix.pluto.content.link.ContentLink
629030940 com.aptrix.pluto.presentation.Style
-1363088202 com.aptrix.pluto.workflow.actions.ExpireAction
858827521 com.aptrix.pluto.cmpnt.FileResourceCmpnt
-1535525020 com.aptrix.pluto.cmpnt.ImageResourceCmpnt
-1598427284 com.aptrix.pluto.cmpnt.MenuCmpnt
-1110739776 com.aptrix.pluto.cmpnt.HTMLCmpnt
538485974 com.aptrix.pluto.cmpnt.NavigatorCmpnt
1671285158 com.aptrix.pluto.cmpnt.JSPCmpnt

Thursday, March 12, 2009

New blog about Alfresco and Share

I just would like to let you know that I have created this new blog about Alfresco DM and the Share collaboration tool. For more info see

The role of Alfresco is growing more and more each day in our company, and it is becoming popular very quickly (I would say even more quickly than the underlying infrastructure :-)
So it's time to start a new blog dedicated to this solution.

And as we have connected Alfresco with our Portal (through a custom JSR 168 Portlet), I will continue to post news about this integration here in this blog.

How to copy WCM v6.x JCR database ?

As we have a big WCM content repository (more than 18.000 content items now) we have tried to write a procedure in order to be able to quickly copy an existing v6.x WCM database and to restore it on a target portal system.

This experience was interesting, and I discover that the process is quite different than the one we used in WCM v5.x.Basically, in v5.x, we simply had to do a backup/restore of the (DB2) database, then delete WCM index on the target, and after restarting the portal, everything was working fine...but now with v6.x JCR database format it's a little bit more complex...

The problem is that v6 Portal (or WCM ?) system creates dynamically JCR table view (and tables ?) at startup. Moreover, it seems that standard
database backup process (to copy database) does not copy this tables and tables views. I have no idea of why exactly, but this problem is described
at leat on z/OS (See : Workspace Validation utility to verify JCR Workspace database views are in sync for DB2 on z/OS).

So after restoring WCM database copy, and starting the portal, we had the following errors in the logs:
Caused by: com.ibm.content.exception.ServiceException: javax.jcr.RepositoryException: DB2 SQL error: SQLCODE: -204, SQLSTATE: 42704, SQLERRMC:
JCR.ICMSTJCRL00001Thread 430447016 : lException: DB2 SQL error: SQLCODE: -204, SQLSTATE: 42704, SQLERRMC: JCR.ICMSTJCRL00001

Basically, this means that the table "JCR.ICMSTJCRL00001" was missing on the target portal...

After checking the list of existing tables on the target portal, and compare it with the source, I have seen that a lots of tables were missing on
v6.x.On the target portal, the following kind of tables was not available:
ICMSTJCRN00001, ICMSTJCRN00002, ICMSTJCRN00003, ICMSTJCRN0000x, etc
ICMSTJCRL00001, ICMSTJCRL00002, ICMSTJCRL00003, ICMSTJCRL0000x, etc

Running the command "db2 list tables for schema jcr" reported about 2988 records tables in the v6 source, and only 875 record(s) in the v6 target.
We have asked the DBA why this table was not copied by the DB2 backup/copy process ? (but I have no reply so far, he is still doing some tests).

So I have tried to fix it by myself :-)
Here is the solution I found (please note that this solution might NOT be supported by IBM, I still have to open a PMR to confirm it is):

Based on the previous article (see), I add the option:

jcr.workspace.view.validation=FIX

in the "/portal/WebSphere/wp_profile/PortalServer/jcr/lib/
com/ibm/icm/icm.properties" config file (I first tried with value "REPORT"), and then restart the portal.
(please not that our portal is running on AIX, not z/OS, but this option seems to work anyway)...

In the log file I have seen that the portal (WCM ?) was creating table views at startup:
[3/12/09 11:40:46:915 CET] 00000019 WorkspaceUtil I com.ibm.icm.da.portable.data.WorkspaceUtils validate Workspace Validation is starting. [3/12/09 11:40:47:439 CET] 00000019 WorkspaceUtil W com.ibm.icm.da.portable.data.WorkspaceUtils validate Workspace Validation has discovered the
following Database View(s) are missing. Please run the Validation utility with the FIX option to recreate: ICMSTJCRN00548 ICMSTJCRL00548
ICMSTJCRLR00548 ICMSTJCRLV00548 [3/12/09 11:40:47:452 CET] 00000019 WorkspaceUtil W com.ibm.icm.da.portable.data.WorkspaceUtils validate Workspace Validation has discovered the
following Database View(s) are missing. Please run the Validation utility with the FIX option to recreate: ICMSTJCRN00538 ICMSTJCRL00538
ICMSTJCRLR00538 ICMSTJCRLV00538


And after startup, I was able to see that all the missing tables (JCR.ICMSTJCRL00001, ICMSTJCRL00002, ICMSTJCRL00003, and also ICMSTJCRLV00001, etc,
etc) has been created.
The command "db2 list tables for schema jcr" now report a similar number of tables on the target v6 portal.

Now, I can see that WCM Content library is available in the portal admin interface, and that WCM Portlet rendering is working fine.


However, just a "minor" issue: after starting the portal I had a lots of "JURU" or "indexing" traces in the logs: I think it's because the JCR text search indexer was trying to rebuild WCM data index.
So I did set this option to false, to stop this process (and restart the portal):

jcr.textsearch.enabled=false(in /portal/WebSphere/wp_profile/PortalServer/jcr/lib/
com/ibm/icm/icm.properties)


That's all. Hope this will help you....If some of you also did try a similar process to copy WCM v6 database, do not hesitate to give your feedback.

------------------------
Also, for more information about this topic:

See this IBM article : Java Content Repository (JCR) information and Frequently Asked Questions (FAQ) for IBM Web Content Management (WCM)

See also this post on Vivek Agarwal's Blog, who had a quite similar issue on Quickr : Resolved an exception stacktrace related to Quickr/WebSphere Portal search tables