Re: [oXygen-user] Feature request: Improvement of Japanese search for WebHelp

19 Nov 2013

      Dear Naoki-san,

The Webhelp content indexer is indeed based on the Lucene engine just 
like the Kuromoji morphological analyzer, so delegating the task of 
indexing any Japanese content at build time (when the Webhelp pages are 
created by the Oxygen Webhelp transformation) to the Kuromoji analyzer 
is doable. However the Webhelp search is performed at runtime on the 
client side, with JavaScript code running on the machine where the 
Webhelp search is executed in the browser, not on the server side, where 
the Webhelp pages are stored. The difficulty in integrating an analyzer 
that deals with a specific language sentence morphology like the 
Kuromoji analyzer comes from the lack of an equivalent JavaScript 
analyzer that is able to split the search string entered by the user 
into the morphological components recognized by the Lucene-based 
morphological analyzer that built the index database at build time.

I did a Google search but I could not identify a client side JavaScript 
solution for a Japanese morphological analyzer. If you can suggest such 
a solution we would surely consider it as a future improvement for the 
Webhelp search.

Kind regards,
Sorin

Naoki Hirai wrote:
...
Hi,
I like Oxygen WebHelp very much and recommend it to Japanese users. The 
WebHelp is sophisticated online manual solution, but one issue has still 
remained for Japanese users. That is a Japanese search. For Japanese 
it's difficult to extract words from sentences. Because the words are 
not separated by spaces. Therefore, in general, a morphological analyzer 
is used to extract the words from the sentences. Recently, an open 
source Japanese morphological analyzer which is called "Kuromoji" has 
become popular. The Apache Solr has introduced Kuromoji as the 
morphological analyzer.
So, my feature request is that Oxygen WebHelp plug-in will incorporate 
Kuromoji as the morphological analyzer. And add a parameter which 
selects a stemmer for generating a WebHelp output. I can help the 
development and the evaluation.
Please have a thought.
Best regards,
Naoki

Re: [oXygen-user] Feature request: Improvement of Japanese search for WebHelp

Sorin Ristache