
Hi Lou, Yes, it does suggest for each paragraph you'd be tokenising (or grouping) into sentences, which might have a slight efficiency hit (but I doubt that much), but would make the choosing the number of sentences to be under $maxWords easier. I was assuming that you wanted the output to have the sentences marked as <s>, my mistake. -James On Mon, 5 Nov 2018 at 12:31, Lou Burnard <lou.burnard@retired.ox.ac.uk> wrote:
Thanks for v the quick reply james but doesnt your approach imply that the tokenisation into sentences has already been done? Im trying t o avoid a two pass solution as I expect to be doing this hundreds of times
reluctantly using Outlook for Android <https://aka.ms/ghei36>
------------------------------ *From:* James Cummings <james@blushingbunny.net> *Sent:* Monday, November 5, 2018 1:10:02 PM *To:* Lou Burnard *Cc:* oxygen-user@oxygenxml.com *Subject:* Re: [oXygen-user] an xslt challenge
Hi Lou,
Would it make sense to use xsl:for-each-group to group the sentences into <s> units to make this easier? Then I'd probably recursively call a template or function passing the current collection of <s> units as a variable item* value, testing if its tokenised number is above or below $maxWords.
Not got time to write that out as a solution atm, and I'm sure it can be done without the recursivity as well, but that is the approach that would have occurred to me at least.
-James
On Mon, 5 Nov 2018 at 12:03, Lou Burnard <lou.burnard@retired.ox.ac.uk> wrote:
I hope I am not abusing this list in asking occasionally for advice on the best way to hack something in xslt.
Today's problem is to output only the first x sentences (string terminated by a full stop) of a paragraph such that the total number of words (space delimited strings) is less than some limit (call it $maxWords) Since the sentences are of variable length, obviously I don't know what x is.
Here's where I got to so far:
<xsl:template match="t:p"> <xsl:variable name="pString"> <xsl:value-of select="."/> </xsl:variable> <xsl:for-each select="tokenize($pString, '\.\s')"> <xsl:variable name="seq"> <xsl:value-of select="string(position())"/> </xsl:variable> <xsl:variable name="wordsSoFar"> <xsl:value-of select="string-length(translate(normalize-space (preceding-sibling::text()), ' ', '')) + 1"/> </xsl:variable> <xsl:if test="$wordsSoFar < $maxWords">
<s n="{$seq}"> <xsl:value-of select="."/> </s>
<xsl:if>
</xsl:for-each> </xsl:template>
But this is not valid because preceding-sibling:: wants a node() not a string (even though "text()" *is* a node imho).
Am I going about this entirely the wrong way?
_______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com https://www.oxygenxml.com/mailman/listinfo/oxygen-user