Large XML file

newer
XML to HTML - a couple of questions

Bradley, Peter

7 Oct 2008 7 Oct '08

3:25 p.m.

Hi, I am producing a very large (50MB) XML file via a transformation. Once the file has been produced, it passes through a validation process (another transformation) - for business rules compliance of the original data - as a result of which we occasionally need to examine the xml file in an editor. When I use oXygen to view the file, it takes over 30 minutes to load. The large file viewer is of no use because it does not format the input. Without formatting, the input is just a single long line of unformatted XML. The contents of the oxygen.vmoptions file is: -Xmx1450m -Dcom.oxygenxml.language=English Oxygen is set to format and indent on load. Is there any way that oXygen can be configured to load and display the data more quickly? Cheers Peter

Show replies by date

Lars Huttar

7 Oct 7 Oct

3:55 p.m.

Peter, I saw your question on the XSLT and Oxygen mailing lists... On 10/7/2008 10:25 AM, Bradley, Peter wrote:

...

Hi,

I am producing a very large (50MB) XML file via a transformation. Once the file has been produced, it passes through a validation process (another transformation) - for business rules compliance of the original data - as a result of which we occasionally need to examine the xml file in an editor.

One question I have - do you need to *edit* the resulting XML file, or just examine it? If you only need to *see* it, that would open up other alternatives. E.g. opening the file in a browser. Currently Firefox is slow on opening large XML files -- you have to wait till the whole file is rendered with styles before you can see any of it -- but I have modified the stylesheet it uses so that it only renders the first n (e.g. 1000) elements. That in itself may not be what you want but my point is that browsers do indent-and-display XML documents, and there may be some browsers / configurations that are fast enough. I don't remember how IE performs when rendering XML documents. Google Chrome won't work -- it just displays the text content. Lars

...

When I use oXygen to view the file, it takes over 30 minutes to load.

The large file viewer is of no use because it does not format the input. Without formatting, the input is just a single long line of unformatted XML.

The contents of the oxygen.vmoptions file is:

-Xmx1450m -Dcom.oxygenxml.language=English

Oxygen is set to format and indent on load.

Is there any way that oXygen can be configured to load and display the data more quickly?

Cheers

Peter

Bradley, Peter

8 Oct 8 Oct

7:38 a.m.

From: Lars Huttar [mailto:lars_huttar@sil.org] Sent: 07 October 2008 16:55 To: Bradley, Peter Cc: oXygen User ML Subject: Re: [oXygen-user] Large XML file

...

If you only need to *see* it, that would open up other alternatives. E.g. opening the file in a browser.

Thanks Lars, and everyone else who's answered. And you are correct: we only need to view the file. So it's really just a matter of loading and formatting. Using a browser or other tool looks like the way forward. Thanks Peter

George Cristian Bina

10:03 a.m.

Hi Peter, oXygen provide also a tree based editor that is available as a tool, see Tools->Tree Editor (CTRL+T) and also as a separate application see the Oxygen Tree Editor (treeEditor.exe) launcher. Please try also that and see if that helps. It will be interesting for us to perform some tests with your file that you had problems with so if you can make that available please let us know on support@oxygenxml.com how to get access to it (it will not work to just attach it to an email). One thing that may render the format and indent on open useless is the possible presence of an xml:space="preserve" attribute eventually on the root element or on some element with a large content. Best Regards, George -- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com

Bradley, Peter

10:26 a.m.

Hi George, I'll try that. Bit pushed this morning, but as soon as I get a chance ... I'd love to be able to supply you with an example file, too, but unfortunately it contains private data (e.g. disability data) about students at our institution - so for DPA reasons, I can't easily release a copy of the file to you. The output file is a government return, and you can see the detail about the makeup of the file by visiting the public web site: http://www.hesa.ac.uk/index.php?option=com_studrec&Itemid=232&mnl=07051 You can download the schemas from there, for example - and a sample XML file. The sample XML file is, of course, much smaller than our return. Our return contains details of over 11,000 students along with all their courses (in the thousands), all the modules associated with those courses (in the thousands), and all the enrolments of students on modules (in the tens of thousands at least). Given the schema, you might perhaps be able to generate some data?? Cheers Peter -----Original Message----- From: George Cristian Bina [mailto:george@oxygenxml.com] Sent: 08 October 2008 11:04 To: Bradley, Peter Cc: oXygen User ML Subject: Re: [oXygen-user] Large XML file Hi Peter, oXygen provide also a tree based editor that is available as a tool, see Tools->Tree Editor (CTRL+T) and also as a separate application see the Oxygen Tree Editor (treeEditor.exe) launcher. Please try also that and see if that helps. It will be interesting for us to perform some tests with your file that you had problems with so if you can make that available please let us know on support@oxygenxml.com how to get access to it (it will not work to just attach it to an email). One thing that may render the format and indent on open useless is the possible presence of an xml:space="preserve" attribute eventually on the root element or on some element with a large content. Best Regards, George -- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com

Bradley, Peter

10:56 a.m.

Hi again, Well, I got to try it. It loads very quickly (once I'd increased the allocated memory), but unfortunately it's infeasibly slow in operation. Doing anything other than just opening/closing a node by clicking on the +/-, freezes the interface for a noticeable length of time. Searching for data is also really slow. The kind of thing we want to do is to locate, say, a student with a particular HUSID to check data errors discovered for that student's entry. Entering an XPath query like (with an, obviously, invalid HUSID): (/Institution/Student/HUSID[.='999999999999']) ... took several minutes, with the interface freezing after the entry of each forward slash. It's certainly better than the editor and we may use it: but it's still very slow. It took me three or four minutes to make a query of the kind shown above. Cheers Peter -----Original Message----- From: George Cristian Bina [mailto:george@oxygenxml.com] Sent: 08 October 2008 11:04 To: Bradley, Peter Cc: oXygen User ML Subject: Re: [oXygen-user] Large XML file Hi Peter, oXygen provide also a tree based editor that is available as a tool, see Tools->Tree Editor (CTRL+T) and also as a separate application see the Oxygen Tree Editor (treeEditor.exe) launcher. Please try also that and see if that helps. It will be interesting for us to perform some tests with your file that you had problems with so if you can make that available please let us know on support@oxygenxml.com how to get access to it (it will not work to just attach it to an email). One thing that may render the format and indent on open useless is the possible presence of an xml:space="preserve" attribute eventually on the root element or on some element with a large content. Best Regards, George -- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com

Syd Bauman

11:08 a.m.

Please don't take me wrong (as this is my 2nd post on this thread recommending you check out a non-oXygen solution): I really really like oXygen a lot, and use it very frequently. But it strikes me that this task of yours

...

(/Institution/Student/HUSID[.='999999999999']) at least on very large files, is the kind of task for which XML databases like eXist exist.

Note that a) I am *not* much of an eXist user (yet), so can't give you details on what to do, and b) at least in the more expensive editions, oXygen provides a hook into eXist.

Bradley, Peter

11:13 a.m.

No, that's not a problem. As I replied before, I'm evaluating a number of solutions including your suggestions. It's just that this is the oXygen list and having had some suggestions as to what to do using oXygen I thought I should try those as well and report what I found. I've no particular preference for an oXygen solution. Cheers Peter -----Original Message----- From: oxygen-user-bounces@oxygenxml.com [mailto:oxygen-user-bounces@oxygenxml.com] On Behalf Of Syd Bauman Sent: 08 October 2008 12:09 To: oXygen User ML Subject: RE: [oXygen-user] Large XML file Please don't take me wrong (as this is my 2nd post on this thread recommending you check out a non-oXygen solution): I really really like oXygen a lot, and use it very frequently. But it strikes me that this task of yours

...

(/Institution/Student/HUSID[.='999999999999']) at least on very large files, is the kind of task for which XML databases like eXist exist.

Note that a) I am *not* much of an eXist user (yet), so can't give you details on what to do, and b) at least in the more expensive editions, oXygen provides a hook into eXist. _______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Andrew Welch

11:13 a.m.

...

Searching for data is also really slow. The kind of thing we want to do is to locate, say, a student with a particular HUSID to check data errors discovered for that student's entry.

Entering an XPath query like (with an, obviously, invalid HUSID):

(/Institution/Student/HUSID[.='999999999999'])

... took several minutes, with the interface freezing after the entry of each forward slash.

If the XML is basically a repeating structure, you will find splitting it into smaller chunks is more manageable that handling one massive file. Equally, if you want run XPaths against it then you should really be using an XML database (such as eXist), or investigate streaming options. -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/

Bradley, Peter

11:23 a.m.

...

If the XML is basically a repeating structure, you will find splitting it into smaller chunks is more manageable that handling one massive file.

Heh! All I have is the very large file. I've no control over that. I'd have to open it to split it up - and that's what I can't (easily) do.

...

Equally, if you want run XPaths against it then you should really be using an XML database (such as eXist), or investigate streaming options.

As I've mentioned, I'll check this out. Guys, I don't want to make a big thing of this. The world won't end because of it. I was only enquiring, originally, just to find out whether what I was experiencing was expected or whether I was doing something wrong. Being new to this sort of thing, I wasn't sure. I'm grateful for all your input. Thanks Peter

George Cristian Bina

11:57 a.m.

Hi Peter, To avoid the delays when entering XPath you need to disable the XPath content completion Options->Preferences -- Editor / Content Completion / XPath -- Enable content completion in XPath expressions. oXygen supports eXist and Berkeley XML DB in all editions, the Enterprise version covers the commercial databases. Best Regards, George -- George Cristian Bina <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger http://www.oxygenxml.com Bradley, Peter wrote:

...

Hi again,

Well, I got to try it. It loads very quickly (once I'd increased the allocated memory), but unfortunately it's infeasibly slow in operation. Doing anything other than just opening/closing a node by clicking on the +/-, freezes the interface for a noticeable length of time.

Searching for data is also really slow. The kind of thing we want to do is to locate, say, a student with a particular HUSID to check data errors discovered for that student's entry.

Entering an XPath query like (with an, obviously, invalid HUSID):

(/Institution/Student/HUSID[.='999999999999'])

... took several minutes, with the interface freezing after the entry of each forward slash.

It's certainly better than the editor and we may use it: but it's still very slow. It took me three or four minutes to make a query of the kind shown above.

Cheers

Peter

-----Original Message----- From: George Cristian Bina [mailto:george@oxygenxml.com] Sent: 08 October 2008 11:04 To: Bradley, Peter Cc: oXygen User ML Subject: Re: [oXygen-user] Large XML file

Hi Peter,

oXygen provide also a tree based editor that is available as a tool, see

Tools->Tree Editor (CTRL+T) and also as a separate application see the Oxygen Tree Editor (treeEditor.exe) launcher. Please try also that and see if that helps.

It will be interesting for us to perform some tests with your file that you had problems with so if you can make that available please let us know on support@oxygenxml.com how to get access to it (it will not work to just attach it to an email).

One thing that may render the format and indent on open useless is the possible presence of an xml:space="preserve" attribute eventually on the

root element or on some element with a large content.

Best Regards, George

Bradley, Peter

12:14 p.m.

Hi George and all,

...

To avoid the delays when entering XPath you need to disable the XPath content completion

Paul Ryan

5:57 p.m.

Our group uses eXist to load large (50 to 100MB) files and query in a viewer quite often with a lot of success so I would add my recommendation to those previously stated on using eXist for these kinds of large file queries. -- Paul Ryan -----Original Message----- From: oxygen-user-bounces@oxygenxml.com [mailto:oxygen-user-bounces@oxygenxml.com] On Behalf Of Bradley, Peter Sent: Wednesday, October 08, 2008 6:14 AM To: George Cristian Bina Cc: oXygen User ML Subject: RE: [oXygen-user] Large XML file Hi George and all,

...

To avoid the delays when entering XPath you need to disable the XPath content completion

Thanks. I'll do that. And investigate eXist, which looks cool - as my grandchildren say. Despite the temptation to say, "But I like content completion!", I think it's time to call this a day - for me, at least. I've had lots of good advice, which is exactly what I needed, and I've learnt some things about XML tools, too. So my thanks to all. Cheers Peter _______________________________________________ oXygen-user mailing list oXygen-user@oxygenxml.com http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Bradley, Peter

9 Oct 9 Oct

4:40 p.m.

From: Paul Ryan [mailto:pryan@infotrustgroup.com] Sent: Wed 08/10/2008 18:57 To: Bradley, Peter Cc: oXygen User ML Subject: RE: [oXygen-user] Large XML file

...

Our group uses eXist to load large (50 to 100MB) files and query in a viewer quite often with a lot of success so I would >add my recommendation to those previously stated on using eXist for these kinds of large file queries.

...

-- Paul Ryan

Yes. I downloaded and tried eXist yesterday. It's excellent. Thanks to all those who pointed me towards it. Cheers Peter

Syd Bauman

7 Oct 7 Oct

4:04 p.m.

...

The large file viewer is of no use because it does not format the input. Without formatting, the input is just a single long line of unformatted XML.

There may well be an oXygen-only solution, but formatting the XML before loading into the large file editor may be an easy way to do this. E.g., on a Mac or GNU/Linux system $ xmllint --format bigUglyInput.xml > bigPrettyOutput.xml (I'm sure there are lots of other tools that do this, too.)

Jan Nylund

8 Oct 8 Oct

8:01 a.m.

On 7 okt 2008, at 19.04, Syd Bauman wrote:

...

...
The large file viewer is of no use because it does not format the input. Without formatting, the input is just a single long line of unformatted XML.

There may well be an oXygen-only solution, but formatting the XML before loading into the large file editor may be an easy way to do this.

E.g., on a Mac or GNU/Linux system

$ xmllint --format bigUglyInput.xml > bigPrettyOutput.xml

(I'm sure there are lots of other tools that do this, too.)

This would be handy as an integrated function of the large file viewer, to ask "would you like to indent" and then indent the file to a temporary file. Br, Jan -- Jan Nylund Senior System Designer Citec Information Oy Ab

6115

Age (days ago)

6117

Last active (days ago)

List overview

Download

15 comments

7 participants

participants (7)

Andrew Welch
Bradley, Peter
George Cristian Bina
Jan Nylund
Lars Huttar
Paul Ryan
Syd Bauman

Large XML file

Bradley, Peter

Lars Huttar

Bradley, Peter

George Cristian Bina

Bradley, Peter

Bradley, Peter

Syd Bauman

Bradley, Peter

Andrew Welch

Bradley, Peter

George Cristian Bina

Bradley, Peter

Paul Ryan

Bradley, Peter

Syd Bauman

Jan Nylund

tags

participants (7)