Performance XSD validation

Topics: Improvements
Jun 9, 2015 at 8:13 PM
Hi Frank,

As you know we are using the XSD validation in an SSIS flow, to be able to automatically validate XML files that are generated by a data integration system, and when successfully validated send it to all of its subcribers.

We now see that the performance of the validation does not appear to linearly scale with the given size of the input file. For a specific infrastructure configuration, where there is certainly room for optimalization, we see that a file of around 300MB is finished in a couple of minutes, where a file of ~4GB takes 2-3 hours.

The coming days we are gathering the performance details and preparing a setup on development environment to assess the hypotheses of the non-linear scaling of the validation. The validation code is written in such way, that the validation itself is performed step by step, and not on the complete file. However, my first thought is that the xml file under 'validation' is still fully kept in memory, to be able to validate it step-by-step, where the limitations of the infrastructures memory cause swapping and performance issues.

Do you have any suggestions?
Coordinator
Jun 12, 2015 at 8:25 AM
Edited Jul 28, 2015 at 11:26 AM
Hi Marco,

Can you pinpoint if this is happening in the validation step ? The xmlrecord splitter on which this code is based splits a 10 GB file in 40 minutes.
Of course, the xml under validation is still variable but the maximum size I ever saw was a few megabytes (max 10). Can you send me a an anonymized version that i can test?


I would recommend the following steps:
  1. What's the performance without the validation of the body? If i remember correctly this should be off for ILVB because of a validation issue that needs to be resolved but I conclude from your mail that it's on.
  2. Can you split the file ? What's the biggest file that appears? The validatating is done by an XMLReader on an XMLDocument in-memory. I've already found with the XMLCompare utility that files larger than 20MB can pose a problem. That's why the transaction file for ILVV is split and then compared
  3. I may be able to write a new utility called XMLValidate that can validate the file in question so we can reproduce the problem but I need it in order to test it properly.
  4. What's the processing power of the server? All utilities run better with dual or quad core systems so this could also be the case for the validation. Please monitor the process and see if any processors are taxed to a 100%. This might be a processor issue and not a memory issue. The servers should have more than enough memory but the number of processors might be the bottleneck., especially since these are virtual processors which are single core, so the servers might have half the processing power than what we expected.
Any details, you can send to my email address.

Kind regards, Frank
Marked as answer by FrankvdnThillart on 7/28/2015 at 4:33 AM