Excel vba xml解析性能(Excel vba xml parsing performance)

我正在使用Excel中的一些输入数据,将其解析为xml并使用它来运行SQL存储过程,但是我正在运行xml解析中的性能问题。 输入表看起来像这样:

Dates_|_Name1_Name2_Name3_..._NameX Date1 | Date2 | . . . | Date1Y|

我有一些代码循环遍历每个单元格并将数据解析为xml字符串,但即使是大约300乘300网格,执行需要大约五分钟的时间,我希望使用可能的数据集有数千列长。 我已经尝试了一些事情来帮助加快速度,比如将数据读入Variant,然后迭代虽然或不包含DoEvents,但我一直无法获得加速。 这是问题的一小部分代码:

Dim lastRow As Long lRows = (oWorkSheet.Cells(Rows.Count, 1).End(xlUp).Row) Dim lastColumn As Long lCols = (oWorkSheet.Cells(1, Columns.Count).End(xlToLeft).Column) Dim sheet As Variant With Sheets(sName) sheet = .Range(.Cells(1, 1), .Cells(lRows, lCols)) End With ReDim nameCols(lCols) As String

...

resultxml = "<DataSet>" For i = 2 To rows resultxml = resultxml & "<DateRow>" For j = 1 To cols If Trim(sheet(i, j)) <> "" Then lResult = "<" & nameCols(j) & ">" rResult = "</" & nameCols(j) & ">" tmpValue = Trim(sheet(i, j)) If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then If Len(tmpValue) >= 8 Then tmpValue = Format(tmpValue, "yyyy-mm-dd") End If End If resultxml = resultxml & lResult & tmpValue & rResult DoEvents End If Next j resultxml = resultxml & "</DateRow>" Next i resultxml = resultxml & "</DataSet>"

任何意见,以减少运行时间将不胜感激。

I'm working on taking some input data in excel, parsing it to xml and using that to run a SQL stored procedure, but I'm running into performance issue on the xml parsing. The input sheet looks something like this:

Dates_|_Name1_Name2_Name3_..._NameX Date1 | Date2 | . . . | Date1Y|

I've got some code to loop though each cell and parse out the data into an xml string but even for about a 300 by 300 grid the execution takes something on the order of five minutes and I'm looking to use data sets that could be several thousand columns long. I've tries a couple things to help speed it up like reading the data into a Variant then iterating though that or excluding DoEvents but I haven't been able to get the speed up. Here's the bit of code that's the issue:

Dim lastRow As Long lRows = (oWorkSheet.Cells(Rows.Count, 1).End(xlUp).Row) Dim lastColumn As Long lCols = (oWorkSheet.Cells(1, Columns.Count).End(xlToLeft).Column) Dim sheet As Variant With Sheets(sName) sheet = .Range(.Cells(1, 1), .Cells(lRows, lCols)) End With ReDim nameCols(lCols) As String

...

resultxml = "<DataSet>" For i = 2 To rows resultxml = resultxml & "<DateRow>" For j = 1 To cols If Trim(sheet(i, j)) <> "" Then lResult = "<" & nameCols(j) & ">" rResult = "</" & nameCols(j) & ">" tmpValue = Trim(sheet(i, j)) If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then If Len(tmpValue) >= 8 Then tmpValue = Format(tmpValue, "yyyy-mm-dd") End If End If resultxml = resultxml & lResult & tmpValue & rResult DoEvents End If Next j resultxml = resultxml & "</DateRow>" Next i resultxml = resultxml & "</DataSet>"

Any advice for getting the run time down would be greatly appreciated.

最满意答案

考虑使用MSXML ,这是一个全面的W3C兼容的XML API库,您可以使用它来使用DOM方法( createElement , appendChild , setAttribute )构建XML,而不是连接文本字符串。 XML不是一个文本文件,而是一个带有编码和树结构的标记文件。 Excel通过引用或后期绑定配备了MSXML COM对象,并且可以从Excel数据中迭代构建树,如下所示。

随着300行12列的随机日期,下面甚至不需要一分钟(单击宏后几秒钟),它甚至可以使用嵌入式XSLT样式表打印原始输出和换行符(如果不打印漂亮, MSXML将文档输出为一条长而连续的行。

输入

名称日期电子表格

VBA (当然与实际数据对齐)

Sub xmlExport()
On Error GoTo ErrHandle
    ' VBA REFERENCE MSXML, v6.0 '
    Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
    Dim root As IXMLDOMElement, dataNode As IXMLDOMElement, datesNode As IXMLDOMElement, namesNode As IXMLDOMElement
    Dim i As Long, j As Long
    Dim tmpValue As Variant

    ' DECLARE XML DOC OBJECT '
    Set root = doc.createElement("DataSet")
    doc.appendChild root

    ' ITERATE THROUGH ROWS '
    For i = 2 To Sheets(1).UsedRange.Rows.Count

        ' DATA ROW NODE '
        Set dataNode = doc.createElement("DataRow")
        root.appendChild dataNode

        ' DATES NODE '
        Set datesNode = doc.createElement("Dates")
        datesNode.Text = Sheets(1).Range("A" & i)
        dataNode.appendChild datesNode

        ' NAMES NODE '
        For j = 1 To 12
            tmpValue = Sheets(1).Cells(i, j + 1)
            If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then
                Set namesNode = doc.createElement("Name" & j)
                namesNode.Text = Format(tmpValue, "yyyy-mm-dd")
                dataNode.appendChild namesNode
            End If
        Next j

    Next i

    ' PRETTY PRINT RAW OUTPUT '
    xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
            & "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
            & "                xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
            & "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
            & "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
            & "            encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
            & " <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
            & "  <xsl:copy>" _
            & "   <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
            & "  </xsl:copy>" _
            & " </xsl:template>" _
            & "</xsl:stylesheet>"

    xslDoc.async = False
    doc.transformNodeToObject xslDoc, newDoc
    newDoc.Save ActiveWorkbook.Path & "\Output.xml"

    MsgBox "Successfully exported Excel data to XML!", vbInformation
    Exit Sub

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Exit Sub

End Sub
 

产量

<?xml version="1.0" encoding="UTF-8"?> <DataSet> <DataRow> <Dates>Date1</Dates> <Name1>2016-04-23</Name1> <Name2>2016-09-22</Name2> <Name3>2016-09-23</Name3> <Name4>2016-09-24</Name4> <Name5>2016-10-31</Name5> <Name6>2016-09-26</Name6> <Name7>2016-09-27</Name7> <Name8>2016-09-28</Name8> <Name9>2016-09-29</Name9> <Name10>2016-09-30</Name10> <Name11>2016-10-01</Name11> <Name12>2016-10-02</Name12> </DataRow> <DataRow> <Dates>Date2</Dates> <Name1>2016-06-27</Name1> <Name2>2016-08-14</Name2> <Name3>2016-07-08</Name3> <Name4>2016-08-22</Name4> <Name5>2016-11-03</Name5> <Name6>2016-07-28</Name6> <Name7>2016-08-23</Name7> <Name8>2016-11-01</Name8> <Name9>2016-11-01</Name9> <Name10>2016-08-11</Name10> <Name11>2016-08-18</Name11> <Name12>2016-09-23</Name12> </DataRow> ...

Consider using MSXML, a comprehensive W3C compliant library of XML APIs which you can use to build your XML with DOM methods (createElement, appendChild, setAttribute) instead of concatenating text strings. XML is not quite a text file but a markup file with encoding and tree structure. Excel comes equipped with the MSXML COM object by reference or late-binding, and can iteratively build a tree from Excel data as shown below.

With 300 rows by 12 cols of random dates, below didn't even take a minute (literally seconds after clicking macro) AND it even pretty printed raw output with line breaks and indentation using an embedded XSLT stylesheet (if you do not pretty print, the MSXML outputs document as one long, continuous line).

Input

Name Date Spreadsheet

VBA (of course align to actual data)

Sub xmlExport()
On Error GoTo ErrHandle
    ' VBA REFERENCE MSXML, v6.0 '
    Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
    Dim root As IXMLDOMElement, dataNode As IXMLDOMElement, datesNode As IXMLDOMElement, namesNode As IXMLDOMElement
    Dim i As Long, j As Long
    Dim tmpValue As Variant

    ' DECLARE XML DOC OBJECT '
    Set root = doc.createElement("DataSet")
    doc.appendChild root

    ' ITERATE THROUGH ROWS '
    For i = 2 To Sheets(1).UsedRange.Rows.Count

        ' DATA ROW NODE '
        Set dataNode = doc.createElement("DataRow")
        root.appendChild dataNode

        ' DATES NODE '
        Set datesNode = doc.createElement("Dates")
        datesNode.Text = Sheets(1).Range("A" & i)
        dataNode.appendChild datesNode

        ' NAMES NODE '
        For j = 1 To 12
            tmpValue = Sheets(1).Cells(i, j + 1)
            If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then
                Set namesNode = doc.createElement("Name" & j)
                namesNode.Text = Format(tmpValue, "yyyy-mm-dd")
                dataNode.appendChild namesNode
            End If
        Next j

    Next i

    ' PRETTY PRINT RAW OUTPUT '
    xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
            & "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
            & "                xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
            & "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
            & "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
            & "            encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
            & " <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
            & "  <xsl:copy>" _
            & "   <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
            & "  </xsl:copy>" _
            & " </xsl:template>" _
            & "</xsl:stylesheet>"

    xslDoc.async = False
    doc.transformNodeToObject xslDoc, newDoc
    newDoc.Save ActiveWorkbook.Path & "\Output.xml"

    MsgBox "Successfully exported Excel data to XML!", vbInformation
    Exit Sub

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Exit Sub

End Sub
 

Output

<?xml version="1.0" encoding="UTF-8"?> <DataSet> <DataRow> <Dates>Date1</Dates> <Name1>2016-04-23</Name1> <Name2>2016-09-22</Name2> <Name3>2016-09-23</Name3> <Name4>2016-09-24</Name4> <Name5>2016-10-31</Name5> <Name6>2016-09-26</Name6> <Name7>2016-09-27</Name7> <Name8>2016-09-28</Name8> <Name9>2016-09-29</Name9> <Name10>2016-09-30</Name10> <Name11>2016-10-01</Name11> <Name12>2016-10-02</Name12> </DataRow> <DataRow> <Dates>Date2</Dates> <Name1>2016-06-27</Name1> <Name2>2016-08-14</Name2> <Name3>2016-07-08</Name3> <Name4>2016-08-22</Name4> <Name5>2016-11-03</Name5> <Name6>2016-07-28</Name6> <Name7>2016-08-23</Name7> <Name8>2016-11-01</Name8> <Name9>2016-11-01</Name9> <Name10>2016-08-11</Name10> <Name11>2016-08-18</Name11> <Name12>2016-09-23</Name12> </DataRow> ...

更多推荐