射久久,欧洲av一区二区,精品亚洲国产成av人片传媒

文檔對象模型

xml.dom 模塊對于 Python 程序員來說，可能是使用 XML 文檔時功能最強大的工具。不幸的是，XML-SIG 提供的文檔目前來說還比較少。W3C 語言無關的 DOM 規范填補了這方面的部分空白。但 Python 程序員最好有一個特定于 Python 語言的 DOM 的快速入門指南。本文旨在提供這樣一個指南。在上一篇專欄文章中，某些樣本中使用了樣本 quotations.dtd 文件，并且這些文件可以與本文中的代碼樣本檔案文件一起使用。

有必要了解 DOM 的確切含義。這方面，正式解釋非常好：

“文檔對象模型”是平臺無關和語言無關的接口，它允許程序和腳本動態訪問和更新文檔的內容、結構和樣式。可以進一步處理文檔，而處理的結果也可以合并到已顯示的頁面中。（萬維網聯盟 DOM 工作組）

DOM 將 XML 文檔轉換成樹 -- 或森林 -- 表示。萬維網聯盟 (W3C) 規范給出了一個 HTML 表的 DOM 版本作為例子。

詳解Python中DOM方法的動態性

如上圖所示，DOM 從一個更加抽象的角度定義了一組可以遍歷、修剪、改組、輸出和操作樹的方法，而這種方法要比 XML 文檔的線性表示更為便利。

將 HTML 轉換成 XML

有效的 HTML 幾乎就是有效的 XML，但又不完全相同。這里有兩個主要的差異，XML 標記是區分大小寫的，并且所有 XML 標記都需要一個顯式的結束符號（作為結束標記，而這對于某些 HTML 標記是可選的；例如： <img src="X.png" /> ）。使用 xml.dom 的一個簡單示例就是使用 HtmlBuilder() 類將 HTML 轉換成 XML。
try_dom1.py

				?

									"""Convert a valid HTML document to XML

									  USAGE: python try_dom1.py < infile.html > outfile.xml

									"""

									import

									     sys

									    from

									     xml.dom 

									    import

									     core

									    from

									     xml.dom.html_builder 

									    import

									     HtmlBuilder

									    # Construct an HtmlBuilder object and feed the data to it

									b = HtmlBuilder()

									b.feed(sys.stdin.read())

									    # Get the newly-constructed document object

									doc = b.document

									    # Output it as XML

									print

									     doc.toxml()

HtmlBuilder() 類很容易實現它繼承的部分基本 xml.dom.builder 模板的功能，它的源碼值得研究。然而，即使我們自己實現了模板功能，DOM 程序的輪廓還是相似的。在一般情況下，我們將用一些方法構建一個 DOM 實例，然后對該實例進行操作。DOM 實例的 .toxml() 方法是一種生成 DOM 實例的字符串表示的簡單方法（在以上的情況中，只要在生成后將它打印出來）。

將 Python 對象轉換成 XML

Python 程序員可以通過將任意 Python 對象導出為 XML 實例來實現相當多的功能和通用性。這就允許我們以習慣的方式來處理 Python 對象，并且可以選擇最終是否使用實例屬性作為生成 XML 中的標記。只需要幾行（從 building.py 示例派生出），我們就可以將 Python“原生”對象轉換成 DOM 對象，并對包含對象的那些屬性執行遞歸處理。
try_dom2.py

				?

									"""Build a DOM instance from scratch, write it to XML

									  USAGE: python try_dom2.py > outfile.xml

									"""

									import

									     types

									    from

									     xml.dom 

									    import

									     core

									    from

									     xml.dom.builder 

									    import

									     Builder

									    # Recursive function to build DOM instance from Python instance

									defobject_convert

									    (builder, inst):

									    # Put entire object inside an elem w/ same name as the class.

									  builder.startElement(inst.__class__.__name__)

									    for

									     attr 

									    in

									     inst.__dict__.keys():

									    if

									     attr[0] ==

									    '_':   

									    # Skip internal attributes

									     continue

									    value = getattr(inst, attr)

									    if

									     type(value) == types.InstanceType:

									    # Recursively process subobjects

									      object_convert(builder, value)

									    else

									    :

									    # Convert anything else to string, put it in an element

									      builder.startElement(attr)

									      builder.text(str(value))

									      builder.endElement(attr)

									  builder.endElement(inst.__class__.__name__)

									    if

									     __name__ ==

									    '__main__':

									    # Create container classes

									  classquotations

									    : 

									    pass

									  classquotation

									    : 

									    pass

									    # Create an instance, fill it with hierarchy of attributes

									  inst = quotations()

									  inst.title =

									    "Quotations file (not quotations.dtd conformant)"

									  inst.quot1 = quot1 = quotation()

									  quot1.text =

									    """'"is not a quine" is not a quine' is a quine"""

									  quot1.source =

									    "Joshua Shagam, kuro5hin.org"

									  inst.quot2 = quot2 = quotation()

									  quot2.text =

									    "Python is not a democracy. Voting doesn't help. "+\

									    "Crying may..."

									  quot2.source =

									    "Guido van Rossum, comp.lang.python"

									     # Create the DOM Builder

									  builder = Builder()

									  object_convert(builder, inst)

									    print

									     builder.document.toxml()

函數 object_convert() 有一些限制。例如，不可能用以上的過程生成符合 XML 文檔的 quotations.dtd：#PCDATA 文本不能直接放到 quotation 類中，而只能放到類的屬性中（如 .text ）。一個簡單的變通方法就是讓 object_convert() 以特殊方式處理一個帶有名稱的屬性，例如 .PCDATA 。可以用各種方法使對 DOM 的轉換變得更巧妙，但該方法的妙處在于我們可以從整個 Python 對象開始，以簡明的方式將它們轉換成 XML 文檔。

還應值得注意的是在生成的 XML 文檔中，處于同一個級別的元素沒有什么明顯的順序關系。例如，在作者的系統中使用特定版本的 Python，源碼中定義的第二個 quotation 在輸出中卻第一個出現。但這種順序關系在不同的版本和系統之間會改變。Python 對象的屬性并不是按固定順序排列的，因此這種特性就具有意義。對于與數據庫系統相關的數據，我們希望它們具有這種特性，但是對于標記為 XML 的文章卻顯然不希望具有這種特性（除非我們想要更新 William Burroughs 的 "cut-up" 方法）。

將 XML 文檔轉換成 Python 對象

從 XML 文檔生成 Python 對象就像其逆向過程一樣簡單。在多數情況下，用 xml.dom 方法就可以了。但在某些情況下，最好使用與處理所有“類屬”Python 對象相同的技術來處理從 XML 文檔生成的對象。例如，在以下的代碼中，函數 pyobj_printer() 也許是已經用來處理任意 Python 對象的函數。
try_dom3.py

				?

									"""Read in a DOM instance, convert it to a Python object

									"""

									from

									     xml.dom.utils 

									    import

									     FileReader

									    classPyObject

									    : 

									    pass

									defpyobj_printer

									    (py_obj, level=0):

									    """Return a "deep" string description of a Python object"""

									     from

									     string 

									    import

									     join, split

									    import

									     types

									  descript =

									    ''

									     for

									     membname 

									    in

									     dir(py_obj):

									    member = getattr(py_obj,membname)

									    if

									     type(member) == types.InstanceType:

									      descript = descript + (

									    ' '*level) +

									    '{'+membname+

									    '}\n'

									      descript = descript + pyobj_printer(member, level+3)

									    elif

									     type(member) == types.ListType:

									      descript = descript + (

									    ' '*level) +

									    '['+membname+

									    ']\n'

									     for

									     i 

									    in

									     range(len(member)):

									        descript = descript+(

									    ' '*level)+str(i+1)+

									    ': '+ \

									              pyobj_printer(member[i],level+3)

									    else

									    :

									      descript = descript + membname+

									    '='

									      descript = descript + join(split(str(member)[:50]))+

									    '...\n'

									     return

									     descript

									    defpyobj_from_dom

									    (dom_node):

									    """Converts a DOM tree to a "native" Python object"""

									  py_obj = PyObject()

									  py_obj.PCDATA =

									    ''

									     for

									     node 

									    in

									     dom_node.get_childNodes():

									    if

									     node.name ==

									    '#text':

									      py_obj.PCDATA = py_obj.PCDATA + node.value

									    elif

									     hasattr(py_obj, node.name):

									      getattr(py_obj, node.name).append(pyobj_from_dom(node))

									    else

									    :

									      setattr(py_obj, node.name, [pyobj_from_dom(node)])

									    return

									     py_obj

									    # Main test

									dom_obj = FileReader(

									    "quotes.xml").document

									py_obj = pyobj_from_dom(dom_obj)

									    if

									     __name__ ==

									    "__main__":

									    print

									     pyobj_printer(py_obj)

這里的關注焦點應該是函數 pyobj_from_dom() ，特別是起實際作用的 xml.dom 方法 .get_childNodes() 。在 pyobj_from_dom() 中，我們直接抽取標記之間的所有文本，將它放到保留屬性 .PCDATA 中。對于任何遇到的嵌套標記，我們創建一個新屬性，其名稱與標記匹配，并將一個列表分配給該屬性，這樣就可以潛在地包含在在父代塊中多次出現的標記。當然，使用列表要維護在 XML 文檔中遇到的標記的順序。

除了使用舊的 pyobj_printer() 類屬函數（或者，更復雜和健壯的函數）之外，我們可以使用正常的屬性記號來訪問 py_obj 的元素。
Python 交互式會話

				?

									>>> 

									    from

									     try_dom3 

									    import

									     *

									>>> py_obj.quotations[0].quotation[3].source[0].PCDATA

									    'Guido van Rossum, '

重新安排 DOM 樹

DOM 的一大優點是它可以讓程序員以非線性方式對 XML 文檔進行操作。由相匹配的開／關標記括起的每一塊都只是 DOM 樹中的一個“節點”。當以類似于列表的方式維護節點以保留順序信息時，則順序并沒有什么特殊之處，也并非不可改變。我們可以輕易地剪下某個節點，嫁接到 DOM 樹的另一個位置（如果 DTD 允許，甚至嫁接到另一層上）。或者添加新的節點、刪除現有節點，等等。
try_dom4.py

				?

									"""Manipulate the arrangement of nodes in a DOM object

									"""

									from

									     try_dom3 

									    import

									     *

									    #-- Var 'doc' will hold the single <quotations> "trunk"

									doc = dom_obj.get_childNodes()[0]

									    #-- Pull off all the nodes into a Python list

									# (each node is a <quotation> block, or a whitespace text node)

									nodes = []

									    while

									     1:

									    try

									    : node = doc.removeChild(doc.get_childNodes()[0])

									    except

									    : 

									    break

									  nodes.append(node)

									    #-- Reverse the order of the quotations using a list method

									# (we could also perform more complicated operations on the list:

									# delete elements, add new ones, sort on complex criteria, etc.)

									nodes.reverse()

									    #-- Fill 'doc' back up with our rearranged nodes

									for

									     node 

									    in

									     nodes:

									    # if second arg is None, insert is to end of list

									  doc.insertBefore(node, None)

									    #-- Output the manipulated DOM

									print

									     dom_obj.toxml()