当节点只有属性时，如何将XML转换为data.frame？(How to convert XML to data.frame when nodes have only attributes?)

我正在尝试使用XML包和xmlToList或xmlToDataFrame函数。我的输入数据在互联网上（前两行），我只需要处理XML的某些部分（参见第三个nodeset命令）

url<- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true' xml = xmlTreeParse(url,useInternalNode=TRUE) ns <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list')

它是一个类别列表，内部类别是“事件”。并且事件有计数（并且计数特定于临床试验组（例如，药物与安慰剂组）

我只需要这些事件，因此最好的列表是使用xmlToList进行龋齿呼吸停止

xl<-xmlToList(url) set2<-xl$clinical_results$reported_events$serious_events$category_list set2[[3]] > set2[[3]] $title [1] "Cardiac disorders" $event_list $event_list$event $event_list$event$sub_title [1] "Cardio-respiratory arrest" $event_list$event$counts group_id events subjects_affected subjects_at_risk "E1" "1" "1" "260" $event_list$event$counts group_id events subjects_affected subjects_at_risk "E2" "0" "0" "255"

由于此错误，我无法使用xmlToDataFrame。（nodeset2包含XMLattributes中的所有数据，我认为xmlTODataFrame可能不喜欢这个）

hopefulyDF <- getNodeSet(xml, '/clinical_study/clinical_results/reported_events/serious_events/category_list/category/event_list/event/counts') xmlToDataFrame(node = hopefulyDF) Error in matrix(vals, length(nfields), byrow = TRUE) : 'data' must be of a vector type, was 'NULL'

如何最好地提取计数数据？我尝试取消列表，但我可能没有足够的进步。我想避免循环和手动xmlGetAttr。但在最坏的情况下，任何解决方案都被接受。我发现XML包非常密集，有2个版本的XML数据作为列表和NodeSets ...... :-(

理想的输出看起来像这样:(所有事件（不仅仅是第3行）

event group_ID numerator denumerator Cardio-respiratory arrest E1 1 260 Cardio-respiratory arrest E2 0 250

（甚至有一个类别栏（心脏疾病） - 这将是非常理想的）

ps我用过这个问题如何将XML数据转换为data.frame？而那个问题R列表到数据框但没有运气。 :-(

I am trying to use XML package and either xmlToList or xmlToDataFrame function. My input data is on the internet (first 2 lines) and I only need to work with certain part of the XML (see the third nodeset command)

It is a list of categories and inside categories are “events”. And events have counts (and counts are specific to clinical trial arms (eg, drug vs. placebo arms)

I only need the events, so the best listing is here for cario-respiratory arrest using xmlToList

I am not able to use xmlToDataFrame due to this error. (the nodeset2 has all data in XMLattributes and I think the xmlTODataFrame may not like this)

How to best extract the counts data? I tried unlist but I am not advanced in R enough, probably. I would like to avoid loop and manual xmlGetAttr. But in the worst case, any solution is accepted. I find the XML package very dense with 2 version of XML data as list and as NodeSets... :-(

Ideal output would look like this: (all events(not just row 3)

event group_ID numerator denumerator Cardio-respiratory arrest E1 1 260 Cardio-respiratory arrest E2 0 250

(or even have a category column (cardiac disorders) - that would be super-ideal)

p.s. I used this question How to transform XML data into a data.frame? and that question R list to data frame but with no luck. :-(

最满意答案

您可以通过迭代每个event并通过相对XPath提取counts属性来简化XML提取。通过使用rbindlist包中的data.table ，您可以处理缺少的属性而无需添加条件代码：

library(XML) library(data.table) url <- 'http://ClinicalTrials.gov/show/NCT00191100?resultsxml=true' xml <- xmlTreeParse(url,useInternalNode=TRUE) ns <- getNodeSet(xml, '//event') rbindlist(lapply(ns, function(x) { event <- xmlValue(x) data.frame(event, t(xpathSApply(x, ".//counts", xmlAttrs))) }), fill=TRUE) ## event group_id subjects_affected events subjects_at_risk ## 1: Total, serious adverse events E1 44 NA NA ## 2: Total, serious adverse events E2 17 NA NA ## 3: Anaemia E1 6 6 260 ## 4: Anaemia E2 0 0 255 ## 5: Febrile neutropenia E1 6 6 260 ## --- ## 174: Cough E2 15 16 255 ## 175: Pruritus E1 14 16 260 ## 176: Pruritus E2 9 9 255 ## 177: Hypertension E1 19 19 260 ## 178: Hypertension E2 21 21 255

如果需要，您始终可以将其转换回data.frame和/或重命名列。

You can simplify the XML extraction by iterating over each event and extracting the counts attributes via a relative XPath. By using rbindlist from the data.table package, you can deal with the missing attributes without adding in conditional code:

You can always convert it back to a data.frame and/or rename columns if needed.

更多推荐