JAVA使用XPath和XQuary查询xml文件

xml文件
文件处理
建立java工程
在idea中引入jar包
XQuary
XPath

xml文件

下载地址为https://dblp.uni-trier.de/xml 下载的dblp.xml.gz文件解压后有两个多G直接读取导致程序的内存溢出。

文件处理

处理的方法是使用window中的cmd的more 命令。使用cd 命令将目录转到xml文件的目录。使用命令 more dblp.xml

使用如下

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

然后按方向键↓则可以依次刷出数据，避免了加载整个文件导致的内存过大的问题。本例子截取了前面几十个article并手动补全了最后的dblp标签。另存为了dblp_part.xml放到了C盘的根目录下。

在语言的选择中，先是选用了python对文件进行处理，但是在用python完成使用xPath对xml的处理后发现在python环境中很难实现用xQuary对文件进行处理，所以转去使用java。

ide的选择是idea。

建立java工程

选择在idea中搭建java环境。首先在电脑安装java并配置环境变量这里不再赘述，接着安装idea。

idea安装完成后新建java工程的步骤如下：

File->New->Project…

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

在弹出的界面中确保java环境然后一路Next

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

记住在最后一步可以选择工程的名字

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

工程建立后在左边的目录中找到src右键 New->Java Class

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

为新文件命名，然后点击Class

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

然后就可以愉快的编程了

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

在idea中引入jar包

有时候在完成工作的时候需要引入第三方的jar包，首先需要下载相应的jar包，然后把其加到工程中。

在idea中引入下载的jar的方法如下：

File->Project Structure->

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

然后点Modules->✚->JARs or directories

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

选完文件后点击Apply即可把jar添加到工程中。如此，一些不能import的文件就能import了，于是一些不能用的函数就可以引用了。

XQuary

本案例使用的是saxon环境运行的XQuary。

请自行寻找资源或者 https://download.csdn.net/download/xman4code/12082958

首先下载saxon插件下载完成解压后找到三个jar包：

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

把jar包引入到工程里后准备工作已经就绪，下面可以正式开始了。

首先用记事本或者notepad++新建如下内容的文档。

for $x in doc("C://dblp_part.xml")/books/book
where $x/price>30
return $x/title

内容为xQuary查询语言可自行编辑。

然后保存为dblp.xqy放到C盘根目录。

然后java代码如下：

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;

import javax.xml.xquery.XQConnection;
import javax.xml.xquery.XQDataSource;
import javax.xml.xquery.XQException;
import javax.xml.xquery.XQPreparedExpression;
import javax.xml.xquery.XQResultSequence;

import com.saxonica.xqj.SaxonXQDataSource;

public class XQuaryExperiment {
    public static void main(String[] args) {
        try {
            quary();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (XQException e) {
            e.printStackTrace();
        }
    }

    //调用xqy文件得出查询结果
    private static void quary() throws FileNotFoundException, XQException {
        InputStream inputStream = new FileInputStream(new File("C://dblp.xqy"));
        XQDataSource ds = new SaxonXQDataSource();
        XQConnection conn = ds.getConnection();
        XQPreparedExpression exp = conn.prepareExpression(inputStream);
        XQResultSequence result = exp.executeQuery();

        while (result.next()) {
            System.out.println(result.getItemAsString(null));
        }
    }

}

XPath

XPath的方法和之前的类似，不过需要的jar为dom4j和jaxen。请自行寻找资源或者在https://download.csdn.net/download/xman4code/12082938下载

下载完成后文件结构如下：

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

一个文件是dom4j-1.6.1.jar,一个文件在lib文件夹里，点开lib文件夹则可以找到jaxen-1.1-beta-6.jar。

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

把jar引入到工程后具体的代码如下：

import java.io.File;
import java.util.List;
import org.dom4j.Document;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

public class XPathExperiment {
    public static void main(String[] args) throws Exception {
        xpathSelect();
    }

    public static void xpathSelect() throws Exception {
        SAXReader saxReader = new SAXReader();
        Document doc = saxReader.read(new File("D:\\workspace\\dblp.xml"));
        //随便写的十个查询
        String xpath1 = "//article[@publtype = 'informal']/title";
        String xpath2 = "/dblp/article/author[1]";
        String xpath3 = "/dblp/article/author[last()]";
        String xpath4 = "/dblp/article[2]/author[last()-1]";
        String xpath5 = "/dblp/article[title = 'Meltdown']/author";
        String xpath6 = "/dblp/article[3]/title";
        String xpath7 = "/dblp/*[author = 'Daniel Gruss']/title";
        String xpath8 = "/dblp/article[2]/author";
        String xpath9 = "/dblp/article[year = '1991']/title";
        String xpath10 = "/dblp/article[journal = 'GTE Laboratories Incorporated']/title";

        //单一结果的查询
        Element contactElem = (Element) doc.selectSingleNode(xpath5);
        String nodeName = contactElem.getText();
        System.out.println(contactElem.getName() + ":" + nodeName);

        //多个结果的查询
        List<Element> authorList = doc.selectNodes(xpath10);
        for (Element autherNode : authorList) {
            System.out.println(autherNode.getName() + ":" + autherNode.getData());
        }
    }
}

JAVA使用XPath和XQuary查询xml文件xml文件文件处理建立java工程在idea中引入jar包XQuaryXPath

JAVA使用XPath和XQuary查询xml文件

xml文件

文件处理

建立java工程

在idea中引入jar包

XQuary

XPath

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method