量数据导出到JSON文件中

有时，您需要将大量数据导出到JSON文件中。可能是“将所有数据导出到JSON”，或者是GDPR“可移植性的权利”，您实际上需要这样做。

与任何大型数据集一样，您不能将其全部放入内存中并将其写入文件。这需要一段时间，它从数据库中读取了很多条目，您需要小心，不要使这种导出重载整个系统，或者耗尽内存。

幸运的是，在杰克逊的帮助下，这样做很简单<code>SequenceWriter</code>和可选的管道流。下面是它的样子：

<col>

<code>private</code> <code>ObjectMapper jsonMapper = </code><code>new</code> <code>ObjectMapper();</code>

<code>private</code> <code>ExecutorService executorService = Executors.newFixedThreadPool(</code><code>5</code><code>);</code>

<code>@Async</code>

<code>public</code> <code>ListenableFuture<Boolean> export(UUID customerId) {</code>

<code>try</code> <code>(PipedInputStream in = </code><code>new</code> <code>PipedInputStream();</code>

<code>PipedOutputStream pipedOut = </code><code>new</code> <code>PipedOutputStream(in);</code>

<code>GZIPOutputStream out = </code><code>new</code> <code>GZIPOutputStream(pipedOut)) {</code>

<code>Stopwatch stopwatch = Stopwatch.createStarted();</code>

<code>ObjectWriter writer = jsonMapper.writer().withDefaultPrettyPrinter();</code>

<code>try</code><code>(SequenceWriter sequenceWriter = writer.writeValues(out)) {</code>

<code>sequenceWriter.init(</code><code>true</code><code>);</code>

<code>Future<?> storageFuture = executorService.submit(() -></code>

<code>storageProvider.storeFile(getFilePath(customerId), in));</code>

<code>int</code> <code>batchCounter = </code><code>0</code><code>;</code>

<code>while</code> <code>(</code><code>true</code><code>) {</code>

<code>List<Record> batch = readDatabaseBatch(batchCounter++);</code>

<code>for</code> <code>(Record record : batch) {</code>

<code>sequenceWriter.write(entry);</code>

<code>if</code> <code>(batch.isEmpty()) {</code>

<code>// if there are no more batches, stop.</code>

<code>break</code><code>;</code>

<code>// wait for storing to complete</code>

<code>storageFuture.get();</code>

<code>// send the customer a notification and a download link</code>

<code>notifyCustomer(customerId);</code>

<code>logger.info(</code><code>"Exporting took {} seconds"</code><code>, stopwatch.stop().elapsed(TimeUnit.SECONDS));</code>

<code>return</code> <code>AsyncResult.forValue(</code><code>true</code><code>);</code>

<code>} </code><code>catch</code> <code>(Exception ex) {</code>

<code>logger.error(</code><code>"Failed to export data"</code><code>, ex);</code>

<code>return</code> <code>AsyncResult.forValue(</code><code>false</code><code>);</code>

代码做了几件事：

http://www.itangyuan.com/book/16245130.html

https://www.wenjuan.com/s/UZBZJvTYoO/

使用SequenceWriter连续写入记录。它是用OutputStream初始化的，所有内容都写入到OutputStream中。这可能是一个简单的FileOutputStream，或者如下所述的管道流。注意这里的命名有点误导-<code>writeValues(out)</code>听起来，您现在正在指示作者编写一些东西；相反，它将其配置为稍后使用特定的流。

这个<code>SequenceWriter</code>初始化为<code>true</code>，意思是“在数组中包装”。您正在编写许多相同的记录，因此它们应该在最终的JSON中表示一个数组。

使用<code>PipedOutputStream</code>和<code>PipedInputStream</code>链接<code>SequenceWriter</code>转到<code>InputStream</code>然后传递给存储服务。如果我们显式地处理文件，就不需要这样做--只需传递一个<code>FileOutputStream</code>就行了。但是，您可能希望以不同的方式存储文件，例如在AmazonS 3中，并且在那里，putObject调用需要一个InputStream来读取数据并将其存储在S3中。因此，实际上，您正在写入一个OutputStream，它直接写到InputStream，当被攻击时，InputStream将所有的内容都写入另一个OutputStream

存储文件是在一个单独的线程中调用的，这样对文件的写入不会阻塞当前线程，该线程的目的是从数据库中读取。同样，如果使用了简单的FileOutputStream，则不需要这样做。

整个方法被标记为@异步(Spring)，这样它就不会阻止执行--它将被调用并在准备就绪时结束(使用内部SpringExecutor服务和一个有限的线程池)

此处不显示数据库批处理读取代码，因为它根据数据库的不同而有所不同。关键是，您应该分批获取数据，而不是从X中选择*。

OutputStream封装在GZIPOutputStream中，因为具有重复元素的文本文件(如JSON)从压缩中明显受益

主要的工作是由杰克逊的SequenceWriter完成的，(有点明显的)要点是--不要假设您的数据会被存储在内存中。它几乎从来没有这样做过，所以每件事都是分批和增量写的

量数据导出到JSON文件中

继续阅读

Linxu常用命令技巧汇总

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

艰难安装LDAP,SSL认证

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

MySQL的4种隔离级别？出现问题

XX系统实施过程问题总结

无组件上传图片到数据库中，最完整解决方案

【MySQL数据库】数据库索引事务1.索引2.事务

neo4j之cypher使用文档

NOSQL安全攻击

mybatis_入门程序Mybatis入门

vue-cli简介（中文翻译）

登录plsql 报错 the account is locked --用户被锁

Ajax发送和获取json数据到Spring mvc 1.spring mvc后端2.web前段

SequoiaDB巨杉数据库C++驱动概述

JSONObject包导入异常 java.lang.NoClassDefFoundErrorweb项目的导入包的问题