天天看点

ArcEngine导出数据量较大的属性表(ICursor接口的使用)

之前的博文“ArcEngine查看数据量较大属性表的解决方案”中,用了ICursor接口来实现数据量较大属性表的分页查询。既然实现了分页查询,那么就有可能会遇到数据量较大的属性表导出至本地的需求(虽然我觉得这样做意义不大,但是毕竟也算是客户的需求)。在这种情况下,一次性导出至本地显然是不合理的,势必会造成内存溢出的异常,所以考虑用ICursor接口。在ArcObject的帮助文档中对ICursor的说明中我看到了这样一句话:

Cursors are forward only; they do not support backing up and retrieving rows that have already been retrieved or making multiple passes over data. If an application needs to make multiple passes over the data, the application needs to reexecute the query that returned the cursor.

大概意思就是游标(Cursor)是单向、不可逆的,不能被检索的,如果要检索或者查询那么就要从头来过,所以对于数据量较大的属性表如果通过行号来检索刚开始可能造成的时间冗余不多,但是查询越往后的分页时间花费得越多,远远超出IO读写所需要得时间。经过测试,用之前没有优化得代码来导出约3万条数据大约需要十分钟,优化之后不到一分钟。下面是核心代码(excel支持的最大行数有限(65535?),所以对于数据量较大的表,导出为txt或csv是更好的选择):

/// <summary>
        /// 分块导出属性表(用于属性表属性较大的情况)
        /// </summary>
        /// <param name="exportPath"></param>
        /// <param name="dataTable"></param>
        /// <param name="shapeType"></param>
        /// <param name="pCursor"></param>
        private bool ExportTableByBlock2(string exportPath, DataTable dataTable, string shapeType, int totalRowCount, ref ICursor pCursor)
        {
            try
            {
                int n = 0;
                IRow pRow = null;
                DataRow pDataRow = null;
                ExportDataTableToCSV(dataTable, exportPath);
                while ((pRow = pCursor.NextRow()) != null)
                {
                    ++n;
                    pDataRow = dataTable.NewRow();
                    //500条追加一次
                    if (n % 500 == 0 || n == totalRowCount)
                    {
                        for (int i = 0; i < pRow.Fields.FieldCount; i++)
                        {
                            //如果字段类型为esriFieldTypeGeometry,则根据图层类型设置字段值
                            if (pRow.Fields.get_Field(i).Type == esriFieldType.esriFieldTypeGeometry)
                            {
                                pDataRow["Shape"] = shapeType;
                            }
                            //当图层类型为Anotation时,要素类中会有esriFieldTypeBlob类型的数据,
                            //其存储的是标注内容,如此情况需将对应的字段值设置为Element
                            else if (pRow.Fields.get_Field(i).Type == esriFieldType.esriFieldTypeBlob)
                            {
                                pDataRow[i] = "Element";
                            }
                            else
                            {
                                pDataRow[pRow.Fields.Field[i].Name] = pRow.get_Value(i);
                            }
                        }
                        dataTable.Rows.Add(pDataRow);
                        pDataRow = null;
                        using (StreamWriter sw = new StreamWriter(new System.IO.FileStream(exportPath, FileMode.Append, FileAccess.Write), Encoding.UTF8))
                        {
                            AppendDataTableToCSV(dataTable, exportPath, sw, n - 500);
                            sw.Close();
                        }
                        dataTable.Rows.Clear();
                        continue;
                    }

                    for (int i = 0; i < pRow.Fields.FieldCount; i++)
                    {
                        //如果字段类型为esriFieldTypeGeometry,则根据图层类型设置字段值
                        if (pRow.Fields.get_Field(i).Type == esriFieldType.esriFieldTypeGeometry)
                        {
                            pDataRow["Shape"] = shapeType;
                        }
                        //当图层类型为Anotation时,要素类中会有esriFieldTypeBlob类型的数据,
                        //其存储的是标注内容,如此情况需将对应的字段值设置为Element
                        else if (pRow.Fields.get_Field(i).Type == esriFieldType.esriFieldTypeBlob)
                        {
                            pDataRow[i] = "Element";
                        }
                        else
                        {
                            pDataRow[pRow.Fields.Field[i].Name] = pRow.get_Value(i);
                        }
                    }
                    dataTable.Rows.Add(pDataRow);
                    pDataRow = null;
                }
                return true;
            }
            catch (Exception ex)
            {
                return false;
                throw ex;
            }
        }        

        /// <summary>
        /// DataTable对象导出成csv
        /// </summary>
        /// <param name="dataTable"></param>
        /// <param name="exportPath"></param>
        /// <param name="isWriteColumnName">是否写入列名,默认写入</param>
        public void ExportDataTableToCSV(DataTable dataTable, string exportPath, bool isWriteColumnName = true)
        {
            using (StreamWriter sw = new StreamWriter(new FileStream(exportPath, FileMode.Create, FileAccess.Write), Encoding.UTF8))
            {
                StringBuilder sb = new StringBuilder();
                if (isWriteColumnName)
                {
                    for (int i = 0; i < dataTable.Columns.Count; i++)
                    {
                        sb.Append(dataTable.Columns[i].ColumnName + ",");
                    }
                    sb.Remove(sb.Length - 1, 1);
                    sw.WriteLine(sb);
                }

                sb = new StringBuilder();
                for (int i = 0; i < dataTable.Rows.Count; i++)
                {
                    sb = new StringBuilder();
                    for (int j = 0; j < dataTable.Columns.Count; j++)
                    {
                        sb.Append(dataTable.Rows[i][j].ToString() + ",");
                    }
                    sb.Remove(sb.Length - 1, 1);
                    sw.WriteLine(sb);
                }
                sw.Close();
            }
        }

        /// <summary>
        /// 通过追加的方式将DataTable导出成xlsx文件(适用于导出数据量较大的情况)
        /// </summary>
        /// <param name="dataTable">导出的DataTable表格(一般是大表中的子表)</param>
        /// <param name="exportPath">导出路径</param>
        /// <param name="sw">写入流对象,需要提前初始化</param>
        /// <param name="rowIndex">行序号(DataTable在大表中的行初始序号)</param>
        /// <param name="isWriteColumnName">是否写入列名,第一个子表为true,后面均为false</param>
        public void AppendDataTableToCSV(DataTable dataTable, string exportPath, StreamWriter sw, int rowIndex, bool isWriteColumnName = false)
        {
            try
            {
                StringBuilder sb = new StringBuilder();
                if (isWriteColumnName)
                {
                    for (int i = 0; i < dataTable.Columns.Count; i++)
                    {
                        sb.Append(dataTable.Columns[i].ColumnName + ",");
                    }
                    sb.Remove(sb.Length - 1, 1);
                    sw.WriteLine(sb);
                }

                sb = new StringBuilder();
                for (int i = 0; i < dataTable.Rows.Count; i++)
                {
                    sb = new StringBuilder();
                    for (int j = 0; j < dataTable.Columns.Count; j++)
                    {
                        sb.Append(dataTable.Rows[i][j].ToString().Replace("\n", "").Replace("\r", "") + ",");
                    }
                    sb.Remove(sb.Length - 1, 1);
                    sw.WriteLine(sb);
                }
            }
            catch (Exception ex)
            {
                throw(ex);
            }
        }
           

大致的思路就是先导出一个空的包含表头的表,然后用一个引用ICursor变量来记录已经遍历的游标,一个数值型变量来记录已导出的条数,每500条追加导出一次。这样既解决了效率的问题,又不容易内存溢出。