R語言 讀取檔案

1. R讀取txt檔案

使用R讀取txt檔案直接使用read.table()方法進行讀取即可，不需要加載額外的包。

read.table("/home/slave/test.txt",header=T,na.strings = c("NA"))

注意，此處的na.strings = c("NA") 的意思是檔案中的缺失資料都是用NA進行表示；在讀取文本檔案時，預設的分割符号為空格。具體的參數設定可參照如下：

read.table(file, header = FALSE, sep = "", quote = "\"'",
           dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
           row.names, col.names, as.is = !stringsAsFactors,
           na.strings = "NA", colClasses = NA, nrows = -1,
           skip = 0, check.names = TRUE, fill = !blank.lines.skip,
           strip.white = FALSE, blank.lines.skip = TRUE,
           comment.char = "#",
           allowEscapes = FALSE, flush = FALSE,
           stringsAsFactors = default.stringsAsFactors(),
           fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)

2. R讀取csv檔案

使用R讀取csv檔案和讀取txt檔案很類似，使用的是read.csv()方法，兩者參數的使用大部分是一樣的。

read.csv("/home/slave/test.csv", header=T, na.strings=c("NA"))
read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", ...)

3. R讀取xls和xlsx檔案

讀取xls和xlsx有很多方法，但是這裡面的很多方法也不是特别好用，例如RODBC包中的讀取xls方法就不太好用，有時還會出現各種各樣的問題。在進行了一番入坑探索之後，找到了兩個相對好用的讀取xls檔案的包，下面我将分别進行說明。

gdata

install.packages("gdata")
library(gdata)
read.xls("/home/slave/test.xls",sheet=1,na.strings=c("NA","#DIV/0!"))

其中sheet=1 參數的意思是讀取第一個sheet中的内容；na.strings=c("NA","#DIV/0!") 将"NA" 和 "#DIV/0!" 都作為缺失資料表示，read.xls()方法的具體參數設定可參考如下：

read.xls(xls, sheet=1, verbose=FALSE, pattern, na.strings=c("NA","#DIV/0!"),
         ..., method=c("csv","tsv","tab"), perl="perl")
![](http://upload-images.jianshu.io/upload_images/9218360-3f24ba87157d2f8e?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

xls2csv(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2tab(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2tsv(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ..., perl="perl")
xls2sep(xls, sheet=1, verbose=FALSE, blank.lines.skip=TRUE, ...,
        method=c("csv","tsv","tab"), perl="perl")

gdata包有着很多的功能，但是它對其他的包的依賴很多，可能會出現各種不可預知的問題，下面介紹一個較少依賴的包。

readxl

install.packages("readxl")
library(readxl)
read_excel("/home/slave/test.xls",sheet=1,na="NA")
read_excel(path, sheet = 1, col_names = TRUE, col_types = NULL, na = "", skip = 0)

readxl軟體包可以很容易地将資料從Excel和R中取出。與許多現有軟體包（例如gdata，xlsx，xlsReadWrite）相比，readxl沒有外部依賴關系，是以它很容易在所有作業系統上安裝和使用。它旨在與表格資料一起工作。readxl支援舊版.xls格式和現代基于xml的.xlsx格式。

用法

library(readxl)
readxl includes several example files, which we use throughout the documentation. Use the helper readxl_example() with no arguments to list them or call it with an example filename to get the path.

readxl_example()
#>  [1] "clippy.xls"    "clippy.xlsx"   "datasets.xls"  "datasets.xlsx"
#>  [5] "deaths.xls"    "deaths.xlsx"   "geometry.xls"  "geometry.xlsx"
#>  [9] "type-me.xls"   "type-me.xlsx"
readxl_example("clippy.xls")
#> [1] "/Users/jenny/resources/R/library/readxl/extdata/clippy.xls"
read_excel() reads both xls and xlsx files and detects the format from the extension.

xlsx_example <- readxl_example("datasets.xlsx")
read_excel(xlsx_example)
#> # A tibble: 150 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1         5.10        3.50         1.40       0.200 setosa 
#> 2         4.90        3.00         1.40       0.200 setosa 
#> 3         4.70        3.20         1.30       0.200 setosa 
#> # ... with 147 more rows

xls_example <- readxl_example("datasets.xls")
read_excel(xls_example)
#> # A tibble: 150 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1         5.10        3.50         1.40       0.200 setosa 
#> 2         4.90        3.00         1.40       0.200 setosa 
#> 3         4.70        3.20         1.30       0.200 setosa 
#> # ... with 147 more rows
List the sheet names with excel_sheets().

excel_sheets(xlsx_example)
#> [1] "iris"     "mtcars"   "chickwts" "quakes"
Specify a worksheet by name or number.

read_excel(xlsx_example, sheet = "chickwts")
#> # A tibble: 71 x 2
#>   weight feed     
#>    <dbl> <chr>    
#> 1   179. horsebean
#> 2   160. horsebean
#> 3   136. horsebean
#> # ... with 68 more rows
read_excel(xls_example, sheet = 4)
#> # A tibble: 1,000 x 5
#>     lat  long depth   mag stations
#>   <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1 -20.4  182.  562.  4.80      41.
#> 2 -20.6  181.  650.  4.20      15.
#> 3 -26.0  184.   42.  5.40      43.
#> # ... with 997 more rows
There are various ways to control which cells are read. You can even specify the sheet here, if providing an Excel-style cell range.

read_excel(xlsx_example, n_max = 3)
#> # A tibble: 3 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1         5.10        3.50         1.40       0.200 setosa 
#> 2         4.90        3.00         1.40       0.200 setosa 
#> 3         4.70        3.20         1.30       0.200 setosa
read_excel(xlsx_example, range = "C1:E4")
#> # A tibble: 3 x 3
#>   Petal.Length Petal.Width Species
#>          <dbl>       <dbl> <chr>  
#> 1         1.40       0.200 setosa 
#> 2         1.40       0.200 setosa 
#> 3         1.30       0.200 setosa
read_excel(xlsx_example, range = cell_rows(1:4))
#> # A tibble: 3 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1         5.10        3.50         1.40       0.200 setosa 
#> 2         4.90        3.00         1.40       0.200 setosa 
#> 3         4.70        3.20         1.30       0.200 setosa
read_excel(xlsx_example, range = cell_cols("B:D"))
#> # A tibble: 150 x 3
#>   Sepal.Width Petal.Length Petal.Width
#>         <dbl>        <dbl>       <dbl>
#> 1        3.50         1.40       0.200
#> 2        3.00         1.40       0.200
#> 3        3.20         1.30       0.200
#> # ... with 147 more rows
read_excel(xlsx_example, range = "mtcars!B1:D5")
#> # A tibble: 4 x 3
#>     cyl  disp    hp
#>   <dbl> <dbl> <dbl>
#> 1    6.  160.  110.
#> 2    6.  160.  110.
#> 3    4.  108.   93.
#> # ... with 1 more row
If NAs are represented by something other than blank cells, set the na argument.

read_excel(xlsx_example, na = "setosa")
#> # A tibble: 150 x 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#> 1         5.10        3.50         1.40       0.200 <NA>   
#> 2         4.90        3.00         1.40       0.200 <NA>   
#> 3         4.70        3.20         1.30       0.200 <NA>   
#> # ... with 147 more rows

特征

沒有外部依賴，例如Java或Perl。

将非ASCII字元重新編碼為UTF-8。

将日期時間加載到POSIXct列中。 Windows（1900）和Mac（1904）日期規格都正确處理。

發現最小資料矩形并預設傳回。使用者可以使用範圍，跳過和n_max進行更多的控制。

預設情況下，列名稱和類型由工作表中的資料确定。使用者也可以通過col_names和col_types提供。

傳回一個tibble，即帶有附加tbl_df類的資料框。除此之外，這提供更好的列印。

R語言讀取檔案

1. R讀取txt檔案

2. R讀取csv檔案

3. R讀取xls和xlsx檔案

用法

特征

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

R語言 讀取檔案

1. R讀取txt檔案

2. R讀取csv檔案

3. R讀取xls和xlsx檔案

用法

特征

繼續閱讀

R語言讀取檔案