使用Go語言編寫高效的網絡爬蟲程式

下面是一些常用的方法，可用于編寫使用Go語言編寫爬蟲程式：

使用net/http包進行HTTP請求

Go語言的标準庫提供了net/http包，可用于HTTP請求。您可以使用http.Get()函數或http.Post()函數發送HTTP GET或POST請求，并使用ioutil.ReadAll()函數讀取響應體。

示例代碼：

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    resp, err := http.Get("https://example.com")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        panic(err)
    }

    fmt.Println(string(body))
}

使用goquery包解析HTML文檔

goquery是一個流行的Go語言包，可用于解析HTML文檔。它提供了一個類似于jQuery的API，易于使用和學習。

示例代碼：

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "net/http"
)

func main() {
    resp, err := http.Get("https://example.com")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        panic(err)
    }

    doc.Find("a").Each(func(i int, s *goquery.Selection) {
        href, _ := s.Attr("href")
        fmt.Println(href)
    })
}

在上面的示例中，我們使用goquery包解析名為example.com的網頁，并列印出所有連結的href屬性。

存儲資料

在爬取資料後，您可能需要将資料存儲到本地檔案或資料庫中。Go語言中可以使用ioutil.WriteFile()函數将資料存儲到本地檔案中，并使用database/sql包将資料存儲到資料庫中。

示例代碼：

package main

import (
    "database/sql"
    "fmt"
    "io/ioutil"
    "net/http"

    _ "github.com/go-sql-driver/mysql"
)

func main() {
    resp, err := http.Get("https://example.com")
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        panic(err)
    }

    // 将資料存儲到本地檔案
    err = ioutil.WriteFile("data.txt", body, 0644)
    if err != nil {
        panic(err)
    }

    // 将資料存儲到資料庫
    db, err := sql.Open("mysql", "user:password@tcp(localhost:3306)/database")
    if err != nil {
        panic(err)
    }
    defer db.Close()

    stmt, err := db.Prepare("INSERT INTO data (body) VALUES (?)")
    if err != nil {
        panic(err)
    }
    defer stmt.Close()

    _, err = stmt.Exec(body)
    if err != nil {
        panic(err)
    }

    fmt.Println("Data stored successfully")
}

在上面的示例中，我們使用ioutil.WriteFile()函數将資料存儲到名為data.txt的本地檔案中，并使用sql包将資料存儲到名為database的MySQL資料庫中。請注意，您需要使用正确的MySQL連接配接字元串替換user、password和database參數。

并發爬取

在爬取網站時，您可能需要同時發送多個HTTP請求。在Go語言中，可以使用goroutine和channel來實作并發爬取。使用goroutine可讓您同時啟動多個任務，使用channel可讓它們之間進行通信。

示例代碼：

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "net/http"
)

func main() {
    urls := []string{
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3",
    }

    ch := make(chan string)

    for _, url := range urls {
        go func(url string) {
            resp, err := http.Get(url)
            if err != nil {
                panic(err)
            }
            defer resp.Body.Close()

            doc, err := goquery.NewDocumentFromReader(resp.Body)
            if err != nil {
                panic(err)
            }

            ch <- doc.Find("title").Text()
        }(url)
    }

    for i := 0; i < len(urls); i++ {
        fmt.Println(<-ch)
    }
}

在上面的示例中，我們使用goroutine同時爬取多個網頁，并使用channel将每個網頁的标題發送回主函數。使用len(urls)循環等待所有任務完成，并列印出每個網頁的标題。

總結：

以上是一些使用Go語言編寫爬蟲程式的常用方法。當然，這隻是冰山一角，實際上還有很多其他的技術和工具可以使用。建議您在編寫爬蟲程式之前，對目标網站的反爬機制和使用條款進行調查，并盡可能遵守相關規定和法律。

使用Go語言編寫高效的網絡爬蟲程式

繼續閱讀

Golang的goroutine協程和channel通道

深入golang之---goroutine并發控制與通信通知多個子goroutine退出運作控制并發的方法參考連結

celery分布式

fabric go語言鍊碼打包并在其他Peer節點部署

2023爬蟲學習筆記 -- 多線程操作

Go入門系列（十四） go并發程式設計之Goroutine與channel（上）

go語言的并發原理（goroutine）

【GO面試精要】GMP并發模型、GoroutineGMP并發模型

Go小白入門7 - 結構體、方法 & 接口

go 語言 for{} 語句性能優化

Go語言交叉編譯二進制檔案

go 1.16.4： go mod tidy的用途

Go 結構體方法指針和值的差別

go語言筆記（結構體、方法、接口）go語言筆記（結構體、方法、接口）

Go中結構體和接口的定義

Boss直聘Python爬蟲實戰