天天看點

鄰居子系統 arp 輸入

要成功添加一條鄰居表項,需要滿足兩個條件:

1. 本機使用該表項;

2. 對方主機進行了确認。

同時,表項的添加引入了NUD(Neighbour Unreachability Detection)機制,從建立NUD_NONE到可用NUD_REACHABLE需要經曆一系列狀态轉移,

而根據達到兩個條件順序的不同,可以分為兩條路線:

 先引用再确認- NUD_NONE -> NUD_INCOMPLETE -> NUD_REACHABLE

先确認再引用- NUD_NONE -> NUD_STALE -> NUD_DELAY -> NUD_PROBE -> NUD_REACHABLE

/*
 *    Process an arp request.
 */

static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
{
    struct net_device *dev = skb->dev;
    struct in_device *in_dev = __in_dev_get_rcu(dev);
    struct arphdr *arp;
    unsigned char *arp_ptr;
    struct rtable *rt;
    unsigned char *sha;
    unsigned char *tha = NULL;
    __be32 sip, tip;
    u16 dev_type = dev->type;
    int addr_type;
    struct neighbour *n;
    struct dst_entry *reply_dst = NULL;
    bool is_garp = false;

    /* arp_rcv below verifies the ARP header and verifies the device
     * is ARP'able.
     */

    if (!in_dev)//dev->ip_ptr 确認網絡裝置的ip配置塊是否正常
        goto out_free_skb;

    arp = arp_hdr(skb);

    switch (dev_type) {
    default:
        if (arp->ar_pro != htons(ETH_P_IP) ||
            htons(dev_type) != arp->ar_hrd)
            goto out_free_skb;
        break;
    case ARPHRD_ETHER:
    case ARPHRD_FDDI:
    case ARPHRD_IEEE802:
        /*
         * ETHERNET, and Fibre Channel (which are IEEE 802
         * devices, according to RFC 2625) devices will accept ARP
         * hardware types of either 1 (Ethernet) or 6 (IEEE 802.2).
         * This is the case also of FDDI, where the RFC 1390 says that
         * FDDI devices should accept ARP hardware of (1) Ethernet,
         * however, to be more robust, we'll accept both 1 (Ethernet)
         * or 6 (IEEE 802.2)
         */
        if ((arp->ar_hrd != htons(ARPHRD_ETHER) &&
             arp->ar_hrd != htons(ARPHRD_IEEE802)) ||
            arp->ar_pro != htons(ETH_P_IP))
            goto out_free_skb;
        break;
    case ARPHRD_AX25:
        if (arp->ar_pro != htons(AX25_P_IP) ||
            arp->ar_hrd != htons(ARPHRD_AX25))
            goto out_free_skb;
        break;
    case ARPHRD_NETROM:
        if (arp->ar_pro != htons(AX25_P_IP) ||
            arp->ar_hrd != htons(ARPHRD_NETROM))
            goto out_free_skb;
        break;
    }

    /* Understand only these message types 
隻處理arp reply request 請求
    */

    if (arp->ar_op != htons(ARPOP_REPLY) &&
        arp->ar_op != htons(ARPOP_REQUEST))
        goto out_free_skb;

/*
 *    Extract fields
 */
    arp_ptr = (unsigned char *)(arp + 1);
    sha    = arp_ptr;
    arp_ptr += dev->addr_len;
    memcpy(&sip, arp_ptr, 4);//發送方 sip
    arp_ptr += 4;
    switch (dev_type) {
#if IS_ENABLED(CONFIG_FIREWIRE_NET)
    case ARPHRD_IEEE1394:
        break;
#endif
    default:
        tha = arp_ptr;
        arp_ptr += dev->addr_len;
    }
    //目的ip
    memcpy(&tip, arp_ptr, 4);
/*
 *    Check for bad requests for 127.x.x.x and requests for multicast
 *    addresses.  If this is one such, delete it.
 */
 /*
 丢棄目标ip為多點傳播 或者 在沒有開啟route localnet條件下,
 丢棄lo位址route_localnet:作用如下該參數指定一個網絡裝置是否允許轉發目的或源位址為127/8的資料包,
 也就是來自或去往lo裝置的資料包
  Do not consider loopback addresses as martian source or destination while routing.
  This enables the use of 127/8 for local routing purposes
*/
    if (ipv4_is_multicast(tip) ||
        (!IN_DEV_ROUTE_LOCALNET(in_dev) && ipv4_is_loopback(tip)))
        goto out_free_skb;

 /*
  *    For some 802.11 wireless deployments (and possibly other networks),
  *    there will be an ARP proxy and gratuitous ARP frames are attacks
  *    and thus should not be accepted.
  */
    if (sip == tip && IN_DEV_ORCONF(in_dev, DROP_GRATUITOUS_ARP)//丢棄免費arp封包)
        goto out_free_skb;

/*
 *     Special case: We must set Frame Relay source Q.922 address
 */
    if (dev_type == ARPHRD_DLCI)
        sha = dev->broadcast;

/*
 *  Process entry.  The idea here is we want to send a reply if it is a
 *  request for us or if it is a request for someone else that we hold
 *  a proxy for.  We want to add an entry to our cache if it is a reply
 *  to us or if it is a request for our address.
 *  (The assumption for this last is that if someone is requesting our
 *  address, they are probably intending to talk to us, so it saves time
 *  if we cache their address.  Their address is also probably not in
 *  our cache, since ours is not in their cache.)
 *
 *  Putting this another way, we only care about replies if they are to
 *  us, in which case we add them to the cache.  For requests, we care
 *  about those for us and those for our proxies.  We reply to both,
 *  and in the case of requests for us we add the requester to the arp
 *  cache.
 */

    if (arp->ar_op == htons(ARPOP_REQUEST) && skb_metadata_dst(skb))
        reply_dst = (struct dst_entry *)
                iptunnel_metadata_reply(skb_metadata_dst(skb),
                            GFP_ATOMIC);

    /* Special case: IPv4 duplicate address detection packet (RFC2131)
        用來檢測沖突的arp封包
    */
    if (sip == 0) {
        //在确定目标封包為本機本地ip後
        if (arp->ar_op == htons(ARPOP_REQUEST) &&
            inet_addr_type_dev_table(net, dev, tip) == RTN_LOCAL &&
            !arp_ignore(in_dev, sip, tip))
            //arp_ignore參數的作用是控制系統在收到外部的arp請求時,是否要傳回arp響應。
            //發送arp應答
            arp_send_dst(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip,
                     sha, dev->dev_addr, sha, reply_dst);
        goto out_consume_skb;
    }
/*如果是arp請求 根據arp 的目的ip tip  查找路由*/
    if (arp->ar_op == htons(ARPOP_REQUEST) &&
        ip_route_input_noref(skb, tip, sip, 0, dev) == 0) {

        rt = skb_rtable(skb);
        addr_type = rt->rt_type;

        if (addr_type == RTN_LOCAL) {//處理發送給本機的arp 請求
            int dont_send;

            dont_send = arp_ignore(in_dev, sip, tip);
            if (!dont_send && IN_DEV_ARPFILTER(in_dev))
                dont_send = arp_filter(sip, tip, dev);
            if (!dont_send) {
                //call neigh_update(neigh, lladdr, NUD_STALE, NEIGH_UPDATE_F_OVERRIDE, 0); 更新鄰居表項
                n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
                if (n) {
                    arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
                             sip, dev, tip, sha,
                             dev->dev_addr, sha,
                             reply_dst);
                    neigh_release(n);
                }
            }
            goto out_consume_skb;
        } else if (IN_DEV_FORWARD(in_dev)) {//收到的arp 請求不是本機的封包
            if (addr_type == RTN_UNICAST  &&
               (arp_fwd_proxy(in_dev, dev, rt) ||
                 arp_fwd_pvlan(in_dev, dev, rt, sip, tip) ||
                 (rt->dst.dev != dev &&
                  pneigh_lookup(&arp_tbl, net, &tip, dev, 0)))) {
              /*
              1. 是否允許代理
              2.rp 輸入輸出裝置 不是同一個且arp表中有相關代理?
              neigh_event_ns()與neigh_release()配套使用并不代表建立後又被釋放?
              琻eigh被釋放的條件是neigh->refcnt==0,但neigh建立時的refcnt=1,
              而neigh_event_ns會使refcnt+1,neigh_release會使-1,
              此時refcnt的值還是1,
              隻有當下次單獨調用neigh_release時才會被釋放。

              ?
              */
                n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
                if (n)
                    neigh_release(n);//釋放鄰居表項
                /*如果封包來自封包緩沖隊列 或者arp封包發送給本機
                    arp 代理不需要延時 直接回複應答封包
                */
                if (NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED ||
                    skb->pkt_type == PACKET_HOST ||
                    NEIGH_VAR(in_dev->arp_parms, PROXY_DELAY) == 0) {
                    arp_send_dst(ARPOP_REPLY, ETH_P_ARP,
                             sip, dev, tip, sha,
                             dev->dev_addr, sha,
                             reply_dst);
                } else {
                //需要延時處理代理封包請求 加入隊列 啟動定時器
                    pneigh_enqueue(&arp_tbl,
                               in_dev->arp_parms, skb);
                    goto out_free_dst;
                }
                goto out_consume_skb;
            }
        }
    }

    /* Update our ARP tables 
對于arp 應答 reply 或者沒有處理的arp請求
    */
//neigh_lookup 最後參數為0 表示隻是查找 找不到不會建立
//根據sip 查找
    n = __neigh_lookup(&arp_tbl, &sip, dev, 0);
/*
arp_accept - BOOLEAN
    Define behavior for gratuitous ARP frames who's IP is not
    already present in the ARP table:
    0 - don't create new entries in the ARP table
    1 - create new entries in the ARP table

    Both replies and requests type gratuitous arp will trigger the
    ARP table to be updated, if this setting is on.

    If the ARP table already contains the IP address of the
    gratuitous arp frame, the arp table will be updated regardless
    if this setting is on or off.

*/
    addr_type = -1;
    if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
        //是否為免費arp請求
        is_garp = arp_is_garp(net, dev, &addr_type, arp->ar_op,
                      sip, tip, sha, tha);
    }

    if (IN_DEV_ARP_ACCEPT(in_dev)) {
        /* Unsolicited ARP is not accepted by default.
           It is possible, that this option should be enabled for some
           devices (strip is candidate)
         */
        if (!n &&
            (is_garp ||//如果是免費arp 建立 neigh 
             (arp->ar_op == htons(ARPOP_REPLY) &&
              (addr_type == RTN_UNICAST ||
               (addr_type < 0 &&
            /* postpone calculation to as late as possible */
            inet_addr_type_dev_table(net, dev, sip) ==
                RTN_UNICAST)))))
            n = __neigh_lookup(&arp_tbl, &sip, dev, 1);
    }

    if (n) {
        int state = NUD_REACHABLE;
        int override;

        /* If several different ARP replies follows back-to-back,
           use the FIRST one. It is possible, if several proxy
           agents are active. Taking the first reply prevents
           arp trashing and chooses the fastest router.
         */
        override = time_after(jiffies,
                      n->updated +
                      NEIGH_VAR(n->parms, LOCKTIME)) ||
               is_garp;

        /* Broadcast replies and request packets
           do not assert neighbour reachability.
         */
        if (arp->ar_op != htons(ARPOP_REPLY) ||
            skb->pkt_type != PACKET_HOST)
            state = NUD_STALE;
        /*如果是發送給本機的arp reply  則應該是 reachbale 狀态
            否者NUD_STALE,如果跟新時間已經超過LOCKTIME 
            則直接NEIGH_UPDATE_F_OVERRIDE
        */
        neigh_update(n, sha, state,
                 override ? NEIGH_UPDATE_F_OVERRIDE : 0, 0);
        neigh_release(n);
    }
/*
先引用再确認- NUD_NONE -> NUD_INCOMPLETE -> NUD_REACHABLE
 先确認再引用- NUD_NONE -> NUD_STALE -> NUD_DELAY -> NUD_PROBE -> NUD_REACHABLE
NEIGH_CB(skb)實際就是skb->cb,在skb聲明為u8 char[48],它用作每個協定模
塊的私有資料區(control buffer),每個協定子產品可以根據自
身需求在其中存儲私有資料。
而arp子產品就利用了它存儲控制結構neighbour_cb,
它聲明如下,占8位元組。這個控制結構在代理ARP中使用
工作隊列時會發揮作用,sched_next代表下次被排程的時間,
flags是标志。

收到arp請求,NUD_NONE -> NUD_STALE;
收到arp響應,NUD_INCOMPLETE/NUD_DELAY/NUD_PROBE -> NUD_REACHABLE。

還存在NUD_NONE -> NUD_REACHABLE和NUD_INCOMPLETE -> NUD_STALE的轉移????

neigh_timer_handler定時器、neigh_periodic_work工作隊列會異步的更改NUD狀态,
neigh_timer_handler用于NUD_INCOMPLETE, NUD_DELAY, NUD_PROBE, NUD_REACHABLE狀态;
neigh_periodic_work用于NUD_STALE。注意neigh_timer_handler是每個表項一個的,
而neigh_periodic_work是唯一的,NUD_STALE狀态的表項沒必要單獨使用定時器,
定期檢查過期就可以了,這樣大大節省了資源。
       neigh_update則專門用于更新表項狀态,neigh_send_event則是
       解析表項時的狀态更新

*/
out_consume_skb:
    consume_skb(skb);

out_free_dst:
    dst_release(reply_dst);
    return NET_RX_SUCCESS;

out_free_skb:
    kfree_skb(skb);
    return NET_RX_DROP;
}      
/*
Define different modes for sending replies in response toreceived ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configuredon any interface
1 - reply only if the target IP address is local addressconfigured on the incoming interface
2 - reply only if the target IP address is local addressconfigured on the incoming interface and both 
with thesender's IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host,only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses

0:響應任意網卡上接收到的對本機IP位址的arp請求(包括環回網卡上的位址),
而不管該目的IP是否在接收網卡上。
1:隻響應目的IP位址為接收網卡上的本地位址的arp請求。
2:隻響應目的IP位址為接收網卡上的本地位址的arp請求,并且arp請求的源IP必須和接收網卡同網段。
3:如果ARP請求資料包所請求的IP位址對應的本地位址其作用域(scope)為主機(host),
則不回應ARP響應資料包,如果作用域為全局(global)或鍊路(link),則回應ARP響應資料包。
4~7:保留未使用
8:不回應所有的arp請求作者?

*/
static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
{
    struct net *net = dev_net(in_dev->dev);
    int scope;

    switch (IN_DEV_ARP_IGNORE(in_dev)) {
    case 0:    /* Reply, the tip is already validated */
        return 0;
    case 1:    /* Reply only if tip is configured on the incoming interface */
        sip = 0;
        scope = RT_SCOPE_HOST;
        break;
    case 2:    /*
         * Reply only if tip is configured on the incoming interface
         * and is in same subnet as sip
         */
        scope = RT_SCOPE_HOST;
        break;
    case 3:    /* Do not reply for scope host addresses */
        sip = 0;
        scope = RT_SCOPE_LINK;
        in_dev = NULL;
        break;
    case 4:    /* Reserved */
    case 5:
    case 6:
    case 7:
        return 0;
    case 8:    /* Do not reply */
        return 1;
    default:
        return 0;
    }
    return !inet_confirm_addr(net, in_dev, sip, tip, scope);
}
/*
根據arp請求中的發送方ip 目的ip ,查找輸出到arp請求封包的發送方路由
arp_filter -
BOOLEAN 1 - Allows you to have multiple network interfaces on the same subnet,
and have the ARPs for each interface be answered based on whether or not the kernel would route a packet 
from the ARP'd IP out that interface (therefore you must use source based routing for this to work). In other words 
it allows control of which cards (usually 1) will respond to an arp request.

0 - (default) The kernel can respond to arp requests with addresses from other interfaces. This may seem wrong but 
it usually makes sense, because it increases the chance of successful communication. IP addresses are owned by the 
complete host on Linux, not by particular interfaces. Only for more complex setups like load- balancing, does this behaviour cause problems.
arpfilter for the interface will be enabled if at least one of conf/{all,interface}/arpfilter is set to TRUE, it will be disabled otherwise
這個參數對arp封包的源ip進行判斷決定響應行為 
和 arp 響應有關系
當arp_filter設定為0時如果從某張網卡上收到了一個arp請求同時目的ip在此主機上。
不論目的ip是否在接收到此arp請求的網卡上那麼主機便會進行響應
響應的mac位址為接收到此請求的網卡的mac位址。

當arp_filter設定為1時如果從某張網卡上收到了一個arp請求同時目的ip在此主機上
不要求目的ip是一定在接收到此arp請求的網卡上
那麼主機便會查詢到此請求的源ip的路由是通過哪張網卡
如果是接收到此arp請求的網卡則發送arp響應響應的mac位址為接收到此請求的網卡的mac位址
否者不發發送
*/
static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev)
{
    struct rtable *rt;
    int flag = 0;
    /*unsigned long now; */
    struct net *net = dev_net(dev);

    rt = ip_route_output(net, sip, tip, 0, 0);
    if (IS_ERR(rt))
        return 1;
    if (rt->dst.dev != dev) {
        __NET_INC_STATS(net, LINUX_MIB_ARPFILTER);
        flag = 1;
    }
    ip_rt_put(rt);
    return flag;
}      
/* Called when a timer expires for a neighbour entry. 
neigh_timer_handler 定時器函數
當neigh處于NUD_INCOMPLETE, NUD_DELAY, NUD_PEOBE, NUD_REACHABLE時會添加定時器,即neigh_timer_handler,它處理各個狀态在定時器到期時的情況。
當neigh處于NUD_REACHABLE狀态時,根據NUD的狀态轉移圖,它有三種轉移可能,分别對應下面三個條件語句。
neigh->confirmed代表最近收到來自對應鄰居項的封包時間,neigh->used代表最近使用該鄰居項的時間。
-如果逾時,但期間收到對方的封包,不更改狀态,并重置逾時時間為neigh->confirmed+reachable_time;
-如果逾時,期間未收到對方封包,但主機使用過該項,則遷移至NUD_DELAY狀态,并重置逾時時間為neigh->used+delay_probe_time;
-如果逾時,且既未收到對方封包,也未使用過該項,則懷疑該項可能不可用了,遷移至NUD_STALE狀态,而不是立即删除,neigh_periodic_work()會定時的清除NUD_STALE狀态的表項。

當neigh處于NUD_DELAY狀态時,根據NUD的狀态轉移圖,它有二種轉移可能,分别對應下面二個條件語句。
         -如果逾時,期間收到對方封包,遷移至NUD_REACHABLE,記錄下次檢查時間到next;
         -如果逾時,期間未收到對方的封包,遷移至NUD_PROBE,記錄下次檢查時間到next。
      在NUD_STALE->NUD_PROBE中間還插入NUD_DELAY狀态,是為了減少ARP包的數目,期望在定時時間内會收到對方的确認封包,而不必再進行位址解析


*/

static void neigh_timer_handler(unsigned long arg)
{
    unsigned long now, next;
    struct neighbour *neigh = (struct neighbour *)arg;
    unsigned int state;
    int notify = 0;

    write_lock(&neigh->lock);

    state = neigh->nud_state;
    now = jiffies;
    next = now + HZ;

    if (!(state & NUD_IN_TIMER))
        goto out;

    if (state & NUD_REACHABLE) {
    /* Called when a timer expires for a neighbour entry. 
neigh_timer_handler 定時器函數
當neigh處于NUD_INCOMPLETE, NUD_DELAY, NUD_PEOBE, NUD_REACHABLE時會添加定時器,即neigh_timer_handler,它處理各個狀态在定時器到期時的情況。
當neigh處于NUD_REACHABLE狀态時,根據NUD的狀态轉移圖,它有三種轉移可能,分别對應下面三個條件語句。
neigh->confirmed代表最近收到來自對應鄰居項的封包時間,neigh->used代表最近使用該鄰居項的時間。
-如果逾時,但期間收到對方的封包,不更改狀态,并重置逾時時間為neigh->confirmed+reachable_time;
-如果逾時,期間未收到對方封包,但主機使用過該項,則遷移至NUD_DELAY狀态,并重置逾時時間為neigh->used+delay_probe_time;
-如果逾時,且既未收到對方封包,也未使用過該項,則懷疑該項可能不可用了,遷移至NUD_STALE狀态,而不是立即删除,neigh_periodic_work()會定時的清除NUD_STALE狀态的表項。


*/
        if (time_before_eq(now,
                   neigh->confirmed + neigh->parms->reachable_time)) {
            neigh_dbg(2, "neigh %p is still alive\n", neigh);
            next = neigh->confirmed + neigh->parms->reachable_time;
        } else if (time_before_eq(now,
                      neigh->used +
                      NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
            neigh_dbg(2, "neigh %p is delayed\n", neigh);
            neigh->nud_state = NUD_DELAY;
            neigh->updated = jiffies;
            neigh_suspect(neigh);
            next = now + NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME);
        } else {
            neigh_dbg(2, "neigh %p is suspected\n", neigh);
            neigh->nud_state = NUD_STALE;
            neigh->updated = jiffies;
            neigh_suspect(neigh);
            notify = 1;
        }
    } else if (state & NUD_DELAY) {
    /*
    當neigh處于NUD_DELAY狀态時,根據NUD的狀态轉移圖,它有二種轉移可能,分别對應下面二個條件語句。
             -如果逾時,期間收到對方封包,遷移至NUD_REACHABLE,記錄下次檢查時間到next;
             -如果逾時,期間未收到對方的封包,遷移至NUD_PROBE,記錄下次檢查時間到next。
          在NUD_STALE->NUD_PROBE中間還插入NUD_DELAY狀态,是為了減少ARP包的數目,期望在定時時間内會收到對方的确認封包,而不必再進行位址解析

    */
        if (time_before_eq(now,
                   neigh->confirmed +
                   NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME))) {
            neigh_dbg(2, "neigh %p is now reachable\n", neigh);
            neigh->nud_state = NUD_REACHABLE;
            neigh->updated = jiffies;
            neigh_connect(neigh);
            notify = 1;
            next = neigh->confirmed + neigh->parms->reachable_time;
        } else {
            neigh_dbg(2, "neigh %p is probed\n", neigh);
            neigh->nud_state = NUD_PROBE;
            neigh->updated = jiffies;
            atomic_set(&neigh->probes, 0);
            notify = 1;
            next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);
        }
    } else {
        /* NUD_PROBE|NUD_INCOMPLETE 
          當neigh處于NUD_PROBE或NUD_INCOMPLETE狀态時,記錄下次檢查時間到next,
          因為這兩種狀态需要發送ARP解析封包,它們過程的遷移依賴于ARP解析的程序。*/
        next = now + NEIGH_VAR(neigh->parms, RETRANS_TIME);
    }

    if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) &&
        atomic_read(&neigh->probes) >= neigh_max_probes(neigh)) {
        /*   經過定時器逾時後的狀态轉移,如果neigh處于NUD_PROBE或NUD_INCOMPLETE,
        則會發送ARP封包,先會檢查封包發送的次數,如果超過了限度,
        表明對方主機沒有回應,則neigh進入NUD_FAILED,被釋放掉。*/
        neigh->nud_state = NUD_FAILED;
        notify = 1;
        neigh_invalidate(neigh);
        goto out;
    }
/*
實際上,neigh_timer_handler處理啟用了定時器狀态逾時的情況,
下圖反映了neigh_timer_handler中所涉及的狀态轉移,
值得注意的是NUD_DELAY -> NUD_REACHABLE的狀态轉移,
在arp_process中也提到過,收到arp reply時會有表項狀态NUD_DELAY -> NUD_REACHABLE。
它們兩者的差別在于arp_process處理的是arp的确認封包,
而neigh_timer_handler處理的是4層的确認封包。 

*/
    if (neigh->nud_state & NUD_IN_TIMER) {
        if (time_before(next, jiffies + HZ/2))
            next = jiffies + HZ/2;
        if (!mod_timer(&neigh->timer, next))
            neigh_hold(neigh);
    }
    if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
        neigh_probe(neigh);
    } else {
out:
        write_unlock(&neigh->lock);
    }

    if (notify)
        neigh_update_notify(neigh, 0);

    neigh_release(neigh);
}      

neigh_periodic_work NUD_STALE狀态的定時函數

     當neigh處于NUD_STALE狀态時,此時它等待一段時間,主機引用到它,進而轉入NUD_DELAY狀态;沒有引用,則轉入NUD_FAIL,被釋放。

不同于NUD_INCOMPLETE、NUD_DELAY、NUD_PROBE、NUD_REACHABLE狀态時的定時器,這裡使用的異步機制,通過定期觸發neigh_periodic_work()來檢查NUD_STALE狀态。

  在工作最後,再次添加該工作到隊列中,并延時1/2 base_reachable_time開始執行,這樣,完成了neigh_periodic_work工作每隔1/2 base_reachable_time執行一次。

schedule_delayed_work(&tbl->gc_work, tbl->parms.base_reachable_time >> 1);

      neigh_periodic_work定期執行,但要保證表項不會剛添加就被neigh_periodic_work清理掉,

這裡的政策是:gc_staletime大于1/2 base_reachable_time。預設的,gc_staletime = 30,base_reachable_time = 30。

也就是說,neigh_periodic_work會每15HZ執行一次,但表項在NUD_STALE的存活時間是30HZ,這樣,保證了每項在最差情況下也有(30 - 15)HZ的生命周期。

/*
 * It is random distribution in the interval (1/2)*base...(3/2)*base.
 * It corresponds to default IPv6 settings and is not overridable,
 * because it is really reasonable choice.

 當neigh_periodic_work執行時,首先計算到達時間(reachable_time),其中要注意的是
 reachable_time實際取值是1/2 base ~ 2/3 base,而base = base_reachable_time,當表項處于NUD_REACHABLE狀态時,
 會啟動一個定時器,時長為reachable_time,
 即一個表項在不被使用時存活時間是1/2 base_reachable_time ~ 2/3 base_reachable_time。
 */

unsigned long neigh_rand_reach_time(unsigned long base)
{
    return base ? (prandom_u32() % base) + (base >> 1) : 0;
}

static void neigh_periodic_work(struct work_struct *work)
{
    struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work);
    struct neighbour *n;
    struct neighbour __rcu **np;
    unsigned int i;
    struct neigh_hash_table *nht;

    NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs);

    write_lock_bh(&tbl->lock);
    nht = rcu_dereference_protected(tbl->nht,
                    lockdep_is_held(&tbl->lock));

    /*
     *    periodically recompute ReachableTime from random function
     */

    if (time_after(jiffies, tbl->last_rand + 300 * HZ)) {
        struct neigh_parms *p;
        tbl->last_rand = jiffies;
        list_for_each_entry(p, &tbl->parms_list, list)
            p->reachable_time =
                neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
    }

    if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
        goto out;

    for (i = 0 ; i < (1 << nht->hash_shift); i++) {
        np = &nht->hash_buckets[i];

        while ((n = rcu_dereference_protected(*np,
                lockdep_is_held(&tbl->lock))) != NULL) {
            unsigned int state;

            write_lock(&n->lock);

            state = n->nud_state;
            if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
                write_unlock(&n->lock);
                goto next_elt;
            }

            if (time_before(n->used, n->confirmed))
                n->used = n->confirmed;
/*
它會周遊整個鄰居表,每個hash_buckets的每個表項,
如果在gc_staletime内仍未被引用過,則會從鄰居表中清除。

*/
            if (atomic_read(&n->refcnt) == 1 &&
                (state == NUD_FAILED ||
                 time_after(jiffies, n->used + NEIGH_VAR(n->parms, GC_STALETIME)))) {
                *np = n->next;
                n->dead = 1;
                write_unlock(&n->lock);
                neigh_cleanup_and_release(n);
                continue;
            }
            write_unlock(&n->lock);

next_elt:
            np = &n->next;
        }
        /*
         * It's fine to release lock here, even if hash table
         * grows while we are preempted.
         */
        write_unlock_bh(&tbl->lock);
        cond_resched();
        write_lock_bh(&tbl->lock);
        nht = rcu_dereference_protected(tbl->nht,
                        lockdep_is_held(&tbl->lock));
    }
out:
    /* Cycle through all hash buckets every BASE_REACHABLE_TIME/2 ticks.
     * ARP entry timeouts range from 1/2 BASE_REACHABLE_TIME to 3/2
     * BASE_REACHABLE_TIME.
     */
    queue_delayed_work(system_power_efficient_wq, &tbl->gc_work,
                  NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME) >> 1);
    write_unlock_bh(&tbl->lock);
}      
鄰居子系統 arp 輸入
arp_announce - INTEGER
    Define different restriction levels for announcing the local
    source IP address from IP packets in ARP requests sent on
    interface:
    0 - (default) Use any local address, configured on any interface
    1 - Try to avoid local addresses that are not in the target's
    subnet for this interface. This mode is useful when target
    hosts reachable via this interface require the source IP
    address in ARP requests to be part of their logical network
    configured on the receiving interface. When we generate the
    request we will check all our subnets that include the
    target IP and will preserve the source address if it is from
    such subnet. If there is no such subnet we select source
    address according to the rules for level 2.
    2 - Always use the best local address for this target.
    In this mode we ignore the source address in the IP packet
    and try to select local address that we prefer for talks with
    the target host. Such local address is selected by looking
    for primary IP addresses on all our subnets on the outgoing
    interface that include the target IP address. If no suitable
    local address is found we select the first local address
    we have on the outgoing interface or on all other interfaces,
    with the hope we will receive reply for our request and
    even sometimes no matter the source IP address we announce.

    The max value from conf/{all,interface}/arp_announce is used.

    Increasing the restriction level gives more chance for
    receiving answer from the resolved target while decreasing
    the level announces more valid sender's information.

arp_ignore - INTEGER
    Define different modes for sending replies in response to
    received ARP requests that resolve local target IP addresses:
    0 - (default): reply for any local target IP address, configured
    on any interface
    1 - reply only if the target IP address is local address
    configured on the incoming interface
    2 - reply only if the target IP address is local address
    configured on the incoming interface and both with the
    sender's IP address are part from same subnet on this interface
    3 - do not reply for local addresses configured with scope host,
    only resolutions for global and link addresses are replied
    4-7 - reserved
    8 - do not reply for all local addresses

    The max value from conf/{all,interface}/arp_ignore is used
    when ARP request is received on the {interface}
    
arp_accept - BOOLEAN
    Define behavior for gratuitous ARP frames who's IP is not
    already present in the ARP table:
    0 - don't create new entries in the ARP table
    1 - create new entries in the ARP table

    Both replies and requests type gratuitous arp will trigger the
    ARP table to be updated, if this setting is on.

    If the ARP table already contains the IP address of the
    gratuitous arp frame, the arp table will be updated regardless
    if this setting is on or off.