
如圖是資料中心的一個基本架構
最上層是Internet Edge,也叫Edge Router,也叫Border Router,它提供資料中心與Internet的連接配接。
連接配接多個網絡供應商來提供備援可靠的連接配接
對外通過BGP提供路由服務,使得外部可以通路内部的IP
對内通過iBGP提供路由服務,使得内部可以通路外部IP
提供邊界安全控制,使得外部不能随意通路内部
控制内部對外部的通路
為了HA的需要,往往會有兩個Border Router
Typical enterprise Internet connectivity design:
Two edge routers, IR1 and IR2, provide direct connectivity to the Internet.
a pair of firewalls provides stateful inspection capabilities and access to both the internal network and the demilitarized zone (DMZ).
The DMZ contains public-facing services such as web; this is the only network accessible directly from the public Internet.
The internal network should never be accessed directly by the Internet,
but traffic sourced from the internal network must be able to reach Internet sites.
在Border Router上往往設定ACL對通路進行控制
Border Router ACLs:
Special-use address and anti-spoofing entries that deny illegitimate sources and packets with source addresses that belong within your network from entering the network from an external source RFC 1918 defines reserved address space that is not a valid source address on the Internet.
RFC 3330 defines special-use addresses that might require filtering.
RFC 2827 provides anti-spoofing guidelines.
Explicitly permitted return traffic for internal connections to the Internet
Explicitly permitted externally sourced traffic destined to protected internal addresses
Explicit deny statement
ACL的一個例子
------------------------------------------------------------------------------------------------
!--- Add anti-spoofing entries.
!--- Deny special-use address sources.
!--- Refer to RFC 3330 for additional special use addresses.
access-list 110 deny ip 127.0.0.0 0.255.255.255 any
access-list 110 deny ip 192.0.2.0 0.0.0.255 any
access-list 110 deny ip 224.0.0.0 31.255.255.255 any
access-list 110 deny ip host 255.255.255.255 any
!--- The deny statement should not be configured
!--- on Dynamic Host Configuration Protocol (DHCP) relays.
access-list 110 deny ip host 0.0.0.0 any
!--- Filter RFC 1918 space.
access-list 110 deny ip 10.0.0.0 0.255.255.255 any
access-list 110 deny ip 172.16.0.0 0.15.255.255 any
access-list 110 deny ip 192.168.0.0 0.0.255.255 any
!--- Permit Border Gateway Protocol (BGP) to the edge router.
access-list 110 permit tcp host bgp_peer gt 1023 host router_ip eq bgp
access-list 110 permit tcp host bgp_peer eq bgp host router_ip gt 1023
!--- Deny your space as source (as noted in RFC 2827).
access-list 110 deny ip your Internet-routable subnet any
!--- Explicitly permit return traffic.
!--- Allow specific ICMP types.
access-list 110 permit icmp any any echo-reply
access-list 110 permit icmp any any unreachable
access-list 110 permit icmp any any time-exceeded
access-list 110 deny icmp any any
!--- These are outgoing DNS queries.
access-list 110 permit udp any eq 53 host primary DNS server gt 1023
!--- Permit older DNS queries and replies to primary DNS server.
access-list 110 permit udp any eq 53 host primary DNS server eq 53
!--- Permit legitimate business traffic.
access-list 110 permit tcp any Internet-routable subnet established
access-list 110 permit udp any range 1 1023 Internet-routable subnet gt 1023
!--- Allow ftp data connections.
access-list 110 permit tcp any eq 20 Internet-routable subnet gt 1023
!--- Allow tftp data and multimedia connections.
access-list 110 permit udp any gt 1023 Internet-routable subnet gt 1023
!--- Explicitly permit externally sourced traffic.
!--- These are incoming DNS queries.
access-list 110 permit udp any gt 1023 host <primary DNS server> eq 53
!-- These are zone transfer DNS queries to primary DNS server.
access-list 110 permit tcp host secondary DNS server gt 1023 host primary DNS server eq 53
!--- Permit older DNS zone transfers.
access-list 110 permit tcp host secondary DNS server eq 53 host primary DNS server eq 53
!--- Deny all other DNS traffic.
access-list 110 deny udp any any eq 53
access-list 110 deny tcp any any eq 53
!--- Allow IPSec VPN traffic.
access-list 110 permit udp any host IPSec headend device eq 500
access-list 110 permit udp any host IPSec headend device eq 4500
access-list 110 permit 50 any host IPSec headend device
access-list 110 permit 51 any host IPSec headend device
access-list 110 deny ip any host IPSec headend device
!--- These are Internet-sourced connections to
!--- publicly accessible servers.
access-list 110 permit tcp any host public web server eq 80
access-list 110 permit tcp any host public web server eq 443
access-list 110 permit tcp any host public FTP server eq 21
!--- Data connections to the FTP server are allowed
!--- by the permit established ACE.
!--- Allow PASV data connections to the FTP server.
access-list 110 permit tcp any gt 1023 host public FTP server gt 1023
access-list 110 permit tcp any host public SMTP server eq 25
!--- Explicitly deny all other traffic.
access-list 101 deny ip any any
第二層core network,包含很多的core switches
Available Zone同Edge router之間通信
Available Zone之間的通信提供
提供高可用性連接配接HA
提供Intrusion Prevention Services
提供Distributed Denial of Service Attack Analysis and Mitigation
提供Tier 1 Load Balancer
為了HA,一般會建立兩個core network,兩個core network通過vlan互相隔離,互不幹擾,每個core network都能夠連接配接到兩個border router和所有的Available Zone。
core network裡面的switch都是強大的switch,為了提供高可用性,然而又不需要STP,則多個switch之間Link Aggregation
Link aggregation allows you to bond multiple parallel links into a single virtual link (from the STP perspective).
With parallel links being replaced by a single link, STP detects no loops and all the physical links can be fully utilized.
Traditional LA : port channel, Etherchannel, link bonding or multi-link trunking
Traditional Link Aggregation
A port channel bundles up to eight individual interfaces into a group to provide increased bandwidth and redundancy.
Port channeling also load balances traffic across these physical interfaces.
You create a port channel by bundling compatible interfaces.
You can configure and run either static port channels or ports channels running the Link Aggregation Control Protocol (LACP).
LACP (Link Aggregation Control Protocol)
individual links can be combined into LACP port channels and channel groups Static LACP : creation of channel groups and addition of ports are manually configured. LACP is to determine the ports are selected or standby
Dynamic LACP : all above are negotiated via LACPDU between both sides
目前有些進階的switch,本身就提供cluster的功能,使得多個switch通過本身定義的協定,以及硬體的支援,形成一個cluster,即能滿足HA,也能提高帶寬。
HA是這樣實作的
如果想進一步擴大帶寬,還可以兩個switch cluster之間通過LACP進行link aggregation,這種方式稱為Multi-Chassis Link Aggregation
In Multichassis EtherChannel (MCEC), the DHD is dual-homed to two upstream PoAs(points of attachment). The DHD is incapable of running any loop prevention control protocol such as Multiple Spanning Tree (MST).
One method is to place the DHD's uplinks in a LAG, commonly referred to as EtherChannel. (LACP enabled).
LACP is a link-level control protocol that allows the dynamic negotiation and establishment of LAGs.
Multichassis LACP: An extension of the LACP implementation to PoAs is required to convey to a DHD that it is connected to a single virtual
LACP peer and not to two disjointed devices.
如下圖就是L2 core network的架構,其中紅色和綠色表示不同的vlan
接下來是一個個Available Zone,或者稱為Data Center LAN
第三層也即每個AZ的最上層,我們稱為Aggregation layer
在這一層上,是aggregation router或者三層的aggregation switches
同樣會有IDS/IPS
會有Tier 2 Load Balancer
Aggregation Layer是一個AZ的對外入口,上接L2 Core Network。
Border Router和Aggregation Router是通過L2 Core Network連接配接在一起的,是一個大二層連接配接。
這兩層router之間需要通過路由協定,使得Aggregation router可以得知border router的路由,進而AZ裡面的機器可以通路外網,也使得border router可以得知Aggregation router的路由,進而外網可以通路AZ内部的public IP
Routing Algorithm
Nonadaptive algorithms/static routing do not base their routing decisions on any measurements or estimates of the current topology and traffic
computed in advance, offline, and downloaded to the routers when the network is booted
Adaptive algorithms/dynamic routing change their routing decisions to reflect changes in the topology and the traffic
distance vector routing, RIP (Routing Information Protocol)
link state routing, OSPF (Open Shortest Path First)
Distance Vector Routing
Each router maintains a routing table one entry for each router in the network
each entry has two parts: preferred outgoing line for that destination
estimate of the distance to that destination
The router is assumed to know the ‘‘distance’’ to each of its neighbors.
Once every T msec, each router sends to each neighbor a list of its estimated distances to each destination. It also receives a similar list from each neighbor.
if neighbor X says that its estimate of distance to router I is Xi, and my distance to X is M, so my distance to router I is Xi + M
the old routing table is not used in the calculation.
Convergence(收斂): After a number of rounds, The routes to best paths across the network will be settled.
It reacts rapidly to good news, but leisurely to bad news.
Link State Routing
1.Discover its neighbors and learn their network addresses.
•When a router is booted, its first task is to learn who its neighbors are.
•It sends a special HELLO packet on each point-to-point line.
•The router on the other end will send back a reply giving its name (globally unique)
2.Set the distance or cost metric to each of its neighbors.
•To determine the delay:
–send over the line a special ECHO packet that the other side is required to send back immediately.
–By measuring the round-trip time and dividing it by two
3.Construct a packet telling all it has just learned.
•The packet contains:
–identity of the sender
–a sequence number and age
–a list of neighbors and the cost to each neighbor.
4.Send this packet to and receive packets from all other routers.
•Basic Algorithm
–Each packet contains a sequence number incremented for each new packet sent.
–Routers keep track of all the (source router, sequence) pairs they see.
–When a new link state packet comes in, it is checked against the list of packets already seen.
–If it is new, it is forwarded on all lines except the one it arrived on.
–If it is a duplicate, it is discarded.
–If a packet with a sequence number lower than the highest one seen so far ever arrives, it is rejected as being obsolete
•Problems:
–if a router ever crashes, it will lose track of its sequence number. If it starts again at 0, the next packet it sends will be rejected as a duplicate.
–if a sequence number is ever corrupted and 65,540 is received instead of 4 (a 1-bit error), packets 5 through 65,540 will be rejected as obsolete
•Refine 1: for problem 1
–include the age of each packet after the sequence number
–decrement it once per second.
–When the age hits zero, the information from that router is discarded.
•if a router crashes, and seq. starts from 0, all these packages are discarded until the entry in the list times out. then the new package can come in.
•Refine 2: for problem 2
–When a link state packet comes in to a router for flooding, it is not queued for transmission immediately.
–Instead, it is put in a holding area to wait a short while in case more links are coming up or going down.
–To guard against errors on the links, all link state packets are acknowledged.
The packet buffer for router B
•Each row here corresponds to a recently arrived, but as yet not fully processed, link state packet.
•The send flags mean that the packet must be sent on the indicated link.
•The acknowledgement flags mean that it must be acknowledged there.
5.Compute the shortest path to every other router.
•Once a router has accumulated a full set of link state packets, it can construct the entire network graph
•Dijkstra’s algorithm can be run locally to construct the shortest paths to all possible destinations.
•link state routing requires more memory and computation.
–For a network with n routers, each of which has k neighbors
–the memory required to store the input data is proportional to kn
–the computation time grows faster than kn
Hierarchical Routing
•The routers are divided into regions.
•Each router knows all the details about how to route packets to destinations within its own region but knows nothing about the internal structure of other regions.
OSPF—An Interior Gateway Routing Protocol
•requirements
–Open
–support a variety of distance metrics (physical distance, delay)
–dynamic algorithm
–support routing based on type of service (not used in IP header)
–load balancing, splitting the load over multiple lines. (do not sent all packets over a single best route)
–support for hierarchical systems
–Security : prevent routers from sending false routing information
–Can deal with routers that were connected to the Internet via a tunnel.
–OSPF supports both point-to-point links (e.g., SONET) and broadcast networks (e.g., most LANs).
–support networks with multiple routers, each of which can communicate directly with the others (called multi-access networks)
1. an autonomous system network
2. abstract actual networks, routers, and links into a directed graph
3. Use the link state method to have every router compute the shortest path from itself to all other nodes.
•Multiple paths may be found that are equally short.
•OSPF remembers the set of shortest paths and during packet forwarding, traffic is split across them.
•This helps to balance load. It is called ECMP (Equal Cost MultiPath).
4. OSPF allows an AS to be divided into numbered areas
•Internal Routers : Routers that lie wholly within an area.
•Backbone Routers:
–Every AS has a backbone area, called area 0.
–The routers in this area are called backbone routers.
–All areas are connected to the backbone
•Area border router:
–Each router that is connected to two or more areas. It must also be part of the backbone.
–Its job is to summarize the destinations in one area and to inject this summary into the other areas
–Passing cost information allows hosts in other areas to find the best area border router to use to enter an area.
–Not passing topology information reduces traffic and simplifies the shortest-path computations
•AS boundary router:
–It injects routes to external destinations on other ASes into the area.
–The external routes then appear as destinations that can be reached via the AS boundary router with some cost.
–An external route can be injected at one or more AS boundary routers.
OSPF All in One
•Using flooding, each router informs all the other routers in its area of its links to other routers and networks and the cost of these links.
•This information allows each router to construct the graph for its area(s) and compute the shortest paths.
•The backbone area does this work, too.
•The backbone routers accept information from the area border routers in order to compute the best route from each backbone router to every other router.
•This information is propagated back to the area border routers, which advertise it within their areas.
•Using this information, internal routers can select the best route to a destination outside their area, including the best exit router to the backbone.
BGP—The Exterior Gateway Routing Protocol
•The goals of an intradomain protocol and an interdomain protocol are not the same.
–All an intradomain protocol is to move packets as efficiently as possible from the source to the destination.
–In contrast, interdomain routing protocols have to worry about politics
•A routing policy is implemented by deciding what traffic can flow over which of the links between ASes.
BGP – Transit Service
•The customer ISP can buy transit service from the provider ISP.
•Provider ISP
–It should advertise routes to all destinations on the Internet to the customer over the link that connects them
–so that the customer will have a route to use to send packets anywhere.
•Customer ISP
–the customer should advertise routes only to the destinations on its network to the provider.
–This will let the provider send traffic to the customer only for those addresses
–the customer does not want to handle traffic intended for other destinations.
BGP – Peering
•Suppose that AS2 and AS3 exchange a lot of traffic. and their networks are connected already, they can send traffic directly to each other for free.
•This policy is called peering.
•To implement peering, two ASes send routing advertisements to each other for the addresses that reside in their networks.
•AS2 can send AS3 packets from A destined to B and vice versa.
•Peering is not transitive.
–AS3 and AS4 also peer with each other.
–This peering allows traffic from C destined for B to be sent directly to AS4
–if C sends a packet to A, traffic will not pass from AS4 to AS3 to AS2, even though a physical path exists.
–Because AS3 is not paid, so it do not want to do so, it is AS1 who will carry the packet from C to A
BGP – Multi-homing
•A is a stub network that is connected to the rest of the Internet by only one link, so it do not need to run BGP
•some company networks are connected to multiple ISPs, named multi-homing, they should run an interdomain routing protocol (e.g., BGP) to tell other ASes which addresses should be reached via which ISP links.
How BGP advertise routes
•Path vector protocol
•The path consists of the next hop router and the sequence of ASes, or AS path
•Pairs of BGP routers communicate with each other by establishing TCP connections.
•Carrying the complete path with the route makes it easy for the receiving router to detect and break routing loops.
–When a router receives a route, it checks to see if its own AS number is already in the AS path.
–If it is, a loop has been detected and the advertisement is discarded.
•BGP dose not tell the differences between difference ASes
iBGP
•So far we have seen how a route advertisement is sent across the link between two ISPs.
•We still need some way to propagate BGP routes from one side of the ISP to the other, so they can be sent on to the next ISP.
•This task could be handled by the intradomain protocol (IGP), but because BGP is very good at scaling to large networks, a variant of BGP is often used.
•It is called iBGP (internal BGP) to distinguish it from the regular use of BGP as eBGP (external BGP).
iBGP rules
•Each BGP router may learn a route for a given destination from the router it is connected to in the next ISP and from all of the other boundary routers
•Each router must decide which route in this set of routes is the best one to use.
iBGP strategies
•routes via peered networks are chosen in preference to routes via transit providers.
•the default rule that shorter AS paths are better (many small AS vs. a large AS)
•prefer the route that has the lowest cost within the ISP. (early exit or hot-potato routing)
Aggregation Layer的switches往往也是cluster的,并通過multi-chassis LACP同access layer的switch相連。
第四層是access layer
就是一個個機架的伺服器,用TOR連接配接在一起
Top of Rack (TOR) vs. End of Row (EOR)
第五層稱為storage layer
很多資料中心會為存儲系統部署單獨的網絡
通過iSCSI或者Fibre Channel連接配接SAN,将block storage attached到機器上。