Page Navigation by

Sabtu, 07 April 2012

Linux Network Troubleshooting

So you've installed Linux, but can't get your browser to see the outside world. Or everything was working, but all of a sudden, it's not. However do you track down the problem?
Interface Configuration
I always start by checking to see that the machine's interfaces are up and have an IP address. To do this, use the ifconfig command:
[root@sleipnir root]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:6E:0A:3D:26
         inet addr:  Bcast:  Mask:
         RX packets:13647 errors:0 dropped:0 overruns:0 frame:0
         TX packets:12020 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:100
         RX bytes:7513605 (7.1 Mb)  TX bytes:1535512 (1.4 Mb)

lo        Link encap:Local Loopback
         inet addr:  Mask:
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:8744 errors:0 dropped:0 overruns:0 frame:0
         TX packets:8744 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:892258 (871.3 Kb)  TX bytes:892258 (871.3 Kb)

Here, you can see that this machine is running normally. The first line of output tells us that the Ethernet interface eth0 has a layer 2 (MAC or hardware) address of 00:0C:6E:0A:3D:26. This confirms that the device driver is able to connect to the card, as it has read the Ethernet address burned into the network card's ROM. The next line tells us that the interface has an IP address of, and the subnet mask and broadcast address are consistent with the machine being on network
The next line specifies the flags on the interface; this one is up and running (UP), has had the broadcast address set (BROADCAST), has reosources allocated (RUNNING) and supports multicasting (MULTICAST). The MTU (maximum transmission unit) size is 1500 bytes, which is correct for DEC/Intel/Xerox Ethernet II, and the metric, or cost of using the interface is the default and minimum value of 1. For full details of the flags see man 7 netdevice on your system.
The next couple of lines confirm that: On the receive side, 13,647 frames (that's what we call Ethernet packets) have been received, with no errors of any kind. And 12,020 frames have been sent, also without errors. There have been no collisions - there couldn't be, because this interface is connected to a 100 Mbps switch, not a hub, and the transmit queue length is 100
Networking Not Configured During Installation
This usually means that the installer program was not able to probe and identify your network card. Generally, this is because the network card is too old (many modern installers don't expect to see old-fashioned 10Base-2 coax network cards like the 3Com Etherlink II, for example) or is too new, so that there is no support for it in the kernel supplied as part of your distribution. It can also mean that the card needs to be manually configured - this often happens with ISA bus NE-2000 cards, EtherExpress Pro's and the like.
First - if you know the type of card, take a look at /lib/modules/kernelver/kernel/drivers/net and see if you can identify the required driver. The 3Com Etherlink II, for example, uses the 3c503.o module, so I can attempt to load it with the command
modprobe 3c503

As usual, no messages indicates success. If you see something like:
[root@sleipnir net]# modprobe 3c503
/lib/modules/2.4.20-31.9/kernel/drivers/net/3c503.o: init_module: No such device or address
Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters.

then either it is the wrong driver module, or you need to pass some parameters for the driver. One of our machines here, for example, has an Intel EtherExpress Pro, and its driver module has to be loaded with the command line:
[root@baldur root]# modprobe eepro irq=5 io=0x300

This specifies the IRQ line and I/O port address the card is using.
If you're completely stuck and desparate, one trick that might help is to try to load all network card drivers in turn and see what sticks, with the command:
[root@baldur root]# modprobe -t net \*

Expect to see lots of error messages fly by as drivers load and fail to initialize the right hardware. However, after it has done, do an lsmod command, and see if any network card driver module has "stuck".
If your network card is new or otherwise not supported by the kernel, then you'll have to obtain a driver from the manufacturer's web site. I recently worked with some machines that used an on-board Intel EtherExpress Gigabit Ethernet chipset, for example. We had to use Red Hat 9 on the machines, but the Anaconda installer simply skipped network configuration as it didn't "recognise" the interface. Although RH 9 has an e1000.o module, it obviously wasn't the right one for this interface, so I had to visit the Intel web site and download the driver source code, compile it and install it manually. The good news is that Intel supports easy building of an RPM package, and once that has been done and copied to floppy, it's a snap to install on the other machines.
Once you've identified (and installed) the correct driver module, you can make it permanent by loading it from with /etc/modules.conf (2.4 kernels) or /etc/modprobe.conf (2.6 kernels). For example, the syntax to load theeepro.o module with the parameters above looks like this:
alias eth0 eepro
options eepro irq=5 io=0x300

You should now be able to set up an IP address on the interface as described in the following sections.
Incorrect Address, Broadcast Address, Subnet Mask
Problems with an interface showing the wrong IP address, broadcast address or subnet mask are usually down to an incorrect entry in the configuration files. You can fix the problem by reconfiguring an interface on the fly with the ifconfig command. For example, to set an interface to the address
ifconfig eth0

will do the trick. This assumes that your network is Class-C sized (/24), which it probably is, and so it sets the subnet mask to, the network address to and the broadcast address to However, you can over-ride these settings; for example, I use four /24 subnets - in other words, a /22 - so I could configure the interface like this:
ifconfig eth0 netmask

However, this is only temporary - it will remain in effect until the machine is shut down or the interface is reconfigured. In order to configure the interface on power-up, the system runs startup scripts which will read the configuration from a file somewhere. On systems that use System V-style init scripts, this is often a file in /etc/sysconfig/network-scripts, e.g. /etc/sysconfig/network-scripts/ifcfg-eth0. Here's an example from one of my systems:
[root@sleipnir root]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

You can see that this sets the IP address and subnet mask (the braodcast and network addresses are implicitly derived from these). It also sets the default route on this interface. If you are running Slackware, you will need to edit/etc/rc.d/rc.inet1 in order to set the interface parameters.
Configuration by DHCP
If your network interface is configured by a DHCP server, then the configuration script above will not specify the IP address, etc. instead, it will say something like this:

With BOOTPROTO set to dhcp, the startup scripts will run a DHCP client program to retrieve the interface configuration from the DHCP server. There are several different DHCP clients that have been used over the years and by different distributions. If your interface has not come up correctly, you can attempt to obtain a DHCP lease with the following command:
[root@sleipnir root]# dhclient eth0
Internet Software Consortium DHCP Client V3.0pl1
Copyright 1995-2001 Internet Software Consortium.
All rights reserved.
For info, please visit

Listening on LPF/eth0/00:0c:6e:0a:3d:26
Sending on   LPF/eth0/00:0c:6e:0a:3d:26
Sending on   Socket/fallback
DHCPREQUEST on eth0 to port 67
bound to -- renewal in 4215 seconds.

If the dhclient command is not found, then you should try "pump -i eth0" and "dhcpcd eth0". If one of these commands work, and you get an IP address, then you should set about making the DHCP configuration permanent, using your distribution's configuration tools or by directly editing the network startup scripts.
Network Reachability (arp, ping)
Once you've got an IP address on an interface, the next step is to test whether you can reach remote systems. I always start by pinging the system's local interface, then the local router and then some distant systems. For example:
[root@sleipnir root]# ifconfig|grep inet
         inet addr:  Bcast:  Mask:
         inet addr:  Mask:
[root@sleipnir root]# # Ping the local interface first
[root@sleipnir root]# ping -c 2
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.041 ms
64 bytes from icmp_seq=2 ttl=64 time=0.042 ms

--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.041/0.041/0.042/0.006 ms
[root@sleipnir root]# # Now ping the upstream router
[root@sleipnir root]# ping -c 2
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.368 ms
64 bytes from icmp_seq=2 ttl=64 time=0.282 ms

--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.282/0.325/0.368/0.043 ms
[root@sleipnir root]# # Finally, ping a remote host
[root@sleipnir root]# ping -c 2
PING ( 56(84) bytes of data.

--- ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

The ping command on UNIX systems defaults to sending a packet every second until you press Ctrl+C to interrupt it. Using the "-c 2" option makes it send two packets and then stop. In this example, you can see that the attempt to ping a remote system has failed, with 100% packet loss. What's going on here?
Don't assume too much from the ping command. It only indicates that a destination host is up and running, and that it is reachable. It's possible that the machine's OS might have crashed, but still be responding to interrupts, or that the particular daemon or network service that you want to use has crashed or is unable to respond. It might also happen that the wrong host replies - I've seen that happen when someone accidentally sets a machine to the wrong IP address. And of course, the lack of a reply doesn't necessarily indicate a problem, ether. In the example above, the destination machine has firewall rules and will not respond to ICMP echo requests (pings) for security reasons, even though it's up and running. Make sure that you try a few external hosts, ideally ones that you know will respond to ICMP echo requests.
If you do not get back a reply from the local interface, something is really very wrong - the steps listed above using the ifconfig command should have given you a working local interface and you should go back and double-check them.
If you do not get back a reply from the upstream router, try some other machines on the local network. If you still do not get a response, then use the arp command to see whether your machine was able to work out their Ethernet (MAC) addresses
Here's what happens when you've got a machine disconnected or down:
[root@sleipnir root]# ping baldur
PING ( 56(84) bytes of data.

--- ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1012ms

[root@sleipnir root]# arp -a ( at <incomplete> on eth0 ( at 00:10:A4:82:99:C1 [ether] on eth0 ( at 00:10:B5:39:E4:18 [ether] on eth0 ( at 00:40:F4:3C:85:6A [ether] on eth0

When I gave the "ping baldur" command, my machine looked up baldur's IP address correctly (more on that shortly) and, since both machines are on the same network, it referred to its ARP cache to get the corresponding Ethernet address. It then sent out the ICMP echo request datagrams, but got no replies (in this case because the machine had been unplugged while we did some work on the nearby cabling). It also purged the Ethernet address out of its cache, as shown by the arp -a command.
If baldur's Ethernet address hadn't been in the ARP cache in the first place, we would have seen this:
[root@sleipnir root]# ping -c 2 baldur
PING ( 56(84) bytes of data.
From ( icmp_seq=1 Destination Host Unreachable
From ( icmp_seq=2 Destination Host Unreachable

--- ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1013ms
, pipe 2
[root@sleipnir root]# arp -a ( at <incomplete> on eth0 ( at 00:10:A4:82:99:C1 [ether] on eth0 ( at 00:10:B5:39:E4:18 [ether] on eth0 ( at 00:40:F4:3C:85:6A [ether] on eth0

The "Destination Host Unreachable" messages are a clue that this machine could not even send the ICMP datagrams in the first place.
Routing (route add, route, netstat, traceroute)
If you can ping machines on the local network but cannot get to all or some destinations on remote networks, then the problem could be in the routing table of your machine or an upstream router. You can add and remove routes with the route add and route delete commands, and display them with route:
[root@dvalin root]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
artemis.rr.lesb *      UH    0      0        0 ppp0  *      U     0      0        0 eth1   *        U     0      0        0 wlan0   *        U     0      0        0 eth0       *            U     0      0        0 lo
default         midgard.lesbell         UG    0      0        0 eth1

This is a more complex routing table than the average - it's for a firewall machine that has two Ethernet interfaces (eth0 and eth1), Wi-Fi interface (wlan0) and a PPP connection to a dial-up host (ppp0). Here you can see the route to artemis (the dial-in host) has the H flag set, indicating this is a host route (i.e. a route to a single host rather than a network, as the netmask value of confirms). There's an entry for the DMZ network on eth1 (that's, an entry for, which is the wireless network and one for, the private Ethernet LAN. You can also see the entry for the loopback interface, lo, and finally the default entry, which passes datagrams for all other destinations to the external router, midgard. This entry has the G flag turned on, indicating it routes via a gateway - naturally, since that's what midgard is.
You can diagnose routing problems by using the traceroute command. This will show the routers through which your datagrams will pass on the way to the destination. However, as for ping, don't assume too much - it's possible that routers along the path may be configured not to respond to traceroute. For example:
[root@sleipnir root]# traceroute
traceroute to (, 30 hops max, 38 byte packets
1  dvalin (  0.705 ms  0.289 ms  0.391 ms
2 (  1.829 ms  1.553 ms  1.673 ms
3 (  18.125 ms  17.887 ms  19.615 ms
4 (  20.833 ms  20.587 ms  19.583 ms
5 (  21.840 ms  21.336 ms  20.195 ms
6  * * *
7  * * *

[root@sleipnir root]#

Each line shows the next router along the path to the destination, along with the round-trip-times for three queries and responses. You can see the firewall and upstream router in my office, then a router at Telstra's Kent Street exchange, then the gigabit Ethernet backbone there, then a link to austra426 - and then everything goes quiet. The three asterisks indicate a timeout, and after a couple of these, I pressed Ctrl+C to stop traceroute. This doesn't mean that I can't get to the PC User website - just that a firewall along the way is not responding to the UDP datagrams sent by traceroute. In any case, I can see that the link from my office into Telstra's network is working OK.
When tracerouting to distant hosts, you can often see the round-trip-time jump as you cross the trans-Pacific link. For example:
[root@sleipnir root]# traceroute
traceroute: Warning: has multiple addresses; using
traceroute to (, 30 hops max, 38 byte packets
1  dvalin (  0.739 ms  0.364 ms  0.265 ms
2 (  1.890 ms  1.315 ms  1.292 ms
3 (  17.909 ms  19.071 ms  19.581 ms
4 (  31.446 ms  208.883 ms  171.406 ms
5 (  20.327 ms  20.912 ms  18.271 ms
6 (  20.309 ms  20.360 ms  19.721 ms
7 (  169.611 ms  170.647 ms  168.839 ms
8 (  191.602 ms  188.565 ms  177.935 ms
9 (  201.521 ms  200.113 ms  202.050 ms
10 (  201.732 ms  201.403 ms  201.674 ms
11 (  228.770 ms  227.226 ms  229.076 ms
12 (  228.021 ms  228.383 ms  237.262 ms
13 (  226.879 ms  228.772 ms  230.193 ms

[root@sleipnir root]#

Notice how the RTT jumps from around 20 ms to 170 ms as it crosses the Pacific. By the way, there are fancy graphical traceroute programs - for example, xtraceroute - that draw a world map to show where your packets are going, but they're really not much help for troubleshooting.
DNS Configuration (host, dig)
A common problem that afflicts users on dial-up connections, and sometimes those with DHCP-allocated IP addresses on broadband, is an inability to resolve names into IP addresses. Symptoms include popup error messages from your browser, such as "hostname could not be found. Please check the name and try again" or command-line error messages like "Temporary failure in name resolution".
Your machine will turn names into IP addresses by asking a Domain Name Server, and the IP addresses of one or more DNS's will be set up in the file /etc/resolv.conf. It should look something like this:

In this example, there are two domain name servers listed, but you can have between one and three (by default) entries. If you are seeing errrors in resolving hostnames, first check that this file exists, and then try pinging the DNS's listed.
If the file does not exist, you can manually create it, using the IP addresses of your own DNS's or the DNS's provided by your ISP. However, if your system is configured by DHCP - e.g. you are using a consumer-grade ADSL or cable modem connection - or is configured by a dial-up PPP connection, then the file is normally created or overwritten when the connection is set up. If this is not happening, you will need to check the documentation for your DHCP client program or PPP configuration to try to determine what the problem is. In general, for DHCP you should check your ifcfg-eth0 file for the presence of a "PEERDNS=yes" statement, while for a PPP connection, check /etc/ppp/options for a "usepeerdns" statement.
You can check whether your system is able to connect to a DNS and resolve a hostname into an IP address with the host command:
[les@sleipnir les]$ host fulbert has address

If you need to get down to low-level debugging of DNS lookups, then you really should learn the nuances of the dig command. This will let you query any name server for all kinds of information the DNS service can provide:
[les@sleipnir les]$ dig

; <<>> DiG 9.2.1 <<>>
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17821
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

;             IN      A

;; ANSWER SECTION:      36417   IN      A

;; AUTHORITY SECTION:          36417   IN      NS          36417   IN      NS          36417   IN      NS

;; ADDITIONAL SECTION:        104627  IN      A

;; Query time: 3 msec
;; WHEN: Wed Nov 10 14:17:27 2004
;; MSG SIZE  rcvd: 152

As you can see, dig will provide a lot of data, including details of the query it sent, the answer it received and how it got the information. Another common use of dig is for turning IP addresses into host names by doing a reverse lookup:
[les@sleipnir les]$ dig -x

; <<>> DiG 9.2.1 <<>> -x
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41638
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

;    IN      PTR


;; AUTHORITY SECTION: 86400  IN      NS 86400  IN      NS 86400  IN      NS

;; ADDITIONAL SECTION:        104486  IN      A

;; Query time: 2087 msec
;; WHEN: Wed Nov 10 14:19:48 2004
;; MSG SIZE  rcvd: 173

Common Errors
Firewall in the way
Quite often, everything is configured fine at your end, but between you and the system you are trying to connect to, there is a firewall. If it's not your firewall, there's not a lot you can do about that - although Linux boasts an interesting armoury of tools for getting past firewalls, their use is beyond the scope of this article. If you're trying to connect to your own server, check that you have added a firewall rule to allow access to the appropriate protocol and port number - you can use the iptables -L command to dump your firewall rules or edit your firewall configuration file.
Daemon not listening
If you can ping a machine, but canot connect to a specific service, and you've eliminated firewall rules as a problem, then check that the daemon you are trying to connect to is in fact running. Use the ps aux command to list running processes, and/or use the netstat -pat and netstat -pau commands to list processes that are listening on TCP and UDP sockets, respectively. If you don't see the daemon you need, then start it, with a command like service httpd start or apachectl start, depending on your Linux distribution.
Remember, packets have to flow both ways!
When setting up routing tables in more complex internetworks, a common failing is to think about how datagrams get routed away from where you are sitting - but don't forget that replies have to be routed back again! This may require your host or subnet to be added to the routing table of "upstream" routers and firewalls.
Linux network configuration offers lots of options, but most distributions provide grapical configuration tools to keep things relatively simple. When things don't work out, especially for more complex configurations like servers, routers and firewalls, there's a huge selection of diagnostic tools that you can use to sort the problem out.
Hint Sidebar
Most Linux network-related commands will try to display hostnames rather than IP addresses, and they do this by doing a reverse DNS lookup. However, if you're using these commands because you've got some network problem, there's a good chance that your DNS is unreachable, and this will make the commands run very slowly as they try to do the reverse lookups and time out. However, these commands mostly accept the -n (Numeric) option, which will display addresses and dispense with the reverse DNS lookup so that things run a lot faster. See the difference:
[root@sleipnir root]# ping -nc2
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.127 ms
64 bytes from icmp_seq=2 ttl=64 time=0.285 ms

--- ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.127/0.206/0.285/0.079 ms

References and Further Reading
Jargon File
ARP Address Resolution Protocol - converts IP addresses into MAC addresses by broadcasting a request: "Who has" and seeing who replies, then remembering their MAC addresses for future use.
ICMP The Internet Control Message Protocol. Routers and hosts generate messages in this protocol when they have problems. ICMP datagrams can indicate network congestion, destination unreachable for a variety of reasons, routing table errors and other problems. Perhaps the best known ICMP datagrams are the "echo request" and "echo reply" packets which are sent and received by the ping command.
IP Address A 32-bit address used by the Internet Protocol. This allows communication between networks, since the IP address can be broken down into a network part, subnetwork part and the final host address - in much the same way as a phone number (in the US at least) consists of an area code, and exchange within that area, and then the line that connects to the final phone.
MAC Address Also known as a hardware address or Ethernet address, is the 48-bit address used by Ethernet cards to talk directly over a local area network. Other LAN protocols such as Token Ring and 802.11 wireless LANs also use 48-bit addresses in the same way. You cannot communicate with a network card on a different network using this address, which is why IP addresses are required.
TCP Transmission Control Protocol - a connection-oriented protocol used by services that require reliable transfer ot variable length data, such as email, ftp and the web.
UDP User Datagram Protocol - a connectionless protocol used by some low-level system services like DNS, as well as multimedia applications like Real Audio

Tidak ada komentar:

Posting Komentar