Network management best practices: Collecting essential data to optimise network performance

Network management accounting and performance strategies are discussed to teach you how to collect essential data from your network to optimise performance.

When network performance and accounting technologies overwhelm both Network Management Systems (NMS) applications and network managers, ask this question: What is the purpose of collecting your data?

We interviewed Cisco Distinguished Engineer Benoit Claise and Cisco's Senior Manager of Consulting Engineering Ralph Wolter, authors of Network Management: Accounting and Performance Strategies, to bring you information on how to simply network management and optimise network performance.

How is performance management best approached on enterprise networks?

Benoit Claise and Ralph Wolter: The best approach to performance management depends less on the classification of "enterprise versus service provider" and more on the network size and -- most important -- the business requirements. If the only objective is to get simple performance statistics, and the administrator knows exactly which parameters need to be collected and how to collect them, a tool such as MRTG may be the right choice. Enterprises requiring performance statistics, baseline reports or service level reports either want these reports "out of the box" or need customised reporting, resulting in more complex network management application suites -- HP Open View, IBM Tivoli or others. In both cases, the key to success is effective device instrumentation!

How can a network manager implement effective device instrumentation?

Claise and Wolter: By identifying what is relevant to collect, which links directly to the business case: What is the problem you are trying to solve by collecting accounting and performance records?

The following diagram from our book Network Management: Accounting and Performance Strategies offers guidance on how to get started:

Are there open source equivalents of this/these tool(s), and why would or wouldn't you recommend them?

Claise and Wolter: As mentioned above, MRTG is an excellent open source tool for monitoring. The advantage is that it is free of charge; the disadvantage is the lack of support and a potentially longer learning curve. Caida offers a set of very nice applications, e.g., cflowd for collecting NetFlow records.

In the decision phase between open source and commercial software, one should avoid comparing apples and oranges. Although open source software is available free of charge, it has operational costs directly related to it, such as the knowledge of what data to collect and how to collect it, configuration and customisation of the application, and so on. So-called "commercial off-the-shelf" (COTS) products offer the benefit of pre-defined reports, pre-tested and integrated functionality, and built-in knowledge about the collection procedures. It is not only the NMS application software that offers integrated knowledge, it can also be collection devices like the Cisco NAM (Network Analysis Module), which collects a variety of details, such as RMON, ART, NetFlow and IP telephony statistics, and presents them in a single GUI.

How do you identify the different application types in the network?

Claise and Wolter: There are two approaches: The first is static application recognition, using NetFlow for monitoring or access control lists (ACL) to map applications via well-known ports to different Quality of Service (QoS) classes. The second method is stateful application recognition via Deep Packet Inspection (DPI), where the network element inspects data packets, including the payload, to compare the data with a set of pre-defined traffic patterns.

Method 1 is straightforward and has a limited CPU impact at the device. It is limited, however, as it cannot identify applications that dynamically change ports. Method 2 is very flexible but has a certain CPU impact, unless implemented in hardware (ASICS). Examples for Method 2 are Cisco NBAR (Network Based Application Recognition) and the Cisco Service Control Engine (SCE). Note that the Cat6500 PISA Supervisory card implements NBAR collection in hardware.

In your book, you mention that there are types of data that are nice to have and types of data that are necessary in order to evaluate your network. What metrics are additional as opposed to necessary?

Claise and Wolter: In most cases, this depends on the type of network that the enterprise has built. For example, if the network has different QoS classes configured, the DSCP/ToS field is a must-have field. Otherwise, it is just nice to have for potential future network redesigns.

Another example is traffic monitoring: If I need a report that shows only the destination of traffic, the IP destination address is required, while the IP source address is not necessarily required. Unless application statistics are monitored, the source and destination port numbers need not necessarily be collected. Some large enterprises run their own Multiprotocol Label Switching (MPLS) networks and collect statistics for core traffic analysis. In this case, the Forwarding Equivalence Class (FEC -- typically this is the destination IP address at the exit point of the MPLS cloud) field is necessary to distinguish traffic with a destination in the MPLS cloud from traversing traffic that is destined for other networks. In this scenario, the MPLS label is nice to have but does not add much value.

For an enterprise network, what kind of data is essential to have for network performance or otherwise?

Claise and Wolter: This question can easily be answered after identifying which of the following three categories are relevant:

  • Device performance monitoring:
    • Interface and subinterface utilisation
    • Per class of service utilisation
    • Traffic per application
  • Network performance monitoring:
    • Communication patterns in the network
    • Path utilisation between devices in the network
  • Service performance monitoring:
    • Traffic per server
    • Traffic per service
    • Traffic per application

How can a network administrator check connectivity proactively?

Claise and Wolter: There are two approaches to this: Either the NMS application monitors all devices, interfaces and other details regularly, or the network elements monitor themselves by applying self-management concepts. Note that the device self-monitoring still works even if the connection to the NMS application is cut off. Best current practice suggests using an approach that combines both methods.

There seem to be countless numbers of data collection features. Which of these are used for network managers to manage network performance, and is there an advantage of one over another, depending on the size of your company?

Claise and Wolter: In the book, we distinguish between data collection features (RMON, NBAR, ART/APM, NetFlow, IP SLA, etc.) and transport protocols (such as SNMP, NetFlow, FTP, RADIUS, and XML). Most of these features use SNMP as the transport protocol (RMON, NBAR, ART/APM, IP SLA), which means that there is little choice of transport protocol. The network planner has a choice only if multiple export protocols are available; for example, NetFlow data can be exported in a push model as detailed records (via the Cisco NetFlow or IETF ipfix protocol) or summarised in an MIB with a pull model. Therefore, a features comparison is mainly related to functionality and less to the transport protocol.

If there were one rule you would want the readers of your book to follow to make them better network managers, what would it be?

Claise and Wolter: The foundation of the book is based on this question: What is the purpose of collecting data? State-of-the-art accounting and performance technologies collect so many details that both NMS applications and network managers can be overwhelmed by the amount of data collected. A good network manager always starts by asking the following questions:

  • What is the business case?
  • What information is required to solve the request?
  • What is the best technology to gather the right amount of detail?
  • How and where do I collect the data records?
  • What is the right NMS application to provide the answers to questions 1 through 4?

Note that the flow chart from the second question provides additional answers to this question.

Read more on Network monitoring and analysis