|
This document presents some of the fundamentals on System-management to help readers become familiar with the basic functionality of a system management system.
System Management in general is a set of activities where a variety of tools, applications, and devices are utilized by IT personnel to monitor and maintain information technology systems. System management means different things to different people. For the CxO of an organization it would mean being able to ensure that the enterprise IT infrastructure (consisting of departments, locations and services) is performing optimally. To the System Manager it would mean managing the details that constitute this high-level view. In the simplest terms, system management translates into managing fault and performance across applications, servers and systems.
The majority of system management architectures have the same fundamental structure and set of relationships. End-user devices or stations, such as desk-top computers and other system devices, either run software that enables them to send alerts when problems are recognized (e.g. when one or more user-defined thresholds are exceeded), or are periodically polled/queried to determine their health. The management system receiving these alerts presents the data for review and consumption by IT personnel, and reacts by executing one or more actions, including notifying operators/engineers, logging events, determining the root causes of problems and initiating relevant automatic repair actions.
Polling of end devices by the management system can be automatic or user-initiated. Agents or data gathering engines are software modules that collect detailed information about managed devices, store this information in a central or distributed database, and provide it (proactively or reactively) to the core system management system (NMS) using a variety of system management protocols. Simple System Management Protocol (SNMP) and Common Management Information Protocol (CMIP) are well known protocols.
System management tools provide a variety of information to system operators and engineers through monitoring and measuring a variety of performance metrics. The most common metrics in the systeming arena are availability, throughput, bandwidth utilization, and latency (or delay). Availability for example is a measure of what percentage of the time a system resource is available for use. In addition to these core metrics, administrators are also often interested in error rates and the performance of systems including CPU and memory utilization and delay (or latency).
System management systems vary from simple one device applications to complex hierarchical and distributed system. A variety of monitoring techniques are utilized by these systems. Some systems utilize passive monitoring to gather information about the system whereas others actively poll system devices to collect system performance data. An integrated SNMP system management system leverages SNMP to provide a complete view of a system. Passive system performance tools, also called packet capture tools or packet sniffers, do not generate any traffic themselves but focus on merely listening to the data on the system. The breadth of analysis enabled by passive tools is limited because they can only see traffic that is local to the device running the sniffer.
Application and service monitoring tools support monitoring of individual system applications, and focus less on the system equipment and infrastructure and more on the actual servers and applications that provide user services. Flow monitoring analyzes system traffic as flows, and aggregates system traffic based on individual connections, users, protocols, or applications. Flow monitoring tools are able to provide a bigger picture view of a system including specific information on application and connection performance as well as insight into routing and even system security.
The ISO system management model consists of five conceptual areas. The goal of performance management is to measure the various aspects of system performance so that system operation can be maintained at an acceptable level. The goal of configuration management is to monitor system and system configuration information so that the effects on system operation of various versions of hardware and software elements can be tracked and managed. The goal of accounting management is to measure system utilization parameters so that individual or group users on the system can be regulated appropriately. The goal of fault management is to detect, log, notify users of, and (to the extent possible) automatically fix system problems to keep the system running effectively. The goal of security management is to control access to system resources according to local guidelines so that the system cannot be sabotaged (intentionally or unintentionally) and sensitive information cannot be accessed by those without appropriate authorization.
Choosing a system management system typically involves understanding the following issues:
![]() |
| © 2008-2010 Zyrion, Inc. All rights reserved. |
![]() |
| Company | Contact | Resources | Datacenter Solutions | Systems Management | Network Performance Management |
![]() |
![]() |