Event Correlation
Definition: Event correlation is the process of monitoring what is happening on networks and other systems in order to identify patterns of events that might signify attacks, intrusions, misuse or failure.
In today’s interconnected world, network management is critically important. Those who maintain the network need to quickly pinpoint and fix any problem, whether it’s a malfunctioning mail daemon or a damaged fiber-optic link.
Luckily, almost every part of a modern network provides data about what it’s doing:
● Operating systems log systems and security events.
● Servers keep records of what they do.
● Applications log errors, warnings and failures.
● Firewalls and virtual private network gateways record traffic deemed suspicious.
● Network routers and switches watch what goes on between network segments.
● Messaging systems forward alerts, such as Simple Network Management Protocol (SNMP) traps, to a central management console.
Besides monitoring their own behavior, all these devices and management programs receive and relay messages from other network systems, leading to duplicate alerts. A single failure or problem can generate a blizzard of event messages.
The more complex the network and the more applications that are distributed, the more event messages, alarms and alerts the appliances will generate. In the end, far more data is generated than anyone can easily scan.
According to Chris Jordan, a security manager at Computer Sciences Corp., OC-12 connections can generate about 850 megabytes of event data in an hour. (OC-12 is a fiber-optic connection with bandwidth of 622Mbit/sec.) That translates into more than 600GB of data per month, or 7TB a year —— just for logs and alerts related to a single network link.
Event correlation simplifies and speeds the monitoring of network events by consolidating alerts and error logs into a short, easy-to-understand package. A network administrator can deal with, say, 25 events based on cross-referencing intrusion alerts against firewall entries and host/asset databases much more efficiently than when he must scan 10000 mostly normal log entries.
The benefits can be very real: more efficient use of staff time and skills, as well as the prevention of revenue loss resulting from downtime.
According to Marcus Ranum, an independent computer and communications security consultant in Woodbine, Md., correlation is something everyone wants, but nobody even knows what it is. It’s like liberty or free beer —— everyone thinks it’s a great idea and we should all have it, but there’ s no road map for getting from here to there. Still, a variety of technologies and operations are associated with event correlation:
Compression takes multiple occurrences of the same event, examines them for duplicate information, removes redundancies and reports them as a single event. So 1000 “route failed” alerts become a single alert that says “route failed 1,000 times.”
Counting reports a specified number of similar events as one. This differs from compression in that it doesn’t just tally the same event and that there's a threshold to trigger a report.
Suppression associates priorities with alarms and lets the system suppress an alarm for a lower-priority event if a higher-priority event has occurred.
Generalization associates alarms with some higher-level events, which are what’s reported. This can be useful for correlating events involving multiple ports on the same switch or router in the event that it fails. You don’ t need to see each specific failure if you can determine that the entire unit has problems.
Time-based correlation can be helpful establishing causality —— for instance, tracing a connectivity problem to a failed piece of hardware. Often more information can be gleaned by correlating events that have specific time-based relationships. Some problems can be determined only through such temporal correlation. Examples of time-based relationships include the following:
● Event A is followed by Event B.
● This is the first Event A since the recent Event B.
● Event A follows Event B within two minutes.
● Event A wasn’t observed within Interval I.
Event correlation, in its basic form, is becoming almost a commodity product. If you want to reduce the number of events and alarms and have some level of topological awareness to eliminate duplicates, that’s pretty standard and working today.
事件相关
定义: 事件相关是一个过程,监视网络上和其他系统中正在发生的事情,以便识别出有可能表示攻击、入侵或故障的事件模式。
在今天这个相互联接的世界里,网络管理是至关重要的。维护网络的人需要快速查明和解决任何问题,不管它是出了故障的邮件后台收发程序、还是被毁的光缆线路。
令人幸运的是,现代网络的几乎每个部分都提供它在做什么的数据:
● 操作系统记录系统和安全事件。
● 服务器保存它们做了什么的纪录。
● 应用程序记录错误、警告和故障。
● 防火墙和虚拟专网网关记录被认为是可疑的流量。
● 网络路由器和交换机监视着网络各段之间流动着什么。
● 消息系统给中央管理控制台转发警报,如SNMP(简单网络管理协议)陷阱。
除监视它们自己的行为之外,所有这些设备和管理程序还接收和转发其他网络系统传来的消息,导致警报的复制。单一的故障或问题有可能产生事件消息的泛滥。
网络越复杂、应用程序越分散,产生的事件消息、预警和警报就越多。结果,产生了太多的数据,以致没有人能够很容易地浏览一遍。
计算机科学公司的安全经理Chris Jordan说,OC-12连接在一个小时内能产生大约850兆字节的事件数据(OC-12是带宽为622兆位/秒的光缆连接)。就与单一网络连接有关的记录和警报而言,这意味着一个月就有超过600GB数据,一年就是7TB的数据。
事件相关通过将警报和错误记录合并成简短的、容易理解的包,从而简化和加速网络事件的监视。比如,一名网管员就能处理25个基于针对防火墙输入的交叉引用和主/资产数据库的入侵警报的事件,比他通常扫描1万条记录事件更高效。
其好处是实实在在的:更高效地利用员工的时间和技能,以及防止因宕机造成收入的损失。
美国马里兰州Woodbine市的独立计算机和通信安全顾问Marcus Ranum说,相关是人人都需要的东西,但是没有人知道它是什么样的。它与自由或免费啤酒差不多——人人都认为这是一个好主意,我们都应该拥有它,而如何得到却没有线路图。但是,有一些技术和操作可以用于事件相关:
压缩取出发生多次的相同事件,检查重复的信息,去除冗余,按单一的事件报告。因而,1000个“路由失败”警报成了单个警报,说“路由失败了1000次”。
计数把规定数目的类似事件按一个(事件)报告。它与压缩的区别在于它不只是记录相同的事件同时对触发报告设有一门限值。
抑制与警报的优先等级有关联,如果出现较高优先级的警报,它让系统抑制较低优先级的事件。
归纳与一些较高级别的事件的警报有关联,指出报告的是什么。这对涉及同一交换机或路由器上多个端口的事件在交换机或路由器失效的情况下进行相关处理时有用。如果你能确定整个设备有问题,你就不需要察看每个具体的故障。
基于时间的相关有助于建立因果关系。例如,从连接故障追查到硬件的失效部件。常常通过对具有特定基于时间的关系的事件进行相关,就能收集到更多的信息。有些问题只要通过时间相关就能确定。下列是基于时间的关系的例子:
● 事件B紧跟着事件A。
● 自最新一个事件B以后出现的第一个事件A。
● 两分钟内事件A跟在事件B之后。
● 在间隔1中没有发现事件A。
事件相关,就其基本形式,几乎成为了商品化产品。如果你要减少事件和警报的数目,以及拥有某种水平对消除重复的拓扑结构的认知,那么(事件相关)是今天非常好的标准和工作。