0% found this document useful (0 votes)
43 views28 pages

Mining Terrorism

management information system

Uploaded by

Yu Thian
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views28 pages

Mining Terrorism

management information system

Uploaded by

Yu Thian
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 3

Data Mining for Counter-Terrorism


Bhavani Thuraisingham
The MITRE Corporation Burlington Road, Bedford, MA On leave at the National Science Foundation, Arlington, VA

Abstract: Data mining is becoming a useful tool for detecting and preventing terrorism. This paper rst discusses some technical challenges for data mining as applied for counterterrorism applications. Next it provides an overview of the various types of terrorist threats and describes how data mining techniques could provide solutions to counterterrorism. Finally some privacy concerns and potential solutions that could detect terrorist activities and yet attempt to maintain privacy will be discussed. Keywords: Counter-terrorism, Data Mining, Privacy

3.1 Introduction
Data mining is the process of posing queries and extracting useful patterns or trends often previously unknown from large amounts of data using various techniques such as those from pattern recognition and machine learning. There have been several developments in data mining and the technology is being used for a wide variety of ap191

192 C HAPTER T HREE plications from marketing and nance to medicine and biotechnology to multimedia and entertainment. Recently there has been much interest on exploring the use of data mining for counter-terrorism applications. For example, data mining can be used to detect unusual patterns, terrorist activities and fraudulent behavior. While all of these applications of data mining can benet humans and save lives, there is also a negative side to this technology, since it could be a threat to the privacy of individuals. This is because data mining tools are available on the web or otherwise and even nave users can apply these tools to extract information from the data stored in various databases and les and consequently violate the privacy of the individuals. To carry out effective data mining and extract useful information for counter-terrorism and national security, we need to gather all kinds of information about individuals. However, this information could be a threat to the individuals privacy and civil liberties. In this paper we will provide an overview of applying data mining for counterterrorism. At the workshop on Next Generation Data Mining (NGDM), a panel was conducted on Data Mining for Counter-terrorism. The panel raised many interesting technical challenges. In section 3.2 of this paper we will discuss some of these challenges. To understand how data mining could be applied, we need a good understanding of what the terrorist threats are. We have grouped the threats into several categories and will discuss them in Section 3.3. Applying data mining techniques for counter-terrorism will be the subject of Section 3.4. There have been many discussions recently of the privacy violations that could occur as a result of data mining. In section 3.5 we address privacy as well as discuss data mining solutions that attempt to detect/prevent terrorism and at the same time maintain some level of privacy. The paper is concluded in Section 3.6.

3.2 Research Challenges


The panel on data mining for counter-terrorism at the NGDM workshop discussed several technical challenges. We discuss a few of the challenges in this section. Data mining technologies have advanced a great deal. They are now being applied for many applications. The main question is, are they ready for detecting and /or preventing terrorist activities? For example, can we completely eliminate false positives and false negatives? False positives could be disastrous for various individuals. False negatives could increase terrorist activities. The challenge is to nd the needle in the haystack. We need knowledge directed data mining to eliminate false positives and false negatives as much as possible. Another challenge is mining data in real-time. We now have tools to detect credit card violations and calling card violations. These tools function in real-time. However can one build models in real-time? The general view among the research community is that real-time model building is a challenge. Furthermore, for detecting counterterrorism activities we need good training examples. How can we get such examples especially in an unclassied setting? A third challenge is multimedia data mining. While we now have tools to mine structured and relational databases, mining unstructured databases is still a challenge. Do we extract structure from unstructured databases and then mine the structured data

T HURAISINGHAM 193 or do we apply mining tools directly on unstructured data? Furthermore, while there is progress on text mining, we need work on audio and video as well as on image mining. Other directions include graph and pattern mining. For example, one has to connect all the dots. Essentially one builds a graph structure based on the information he or she has. If multiple agencies are working on the problem, then each agency will have its own graph. The challenge is to be able to make inferences about missing nodes and links in the graph. Also the graph could be very large. The question is how can one reduce the graph to a more manageable size? Finally nding the data to test the ideas is still a major challenge. How can we get unclassied data? Is it possible to scrub and clean the classied data and produce reasonable data at the unclassied level? How can we nd large data sets consisting of multimedia data types? Is it possible to develop a test-bed where one can apply the various data mining tools to determine their efciency? Web mining is a challenge for detecting unusual patterns. In a way web mining encompasses data mining as one has to mine all the data on the web as well as mine the structure and usage patterns. By mining the usage patterns one could get patterns such as there are an unusual number of visits to a federal web site from Paris around 3am in the morning. Data on the web includes structured as well as unstructured data. Therefore the tools developed for data mining apply for web mining also. In addition, we need tools to mine the structure of the web as well as the usage patterns. Privacy is a major challenge with respect to data mining for counter-terrorism. The challenge is to extract useful information from data mining but at the same time maintain privacy. Several efforts are under way for privacy preserving data mining. The idea here is to use various techniques such as randomization, cover stories, as well as multi-party policy enforcement for privacy preserving data mining. While there is some progress, the effectiveness of these techniques needs to be determined. The above are some of the challenges for data mining for counter-terrorism discussed at the workshop. That is, while data mining could become a useful tool for counter-terrorism, there are many challenges that need to be addressed. They include mining multimedia data, graph mining, building models in real-time, knowledge directed data mining to eliminate false positives and false negatives, web mining, and privacy sensitive data mining. Research is progressing in the right direction. However, there is still much to be done (see also [14]). Now that we have provided an overview of the challenges on data mining for counter-terrorism, in the next three sections we will provide some more details on this topic. To understand how data mining may be applied, we need a good understanding of what the threats are. In section 3.3 we will provide an overview of various threats and protection measures. In section 3.4 we will examine how data mining could provide potential counter-terrorism solutions, especially for the threats discussed in section 3.3. Because of the important of privacy and the potential threats to privacy due to data mining, we will discus various privacy issues in Section 3.5.

194 C HAPTER T HREE

3.3 Some Information on Terrorism, Security Threats, and Protection Measures


3.3.1 Overview

Now we are ready to embark on a critical application of data mining technologies. This application is counter-terrorism. Counter-terrorism is mainly about developing counter-measures to threats occurring from terrorist activities. In this section we focus on the various types of threats that could occur. In section 3.4 we will discuss how data mining could help prevent and detect the threats. Our discussion of counter-terrorism is rather preliminary. We are not claiming to be counter-terrorism experts. The information on terrorist threats we have presented here has been obtained entirely from unclassied newspaper articles and news reports that have appeared over the years. Our focus is to illustrate how data mining could help towards combating terrorism. We are not saying that data mining solves all the problems. But because of the fact that data mining has the capability to extract patterns and trends, often previously unknown, we should certainly explore the various data and web data mining technologies for counter-terrorism. For us web data mining goes beyond data mining. It not only includes data mining techniques, but also focuses on web trafc and usage mining as well as web structure mining. That is, there are additional challenges for web data mining that are not present for just data mining. Furthermore, web data mining also includes structured data mining as well as unstructured data mining. Furthermore, we believe that much of the data will eventually be on the web, whether they are public networks such as the Internet or private such corporate intranets and classied intranets. Therefore, studying web data mining encompasses studying data mining as well. Before we embark on a discussion of counter-terrorism, we need to discus the types of threats. Note that threats could be malicious threats due to terror attacks or non-malicious threats due to inadvertent errors. While our main focus is on malicious attacks, we also cover some of the inadvertent errors, as there may be similar solutions to combat such problems. The types of terrorist threats we have discussed include non-information related terrorism, information related terrorism, bio-terrorism and chemical attacks. By non-information related terrorism we mean people attacking others with say bombs and guns. For this we need to nd out who these people are by analyzing their connections and then develop counter-terrorism solutions. By information related threats we mean threats due to the existence of computer systems and networks. These are unauthorized intrusions and viruses as well as computer related vandalism. Information related terrorism is essentially cyber terrorism. Then there is bio-terrorism, chemical and nuclear attacks. These are terrorist attacks caused by biological substances and chemical/nuclear weapons. It does not mean that these are all the types of threats that exist. But these are the threats we will be examining. We will discus how data mining could perhaps be used to help prevent and detect attack due to such threats. The organization of this section is as follows. Section 3.3.2 discusses threats from natural disasters as well as human errors. We then focus on malicious threats in the

T HURAISINGHAM 195 next three sections. Non-information related threats would be discussed in Section 3.3.3. These include terrorist attacks as well as insider threat analysis, border and transportation threats. In section 3.3.4 we discuss information related threats. Essentially this is about cyber-terrorism. Threats occurring from biological, chemical and nuclear weapons will be disused in Section 3.3.5. Attacks on critical infrastructures will be given special consideration in Section 3.3.6. Note that infrastructures may also be attacked during information related attacks and non-information related attacks. We group the threats into two categories in Section 3.3.8. They are non real-time threats and real-time threats. We analyze the threats discussed in section 3.3.3 through 3.3.6 to see whether they are non real-time threats or real-time threats. Then we focus on counter-terrorism measures in Section 3.3.8. These include counter-terrorism for noninformation related threats; information related threats as well as bio-terrorism. We also briey examine counter-terrorism measures for non real-time threats as well as for real-time threats. Note that when we want to carry out data mining to combat terrorism, we need good data. This means that we need data about terrorists as well as terrorist activities. This also means we will have to gather data about all kinds of people, events and entities. Therefore, there could be a serious threat to privacy. Therefore, we will address privacy and civil liberties in Section 3.5.

3.3.2

Natural Disasters and Human Errors

As we have stated in Section 3.3.1, threats could occur due to natural disasters and human errors as well as through malicious attacks. While the solutions to the attacks in the near-term may not be that different in terms of emergency responses, the way to combat these threats in the longer-term will very likely be quite different. By natural disasters we mean disasters due to hurricanes, earthquakes, res, power failures and accidents. Some of these disasters may be due to human errors such as pressing the wrong button in a process plant causing the plant to explode. Data mining could help detect some of the natural disasters. That is, by analyzing lot of geological data, a data mining tool may predict that an earthquake is about to occur in which case the people in the area could be evacuated beforehand. Similarly by analyzing the weather data, the tool could predict that hurricanes are about to occur. Emergency responses, whether a building is caught on re through natural disasters or by terrorist attacks, may not be that different. In both cases, there will intense panic, although if the building explodes due to a bomb the panic may be more intense and the collapse may be more rapid. We need effective emergency response teams to handle such attacks. Data mining could be used to analyze say previous attacks and train various tools and then be able to give advice how to handle the emergency situation. Here again we need training examples some of which may not exist. In this case we may need to train with hypothetical scenarios and simulated examples. The long terms measures to be taken for natural disasters may be quite different from terrorist attacks. It is not every day that we have an earthquake, even in the most earthquake prone regions. It is not often that we have hurricanes, even in the most hurricane prone regions. Therefore we have time to plan and react. This does not mean that a natural disaster is less complex to manage. It could be devastating and take many human lives. Nevertheless countries usually plan for such disasters mainly through

196 C HAPTER T HREE experiences. Human errors are also a source of major concern. We need to continually train say the operators and give them advice to be cautious and alert. We need to take proper actions if humans have been careless. That is, unless there is an absolutely good excuse, human errors should not be treated lightly. This way, humans will be cautious and perhaps not make such errors. Terrorist attacks are quite different. The problem is, one does not know when it will happen and how it will happen. Many of us could never have imagined that airplanes would be used as weapons of mass destruction to bring the famous world trade center towers down. Many of us still may not know what the next attack may be. Would they be attacks caused by suicide bombers or would they be attacks caused by chemical weapons or would they be attacks caused by cyber terrorism. The counter-measures for prevention and detection may be quite intense for terrorist attacks. As we have stated, we are not experts on counter-terrorism or have studied the nature of the attacks. Our goal is to examine the various data mining techniques to see how they could be applied to handle the various threats that have been discussed almost daily in the newspapers and on television. It should however be noted that to develop effective techniques, the data mining specialists have to work together with counter-terrorism experts. That is, one cannot use the techniques without a good understanding of what the threats are. Therefore, while the contents of this paper may be used as a reference, I would urge those interested in applying data mining techniques to solve real world problems and terrorist attacks to work with counter-terrorism specialists. In the next few sections we will discuss various types of terrorism and counter-terrorism measures.

3.3.3
3.3.3.1

Non-Information Related Terrorism


Overview

In this section we will provide an overview of various types of non-information related terrorism. Note that by information related terrorism we mean attacks essentially on computers and networks. That is, they are threats that damage electronic information. By non-information related terrorism we mean terrorism due to other means such as terrorist attacks, car bombing, vandalism such as setting res etc. The organization of this section is as follows. We discuss terrorist attacks and external threats in Section 3.3.3.2. Insider threats are discussed in section 3.3.3.3. Attacks on borders and transportation are discussed in Section 3.3.3.4. Note that border and transportation attacks may be considered to be part of non information related attacks, we have given special consideration as there is so much discussion now related to securing the borders and transportation mechanisms. 3.3.3.2 Terrorist Attacks and External Threats

When we hear the word terrorism it is the external threats that come to our mind. My earliest recollection of terrorism is riots where one ethnic group attacks another ethnic group by essentially killing, looting, setting res to houses, and other acts of

T HURAISINGHAM 197 terrorism and vandalism. Then later on we heard of airplane hijackings where a group of terrorists hijack airplanes and then make demands on governments such as releasing political prisoners who could possibly be terrorists. Then we heard of suicide bombings where terrorists carry bombs and blow themselves up as well as others nearby. Such attacks usually occur in crowded places. More recently we have heard of using airplanes to blow up buildings. While the above acts are all terrorist attacks, we hear almost daily about someone shooting and killing someone else when neither party belongs to any gangs or terrorist groups. This in a way is terrorism also, but these acts are more difcult to detect and prevent because there are always what are called crazy people in our society. While the technologies should detect and prevent such attacks also, what this paper focuses is on how to detect attacks from people belonging to terrorist groups. All of the threats we have discussed above are sort of external threats. These are threats occurring from the outside. In general, the terrorists are usually neither friends nor acquaintances of the victims involved. But there are also other kinds of threats and they are insider threats. We will discuss them in the next section. 3.3.3.3 Insider Threats

Insider threats are threats from people inside an organization attacking the others around them through perhaps not bombs and airplanes but using other sinister mechanisms. Examples of insider threats include some one from a corporation giving information to a competitor of proprietary products. Another example is an agent from an intelligence agency committing espionage. A third example is a threat coming from ones own family. For example, betrayal from a spouse who has insider information about assets and the betrayer giving the information to a competitor to his or her advantage. That is, insider threats can occur at all levels and all walks of life and could be quite dangerous and sinister because you never know who these terrorists are. They may be your so-called best friends or even your spouse or your siblings. Note that people from the inside could also use guns to shoot people around them. We often hear about ofce shootings. But these shootings are not in general insider threats, as they are not happening in sinister ways. That is, these shootings are sort of external threats although they are coming from people within an organization. We also hear often about domestic abuse and violence such as husbands shooting wives or vice versa. These are also external threats although they are occurring from the inside. Insider threats are threats where others around are totally unaware until perhaps something quite dangerous occurs. We have heard that espionage goes on for years before someone gets caught. While both insider threats and external threats are very serious and could be devastating, insider threats can be even more dangerous because one never knows who these terrorists are. 3.3.3.4 Transportation and Border Security Violations

Let us examine border threats rst and then discuss transportation threats. Safeguarding the borders is critical for the security of a nation. There could be threats at borders from illegal immigration to gun and drug trafcking as well as human trafcking to terrorists

198 C HAPTER T HREE entering a country. We are not saying that illegal immigrants are dangerous or are terrorists. They may be very decent people. However, they have entered a country without the proper papers and that could be a major issue. For ofcial immigration into say the USA, one needs to go through interviews at US embassies, go through medical checkups and X-rays as well as checks for diseases such as tuberculosis, background checks and many more things. It does not mean that people who have entered a country legally are always innocent. They could be terrorists also. At least there is some assurance that proper procedures have been followed. Illegal immigration can also cause problem to the economy of a society and violate human rights through cheap illegal labor etc. As we have stated, drug trafcking has occurred a lot at borders. Drugs are a danger to society. It could cripple a nation, corrupt its children, cause havoc in families, and damage the education system and cause extensive damage. It is therefore critical that we protect the borders from drug trafcking as well as other types of trafcking including rearms and human slaves. Other threats at borders include prostitution and child pornography, which are serious threats to decent living. It does not mean that everything is safe inside the country and these problems are only at borders. Nevertheless we have to protect our borders so that there are no additional problems to a nation. Transportation systems security violations can also cause serious problems. Buses, trains and airplanes are vehicles that can carry tens of hundreds of people at the same time and any security violation could cause serious damage and even deaths. A bomb exploding in an airplane or a train or a bus could be devastating. Transportation systems are also the means for terrorists to escape once they have committed crimes. Therefore transportation systems have to be secure. A key aspect of transportation systems security is port security. These ports are responsible for ships of the United States Navy. Since these ships are at sea throughout the world, terrorist may have opportunities to attack these ships and the cargo. Therefore, we need security measures to protect the ports, cargo, and our military bases. In Section 3.3.7 we will discuss various counterterrorism measures for the threats we have discussed here. The next three sections will discuss additional types of terrorism.

3.3.4
3.3.4.1

Information Related Terrorism


Overview

This section discusses information related terrorism. By information related terrorism we mean cyber-terrorism as well as security violations through access control and other means. Trojan horses as well as viruses are also information related security violations, which we group into information related terrorism activities. In the next few subsections we discuss various information related terrorist attacks. In section 3.3.4.2 we give an overview of cyber terrorism and then discuss insider threats and external attacks. Malicious intrusions are the subject of Section 3.3.4.3. Credit card and identity theft are discussed in Section 3.3.4.4. Information security violations such as access control violations are discussed in Section 3.3.4.5. Since web is a major means of information transportation, we give web security threats special consideration in Section 3.3.4.6. Note that an excellent book on web security discussing

T HURAISINGHAM 199 various threats and solutions is the one by Ghosh [10]. We also discuss some of the cyber threats and countermeasures in [11]. 3.3.4.2 Cyber-terrorism, Insider Threats, and External Attacks

Cyber-terrorism is one of the major terrorist threats posed to our nation today. As we have mentioned earlier, there is now so much of information available electronically and on the web. Attack on our computers as well as networks, databases and the Internet could be devastating to businesses. It is estimated that cyber-terrorism could cause billions of dollars to businesses. For example, consider a banking information system. If terrorists attack such a system and deplete accounts of the funds, then the bank could loose millions and perhaps billions of dollars. By crippling the computer system millions of hours of productivity could be lost and that equates to money in the end. Even a simple power outage at work through some accident could cause several hours of productively loss and as a result a major nancial loss. Therefore it is critical that our information systems be secure. Next we discuss various types of cyber terrorist attacks. One is spreading viruses and Trojan horses that can wipe away les and other important documents. Another is intruding the computer networks, which we will discuss in the next section. Information security violations such as access control violations as well as a discussion of various other threats such as sabotage and denial of service will be given later. Note that threats can occur from outside or form the inside of an organization. Outside attacks are attacks on computers from someone outside the organization. We hear of hackers breaking into computer systems and causing havoc within an organization. There are hackers who start spreading viruses and these viruses cause great damage to the les in various computer systems. But a more sinister problem is the insider threat. Just like non-information related attacks, there is the insider threat with information related attacks. There are people inside an organization who have studied the business practices and develop schemes to cripple the organizations information assets. These people could be regular employees or even those working at computer centers. The problem is quite serious as some one may be masquerading as someone else and causing all kinds of damage. In the next few sections we will examine how data mining could detect and perhaps prevent such attacks. 3.3.4.3 Malicious Intrusions

We have discussed some aspects of malicious intrusions. These intrusions could be intruding the networks, the web clients and servers, the databases, operating systems, etc. Many of the cyber terrorism attacks that we have discussed in the previous sections are malicious intrusions. We will revisit them in this section. We hear a lot of network intrusions. What happens here is that intruders try to tap into the networks and get the information that is being transmitted. These intruders may be human intruders or Trojan horses set up by humans. Intrusions could also happen on les. For example, one can masquerade as some one else and log into someone elses computer system and access the les. Intrusions can also occur on databases. Intruders

200 C HAPTER T HREE posing as legitimate users can pose queries such as SQL queries and access the data that they are not authorized to know. Essentially cyber terrorism includes malicious intrusions as well as sabotage through malicious intrusions or otherwise. Cyber security consists of security mechanisms that attempt to provide solutions to cyber attacks or cyber terrorism. When we discuss malicious intrusions or cyber attacks, we need to think about the non cyber world, that is non information related terrorism and then translate those attacks to attacks on computers and networks. For example, a thief could enter a building through a trap door. In the same way, a computer intruder could enter the computer or network through some sort of a trap door that has been intentionally built by a malicious insider and left unattended through perhaps careless design. Another example is a thief entering the bank with a mask and stealing the money. The analogy here is an intruder masquerading as someone else, legitimately entering the system and taking all the information assets. Money in the real world would translate to information assets in the cyber world. That is, there are many parallels between non-information related attacks and information related attacks. We can proceed to develop counter-measures for both types of attacks. These counter-measures are discussed in Section 3.3.8. 3.3.4.4 Credit Card Fraud and Identity Theft

We are hearing a lot these days about credit card fraud and identity theft. In the case of credit card fraud, others get hold of a persons credit card and make all kinds of purchases, by the time the owner of the card nds out, it may be too late. The thief may have left the country by then. A similar problem occurs with telephone calling cards. In fact this type of attack has happened to me once. Perhaps while I was making phone calls using my calling card at airports someone must have noticed say the dial tones and used my calling card. This was my company calling card. Fortunately our telephone company detected the problem and informed my company. The problem was dealt with immediately. A more serious theft is identity theft. Here one assumes the identity of another person say but getting hold of the socials security number and essentially carried out all the transactions under the other persons name. This could even be selling houses and depositing the income in a fraudulent bank account. By the time, the owner nds out it will be far too late. It is very likely that the owner may have lost millions of dollars due to the identity theft. We need to explore the use of data mining both for credit card fraud detection as well as for identity theft. There have been some efforts on detecting credit card fraud (see citeAFCE). We need to start working actively on detecting and preventing identity thefts. 3.3.4.5 Information Security Violations

In this section we provide an overview of the various information security violations. These violations do not necessarily mean that they are occurring through cyber attacks or cyber terrorism. They could occur through bad security design and practices. Nevertheless we have included this discussion for completion.

T HURAISINGHAM 201 Information security violations typically occur due to access control violations. That is, users are granted access depending on their roles which is called role-based access control) or their clearance level (which is called multilevel access control) or on a need to know basis. Access controls are violated usually due to poor design or designer errors. For example, suppose John does not have access to salary data. By some error this rule may not be enforced and as a result, John gets access to salary values. Access control violations can occur due to malicious attacks also. That is, someone could enter the system by pretending to be the system administrator and delete the access control rule that John does not have access to salaries. Another way is for a Trojan horse to operate on behalf of the malicious users and each time John makes a request, the malicious code could ensure that the access control rule is bypassed.

3.3.4.6

Security Problems for the Web

As mentioned in section 3.3.4.1, there are numerous security attacks that can occur due to the web. We discuss some of the web security threats in this section. As we have mentioned, in his book Ghosh [10] has provided an excellent introduction to web security and various threats. Note that while we have focused on web threats in this section, the threats discussed are applicable to any information system such as networks, databases and operating systems. The threats include access control violations, integrity violations, sabotage, fraud, denial of service and infrastructure attacks. For example, the traditional access control violations could be extended to the web. User may access unauthorized data across the web. Note that with the web there is so much of data all over the place that controlling access to this data will be quite a challenge. Data on the web may be subject to unauthorized modications. This makes it easier to corrupt the data. Also, data could originate from anywhere and the producers of the data may not be trustworthy. Incorrect data could cause serious damages such as incorrect bank accounts, which could result in incorrect transactions. We hear of hackers breaking into systems and posting inappropriate messages. With so much of business and commerce being carried out on the web without proper controls, Internet fraud could cause businesses to loose millions of dollars. Intruder could obtain the identity of legitimate users and through masquerading may empty the bank accounts. We hear about infrastructures being brought down by hackers. Infrastructures could be the telecommunication system, power system, and the heating system. These systems are being controlled by computers and often through the Internet. Such attacks would cause denials of service. Other threats include violations to condentiality, authenticity, and no repudiation. Condentiality violations enable intruders to listen in on the message. Authentication violations include using passwords without permissions, and non-repudiation violations enable someone from denying that he sent the message. The web threats discussed here occur because of insecure clients, servers and networks. To have complete security, one needs end-to-end security; that means secure clients, secure servers, secure operating systems, secure databases, secure middleware and secure networks.

202 C HAPTER T HREE

3.3.5

Bio-Terrorism, Chemical and Nuclear Attacks

The previous two sections discussed non-information related as well as information related terrorist attacks. Note that by information related attacks we mean cyber attacks. Non-information related attacks mean everything else. However we have separated bio-terrorism and chemical weapons attacks from non-information related attacks. We have also given special consideration for critical infrastructure attacks. That is, the non-information related attacks are essentially attacks due to bombs, explosions and other similar activities. While bio-terrorism and chemical/nuclear weapons attacks have been discussed at least for several decades, it is only after September 11, 2001 that the public is paying a lot of attention to these discussions. The anthrax attacks that occurred during the latter part of 2001 have resulted in increased fear and awareness of the potential dangers of bio-terrorism attacks and chemical/nuclear weapons attacks. Such attacks could kill several million people within a short space of time. More recently there is increasing awareness of the dangers due to bio-terrorism attacks resulting in the spread of infectious diseases such as smallpox, yellow fever, and similar diseases. These diseases are so infectious that it is critical that their spread is detected as soon as they occur. Preventing such attacks would be the ultimate goal. One option is to carry out mass vaccination. But this would mean some health hazards to various groups of people. Our challenge is to use technology to prevent and detect such deadly attacks. Technologies would include sensor technology and data mining and data management technologies. Attacks using chemical weapons are equally deadly. One could spray poisonous gas and other chemicals into the air, water and food supplies. For example, various dangerous chemical agents could be sprayed from the air on plants and crops. These plants and crops could get into the food supply and kill millions. We have to develop technologies to detect and prevent such deadly attacks. Another form of deadly attacks is the nuclear attacks. Such attacks could wipe out the entire population in the world. There are various nations developing nuclear weapons when they do not have the authorization to develop them. That is, these weapons are being developed illegally. This is what makes the world very dangerous. We have to develop technologies to detect and prevent such deadly attacks. In this section we have only briey mentioned the various biological, chemical and nuclear attacks. There are some good books that are being written about such terrorist activities (see [4] and [5]). As we have stressed, we are not counter-terrorism experts; nor have we studied the various types of terrorist attacks in any depth. Our information is obtained from various newspaper articles and documentaries. Our main goal is to examine various data mining techniques and see how they could be applied to detect and prevent such deadly terrorist attacks. Data mining for counter-terrorism will be discussed in sections 3.4 and 3.5.

3.3.6

Attacks on Critical Infrastructures

Attacks on critical infrastructures could cripple a nation and its economy. Infrastructure attacks include attacking the telecommunication lines, the electronic, power, gas, reservoirs and water supplies, food supplies and other basic entities that are critical for

T HURAISINGHAM 203 the operation of a nation. Attacks on critical infrastructures could occur during any type of attacks whether they are non-information related, information related or bio-terrorism attacks. For example, one could attack the software that runs the telecommunications industry and close down all the telecommunications lines. Similarly software that rues the power and gas supplied could be attacked. Attacks could also occur through bombs and explosives. That is, the telecommunication lines could be attacked through bombs. Attacking transportation lines such as highways and railway tracks are also attacks on infrastructures. As we have mentioned in Section 3.3.2, infrastructures could also be attacked by natural disaster such as hurricanes and earth quakes. Our main interest here is the attacks on infrastructures through malicious attacks both information related and noninformation related. Our goal is to examine data mining and related data management technologies to detect and prevent such infrastructure attacks.

3.3.7

Non Real-time Threats vs. Real-time Threats

The threats that we have discussed so far can be grouped into two categories; non realtime threats or real-time threats. In a way all threats are real-time as we have to act in real-time once the threats have occurred. However, some threats are analyzed over a period of time while some others have to be handled immediately. We discuss the various threats here. Consider for example the biological, chemical and nuclear threats. These threats have to be handled in real-time. That is, the response to these threats have timing constrains. If smallpox virus is being spread maliciously, then we have to start vaccinations immediately. Similarly if networks say for critical infrastructures are being attacked, the response has to be immediate. Otherwise we could loose millions of lives and/or millions of dollars. There are some other threats that do not have to be handled in real-time. For example consider the behavior of suspicious people such as those belonging to a certain terrorist organization or those enrolling in ight training schools. In a way these people are also planning attacks but sometime even they are not sure when they will attack. Therefore, one has to monitor these people, analyze their behavior and predict their actions. While there are timing constraints for these threats, the urgency is not as great as say the spread of the smallpox virus. But one should be vigilant about these non real-time threats also. In general there is no way to say that A is a real-time threat and B is a non real-time threat. A non real-time threat could turn into a real-time threat. For example, once the terrorists had hijacked the airplanes on September 11, 2001, the threat became a real-time threat as action had to be taken within say an hour.

204 C HAPTER T HREE

3.3.8
3.3.8.1

Aspects of Counter-Terrorism
Overview

Now that we have provided some discussion on various types of terrorist attacks including non-information related terrorism, information related terrorism, bio-terrorism, etc. we will discus what counter-terrorism is all about. Counter-terrorism is a collection of techniques used to combat, prevent, and detect terrorism. Our goal in this paper is to examine various data mining techniques to see how we can combat terrorism using these techniques. In this section we will briey discuss what counter-terrorism is all about for the terrorist attacks discussed in the previous sections. In Section 3.3.8.2 we discuss protecting from non-information related terrorism. In section 3.3.8.3 we discuss protecting from information related terrorism. In particular, we discuss various web security measures as well as other aspects such as intrusion detection and access control, briey. In section 3.3.8.4 we discuss protecting from bio-terrorism and chemical attacks and nuclear attacks. In section 3.3.8.5 we discuss protecting the critical infrastructures. We analyze counter-terrorism measures for non real-time threats as well as for real-time threats in Section 3.3.8.6. 3.3.8.2 Protecting from Non-information Related Terrorism

As we have stated, non-information related counter-terrorism includes protecting from bombings, explosions, vandalism and other kinds of terrorist attacks not involved with computers. For example, hijacking an airplane and attacking buildings with airplanes is a case of non-information related terrorist activity. The questions are how we do protect against such terrorist attacks? First of all we need to gather information about various scenarios and examples. That is, we need to identify all kinds of terrorist acts that have occurred in history starting from airplane hijacking to bombing of buildings. We also need to gather information about those under suspicion. All of the data that we have gathered needs to be analyzed to see if any patterns emerge. We also need to ensure there are physical safety measures. For example, we need to check the identity at airports or other places. We need to check for identity randomly say in trains as well as routinely say at checkpoints. We need to check the belongings of a person either randomly, routinely or if that person arouses suspicions to see if there are dangerous weapons or chemicals in his/her belongings. We should also use sniffing dogs, sensor devices to see if there are potentially hazardous materials. We need surveillance cameras to see who is entering the building. These cameras should also capture perhaps the facial expressions of various people. The data gathered from the cameras should be analyzed further for suspicious behavior. We also need to enforce access control measures at military bases and seaports. In summary, several counter-terrorism measures have to be taken to combat noninformation related terrorism. These include information gathering and analysis, surveillance, physical security and various other mechanisms. In the next few sections we will examine the data mining techniques and see how they can detect and prevent such terrorist attacks.

T HURAISINGHAM 205 3.3.8.3 Protecting from Information related terrorism

General Discussion We will rst provide an overview of counter-terrorism with respect to information related terrorism. We will give special consideration for security solution for the web later on. Essentially protecting from information related terrorism is involved with detecting and preventing malicious attacks and intrusions. These attacks could be attacks due to viruses or spoong or masquerading and stealing say information assets. These attacks could also be attacks on databases and malicious corruption of data. That is, terrorist attacks are not necessarily stealing and accessing unauthorized information. They could also include malicious corruption and alteration of the data so that the data will be of little or no use. Terrorist attacks also include credit card frauds and identity thefts. Various data mining techniques are being proposed for detecting intrusions as well as credit card fraud. We will discuss them in later sections. Preventing malicious attacks is more challenging. We need to design systems in such a way that malicious attacks and intrusions are prevented. When an intruder attempts to attack the system, the system would gure this out and alert the security ofcer. There is research being carried out on secure systems design so that such intrusions are prevented. However there is more focus on detecting such intrusions than prevention. Enforcing appropriate access control techniques is also a way to enforce security. For example, users may have certicates to access the information they need to carry out the jobs that they are assigned to do. The organization should give the users no more or no less privileges. There is much research on managing privileges and access rights to various types of systems. We have briey discussed cyber security measures. We will discuss security solutions for the web in more detail next. Note that there are also additional problems such as the inference problem where users pose sets of queries and infer sensitive information. This is also an attack. We will visit the inference problem later when we discuss privacy. Security Solutions for the Web We need end-end-end security and therefore the components include secure clients, secure servers, secure databases, secure operating systems, secure infrastructures, secure networks, secure transactions and secure protocols. One needs good encryption mechanisms to ensue that the sender and receiver communicate securely. Ultimately whether it be exchanging messages or carrying out transactions, the communication between sender and receiver or the buyer and the seller has to be secure. Secure client solutions including securing the browser, securing the Java virtual machine, securing Java applets, and incorporating various security features into languages such as Java. Note that Java is not the only component that has to be secure. Microsoft has come up with a collection of products including ActiveX and these products have to be secure also. Securing the protocols include secure HTTP, the secure socket layer. Securing the web server means the server has to be installed securely as well as it has to be ensured that the server cannot be attacked. Various mechanisms that have been used to secure operating systems and databases may be applied here. Notable among them are access control lists, which specify which users have access to which web pages and data. The

206 C HAPTER T HREE web servers may be connected to databases at the backend and these databases have to be secure. Finally various encryption algorithms are being implemented for the networks and groups such as OMG (Object Management Group) are envisaging security for middleware such as ORB (Object Request Brokers). One of the challenges faced by the web mangers is implementing security policies. One may have policies for clients, servers, networks, middleware, and databases. The question is how do you integrate these policies? That is how do you make these policies work together? Who is responsible for implementing these policies? Is there a global administrator or are there several administrators that have to work together? Security policy integration is an area that is being examined by researchers. Finally, one of the emerging technologies for ensuring that an organizations assets are protected is rewalls. Various organizations now have web infrastructures for internal and external use. To access the external infrastructure one has to go through the rewall. These rewalls examine the information that comes into and out of an organization. This way, the internal assets are protected and inappropriate information may be prevented from coming into an organization. We can expect sophisticated rewalls to be developed in the future. Other security mechanism includes cryptography. 3.3.8.4 Protecting from Bio-terrorism and Chemical Attacks

We discussed biological, chemical and nuclear threats in Section 3.3.5. In this section we discuss counter-terrorism measures. First of all unlike say non information related terrorism where bombing and shootings are fairly explicit, bio-terrorism and even chemical attacks are not immediately obvious. Suppose a terrorist spreads the smallpox virus, it takes time, at least a few days before the symptoms surface and few more days before the diagnosis is made. By then it may be too late as millions of people may be infected in trains and planes and large gatherings and meetings. The challenge here is to prevent as well as detect such attacks as soon as possible. Preventing such attacks could mean developing special sensors to sense that there are certain viruses in the air. The sensors may also have to detect what these viruses are. A cold virus may not be as harmful as a smallpox virus. If the disease has spread then some quick actions have to be taken as to who and how many to vaccinate. Chemical weapons may also be treated similarly. One needs sensors to detect as to who has these weapons. Once the dangerous chemicals are spilt, we need to determine what other agents do we spray to limit the damage caused by the chemicals. For example when one spills acidic material, then one counters it by washing with soap-based materials. In the case of nuclear attacks, we need to determine what nuclear weapons have been used and then decide what actions to take. How do we evacuate the various groups of people in an organized fashion? What medications do we give them? These are very difcult challenges. Research activities are proceeding, but it will take a very long time to nd viable solutions. 3.3.8.5 Critical Infrastructure Protection

Next we discuss critical infrastructure protection. Our critical infrastructures are telecommunication lines, networks, water, food, gas electric lines, etc. Attacking the critical

T HURAISINGHAM 207 infrastructure could cripple businesses and the country. We need to determine the measures to be taken when the infrastructures are attacked. Essentially the counter-measures include those developed for non informationbased terrorism as well as for information-based terrorism. For example one could bomb the telecommunication lines or create viruses that would affect the telecommunications software. This means that communication through telephones as well as computer communications that occurs through phone lines could be crippled. The counter-measures developed for non information related terrorisms well for information related terrorism could be applied here. We need to gather information about the terrorist groups and extract patterns. We also need to detect any unauthorized intrusions. Our ultimate goal is to prevent such disastrous acts. Even biological, chemical and nuclear weapons could attack the infrastructure of the nation. For example our food supplies, water supplies and hospitals could be damaged by biological warfare. Here again we need to examine the counter-terrorism measures for biological, chemical and nuclear attacks and apply them here. 3.3.8.6 Protecting from Non Real-time and Real-time Threats

In section 3.3.7 we discussed both non real-time and real-time threats. As we have mentioned, it is difcult to state that A is a real-time threat and B is a non real-time threat. Over time, a non real-time threat could become a real-time threat. Real-time threats have to be handled in real-time. Example of a real-time threat is detecting and preventing the spread of the smallpox virus. When it comes to counter-measures for handling these threats, one needs to develop techniques that meet timing constraints to handle real-time threats. For example, if data mining is to be used to detect and prevent the malicious intrusions into say corporate networks, then these data mining techniques have to give results in real-time. In the case of non real-time threats, the data mining techniques could analyze the data and make predictions that certain threats could occur say in July 2003. In the next section we will revisit non real-time threats and real-time threats from a data mining perspective. While real-time threats need immediate response, both non real-time threats as well as real-time threats could be deadly and have to be taken seriously.

3.4 Data Mining Applications in Counter-Terrorism


3.4.1 Overview
In the previous section we discussed various threats and counter-measures. In particular, we discussed non information related attacks such as bombings and explosions; information related attacks such as cyber terrorism; biological, chemical and nuclear attacks such as the spread of smallpox; and critical infrastructure attacks such as attacks on power and gas lines. Counter-terrorism measures include ways of protecting from non-information related attacks, information related attacks, biological, chemical and nuclear attacks, as well as critical infrastructure attacks.

208 C HAPTER T HREE In this section we will provide a high level overview of how web data mining as well as data mining could help toward counter-terrorism. Note that we have used web data mining and data mining sort of interchangeably as our denition of web data mining goes beyond just mining structured data. We have included mining unstructured data, mining for business intelligence, web usage mining and web structure mining as part of web data mining. That is, in a way web data mining encompasses data mining. As we have stated data mining could contribute towards counter-terrorism. We are not saying that data mining will solve all our national security problems. However the ability to extract hidden patterns and trends from large quantities of data is very important for detecting and preventing terrorist attacks. The organization of this section is as follows. Section 3.4.2 provides an overview of web data mining for counter-terrorism. We will analyze the techniques in Section 3.4.3. A particular technique, called link analysis, that may be very important for counter-terrorism applications will be given more consideration in Section 3.4.4. The section is summarized in section ??.

3.4.2
3.4.2.1

Data Mining for Handling Threats


Overview

In Section 3.3 we grouped threats different ways. One grouping was whether they were based on information related or non-information related. It was somewhat articial, as we need information for all types of threats. However in our terminology, information related threats were threats dealing with computers; some of these threats were realtime threats while some others were non real-time threats. Even here the grouping was somewhat arbitrary, as a non real-time threat could become a real-time threat. For example, one could suspect that a group of terrorists will eventually perform some act of terrorism. However when we set time bounds such as a threat will likely occur say before July 1, 2003, then it becomes a real-time threat and we have to take actions immediately. If the time bounds are tighter such as a threat will occur within two days then we cannot afford to make any mistakes in our response. The purpose of this section is to examine both the non real-time threats and realtime threats and see how data mining in general and web data mining in particular could handle such threats. Again we want to stress that web data mining in our terminology encompasses data mining as it deals with data mining on the web as well as mining structured and unstructured data. Furthermore, we are assuming that much of the data will be on the web whether they be public networks such as the Internet or private networks such as corporate intranets. Therefore, we are using the terms data mining and web data mining interchangeably. In section 3.4.2.2 we discuss non real-time threats and in section 3.4.2.3 we discuss real-time threats. We will refer to the specic examples that we have mentioned in the previous section in our discussions as needed. Section 3.4.3 will examine the various data mining outcomes and techniques and see how they can help toward counter-terrorism. Some very good articles on data mining for counter-terrorism have been presented at the Security Informatics Workshop held in June 2003 (see [6]).

T HURAISINGHAM 209 3.4.2.2 Non Real-time Threats

Non real-time threats are threats that do not have to be handled in real-time. That is, there are no timing constraints for these threats. For example, we may need to collect data over months, analyze the data and then detect and/or prevent some terrorist attack, which may or may not occur. The question is how does data mining help towards such threats and attacks? As we have stressed in [14], we need good data to carry out data mining and obtain useful results. We also need to reason with incomplete data. This is the big challenge, as organizations are often not prepared to share the data. This means that the data mining tools have to make assumptions about the data belonging to other organizations. The other alternative is to carry out federated data mining under some federated administrator. For example, the Homeland security department could serve as the federated administrator and ensure that the various agencies have autonomy but at the same time collaborate when needed. Next, what data should we collect? We need to start gathering information about various people. The question is, who? Everyone in the world? This is quite impossible. Nevertheless we need to gather information about as many people as possible; because sometimes even those who seem most innocent may have ulterior motives. One possibility is to group the individuals depending on say where they come from, what they are doing, who their relatives are etc. Some people may have more suspicious backgrounds than others. If we know that someone has had a criminal record, then we need to be more vigilant about that person. Again to have complete information about people, we need to gather all kinds of information about them. This information could include information about their behavior, where they have lived, their religion and ethnic origin, their relatives and associates, their travel records etc. Yes, gathering such information is a violation to ones privacy and civil liberties. The question is what alternative do we have? By omitting information we may not have the complete picture. From a technology point of view, we need complete data not only about individuals but also about various events and entities. For example, suppose I drive a particular vehicle and information is being gathered about me. This will also include information about my vehicle, how long I have driven, do I have other hobbies or interests such as ying airplanes, have I enrolled in ight schools and asked the instructor that I would like to learn to y an airplane, but do not care learning about take-offs or landings, etc. Once the data is collected, the data has to be formatted and organized. Essentially one may need to build a warehouse to analyze the data. Data may be structured or unstructured data. Also, there will be some data that is warehoused that may not be of much use. For example, the fact that I like ice cream may not help the analysis a great deal. Therefore, we can segment the data in terms of critical data and non-critical data. Once the data is gathered and organized, the next step is to carry out mining. The question is what mining tools to use and what outcomes to nd? Do we want to nd associations or clusters? This will determine what our goal is. We may want to nd anything that is suspicious. For example, the fact that I want to learn ying without caring about take-off or landing should raise a red ag as in general one would want to take a complete course on ying. In Section 3.4.3 we discuss the various outcomes of interest to counter-terrorism activities. Once we determine the outcomes we want, we

210 C HAPTER T HREE determine the mining tools to use and start the mining process. Then comes the very hard part. How do we know that the mining results are useful? There could be false positives and false negatives. For example, the tool could incorrectly produce the result that John is planning to attack the Empire State Building on July 1, 2003. Then the law enforcement ofcials will be after John and the consequences could be disastrous. The tool could also incorrectly product the result that James is innocent when he is in fact guilty. In this case the law enforcement ofcials may not pay much attention to James. The consequence here could be disastrous also. As we have stated we need intelligent mining tools. At present we need the human specialists to work with the mining tools. If the tool states that John could be a terrorist, the specialist will have to do some more checking before arresting or detaining John. On the other hand if the tool states that James is innocent, the specialist should do some more checking in this case also. Essentially with non real-time threats, we have time to gather data, build say proles of terrorists, analyze the data and take actions. Now, a non real-time threat could become a real-time threat. That is, the data mining tool could state that there could be some potential terrorist attacks. But after a while, with some more information, the tool could state that the attacks will occur between September 10, 2001 and September 12, 2001. Then it becomes a real-time threat. The challenge will then be to nd exactly what the attack will be? Will it be an attack on the World Trade Center or will it be an attack on the Tower of London or will it be an attack on the Eiffel Tower? We need data mining tools that can continue with the reasoning as new information comes in. That is, as new information comes in, the warehouse needs to get updated and the mining tools should be dynamic and take the new data and information into consideration in the mining process. 3.4.2.3 Real-time Threats

In the previous section we discussed non real-time threats where we have time to handle the threats. In the case of real-time threats there are timing constraints. That is, such threats may occur within a certain time and therefore we need to respond to it immediately. Example of such threats are the spread of smallpox virus, chemical attacks, nuclear attacks, network intrusions, bombing of a building before 9am in the morning, etc. The question is what type of data mining techniques do we need for real-time threats? By denition, data mining works on data that has been gathered over a period of time. The goal is to analyze the data and make deductions and predict future trends. Ideally it is used as a decision support tool. However, the real-time situation is entirely different. We need to rethink the way we do data mining so that the tools can give out results in real-time. For data mining to work effectively, we need many examples and patterns. We use known patterns and historical data and then make predictions. Often for real-time data mining as well as terrorist attacks we have no prior knowledge. For example, the attack on the world trade center came as a surprise to many of us. As ordinary citizens, no way could we have imagined that the buildings would be attacked by air planes. Another good example is the recent sniper attacks in the Washington DC area. Here

T HURAISINGHAM 211 again many of us could never have imagined that the sniper would do the shootings from the trunk of a car. So the question is, how do we train the data mining tools such as neural networks without historical data? Here we need to use hypothetical data as well as simulated data. We need to work with counter-terrorism specialists and get as many examples as possible. Once we gather the examples and start training the neural networks and other data mining tools, the question is what sort of models do we build? Often the models for data mining are built before hand. These models are not dynamic. To handle real-time threats, we need the models to change dynamically. This is a big challenge. Data gathering is also a challenge for real-time data mining. In the case of non realtime data mining, we can collect data, clean data, format the data, build warehouses and then carry out mining. All these tasks may not be possible for real-time data mining as there are time constraints. Therefore, the questions are what tasks are critical and what tasks are not? Do we have time to analyze the data? Which data do we discard? How do we build proles of terrorists for real-time data mining? We need real-time data management capabilities for real-time data mining. From the pervious discussion it is clear that a lot has to be done before we can effectively carry out real-time data mining. Some have argued that there is no such thing as real-time data mining and it will be impossible to build models in real-time. Some others have argued that without real world examples and historical data we cannot do effective data mining. These arguments may be true. However our challenge is to then perhaps redene data mining and gure out ways to handle real-time threats. As we have stated, there are several situations that have to be managed in realtime. Examples are the spread of smallpox, network intrusions, and even analyzing data emanating from sensors. For example, there are surveillance cameras placed in various places such as shopping centers and in front of embassies and other public places. The data emanating from the sensors have to be analyzed in many cases in real-time to detect/prevent attacks. For example, by analyzing the data, we may nd that there are some individuals at a mall carrying bombs. Then we have to alert the law enforcement ofcials so that they can take actions. This also raises the questions of privacy and civil liberties. The questions are what alternatives do we have? Should we sacrice privacy to protect the lives of millions of people? As stated in [12] we need technologists, policy makers and lawyers to work together to come up with viable solutions. We will revisit privacy in section 3.5.

3.4.3

Analyzing the Techniques

In section 3.4.2 we discussed data mining both for non real-time threats as well as realtime threats. As we have mentioned, applying data mining for real-time threats is a major challenge. This is because the goal of data mining is to analyze data and make predictions and trends. Current tools are not capable of making the predictions and trends in real-time, although there are some real-time data mining tools emerging and some of them have been listed in [16]. The challenge is to develop models in real-time as well as get patterns and trends based on real world examples. In this section we will examine the various data mining outcomes and discuss how they could be applied for counter-terrorism. Note that the outcomes include making

212 C HAPTER T HREE associations, link analysis, forming clusters, classication and anomaly detection. The techniques that result in these outcomes are techniques based on neural networks, decisions trees, market basket analysis techniques, inductive logic programming, rough sets, link analysis based on the graph theory, and nearest neighbor techniques. As we have stated in [14], the methods used for data mining are top down reasoning where we start with a hypothesis and then determine whether the hypothesis is true or bottom up reasoning where we start with examples and then come up with a hypothesis. Let us start with association techniques. Examples of these techniques are market basket analysis techniques. The goal is to nd which items go together. For example, we may apply a data mining tool to data that has been gathered and nd that John comes from Country X and he has associated with James who has a criminal record. The tool also outputs the result that an unusually large percentage of people from Country X have performed some form of terrorist attacks. Because of the associations between John and Country X, as well as between John and James, and James and criminal records, one may need to conclude that John has to be under observation. This is an example of an association. Link analysis is closely associated with making associations. While association-rule based techniques are essentially intelligent search techniques, link analysis uses graph theoretic methods for detecting patterns. With graphs (i.e. node and links), one can follow the chain and nd links. For example A is seen with B and B is friends with C and C and D travel a lot together and D has a criminal record. The question is what conclusions can we draw about A? Link analysis is becoming a very important technique for detecting abnormal behavior. Therefore, we will discuss this technique in a little more detail in the next section. Next let us consider clustering techniques. One could analyze the data and form various clusters. For example, people with origins from country X and who belong to a certain religion may be grouped into Cluster I. People with origins from country Y and who are less than 50 years old may form another Cluster II. These clusters are formed based on their travel patterns or eating patterns or buying patterns or behavior patterns. While clustering divides the population not based on any pre-specied condition, classication divides the population based on some predened condition. The condition is found based on examples. For example, we can form a prole of a terrorist. He could have the following characteristics: Male less than 30 years of a certain religion and of a certain ethnic origin. This means all males under 30 years belonging to the same religion and the same ethnic origin will be classied into this group and could possibly be placed under observation. Another data mining outcome is anomaly detection. A good example here is learning to y an airplane without wanting to learn to takeoff or land. The general pattern is that people want to get a complete training course in ying. However there are now some individuals who want to learn ying but do not care about take off or landing. This is an anomaly. Another example is John always goes to the grocery store on Saturdays. But on Saturday October 26, 2002 he goes to a rearms store and buys a rie. This is an anomaly and may need some further analysis as to why he is going to a rearms store when he has never done so before. Is it because he is nervous after hearing about the sniper shootings or is it because he has some ulterior motive? If he is living say in the Washington DC area, then one could understand why he wants to buy a rearm, possibly to protect him. But if he is living in say Socorro, New Mexico, then

T HURAISINGHAM 213 his actions may have to be followed up further. As we have stated, all of the discussions on data mining for counter-terrorism have consequences when it comes to privacy and civil liberties. As we have mentioned repeatedly, what are our alternatives? How can we carry out data mining and at the same time preserve privacy? We revisit privacy in section 3.5.

3.4.4

Link Analysis

In this section we discuss a particular data mining technique that is especially useful for detecting abnormal patterns. This technique is link analysis. There have been many discussions in the literature on link analysis. In fact, one of the earlier books on data mining by Berry and Linoff [2] discussed link analysis in some detail. As mentioned in the previous section, link analysis uses various graph theoretic techniques. It is essentially about analyzing graphs. Note that link analysis is also used in web data mining, especially for web structure mining. With web structure mining the idea is to mine the links and extract the patterns and structures about the web. Search engines such as Google use some form of link analysis for displaying the results of a search. As mentioned in [2], the challenge in link analysis is to reduce the graphs into manageable chunks. As in the case of market basket analysis, where one needs to carry out intelligent searching by pruning unwanted results, with link analysis one needs to reduce the graphs so that the analysis is manageable and not combinatorially explosive. Therefore results in graph reduction need to be applied for the graphs that are obtained by representing the various associations. The challenge here is to nd the interesting associations and then determine how to reduce the graphs. Various graphs theoreticians are working on graph reduction problems. We need to determine how to apply the techniques to detect abnormal and suspicious behavior. Another challenge on using link analysis for counter-terrorism is reasoning with partial information. For example, agency A may have a partial graph, agency B another partial graph and agency C a third partial graph. The question is how do you nd the associations between the graphs when no agency has the complete picture? One would ague that we need a data miner that would reason under uncertainty and be able to gure out the links between the three graphs. This would be the ideal solution and the research challenge is to develop such a data miner. The other approach is to have an organization above the three agencies that will have access to the three graphs and make the links. One can think of this organization to be the Homeland security agency. In the next section as well as in some of the ensuing sections we will discuss various federated architectures for counter-terrorism. We need to conduct extensive research on link analysis as well as on other data and web data mining techniques to determine how they can be applied effectively for counter-terrorism. For example, by following the various links, one could perhaps trace say the nancing of the terrorist operations to the president of say country X. Another challenge with link analysis as well with other data mining techniques is having good data. However for the domain that we are considering much of the data could be classied. If we are to truly get the benets of the techniques we need to test with actual data. But not all of the researchers have the clearances to work on classied data. The challenge is to nd unclassied data that is a representative sample of the classied

214 C HAPTER T HREE data. It is not straightforward to do this, as one has to make sure that all classied information, even through implications, is removed. Another alternative is to nd as good data as possible in an unclassied setting for the researchers to work on. However, the researchers have to work not only with counter-terrorism experts but also with data mining specialists who have the clearances to work in classied environments. That is, the research carried out in an unclassied setting has to be transferred to a classied setting later to test the applicability of the data mining algorithms. Only then can we get the true benets of data mining.

3.5 A Note on Privacy


In section 3.4 we briey mentioned the challenges to privacy due to data mining. There has been much debate recently among the counter-terrorism experts and civil liberties unions and human rights lawyers about the privacy of individuals. That is, gathering information about people, mining information about people, conduction surveillance activities and examining say e-mail messages and phone conversations are all threats to privacy and civil liberties. However, what are the alternatives if we are to combat terrorism effectively? Today we do not have any effective solutions. Do we wait until privacy violations occur and then prosecute or do we wait until national security disasters occur and then gather information? What is more important? Protecting nations from terrorist attacks or protecting the privacy of individuals? This is one of the major challenges faced by technologists, sociologists and lawyers. That is, how can we have privacy but at the same time ensure the safety of nations? What should we be sacricing and to what extent? The challenge is to provide solutions to enhance national security but at the same time ensure privacy. There is now research at various laboratories on privacy enhanced sometimes called privacy sensitive data mining (e.g., Agrawal at IBM Almaden, Gehrke at Cornell University and Clifton at Purdue University, see for example [1,3,9]). The idea here is to continue with mining but at the same time ensure privacy as much as possible. For example, Clifton has proposed the use of the multiparty security policy approach for carrying out privacy sensitive data mining. While there is some progress we still have a long way to go. Some useful references are provided in [3] (see also [8]). An approach we are proposing is to process privacy constraints in a database management system. Note that one mines the data and extracts patterns and trends. The privacy constraints determine which patterns are private and to what extent. For example, suppose one could extract the names and healthcare records. If we have a privacy constraint that states that names and healthcare records are private then this information is not released to the general public. If the information is semi-private, then it is released to those who have a need to know. Essentially the inference controller approach we have discussed in [15] is one solution to achieving some level of privacy. It could be regarded to be a type of privacy sensitive data mining. In [13] we have proposed an approach to handle privacy constraints during query, update and database design operations. Also recently IBM Almaden Research Center is developing a similar approach to privacy management. They call their approach hypocritical databases (see [7]). Note that not all approaches to privacy enhanced data mining are the same. Re-

T HURAISINGHAM 215 searchers are taking different approaches to such data mining. Some have argued that privacy enhanced data mining may be time consuming and may not be scalable. However we need to investigate this area more before we can come up with viable solutions.

3.6 Summary and Directions


We rst provided an overview of some of the challenges for applying data mining for counter-terrorism. These include eliminating false positives and false negatives, multimedia data mining, real-time data mining and privacy. Next we discussed various threats. That is, we provided a fairly broad overview of various aspects of threats and counter-terrorism measures. First we discussed natural disasters and human errors. Then we divided the threats into various groups including non-information related terrorism, information related terrorism and biological, chemical, and nuclear threats. We also discussed critical infrastructure threats. Next we discussed counter-terrorism measures for all types of threats. For example, we need to gather information about terrorists and terrorist groups, mine the information and extract patterns. In the case of bio-terrorism, we need to prevent terrorist attacks with say with the use of sensors. Next we provided a rather broad overview of data mining for counter-terrorism. We have used the terms data mining and web data mining interchangeably. Again we can expect much of the data to be on the web, whether they be on the Internet or on corporate intranets, and therefore, mining the data sources and databases on the web to detect and prevent terrorist attacks will become a necessity. These databases could be public databases or private databases. First we discussed data mining for non real-time threats. The idea here is to gather data, build proles or terrorists, learn from examples and then detect as well as prevent attacks. The challenge here is to nd real world examples as in many cases a particular attack has not happened before. Next we discussed real-time data mining. Here the challenge is to build models in real-time. Finally we discussed data mining outcomes and techniques for counter-terrorism as well as focused on link analysis for counterterrorism. We are not counter-terrorism experts. Our discussions on counter-terrorism are based on various newspaper articles and documentaries. Our goal is to explore how data mining can be exploited for counter-terrorism. We want to raise the awareness that data mining could possibly help detect and prevent terrorist attacks. Again this area is a new area. Lot of research needs to be done. It should be noted that we also need to make sure that the data mining tools produce accurate and useful results. For example, if there are false positives, the effects could be disastrous. That is, we do not want to investigate someone who is innocent. This will raise many privacy concerns. We also do not want the data mining tools to give out false negatives. We hope that this paper will spawn interesting ideas so that researchers and practitioners start or continue to work on data mining and apply the techniques for counter-terrorism. We also provided an overview of some of the privacy concerns and discussed the directions in privacy preserving data mining and privacy constraint processing. There are many discussions now on privacy preserving approaches as we need to continue with this research and develop viable solutions that can carry out useful mining and at

216 C HAPTER T HREE the same time ensure privacy. Data mining and web data mining technologies will have a signicant impact on counter-terrorism. As we are seeing, one of the major concerns of our nation today is to detect and prevent terrorist attacks. This is also becoming the goal of many nations in the world. We need to examine the various data mining and web mining technologies and see how they can be adapted for counter-terrorism. We also need to develop special web mining techniques for counter-terrorism. As we have stressed in [14], we expect much of the data to be on the web. The web could be the Internet or Intranets. Analysts will have to collaborate via the web within an agency or between agencies. Also, the founding of the Homeland security department perhaps may have an impact on how data mining will be carried out. In addition to improving on data mining and web mining techniques and adapting them for counter-terrorism, we also need to focus of federated data mining. We can expect agencies to collaboratively work together. They will have to share the data as well as mine the data collaboratively. We can expect to see an increased interest in federated data mining. In this paper we have discussed just the high level ideas. We need to explore the details. Some other areas of interest include multilingual data mining. Terrorism is not conned to one country and it has no borders. There is terrorism everywhere and carried out by people from different countries speaking different languages. We need technologies to understand the various languages as well as mine the text in different languages. We also need translators to translate one language to another before mining. We also need language experts to work with technologists for multilingual data management and mining. Note that terrorists may come from different countries and speak different languages. We need to understand their language without any ambiguity. As we have stressed in [14], we cannot forget about privacy. National security measures will mean violating privacy and civil liberties. We cannot abandon our quest for eliminating terrorism. However, we also have to be sensitive to the privacy of individuals. This will be a major challenge. We need to develop techniques for privacy sensitive data sharing and data mining. Disclaimer: The views and conclusions expressed in this paper are those of the author and do not reect the policies or procedures of the MITRE Corporation or of the National Science Foundation.

Bibliography
[1] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD Conference, Dallas, TX, May 2000. [2] M. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales, and Customer Support. John Wiley, New York, 1997. [3] C. Clifton, M. Kantarcioglu, and J. Vaidya. Dening privacy for data mining. Technical report, Purdue University, 2002. (see also Next Generation Data Mining Workshop, Baltimore, MD, November 2002). [4] H. Ellison. Handbook of Chemical and Biological Warfare Agents. CRC Press, 1999. [5] F. Bolz et al. The Counterterrorism Handbook: Tactics, Procedures, and Techniques. CRC Press, 2001. [6] H. Chen et al. In Proceedings of the 1st Conference on Security Informatics, Tucson, AZ, June 2003. [7] R. Agrawal et. al. Hypocritical databases. In Proceedings of VLDB, 2002. [8] A. Evmievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, July 2002. [9] J. Gehrke. Research problems in data stream processing and privacy-preserving data mining. In Proceedings of the Next Generation Data Mining Workshop, Baltimore, MD, November 2002. [10] A. Ghosh. Ecommerce Security, Weak Links and Strong Defenses. John Wiley, New York, 1998. [11] B. Thuraisingham. Managing threats to web databases and cyber systems: Issues, solutions and challenges. In V. Kumar et al, editor, Cyber Security: Threats and Countermeasures. Kluwer. [12] B. Thuraisingham. Data mining, national security, privacy and civil liberties. SIGKDD Explorations, January 2003. 217

218 C HAPTER T HREE [13] B. Thuraisingham. Privacy constraint processing in a database management system. (accepted to be published) Data and Knowledge Engineering Journal, 2003. [14] B. Thuraisingham. Web Data Mining Technologies and Their Applications in Business Intelligence and Counter-terrorism. CRC Press, June 2003. [15] B. Thuraisingham and W. Ford. Security constraint processing in a multilevel distributed database management system. IEEE Transactions on Knowledge and Data Engineering, April 1995. [16] http://www.kdnuggets.com.

You might also like