Foundation of Semantic Rule Engine to Protect Web Application Attacks

Post on 22-Mar-2023

0 views 0 download

Transcript of Foundation of Semantic Rule Engine to Protect Web Application Attacks

Foundation of Semantic Rule Engine to Protect Web Application Attacks

Abdul Razzaq1, Ali Hur1, Muddassar Masood1, Khalid Latif1, H Farooq Ahmad1,3,Hironao Takahashi2,3

1School of Electrical Engineering and Computer Science (SEECS) National University of Sciences and Technology, Islamabad, Pakistan

2Department of Computer Science, Tokyo Institute of Technology 2-12-1 Ookayama Meguro, Tokyo, 152-8522, Japan

3DTS Inc, Daiichjisyo 5F, 3-39-5 Higashiueno, Taitou-ku, Tokyo 110-0015 JAPAN {abdul.razzaq, Ali.hur, muddassar.masood, khalid.latif, farooq.ahmad}@seecs.edu.pk, hiro@dts-1.com

Abstract The exponentially increasing cyber threats with the expansion of web applications have become the biggest security concern for e-business and information sharing communities. Current survey shows that application layer is more prone to web attacks. Recent survey carried out by Nation Vulnerability Database shows that, on average 15 new vulnerabilities are released per day, thus proved that existing application security mechanisms are ineffective to provide complete security solution. We have proposed an intelligent intrusion detection system (IDS) base on ontology that specifying the different categories of attacks, different encoding schemes used by the hacker, location of attack, system component affected by attack, specification of protocols used and policies/rules for mitigating these attacks. The proposed ontology base system can be refined and expanded over time. The system semantically analyzes the specific field of payload and headers where attack is possible. Inference ability of the system provide the capability for detecting the zero day and complex web application attacks that easily eludes packet level inspection. Proposed system is time efficient by analyzing the specified field of protocol, would be able to provide significant search space reduction as well as low false positive rate. For describing the semantic concepts, Protégé tool is used. OWL-DL used for describing logical class with restrictions. For consistency and inference purpose, Pallet tool is used as inference engine and rules are specified by using Jena API which also provide the reasoning ability. Keywords: semantic security, web application security, ontology security

I. INTRODUCTION The Web applications security has become increasingly important in the last decade and becomes hottest issue due to exponentially increase in electronic communication to billions of users globally through diverse range of applications. A security

assessment by the Application Defense Center, which included more than 250 Web applications from e-commerce, online banking, enterprise collaboration, and supply chain management sites, concluded that at least 92% of Web applications are vulnerable to some form of attack [1]. Another survey found that about 75% of all attacks against web servers, target web-based applications [2]. Web application attacks especially SQL injection and cross-site scripting are two of the most common security vulnerabilities that plague web applications nowadays [3]. On April 24, 2008 hundreds of thousands of Microsoft Web Servers hacked, including several at the United Nations and in the U.K. government through exploitation the vulnerabilities of IIS [4]. According to The National Vulnerability Database (NVD) there are over 18,500 vulnerabilities in the web based applications which include 2,147 cross-site scripting (XSS), 2,757 buffer overflow and 1,600 SQL injection vulnerabilities [5]. Unchecked input validation is the major source of attacks at web application level. According to the Open Web Application Security Project (OWASP) [6], four of the top ten vulnerabilities for web applications having input validation problems. The vulnerability caused by unchecked input would lead the hacker to inject malicious code to bypass or modify the originally intended functionality of the program to gain information, privilege escalation or unauthorized access to a system. For example, in XSS attack, user is deceived to click on the link pointed to trusted site e.g., http://www.citibank.com/account.html, but it contains malicious script by hacker http://www.citibank.com <script>http://www.evil.compayload?c='+document. cookie </script>, which send the cookies information to hacker site. Similarly hacker get the root privileges through accessing the bin directory using Shell, chmod, ps or other UNIX commands e.g. http://host/cgi-

2011 10th International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

2011 10th International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

2011 Tenth International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

2011 Tenth International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

2011 Tenth International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

2011 Tenth International Symposium on Autonomous Decentralized Systems

978-0-7695-4349-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISADS.2011.19

95

bin/bad.cgi? Some time hacker use ∼ character through request for guessing attempt for valid user, e.g. http//xxx.xxx.xxx.xxx./ ∼ root. In SQL Injection attack, query is manipulated like, SELECT * FROM users WHERE username = ‘john’ AND password = ‘anything’ OR ‘x’ = ‘x’; the expression ‘x’ = ‘x’ is always true and resultantly return all tuples from USERS table. Similarly hacker can use DELET, DROP or UNION commands through malicious input. Traditional security solution like web scanners provides the first line of defense against web attacks and detects the "well-known” security flaws those have signatures. Scanners lack semantics thus unable to make intelligent decision upon data leakage or business logic flaws [7], result in false alarm, and fail to detect novel and critical vulnerabilities [8]. Signature base solution usually maintains the white list (positive security model) and black list (negative security model) which contains the signature of benign inputs and signature of malicious attack vectors respectively. Theses lists needed updating of signature, lack of detection zero day attacks and generated the false positive and false negative alarms [9]. Both positive and negative security models have limitations in terms of configuration, and tuning, learning capabilities [27]. Furthermore, most of the network solution ignore the payload and scan only the header of request. So due to the lack of contextual nature, network solution are ineffective to mitigate the application level attacks. So there is need of semantic system which can intelligently understand the application’s context, the data and contextual nature of attacks. System should validate the input syntactically and semantically. Syntax based validation provide the size or content restrictions whereas semantic based validation may focus on specific data type, specific format and understanding potentially dangerous and malicious commands or content with respect to their context and consequences. Our system semantically analyze the header and payload of specific protocol used. As in present scenario almost 90% of network traffic targeted HTTP i.e. port 80, so our system mainly focus on headers and the payload checking for attacks. The payload presented being HTML or XHTML. Ontology driven software system are capable to show a shared understanding of structured information about the concepts within specific domain and provide the reasoning and greater ability to analyze the information automatically. Ontology also specifying the various semantic relationships among different concepts, mitigating the interoperability issue and being reused and evolve overtime. Our System store the concept of different entities like protocol, data,

attacks, there relationship in the form of ontologies, give us important reasoning that improve detection ability and also make the system more efficient and robust. Specific rules are generated by capturing the context of the domain and the relationship between the entities. Our data model is implemented with an ontology representation language such as the Resource Description Framework Schema (RDFS) and OWL-DL. Protégé tool is implemented for structuring ontologies and Pallet inference engine used for facilitate the inference process and Java base Jena API is used for ruling and reasoning purpose. For prototype purpose we use the system development environment as JAVA Net Bean.

II. RELATED WORK Various security mechanisms have been proposed for making web application secure. The characteristics of the best known techniques and their weaknesses are briefly discussed. Anomaly base IDS

Traditional intrusion detection systems are anomaly or signature base which are ineffective to provide fool proof solution to application level attacks. Anomaly based system analyze the input stream against establish profile and classify all abnormal behavior as malicious [10]. Anomaly model propose in [11] for the security of database server against the SQL injection attacks. Poorly trained model reduce the detection performance. Application level IDS System effective in detection novel attacks but are less efficient due to high rate of false positive [11].

Signatures based IDS Misuse or Signatures based system having the signature of already known attacks or vulnerabilities and apply the raw pattern matching algorithms [12] for attack detection. Signature base detection performed at network level and at application level by analyzing network traffic [13] and server logs [14] respectively. But these signatures are ineffective to mitigate zero day attacks and system easily evaded through polymorphic nature of these attacks. Due to constant increase in the variety of attacks, signature base IDS face the challenge of constantly increase in signature rules and updating the database which is hectic and time consuming task. Snort having more than 2500 signature rules [15]. Network IDS also have limitation to work on encrypted communications on the web [16]. Context based Application IDS [17]

969696969696

face a problem of creating the signature manually and lack protection against zero day attacks. Data mining or Statistical base IDS Data Mining Methods for Anomaly Detection [18] provide the framework for web application attacks based on the statistical techniques but framework lack semantic to analyze the malicious payload on contextual base, thus fails by assigning equal probabilities to the both equal length attack string and benign string. Network base IDS[19] and [20] uses the data mining techniques but these technique cater the character frequency and its occurrence probabilities in malicious data and lack semantic to understand the contextual nature of attack and its consequences. Ontologies base IDS Ontologies base IDS solutions are used in information security. Raskin et al[21] developed the ontology for data integrity of web recourses and Denker et al [22] drive the control access through ontology developed in DAML+OIL[23] but these ontologies has not been fully utilize due to simple representation of attack attributes thus inefficient for intrusion detection. In [28] a better approach developed through ontology, for grasping the domain knowledge of application. Similarly [24, 25] adopted the good approach but carrying overhead due to lack of search space reduction. Solution provided in general form and ignoring the most important top web application level attacks like XSS and SQL Injection attacks. Our system reused and modify the [24] taxonomy for developing the comprehensive ontology of attack.

III. SYSTEM ARCHITECTURE The security mechanism would be act like a proxy deployed in front of the web server for the protection of target application. User request would be analyze by proposed system, deployed on web server, before delivering to the web application as shown in Figure 1.

Proposed System

Web Server DatabaseClient : NewClass

requestrequest

request

responseresponse

response

Figure.1. How the IDS work The system is composed of following modules: System Analyzer, Knowledge Base, the Ontology generator, Inference engine and Rule engine. A high level view of system architecture is given in Figure. 2. The brief description of the component is given in following subsections. Ontology generator This Module of the system comprises on two layers: layer of Conceptualization of Domain and Ontology Layer. All concepts (classes) are defined using Protégé API with the help of Protégé GUI. Concepts can be represented in RDF (recourse description framework) format. Using the OWL (web ontology language) plug-in the RDF format can be represented in OWL-DL which more compatible with machine langue for processing. Ontology file is stored with .owl extension which is accessible in Java platform through Jena API. Inference Engine The inference engines observe the inconsistency in different ontologies in knowledge base and make the knowledge base consistence and updated. It also provides the classification hierarchies and computes

979797979797

Kno

wle

dge

Bas

e

Inpu

t Net

wor

k S

tream

Figure.2. System architecture

the inferences among the concepts which are not explicitly mention in the domain ontology. For example the referrer field that is the part of the header of HTTP request is not usually checked for malicious script but system’s inference engine by traversing through the relationship will also check the referrer field as shown in Figure.3. This mechanism clearly shows how our system intelligently analyzes the packet for possible attacks. We have use the Pallet inference engine for the implementation purpose. Rule Engine Rule engine layer generates appropriate rules after inference upon the Knowledge base for specific request .It processes the assertions and rules and also deduces new rules and statements. For implementation purpose we have used the Java base Jena rule engine.

Figure.3. Inference on Referrer field containing URL, Query string and malicious code

Systems Analyzer The Analyzer is the most important component of our system which analyzes the user input for suspected malicious contents based on information stored in the knowledge base. Analyzer work on the following Algorithm:

Analysis of captured network stream Filter out the HTTP packets Parsed the message on the basis of HTTP

ontology stored in the knowledge base Perform Message-Header checking

o Read the url in Start line and in Referrer field

o Check only the Query String portion(portion after ? mark)

o Analyze the Values of the Parameter for malicious script

o Generate Alert in case of malicious value

Perform Message-Body checking o Read the url, Form entry points, and

HTML tags o Check only the Query String

portion (portion after ? mark) o Analyze the Values of the

Parameter in form fields for malicious script

o Analyze the href, image, input, anchor and src tags

o Generate Alert in case of malicious value

If no malicious activity found, packet would be delivered to web Application for the retrial of requested data.

The Analyzer accepts the network stream, captures each HTTP message, parsed the message on the basis of HTTP ontology stored in the knowledge base. Each HTTP message has two sections i.e., header and payload. The headers section contains zero or more headers. These headers contain information regarding the payload and control information. The Analyzer interacts with the inference engine in order to analyze captured payload. If the packet is found benign then it will be allowed otherwise blocked and proper error message would be displayed to either client or the administrator. Messages are analyzed for malicious content. Each part of the message can participates in identifying different attacks. Each section is analyzed only for the subset of attacks and hence it substantially reduces search space that ultimately result in efficient working of IDS. The payload is also parsed according to the tags and each tag is then further analyzed for possible malicious content. Here

989898989898

again the intelligent analysis helps in efficient processing of the payload by only processing those parts of payload that have the possibility of carrying malicious content considering specific attack.

Figure.4. Working of Analyzer

System overall analyze the input semantically and syntactically as mention in Policy ontology. It also check the hidden field values, URI of each page, Method attribute like GET or POST, Action attribute indicating proper destination uri, Href tag for malicious script, Image tags will be analyzed for .jpg or .gif extension and Input fields for all the information given in the input parameter tags like size, name, maxLength, their type like checkbox, radio button, hidden field etc. Knowledge Base The knowledge base contains knowledge about the different categories of attacks, different encoding schemes used by the hacker, location of attack, system component affected by attack, specification of protocols used and policies/rules for mitigating these attacks in shape of ontology which can be refined and expanded over time. Issues like knowledge acquisition problem and verification and validation of knowledge has been considered. An ontological representation of knowledge provides many benefits over simple string matching techniques and mitigates the attack through reason and intelligent decision.

The Knowledge base contains the top most level class System having property victimOf which is defined by class Attack as shown in Figure.5. The class Attack has the properties, receivedFrom, directedTo, categoriesBy, resultingIn, using, deceivedBy and mitigatedBy , which are defined by classes Port Number, System Components, Attack Type, Consequence, Protocol, Encoding Scheme and Polices respectively. These classes are further sub classified, only important one has been discussed. Class Attack Type is further sub classified as Traceable attack& Untraceable attack. The subclass Traceable attack is further sub classified as Known software implementation error, Http specification error and Miss-configuration error. Each these subclasses further sub classified, for example Known software implementation error class having classes Race condition exploit, Heap overflow exploit, Stack over flow exploit and Format string exploit.

Figure.5. Ontology of Attack

Untraceable attack class is our main focus which is sub classified into Forgery of client code, DOS and Malicious Input ( input validation attack).The subclasses of Forgery of client code and Malicious Input are shown in Figure.6 . Similarly Ontology for Encoding scheme having sub classes can represented as Encoding scheme (UTF-8, UTF-16, ASCII, Base 16, Hex ), similarly sub class of Consequences (Assess Sensitive Data, Denial of Service, Authentication by Pass (Remote to local), Authorization by Pass (User to root), Probe, Crash server/application), sub classes of System Component (Browser, Web Server, Server side script preprocessor, Database). Method has subclasses (Get, Post, Delete, Put, Trace). The Sub classes of Protocol, HTTP Message Structure, URL and Policies are shown in Figures.7, 8, and 9 respectively.

999999999999

The logical construct about the attack can be express as: Attack (SQL Injection) is Received from Port Number (80), Directed to System Component (Database), Using the Protocol (Http), Deceived by Encoding Scheme (UTF-8), Resulting in Consequence (Authentication by pass), Categories into Untraceable attack (malicious input) and Mitigated by Policy (Rule1). Web applications normally used HTTP protocol as communication protocol so the System would have the knowledge of the specific attributes related to HTTP protocol. HTTP request composed of headers and payload. HTTP payload is specified in web presentation language i.e. HTML. The relationship between the HTTP and HTML is used in the context of target application that helps in identifying attacks more efficiently and enhances detection ability.

Figure.6. Ontology of malicious Input

Figure.7. Ontology of Protocol

Figure.8. Ontology of Http and URL

Figure.9. Ontology of Policies and Rule class

IV. EVALUATION System is very efficient by analysing and searching attack pattren in specific portion of packets rather than cosidering the entire payload. Only focus on specific portion of input where attack or exploit is possible would reduce the space reduction and save the procsseing time.For example in url it focus on paramereter value of query string. Secondly the whole ontology of attack whick comprises of 1000 of classes can be inferenced with limited time. For example Palllet inference engine observer the inconsitency in 0.063 seconds, classes hierarchy can be establish in 0.234 seconds and compute inference in 0.016 seconds as shown in Figure.10.

100100100100100100

Figure.10.Inferencing through Pallet

Thirdly we have focused on HTTP protocol which is widly deployed on web (more than 80% traffic based on HTTP protocol), thus more attacks and scurity threats can be considered and mitigated.

Figure.11. show the maximum usage of http protocol on the web

Focusing only on HTTP also reduce the response time as compared with considering whole network stream. Blow line in Figure. 12 indicate response time of Http traffic

Figure.12. Traffic Analysis

We have the following assumption as creteria mention in[26]while calculating the detection rate and false alarm rate :

False Positive = FP: the total number of records that are classified as anomalous False Negative = FN: the total number of anomalous records that are classified as normal Total #Noramal = TN: the total number of normal records Total #Attack = TA: the total number of attack records Detection Rate = [(TA-FN)/TA]*100 False Alarm Rate = [FP/TN]*100

Figure.13. System shows the False Alarm Rate and Detection Rate of different Attacks

020406080

100120140160180200

XSS

SQL Injecti

on

Buffer

overflo

w

Directo

ry tra

versa

l

Form ta

mperin

gWorm

DOS

Attack Type

Det

ectio

n R

ate

0

20

40

60

80

100

120

#Normal Record#Attack RecordFalse PositiveFalse NegativeDetection RateFalse Alarm Rate

Figure.14. System shows negligible False Alarm Rate and maximum Detection Rate

V. CONCLUSIONS AND FUTURE WORK In this paper we have presented the brief overview of various security techniques. Due to increasing security concern for web applications, future survival of e-business organizations depends on the effective security measures at application level. We have critically studied the existing techniques and figured out that, a semantic based intrusion detection system capable of making intelligent decision based on the context of the target domain is required. The propose system using ontological representation of the target

101101101101101101

domain. It contain the knowledge of the attacks, the protocol, the data and the target application that would enhance detection capability as well as result in efficient working of intrusion detection system. OWL-DL, Pallet inference engine and Jena API have been used for system implementation. In future work intentions are to convert the whole system into shape of product that would be an open source for the whole community, for security protection and further security development.

VI. REFERENCES

[1] WebCohort, Inc. “Only 10% of Web applications are secured against common hacking techniques”, http:/ /2004-feb-02.html, 2004.

[2] G. Hulme. “New software may improve application security”. http://www.Information week.com, 2001.

[3] Monica S. Lam, Michael Martin, Benjamin Livshits, and John Whaley. “Securing Web Applications with Static and Dynamic Information Flow Tracking” PEPM’08, January 7–8, 2008, San Francisco, California, USA.

[4] http://www.shadowserver.org/wiki/pmwiki.php ?n=Calendar.20080424

[5] National Vulnerability Database (NVD), http://nvd.nist.gov. [6] Open Web Application Security Project. “The ten most

critical Web application security vulnerabilities”, http://umn.dl.sourceforge.net/ sourceforge/owasp/OWASPTopTen2004.pdf.

[7] Analyzing the Effectiveness and Coverage of Web Application Security Scanners 2007 http://searchsecurity.techtarget.com/tip.

[8] By Jeff Forristal and Greg Shipley. “Vulnerability Assesment Scanners”. http://www.network computing.com

[9] Scott Matsumoto. “Mitigating XSS-Why Input Validation is Bogus”, http://www.cigital.com /justiceleague /2007/08/10/mitigating -xss- why-input-validation-is-bogus

[10] C.Kruegel, T.Toth, and E.Kirda. ”Service-specific Anomaly Detection for Network Intrusion Detection”.

[11] Frank S. Rietta. “Application Layer Intrusion Detection for SQL Injection”. ACM SE’06 March 10 12, 2006, Melbourne, Florida, USA.

[12] R. Boyer and J. Moore, “A fast string- searching algorithm”, Commune ACM, 1977.

[13] R.Liu, N.Huang, C.Kao, C.Chen, C.Chou.”A Fast Pattern-Match Engine for Network Processor-based Network Intrusion Detection System”. Proceedings of the International Conference on Information Technology:, IEEE, Vol.1, pp.97 – 101.

[14] T.Ryutov, C.Neuman, D.Kim, L.Zhou, “Integrated Access Control and Intrusion Detection for Web Servers”. In IEEE transactions on parallel and distributed systems.Vol.14, No. 9, September 2003.

[15] M.Roesch. “Snort – Lightweight Intrusion Detection for Networks”. Proceedings of the USENIX LISA’99 Conference, November 1999.

[16] Xin Zhao and Atul Prakash. WSF: An HTTP-level Firewall for Hardening Web Servers.

[17] A. Anitha and V. Vaidehi. Context based Application Level Intrusion Detection. IEEE: International conference on Networking and Services (ICNS’06) 2006.

[18] Xiao-Feng Wang, Jing-Li Zhou, Sheng-Sheng Yu, and Long-Zheng Cai. “Data Mining Methods for Anomaly Detection of HTTP Request Exploitations,” Springer-Verlag Berlin Heidelberg 2005.

[19] Y.B. Reddyl, R. Guha, “Intrusion Detection using Data Mining Techniques,”Artificial Intelligence and Applications (AIA-2004), pp. 232-241, 2004.

[20] S. Stolfo, A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan, P.K. Chan, “JAM:Java Agents for Meta-Learning over Distributed Databases,” Proceeding of KDD-97, pp. 74-81,1997.

[21] V. Raskin, C.F. Hempelmann, K.E. Triezenberg, Nirenburg, “Ontology in Information Security: A Useful Theoretical Foundation and Methodological Tool,” Proceedings of the 2001 Workshop on New Security Paradigms (NSPW-2001), pp. 53-59, 2001.

[22] G. Denker, L. Kagal, T. Finin, M. Paolucci, K. Sycara, ”Security for DAML Web Services: Annotation and Matchmaking,” The Semantic Web (ISWC 2003), LNCS 2870, Springer, , 2003.

[23] DAML+OIL.Availableat:http://www.daml.org/ 2000/12/daml+oil.dam.

[24] J. Undercoffer, J., Pinkston, A. Joshi, T. Finin, “Target-Centric Ontology for Intrusion Detection,” IJCAI Workshop on Ontologies and Distributed Systems (IJCAI'03), August, 2003.

[25] Shao-Shin Hung and Damon Shing-Min Lio:A User-centric Intrusion Detection System by using Ontology Approach. 2006 conf/jcis/2006 JCIS http://dx.doi.org/10.2991/jcis.2006.

[26] S.T. Sarasamma, Q.A. Zhu, and J. Huff, „Hierarchel Kohonenen Net for Anomaly Detection in Network Security“, IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, 35(2) ,2005,pp.302-312.

[27] Web Application Firewalls: Defense in Depth for Your Web Infrastructure, By Jim Beechey - March 2009

[28] Abdul Razzaq, Ali Hur, Hafiz Farooq Ahmad, Nasir Haider “Ontology Based Application Level Intrusion Detection System by using Bayesian Filter” The 2nd IEEE International Conference on Computer, Control & Communication (IEEE-IC4) 2009, PNEC Karachi, Pakistan.

102102102102102102