Log File Categorization and Anomaly Analysis Using Grammar Inference
Memon, Ahmed Umar
MetadataShow full item record
In the information age of today, vast amounts of sensitive and confidential data is exchanged over an array of different mediums. Accompanied with this phenomenon is a comparable increase in the number and types of attacks to acquire this information. Information security and data consistency have hence, become quintessentially important. Log file analysis has proven to be a good defense mechanism as logs provide an accessible record of network activities in the form of server generated messages. However, manual analysis is tedious and prohibitively time consuming. Traditional log analysis techniques, based on pattern matching and data mining approaches, are ad hoc and cannot readily adapt to different kinds of log files. The goal of this research is to explore the use of grammar inference for log file analysis in order to build a more adaptive, flexible and generic method for message categorization, anomaly detection and reporting. The grammar inference process employs robust parsing, islands grammars and source transformation techniques. We test the system by using three different kinds of log file training sets as input and infer a grammar and generate message categories for each set. We detect anomalous messages in new log files using the inferred grammar as a catalog of valid traces and present a reporting program to extract the instances of specified message categories from the log files.