UTBox is a set of building blocks for Splunk specially created for URL manipulation.
UTBox has been created to be modular, easy to use and easy to deploy in any Splunk environments. It only needs to be deployed on Splunk Search Heads and the bundles will automatically be sent to your Splunk Indexers.
One of the core feature of UTBox is to correctly parse URLs and complicated TLDs (Top Level Domain) using the Mozilla Suffix List. Other functions like shannon entropy, counting, suites, meaning ratio, bayesian analysis, etc, are also available.
UTBox has firstly be created for security analysts but may fit other needs as it's a set of building blocks. Enterprise Security users will need to modify the import statement to use UTBox.
You should also take a look at URLParser for efficient URL parsing: https://splunkbase.splunk.com/app/3396/
🐞 For assistance, create issue on: https://github.com/splunk/utbox/issues/new
Maintainer: GSS FDSE @ Splunk
Code Commiters: FDSE, Daniel, Mayur, Cedric, and Ian.
Documentation
This tool has an embeded documentation located after installation in $SPLUNK_HOME/etc/apps/utbox/appserver/static/documentation.pdf
What is what ?
The syntax of a URL is as follow:
scheme://[user:password@]domain:port/path?query_string#fragment_id
Component details:
- The scheme, which in many cases is the name of a protocol (but not always), defines how the resource will be obtained. Examples include http, https, ftp, file and many others.
- The domain name or literal numeric IP address gives the destination location for the URL.
- The port number, given in decimal, is optional; if omitted, the default for the scheme is used (80 for http, 443 for https, etc).
- The path is used to specify and perhaps find the resource requested.
- The query string contains data to be passed to software running on the server. It may contain name/value pairs separated by ampersands, for example ?first_name=John&last_name=Doe.
- The fragment identifier, if present, specifies a part or a position within the overall resource or document. When used with HTML, it usually specifies a section or location within the page, and used in combination with Anchor elements or the "id" attribute of an element, the browser is scrolled to display that part of the page.
Source: http://en.wikipedia.org/wiki/Uniform_resource_locator
List of provided Lookups
For more information, please refer to the embeded documentation.
- ut_parse_simple(url)
- ut_parse(url, list) or ut_parse_extended(url, list)
- ut_shannon(word)
- ut_countset(word, set)
- ut_suites(word, sets)
- ut_meaning(word)
- ut_bayesian(word)
- ut_levenshtein(word1, word2)
Lookup & Macros
A generic lookup call in Splunk is of the format:
... | lookup <lookup_name> field AS field
For example:
... | lookup ut_parse_simple_lookup url AS cs_uri
UTBox also provides macros definition for each lookup to make it easier to call the lookups. In the previous example, the call would be:
... | `ut_parse_simple(cs_uri)`
It is important to understand that those macros are simply shortcuts to lookups call. One can use one or another depending on their tastes.
History
- v1.6, April 2016
- new feature: the list parameter now accept a star (*) to load all lists (Mozilla, IANA, and Custom) in order to return the longest matching TLD.
- Thanks to @davelugo for the idea!
- v1.5, December 2015 Important changes
- new feature: users can choose which list of TLD to load (2 provided by default, Mozilla Suffix List and IANA List)
- ut_parse_extended requires now 2 arguments (url to parse and the list to use, ‘mozilla’, ‘iana' or 'custom')
- ut_parse, mapped to ut_parse_extended requires the same.
- v1.4, November 2015
- Fix incorrect parsing for hosts having a port specified (ex: tcp://host.tld:443/)
- v1.3, September 2015
- Hosts are no more lowered (usefull when dealing with Base64 encoded data).
- v1.2, May 2015
- v1.1, May 2015