NLP Text Analytics

Overview

Details

Have you ever wanted to perform advanced text analytics inside Splunk? Splunk has some ways to handle text but also lacks some more advanced features that NLP libraries can offer. This can also benefit use-cases that involve using Splunk’s Machine Learning Toolkit (https://splunkbase.splunk.com/app/2890/). The intent of this app is to provide a simple interface for analyzing text in Splunk using python natural language processing libraries (currently just NLTK 3.4.5). The app provides custom commands and dashboards to show how to use.

See related Splunk blog https://www.splunk.com/blog/2019/04/11/let-s-talk-about-text-baby.html

The intent of this app is to provide a simple interface for analyzing text in Splunk using python natural language processing libraries (currently just NLTK 3.4.5) and Splunk's Machine Learning Toolkit. The app provides custom commands and dashboards to show how to use.

Version: 1.1.0

Author: Nathan Worsham
Created for MSDS692 Data Science Practicum I at Regis University, 2018
See associated blog for detailed information on the project creation.

Update
Additional content (combined features algorithms) created for MSDS696 Data Science Practicum II at Regis University, 2018
See associated blog for detailed information on the project creation and associated Splunk blog.
This app was part of the basis for a breakout session at Splunk Conf18 I was lucky enough to present at--Extending Splunk MLTK using GitHub Community.
Session Slides
Session Recording

Description and Use-cases

Requirements

Splunk ML Toolkit 3.2 or greater https://splunkbase.splunk.com/app/2890/
Wordcloud Custom Visualization https://splunkbase.splunk.com/app/3212/ (preferred) OR Splunk Dashboard Examples https://splunkbase.splunk.com/app/1603/
Parallel Coordinates Custom Visualization https://splunkbase.splunk.com/app/3137/
Force Directed App For Splunk https://splunkbase.splunk.com/app/3767/
Halo - Custom Visualization https://splunkbase.splunk.com/app/3514/
Sankey Diagram - Custom Visualization https://splunkbase.splunk.com/app/3112/

How to use

Install

Normal app installation can be followed from https://docs.splunk.com/Documentation/AddOns/released/Overview/AboutSplunkadd-ons. Essentially download app and install from Web UI or extract file in $SPLUNK_HOME/etc/apps folder.

Example Texts

The app comes with example Gutenberg texts formatted as CSV lookups along with the popular "20 newsgroups" dataset. Load them with the syntax | inputlookup <filename.csv>

Text Names
20newsgroups.csv
moby_dick.csv
peter_pan.csv
pride_prejudice.csv

Detailed Documentation

Documenation for the app and will be kept upto date on Github, due to character limits of splunkbase it is recommend to view documentation on Github.

Summary Documentation

Custom Commands

bs4

Description

A wrapper for BeautifulSoup4 to extract html/xml tags and text from them to use in Splunk. A wrapper script to bring some functionality from BeautifulSoup to Splunk. Default is to get the text and send it to a new field 'get_text', otherwise the selection is returned in a field named 'soup'. Default is to use the 'lxml' parser, though you can specify others, 'html5lib' is not currently included. The find methods can be used in conjuction, their order of operation is find > find_all > find_child > find children. Each option has a similar named option appended '_attrs' that will accept inner and outer quoted key:value pairs for more precise selections.

Syntax

*| bs4 textfield=<field> [get_text=<bool>] [get_text_label=<string>] [get_attr=<attribute_name_string>] [parser=<string>] [find=<tag>] [find_attrs=<quoted_key:value_pairs>] [find_all=<tag>] [find_all_attrs=<quoted_key:value_pairs>] [find_child=<tag>] [find_child_attrs=<quoted_key:value_pairs>] [find_children=<tag>] [find_children_attrs=<quoted_key:value_pairs>]

cleantext

Description

Tokenize and normalize text (remove punctuation, digits, change to base_word). Different options result in better and slower cleaning. base_type="lemma_pos" being the slowest option, base_type="lemma" assumes every word is a noun, which is faster but still results in decent lemmatization. Many fields have a default already set, textfield is only required field. By default results in a multi-valued field which is ready for used with stats count by. Optionally return special fields for analysis--pos_tags and ngrams.

Syntax

*| cleantext textfield=<field> [keep_orig=<bool>] [default_clean=<bool>] [remove_urls=<bool>] [remove_stopwords=<bool>] [base_word=<bool>] [base_type=<string>] [mv=<bool>] [force_nltk_tokenize=<bool>] [pos_tagset=<string>] [custom_stopwords=<comma_separated_string_list>] [term_min_len=<int>] [ngram_range=<int>-<int>] [ngram_mix=<bool>]

similarity

Description

A wrapper for NTLK distance metrics for comparing text to use in Splunk. Similarity (and distance) metrics can be used to tell how far apart to pieces of text are and in some algorithms return also the number of steps to make the text the same. These do not extract meaning, but are often used in text analytics to discover plagurism, conduct fuzzy searching, spell checking, and more. Defaults to using the Levenshtein distance algorithm but includes several other algorithms (Damerau-Levenshtein, Jaro, Jaro-Winkler), including some set based algorithms (Jaccard, MASI). Can handle multi-valued comparisons with an option to limit to a given number of top matches. Multi-valued output can be zipped together or returned seperately.

Syntax

*| similarity textfield=<field> comparefield=<field> [algo=<string>] [limit=<int>] [mvzip=<bool>]

vader

Description

Sentiment analysis using Valence Aware Dictionary and sEntiment Reasoner. Using option full_output will return scores for neutral, positive, and negative which are the scores that make up the compound score (that is just returned as the field "sentiment". Best to feed in uncleaned data as it takes into account capitalization and punctuation.

Syntax

* | vader textfield=sentence [full_output=<bool>]

ML Algorithms

TruncantedSVD

Description

Release Notes

Version 1.1.4

Aug. 4, 2022

Added language support for cleantext command (more than just English now (thank you Paul-Alexandre Fourrière!), but note that the sentiment command still only supports English). Minor UI updates for 9.0 compatibility.

Version 1.1.3

Jan. 24, 2022

Fixes for Splunk Cloud. Fix LinearSVC and MinMaxScalar algorithms to work with 5.3.x MLTK. Change heights for various panels that need adjustment for 8.2

Version 1.1.2

June 23, 2021

Upgraded splunklib to 1.6.16. Updated to local jquery 3.6.0 for Splunk 8.2 compatibility.

Version 1.1.1

April 13, 2021

Upgraded splunklib to 1.6.15 to fix several known issues including Python 3 compatibility with multibyte characters (https://github.com/splunk/splunk-sdk-python/issues/290#issuecomment-638359587) and SPL-194426(https://docs.splunk.com/Documentation/Splunk/8.1.3/ReleaseNotes/Knownissues#Search_issues). Updated clustering dashboard to allow showing original text from the cluster.

4,287

Downloads

Share Subscribe

Version

Built by

Nathan Worsham

Support

Developer Supported

Contact Developer Questions on Splunk Answers Flag as inappropriate

Compatibility

This version of the app (1.1.2) is not available for Splunk Cloud. However, version 1.1.4 of this app is available for Splunk Cloud.

This version of the app (1.1.1) is not available for Splunk Cloud. However, version 1.1.4 of this app is available for Splunk Cloud.

Products: Splunk Enterprise, Splunk Cloud

Products: Splunk Enterprise

Splunk Versions: 9.2, 9.1, 9.0, 8.2, 8.1

Platform: Platform Independent

Splunk Versions: 9.2, 9.1, 9.0, 8.2, 8.1

Platform: Platform Independent

Splunk Versions: 9.2, 9.1, 9.0, 8.2, 8.1, 8.0

Platform: Platform Independent

Splunk Versions: 9.2, 9.1, 9.0, 8.2, 8.1, 8.0, 7.3, 7.2

Platform: Platform Independent

Licensing

MIT License

Category & Contents

Categories: Business Analytics, Utilities

App Type: App

Subscribe Share

Splunk Cookie Policy

My Account

Support & Services

Accept License Agreements

Thank You

Downloading NLP Text Analytics

To install your download

Flag As Inappropriate

Overview

Details

Description and Use-cases

Requirements

How to use

Install

Example Texts

Detailed Documentation

Summary Documentation

Custom Commands

bs4

Description

Syntax

cleantext

Description

Syntax

similarity

Description

Syntax

vader

Description

Syntax

ML Algorithms

TruncantedSVD

Description

Syntax

LatentDirichletAllocation

Description

Syntax

NMF

Description

Syntax

TFBinary

Description

Syntax

MinMaxScaler

Description

Syntax

LinearSVC

Description

Syntax

ExtraTreesClassifier

Description

Syntax

Support

Release Notes

Version 1.1.4

Version 1.1.3

Version 1.1.2

Version 1.1.1

Version

Built by

Support

Compatibility

Licensing

Category & Contents

Are you a developer?