Once you install the app, it will ask you to set it up on the app configuration page. The setup only contains options related to configuring a proxy server. If no proxy server is used, you can just press save.
You will need to create an input to define the websites that you would like to extract information from. You can setup a new input using the wizard or using the page in Splunk's manager at Settings » Data Inputs » Web-pages or by using the GUI provided in the app itself. The most difficult part of configuring the app is making the CSS selector that will capture the data you want. See W3schools for information on how to create CSS selectors.
You can usually ignore the "Output" section. This is only necessary if you want to name the fields that the input will get based on content within the page (see "Can I use attributes to set the field names?" for details).
The "Authentication" can be left blank unless the web-page requires authentication. Only HTTP authentication is supported at the current time.
The preview window may show that a selector matches in the UI even though the selector doesn't match when executed in preview due to the fact that web-browsers sometimes manipulate the HTML before rendering it. This can happen sometimes when tables do not have a tbody element (which they are supposed to). The browser adds the tbody element even though it doesn't exist in the original HTML.
To fix this, you can do one of the following:
See the links below for answers to frequently asked questions:
Can I specify more than one selector (to match different things on a single page)?
Can I use attributes to set the field names?
I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?
The input isn't extracting content, even though I can see it in my web-browser
This project is open source. See GitHub for the source or LukeMurphey.net for more information.
1) Updated the code to be more compliant with Python 3
2) Fixed issue where the results could be on the wrong order
1) Added link to open URL in new tab
2) Improved code for communicating to the preview iframe
1) Fixed another error that occurred when output values as multi-valued fields
2) Updated the geckodriver to 0.24 so that newer versions of Firefox work
3) Added link to search logs to determine why browser test failed
4) Fixed issue where integrated browser test failed on the input wizard
1) Fixed error that occurred when output values as multi-valued fields
2) Fixed issue where proxy password from secure storage was not being used
1) Fixed issue where passwords were not loaded if there were more than 30
2) Improved styling on Splunk 7.0+
1) Fixed the "when_matches_change" setting of "output_results" made results even the matches hadn't changed
2) Fixed issue where the severity chart on the health page filtered based on the severity filter and thus didn't show all entries
Updating the styling to work better on Splunk 7.0 and 7.1
1) Input now handles large files much better by only downloading the first 512 KB of the file
2) Updated the Chrome driver so that the input works with newer versions of Chrome
3) The input creation wizard auto-suggests a URL filter now when using spidering
4) Output is not streamed (as opposed to being cached) in order to reduce memory usage
5) The input now gracefully handles websites that return a bad encoding
6) Fixed issue where you could not drill-down on logs from the health dashboard
1) Input is now resilient to transient Splunkd outages
2) Fixed issue where index selection input was super-wide on Splunk 7.0
1) Added support for forms authentication with browsers
2) Fixed issue where user-agent string was not set for Firefox and Chrome
3) Fixed issue where the browser testing functionality on the UI didn't use the proxy server
1) Added support for forms authentication
2) Added ability to set a default value for the user-agent globally
3) Removed support for proxy authentication on Splunk Cloud
1) Passwords are now stored using Splunk secure storage
2) Setup page has been updated to make it easier to use
3) Pages can now be rendered using Google Chrome
4) Added help page to guide users on how to use a web browser for rendering; added browser test to input page
5) Fixed a couple small bugs on the Overview dashboard
1) Improved compatibility with Splunk 6.6
2) Fixed issue where users could not enable inputs some times
Adding ability to only output results when they change
1) Fixed issue where the host field could not be overridden
2) Reduced some unimportant log messages to debug level
Added support for running the app on a Splunk free license
Fixed issue where Firefox driver was not correctly added to the path on Windows
1) Fixed issue where some sites could not be previewed
2) Fixed issue where selectors would not match an ID that was not lowercase
3) Added ability to include empty matches
4) Added ability to delete inputs
1) Fixed issue where HTTP authentication didn't work with Firefox
2) Fixed issue where Firefox rendering didn't work on headless environments
3) Other minor changes
Various bug fixes and minor improvements
Vastly updated UI, various bugs fixes and lots of smaller enhancements
1) Improved compatibility with versions of Splunk
2) Fixed overly restrictive URL validation
3) Fixed issue where some parts of the stash file may not have been indexed, losing parts of large result sets
4) Fixed controller logs which were not sourcetyped correctly
Fixed problem where matches were not visible when the content is very long
Fixed problem where you could not create new inputs
Added ability to grant access to make inputs to non-admin users
Password no longer must be re-typed every time an input is modified
Fixed issue where fields without spaces were not being extracted as multi-value fields by default
Updated to the latest version of the modular input library; should fix problems where the input crashes
Added ability to specify the user-agent string
Fixed issue where the input would:
* sometimes fail due to exception thrown from sleep() being interrupted
* sometimes fail due to splunkd connection failure
* ignore the host field that was set on the configuration page
Fixed issue where preview did not work
Added ability to use a proxy server
Fixed problem where websites in non-Ascii encoding did not get decoded correctly
A Splunk input for retrieving and indexing information from web-pages
As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps from Splunk, our partners and our community. Find an app for most any data source and user need, or simply create your own with help from our developer portal.