icon/x Created with Sketch.

Splunk Cookie Policy

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website. Learn more (including how to update your settings) here.
Accept Cookie Policy

We are working on something new...

A Fresh New Splunkbase
We are designing a New Splunkbase to improve search and discoverability of apps. Check out our new and improved features like Categories and Collections. New Splunkbase is currently in preview mode, as it is under active development. We welcome you to navigate New Splunkbase and give us feedback.

Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading Website input
SHA256 checksum (website-input_4510.tgz) a5f81f1f9f1d77ef41d4ae940577699da09c171a4cc7c6dcb3ed524d5a5f01a6 SHA256 checksum (website-input_459.tgz) dcd9eadd3d66503e95810d885c4567ed2a1b1e1aeacce2474c222c6f5172f98c SHA256 checksum (website-input_458.tgz) 0aaf4426c69201a1f3a68c7662ac8a207fa460b4b1a36f3fe4473b5438707237 SHA256 checksum (website-input_457.tgz) 057b28cd090059cbd2c1c6d021a9a9dbdc3a043407086a6133d9641e3a428721 SHA256 checksum (website-input_456.tgz) b8913af8b2a4dacafcc350f2ba8237c95b333895d2d6d430ab04f421fb76d394 SHA256 checksum (website-input_455.tgz) 84d2c394efb09ddc39bfd3934036ed6253e4dbe5a43774b055a510aa19057055 SHA256 checksum (website-input_454.tgz) cde23a08ca0be21b9893419c4443a896df3074aee845e8e77a6309b17d5e49c0 SHA256 checksum (website-input_453.tgz) 14c5937d2a6757177bf034991875ef7851ad84148877e995672d7491dd2460d8 SHA256 checksum (website-input_452.tgz) 9c7397dbdaacf69941eb5159cde62eb498abdfab927d3a4b763ab27551f87690 SHA256 checksum (website-input_451.tgz) 0dc3af457d3f3e1f3ac4452f5e1d9ebef8f421621acfc4cc683e10d7ceb2f48c SHA256 checksum (website-input_45.tgz) 127afd6fad5a198b62c5de3b7b2eb685122a4988b180d2ccbb2b75bce0591c02 SHA256 checksum (website-input_44.tgz) 74ccc73a74dc00abcbaf4633e32846c80a41227eff1b768e01b27b7c113e70f3 SHA256 checksum (website-input_430.tgz) a893216a75f65bad9ca4f522a13da6561fa4c6c391fd60fffd965d664f7af6a3 SHA256 checksum (website-input_421.tgz) 1a8c6871d6611abb99686eca7e4c3f250ca94637a71d3e6db0c705f1b51ed12d SHA256 checksum (website-input_42.tgz) b7758e48cf7660c67dfa5c4b669a04c7d7499d99e67826a2adb06ae03884a2f1 SHA256 checksum (website-input_413.tgz) a888baa6bf748305f108655621887fb6b679836cdcda3733db56c2c9f2a31155 SHA256 checksum (website-input_412.tgz) 93b7db14fd8c19f9f8fca3b19a49624327745da4ca2058eb994e07ead57006b4 SHA256 checksum (website-input_411.tgz) a6aeb03ccf84d352deb71ba65ae4840c952009553a18595f11b6ca588f6702df SHA256 checksum (website-input_41.tgz) 4033640cfde1785d735986f8700f268b65e5447c69063f85c6d6d7caca5f1482 SHA256 checksum (website-input_402.tgz) 569fc6a5ce2080cc0395e1c1e575b5a3621ab749dac6fbc1a5b2cf2c05344a2d SHA256 checksum (website-input_401.tgz) 3f1bcfd615435f9476250242ee94cb4cc2dde8425028895f1020726c79f89cb2 SHA256 checksum (website-input_40.tgz) c1b452a4b4c30d99107d092274b56fe6ccec2720cbb5ae89cc0b53cb38e07b00 SHA256 checksum (website-input_321.tgz) b004732a857e2313cb0d23228c0805ddfe128653353e5e77dc78682df34644aa SHA256 checksum (website-input_32.tgz) a43f62b37448a3f1eaec97c951b0d694e3b8b2d94537e6e0596fb3cc8baec850 SHA256 checksum (website-input_312.tgz) d907d564a0f1f2f8e2c3a8b8fd4ca7e218a9daffd2c1c8228b3d27efd4a832b6 SHA256 checksum (website-input_311.tgz) be081387ac8b4774ed5764acb2925c97bab47a6936a7ab34a8803d54cecc927a SHA256 checksum (website-input_31.tgz) b2492139c1296425654e14f479d2ef8d5ab1e70991d1a0bac03e0d8c90567c8d SHA256 checksum (website-input_30.tgz) e7faaf6813dcfaa0d7256235c0d1eecae0b27b49ecc339353853d175a70f9918 SHA256 checksum (website-input_21.tgz) 862127d54f6876543cecd1d5ec8b077e6dcbbddee2e522ce9915eca287b18dd8 SHA256 checksum (website-input_20.tgz) 953958e9011baac483ba3002af61fc390b05b2966a93ecc0a371308b96a5d359 SHA256 checksum (website-input_120.tgz) a22c8ba8f3b25a6675aaf29b25204dd1e5cb44f9f30dade9e9f7c3f3d7585e55 SHA256 checksum (website-input_113.tgz) 2c3ff19007cb8e854679cac2234d4946853145625d884c69df48ebc7fe599a3f SHA256 checksum (website-input_112.tgz) ca14b1e5255606ceef6eb2d5d1f75a6c7e2f0cc3cc3f372d8b85c5b5da982ee3 SHA256 checksum (website-input_111.tgz) 69cd2a651aa24f682ba917e6b9161ad482309bdc60c05d3ed5f11e2784b60c20 SHA256 checksum (website-input_11.tgz) c1906e504e6345c2166fc663ea0a77a94211a96100c869e44f85c3447cdcdc1d SHA256 checksum (website-input_105.tgz) 50596c66180fce963c7b9e2a4aba944eaa6acfd92fdf53ff519b54b557db1e1e SHA256 checksum (website-input_104.tgz) 40f586cf4ac38ff386baab956cfe16cb458029fc1ddc0470576bbad7fc520f6f SHA256 checksum (website-input_103.tgz) 6a2609d24bb5c116fc6f9b7d2888ce7844201fa372f6a4935fddacc8ff755284 SHA256 checksum (website-input_102.tgz) a2a5c8792ae37f1ecb2a3ee25be5d144baf367df6bf94327d6b943b6e2c5ade6 SHA256 checksum (website-input_101.tgz) 80efd3850971bacb48832b9f2b1dddf11fa5f4b5da3f3d37ba204e67778c53c4 SHA256 checksum (website-input_10.tgz) 07292bc2fe08d08e723a579d6ecb5fe97ffe7338e3884afeb29c92ae08c2c678 SHA256 checksum (website-input_09.tgz) da22a71977d7fb555fd23af75b27911908285a7e4df27fb91f7d39b525efcf5f SHA256 checksum (website-input_08.tgz) 64dae4cb759a71191265d14946101579f8653fa407903c7e1c13f1c38a6e51d7 SHA256 checksum (website-input_07.tgz) de101558834c5e44e792c3581e9601e8a956b2394db83219050d52fc91435579 SHA256 checksum (website-input_06.tgz) 5751223a0d6fb232e68d07887f5c6d921955c4efd5a3656126d20ab892c1a808 SHA256 checksum (website-input_05.tgz) cacfe8fee0939098d174d8d72f91f22e4f63c1266622c85a8334eec91318fe29
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

splunk

Website input

This app has been archived. Learn more about app archiving.
This app is NOT supported by Splunk. Please read about what that means for you here.
Overview
Details
The Website Input app provides a mechanism for scraping web-pages for data and indexing it in your Splunk instance to make it searchable.

Please consider financially supporting me in the developing this app in order to promote continued development; see https://github.com/sponsors/LukeMurphey

Features

  • Website Data Extraction: setup an input that will extract data from a web-page and get it into Splunk
  • Data Preview: select data from a web-page that you would like to extract and preview results to get a sample of the what the output would look like before you save the configuration
  • Website crawling: you can have the input crawl web-pages to automatically discover related content in other pages

Configuration

Initial setup

Once you install the app, it will ask you to set it up on the app configuration page. The setup only contains options related to configuring a proxy server. If no proxy server is used, you can just press save.

Creating an input

You will need to create an input to define the websites that you would like to extract information from. You can setup a new input using the wizard or using the page in Splunk's manager at Settings » Data Inputs » Web-pages or by using the GUI provided in the app itself. The most difficult part of configuring the app is making the CSS selector that will capture the data you want. See W3schools for information on how to create CSS selectors.

You can usually ignore the "Output" section. This is only necessary if you want to name the fields that the input will get based on content within the page (see "Can I use attributes to set the field names?" for details).

The "Authentication" can be left blank unless the web-page requires authentication. Only HTTP authentication is supported at the current time.

Known Issues

The UI shows matches for a selector does preview shows none and the input matches nothing

The preview window may show that a selector matches in the UI even though the selector doesn't match when executed in preview due to the fact that web-browsers sometimes manipulate the HTML before rendering it. This can happen sometimes when tables do not have a tbody element (which they are supposed to). The browser adds the tbody element even though it doesn't exist in the original HTML.

To fix this, you can do one of the following:

  1. Use a selector that matches the original HTML even though it doesn't match in the preview page
  2. Make your selector more generic (like converting "font > table > tr" to "font table tr")
  3. Making a selector that matches both (like "font > table > tr,font > table > tbody > tr"

FAQs

See the links below for answers to frequently asked questions:

Can I specify more than one selector (to match different things on a single page)?

Can I use attributes to set the field names?

I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?

The input isn't extracting content, even though I can see it in my web-browser

More Information

This project is open source. See GitHub for the source or LukeMurphey.net for more information.

Release Notes

Version 4.5.10
July 5, 2020

1) Updated the code to be more compliant with Python 3
2) Fixed issue where the results could be on the wrong order

Version 4.5.9
Jan. 31, 2020

1) Added link to open URL in new tab
2) Improved code for communicating to the preview iframe

Version 4.5.8
Nov. 14, 2019
  1. Adding support for Python 3
  2. Fixing issues on Splunk 8.0.0
  3. Updated the Geckodriver for Mac and Linux to version 0.26.0
Version 4.5.7
June 14, 2019

1) Fixed another error that occurred when output values as multi-valued fields
2) Updated the geckodriver to 0.24 so that newer versions of Firefox work
3) Added link to search logs to determine why browser test failed
4) Fixed issue where integrated browser test failed on the input wizard

Version 4.5.6
Feb. 22, 2019

1) Fixed error that occurred when output values as multi-valued fields
2) Fixed issue where proxy password from secure storage was not being used

Version 4.5.5
Nov. 10, 2018

1) Fixed issue where passwords were not loaded if there were more than 30
2) Improved styling on Splunk 7.0+

Version 4.5.4
July 9, 2018

1) Fixed the "when_matches_change" setting of "output_results" made results even the matches hadn't changed
2) Fixed issue where the severity chart on the health page filtered based on the severity filter and thus didn't show all entries

Version 4.5.3
June 15, 2018

Updating the styling to work better on Splunk 7.0 and 7.1

Version 4.5.2
March 14, 2018

1) Input now handles large files much better by only downloading the first 512 KB of the file
2) Updated the Chrome driver so that the input works with newer versions of Chrome
3) The input creation wizard auto-suggests a URL filter now when using spidering
4) Output is not streamed (as opposed to being cached) in order to reduce memory usage
5) The input now gracefully handles websites that return a bad encoding
6) Fixed issue where you could not drill-down on logs from the health dashboard

Version 4.5.1
Oct. 5, 2017

1) Input is now resilient to transient Splunkd outages
2) Fixed issue where index selection input was super-wide on Splunk 7.0

Version 4.5
Sept. 2, 2017

1) Added support for forms authentication with browsers
2) Fixed issue where user-agent string was not set for Firefox and Chrome
3) Fixed issue where the browser testing functionality on the UI didn't use the proxy server

Version 4.4
Aug. 7, 2017

1) Added support for forms authentication
2) Added ability to set a default value for the user-agent globally
3) Removed support for proxy authentication on Splunk Cloud

Version 4.3.0
July 21, 2017

1) Passwords are now stored using Splunk secure storage
2) Setup page has been updated to make it easier to use
3) Pages can now be rendered using Google Chrome
4) Added help page to guide users on how to use a web browser for rendering; added browser test to input page
5) Fixed a couple small bugs on the Overview dashboard

Version 4.2.1
May 4, 2017

1) Improved compatibility with Splunk 6.6
2) Fixed issue where users could not enable inputs some times

Version 4.2
April 7, 2017

Adding ability to only output results when they change

Version 4.1.3
April 3, 2017

1) Fixed issue where the host field could not be overridden
2) Reduced some unimportant log messages to debug level

Version 4.1.2
March 19, 2017

Added support for running the app on a Splunk free license

Version 4.1.1
March 13, 2017

Fixed issue where Firefox driver was not correctly added to the path on Windows

Version 4.1
March 9, 2017

1) Fixed issue where some sites could not be previewed
2) Fixed issue where selectors would not match an ID that was not lowercase
3) Added ability to include empty matches
4) Added ability to delete inputs

Version 4.0.2
Jan. 18, 2017

1) Fixed issue where HTTP authentication didn't work with Firefox
2) Fixed issue where Firefox rendering didn't work on headless environments
3) Other minor changes

Version 4.0.1
Dec. 3, 2016

Various bug fixes and minor improvements

Version 4.0
Dec. 1, 2016

Vastly updated UI, various bugs fixes and lots of smaller enhancements

Version 3.2.1
Nov. 24, 2016

1) Improved compatibility with versions of Splunk
2) Fixed overly restrictive URL validation
3) Fixed issue where some parts of the stash file may not have been indexed, losing parts of large result sets
4) Fixed controller logs which were not sourcetyped correctly

Version 3.2
Sept. 21, 2016
  • Added ability to view results in search from the modular input creation page
  • Improved documentation on the search command options
Version 3.1.2
Sept. 20, 2016

Fixed problem where matches were not visible when the content is very long

Version 3.1.1
July 14, 2016

Fixed problem where you could not create new inputs

Version 3.1
July 11, 2016

Added ability to grant access to make inputs to non-admin users

Version 3.0
May 26, 2016
  • Added ability to rendering using a browser (to get page contents after JS rendering has executed)
  • MD5 and SHA224 hashes are now included in the results
  • Added ability to output matches as separate fields
  • Matches are now listed in results in order that they discovered
Version 2.1
May 24, 2016
  • Simplified the data input configuration screen
  • Added ability to include the raw content in case you want to do your own parsing in SPL
  • Added ability to specify a custom string that will separate extracted values
  • Fixed incorrect reporting of matches count
Version 2.0
May 3, 2016
  • Added ability to crawl websites
Version 1.2.0
Jan. 3, 2016
  • Added the ability to use the tag names as the field names
  • Fixed issue where the selector would sometimes not match if the content was upper-case and the selector wasn't
  • Added a BNF file for the search command
Version 1.1.3
Dec. 16, 2015

Password no longer must be re-typed every time an input is modified

Version 1.1.2
Nov. 30, 2015

Fixed issue where fields without spaces were not being extracted as multi-value fields by default

Version 1.1.1
Sept. 7, 2015

Updated to the latest version of the modular input library; should fix problems where the input crashes

Version 1.1
Aug. 24, 2015

Added ability to specify the user-agent string

Version 1.0.5
June 22, 2015
  • Fixed issue where web input controller used the incorrect logger name
  • Fixed issue where you could not select the sourcetype correctly in some cases
  • Added a search command for performing web scrapes from the search page
Version 1.0.4
March 28, 2015
  • Fixed issue where some files could not be parsed because lxml won't parse correctly encoded files sometimes
  • Enhanced logging for when interval gap is too large and when checkpoint file could not be found
Version 1.0.3
Jan. 9, 2015
  • Fixed issue where the input would not stay on the interval because it included processing time in the interval
  • Fixed issue where the modular input logs were not sourcetyped correctly
Version 1.0.2
Nov. 29, 2014

Fixed issue where the input would:
* sometimes fail due to exception thrown from sleep() being interrupted
* sometimes fail due to splunkd connection failure
* ignore the host field that was set on the configuration page

Version 1.0.1
Nov. 12, 2014

Fixed issue where preview did not work

Version 1.0
Oct. 28, 2014

Added ability to use a proxy server

Version 0.9
Aug. 17, 2014
  • Fixed issue where not all matches were returned
  • Added preview dialog to modular input page
  • Added raw_match_count to output which counts CSS matches, even they included no text
  • Fixed incompatibility with other apps that also import the modular_input base class
  • Fixed issue where entering and then clearing the sourcetype causes an error
  • Added ability to specify attributes that should be used for the field names
Version 0.8
July 13, 2014

Fixed problem where websites in non-Ascii encoding did not get decoded correctly

Version 0.7
July 11, 2014
Version 0.6
July 8, 2014
  • Switched to multi-value output of matches and added transform for parsing match field
  • Fixed exception that could happen if the web-page was not available
  • Put authentication fields on a separate location on the manager page
Version 0.5
July 7, 2014

A Splunk input for retrieving and indexing information from web-pages


Subscribe Share

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps from Splunk, our partners and our community. Find an app for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
Splunk, Splunk>,Turn Data Into Doing, Data-to-Everything, and D2E are trademarks or registered trademarks of Splunk Inc. in the United States and other countries. All other brand names,product names,or trademarks belong to their respective owners.