icon/x Created with Sketch.

Splunk Cookie Policy

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website. Learn more (including how to update your settings) here.
Accept Cookie Policy

We are working on something new...

A Fresh New Splunkbase
We are designing a New Splunkbase to improve search and discoverability of apps. Check out our new and improved features like Categories and Collections. New Splunkbase is currently in preview mode, as it is under active development. We welcome you to navigate New Splunkbase and give us feedback.

Accept License Agreements

This app is provided by a third party and your right to use the app is in accordance with the license provided by that third-party licensor. Splunk is not responsible for any third-party apps and does not provide any warranty or support. If you have any questions, complaints or claims with respect to this app, please contact the licensor directly.

Thank You

Downloading Website input
SHA256 checksum (website-input_4510.tgz) 0ec713fe180d73e897bc0ac1515450eaeaaa159b899cad3341ea64939f6a7ecf SHA256 checksum (website-input_459.tgz) e4a2175d450a979d2c96d1683d6550d3743bb16acd6f8676a376e066a89775eb SHA256 checksum (website-input_458.tgz) edd9d82e281fcaf5d720184156390a6fcefb133f95cfb3ec48870e42f4cc310f SHA256 checksum (website-input_457.tgz) 612cd307bab7c7f311490b21384a60e6ea8e062bdf9ba2d67ede225212929b83 SHA256 checksum (website-input_456.tgz) 143b30e15bac010c28476b37405ba8b9661d0abff1788ca2748e320244048b45 SHA256 checksum (website-input_455.tgz) f5e829f11af54e405518b2b46bfa0b7f3f0116d6fb875b786a06f77ff88c47ea SHA256 checksum (website-input_454.tgz) e58f8c043c294710a680bfd5437d29b1936819ddd738b465b721a642d6b8335c SHA256 checksum (website-input_453.tgz) 04823359c2e02fd07450152b5226734bc3e0461ed68c8a14f6a09f3ab9742352 SHA256 checksum (website-input_452.tgz) 342308738b048667e2c118d3bfc78c07f03a4284e295274a7d0e60ab2f524cbf SHA256 checksum (website-input_451.tgz) c54ee3276175cb880fcfbc0515b141f06cda81e7180fdc58b764341b08e21da4 SHA256 checksum (website-input_45.tgz) 18bb5c40aecd7573cb2526fbb82f7d8dc0b6e5eeb3657b642154b14ddf8d4dc4 SHA256 checksum (website-input_44.tgz) 8f94d3b9e211858e39be2a43ae07eebe5e5c21205f0d319cabb7770346599622 SHA256 checksum (website-input_430.tgz) d8389176c8fc845ee65130472123da57f45aba5f4d7b768551dcdf493fbe8c8e SHA256 checksum (website-input_421.tgz) 462826258c52e60f4fb61427f0d19b9b69f837d60fbd2e60fa93dc910fb65ff4 SHA256 checksum (website-input_42.tgz) 708d4a49b03242b2c1cb3797213468bffec8b6c9e921ad3f62e5a968e98ea692 SHA256 checksum (website-input_413.tgz) 92e50fc7cc7fa9428cf957335882dfc0b3a1d1596bc94a99865bf89d285b9c62 SHA256 checksum (website-input_412.tgz) a4bd401aa42049e2313a3430af9f38e2820b5aab72dc9329fa10a59135926b79 SHA256 checksum (website-input_411.tgz) 0eec43da99c9d2044a95946b15ad34a3bbfdf4fa744a96e44357f1eba9145bad SHA256 checksum (website-input_41.tgz) f437572ca4e38c7c08fb5fd3208b116079bb5803108876b60b07cbb7e4cf42ee SHA256 checksum (website-input_402.tgz) 675765bff0e35ae9f51eae901cf21fa0dd4d4372010aeb1ce9102fb9ed6c7c28 SHA256 checksum (website-input_401.tgz) 515ce5281921fea2e42f4e03ee3f7294154b9e78b15ba4f29685ad0fcadcb910 SHA256 checksum (website-input_40.tgz) 956c9546f26ee1a8d7739dde47e9163e39865e7e37e40411a3a21dde30df269e SHA256 checksum (website-input_321.tgz) 14e9d9cd9125db3501bf1b880d8f5c1c7b56dbc8f40512a0df3c1a340e12f880 SHA256 checksum (website-input_32.tgz) 38cd3a69c68f92dd08c53c7d1ebce07adf5d5415daa8095971197f07324cdfb0 SHA256 checksum (website-input_312.tgz) 5e94fae1e3de0e4eb1ecd75de34c99865eb82dce5924b5a273e3fe5bc1db5ab8 SHA256 checksum (website-input_311.tgz) 8801f6ed3b863cd6e22552bef8906519ab6ee886d24e61596fb9cae77c4a744f SHA256 checksum (website-input_31.tgz) 735fa1b2db14f66c4b73f34d261cf79c3a7e03877afcedd0d3d20148d0ceb82b SHA256 checksum (website-input_30.tgz) 84b5026999d6cacad248970dea4d2dcba0799c09b4755ca7926ee3699d206955 SHA256 checksum (website-input_21.tgz) 7c4e5ccd11713c95ed758a7e23f23369f44d39fff1abead1504c1c0d24661f34 SHA256 checksum (website-input_20.tgz) 063b1df3fd734d2d5384eda3a3c4057ef1d5fc4ee9eb86470403b0fdcbf09848 SHA256 checksum (website-input_120.tgz) a4fa5a99639dcdbf0f276796a284fc44261edea5d4cc930e98b85c9514573d1a SHA256 checksum (website-input_113.tgz) 7af1367bfad4c06ef3ac4c698a105d5c43c6cccea50f86952784d511ed388e34 SHA256 checksum (website-input_112.tgz) 18afef95875563a4c2982fd7186279677780db2b3030e75a74054d0c596b1321 SHA256 checksum (website-input_111.tgz) 068f3b6d3557f4496ac0c3d0329901cb343fe76c8a07853b0ac1758fb97b94ae SHA256 checksum (website-input_11.tgz) 78ba79afb7b85f94500adf20edf1b004abaa882e5d0609bcf570e6159fe6bb75 SHA256 checksum (website-input_105.tgz) 988964a7239b57fa60a9b52f55cee0ed6cfeba4d8ba7e45c2599922eaa2790e6 SHA256 checksum (website-input_104.tgz) f905d5247a981d1101ce5f519b99fe9962dde563ba749d771dfae211c46d02a9 SHA256 checksum (website-input_103.tgz) 9248efd4a52a85c6d5a4c51364793e9458feed676c8de64b43b6c00dd55b97c2 SHA256 checksum (website-input_102.tgz) f6d568cfbd83ea9919b60d0c60880215125949c505df5710092d350aa55e4c06 SHA256 checksum (website-input_101.tgz) 96471c5085ce73286719a4ef9dc67c3239c7c24787c2d35650536a0859906e5b SHA256 checksum (website-input_10.tgz) 0c3d1ce34c7924dff46bce9d59abef30895dc8b70263955a96e54b583c0528a9 SHA256 checksum (website-input_09.tgz) 325002c1c8dab1bcb3a1f0b8f260030e946965db299f973f2031a63dc923f507 SHA256 checksum (website-input_08.tgz) 4c7f127597880e0ee1ed8e655959f3b14fb162e1e5de5ab164569a59cc1dfcc0 SHA256 checksum (website-input_07.tgz) 1516856e707cbda60444d73d5607ccbdc71e8b43e8ebbb1d87668d2659e20348 SHA256 checksum (website-input_06.tgz) 15bf996a949c14921eaa26b3eb31d00b1f78d8a8882e9c8aec9361f1278cbabd SHA256 checksum (website-input_05.tgz) 7378e87a1ee64b8e74000e72ca81ef833037268c5882cfaf8778fd0eddc8d4b8
To install your download
For instructions specific to your download, click the Details tab after closing this window.

Flag As Inappropriate

splunk

Website input

This app is NOT supported by Splunk. Please read about what that means for you here.
Overview
Details
The Website Input app provides a mechanism for scraping web-pages for data and indexing it in your Splunk instance to make it searchable.

Please consider financially supporting me in the developing this app in order to promote continued development; see https://github.com/sponsors/LukeMurphey

Features

  • Website Data Extraction: setup an input that will extract data from a web-page and get it into Splunk
  • Data Preview: select data from a web-page that you would like to extract and preview results to get a sample of the what the output would look like before you save the configuration
  • Website crawling: you can have the input crawl web-pages to automatically discover related content in other pages

Configuration

Initial setup

Once you install the app, it will ask you to set it up on the app configuration page. The setup only contains options related to configuring a proxy server. If no proxy server is used, you can just press save.

Creating an input

You will need to create an input to define the websites that you would like to extract information from. You can setup a new input using the wizard or using the page in Splunk's manager at Settings » Data Inputs » Web-pages or by using the GUI provided in the app itself. The most difficult part of configuring the app is making the CSS selector that will capture the data you want. See W3schools for information on how to create CSS selectors.

You can usually ignore the "Output" section. This is only necessary if you want to name the fields that the input will get based on content within the page (see "Can I use attributes to set the field names?" for details).

The "Authentication" can be left blank unless the web-page requires authentication. Only HTTP authentication is supported at the current time.

Known Issues

The UI shows matches for a selector does preview shows none and the input matches nothing

The preview window may show that a selector matches in the UI even though the selector doesn't match when executed in preview due to the fact that web-browsers sometimes manipulate the HTML before rendering it. This can happen sometimes when tables do not have a tbody element (which they are supposed to). The browser adds the tbody element even though it doesn't exist in the original HTML.

To fix this, you can do one of the following:

  1. Use a selector that matches the original HTML even though it doesn't match in the preview page
  2. Make your selector more generic (like converting "font > table > tr" to "font table tr")
  3. Making a selector that matches both (like "font > table > tr,font > table > tbody > tr"

FAQs

See the links below for answers to frequently asked questions:

Can I specify more than one selector (to match different things on a single page)?

Can I use attributes to set the field names?

I changed the sourcetype and now the match field is no longer a multi-value field; what do I do?

The input isn't extracting content, even though I can see it in my web-browser

More Information

This project is open source. See GitHub for the source or LukeMurphey.net for more information.

Release Notes

Version 4.5.10
July 5, 2020

1) Updated the code to be more compliant with Python 3
2) Fixed issue where the results could be on the wrong order

Version 4.5.9
Jan. 31, 2020

1) Added link to open URL in new tab
2) Improved code for communicating to the preview iframe

Version 4.5.8
Nov. 14, 2019
  1. Adding support for Python 3
  2. Fixing issues on Splunk 8.0.0
  3. Updated the Geckodriver for Mac and Linux to version 0.26.0
Version 4.5.7
June 14, 2019

1) Fixed another error that occurred when output values as multi-valued fields
2) Updated the geckodriver to 0.24 so that newer versions of Firefox work
3) Added link to search logs to determine why browser test failed
4) Fixed issue where integrated browser test failed on the input wizard

Version 4.5.6
Feb. 22, 2019

1) Fixed error that occurred when output values as multi-valued fields
2) Fixed issue where proxy password from secure storage was not being used

Version 4.5.5
Nov. 10, 2018

1) Fixed issue where passwords were not loaded if there were more than 30
2) Improved styling on Splunk 7.0+

Version 4.5.4
July 9, 2018

1) Fixed the "when_matches_change" setting of "output_results" made results even the matches hadn't changed
2) Fixed issue where the severity chart on the health page filtered based on the severity filter and thus didn't show all entries

Version 4.5.3
June 15, 2018

Updating the styling to work better on Splunk 7.0 and 7.1

Version 4.5.2
March 14, 2018

1) Input now handles large files much better by only downloading the first 512 KB of the file
2) Updated the Chrome driver so that the input works with newer versions of Chrome
3) The input creation wizard auto-suggests a URL filter now when using spidering
4) Output is not streamed (as opposed to being cached) in order to reduce memory usage
5) The input now gracefully handles websites that return a bad encoding
6) Fixed issue where you could not drill-down on logs from the health dashboard

Version 4.5.1
Oct. 5, 2017

1) Input is now resilient to transient Splunkd outages
2) Fixed issue where index selection input was super-wide on Splunk 7.0

Version 4.5
Sept. 2, 2017

1) Added support for forms authentication with browsers
2) Fixed issue where user-agent string was not set for Firefox and Chrome
3) Fixed issue where the browser testing functionality on the UI didn't use the proxy server

Version 4.4
Aug. 7, 2017

1) Added support for forms authentication
2) Added ability to set a default value for the user-agent globally
3) Removed support for proxy authentication on Splunk Cloud

Version 4.3.0
July 21, 2017

1) Passwords are now stored using Splunk secure storage
2) Setup page has been updated to make it easier to use
3) Pages can now be rendered using Google Chrome
4) Added help page to guide users on how to use a web browser for rendering; added browser test to input page
5) Fixed a couple small bugs on the Overview dashboard

Version 4.2.1
May 4, 2017

1) Improved compatibility with Splunk 6.6
2) Fixed issue where users could not enable inputs some times

Version 4.2
April 7, 2017

Adding ability to only output results when they change

Version 4.1.3
April 3, 2017

1) Fixed issue where the host field could not be overridden
2) Reduced some unimportant log messages to debug level

Version 4.1.2
March 19, 2017

Added support for running the app on a Splunk free license

Version 4.1.1
March 13, 2017

Fixed issue where Firefox driver was not correctly added to the path on Windows

Version 4.1
March 9, 2017

1) Fixed issue where some sites could not be previewed
2) Fixed issue where selectors would not match an ID that was not lowercase
3) Added ability to include empty matches
4) Added ability to delete inputs

Version 4.0.2
Jan. 18, 2017

1) Fixed issue where HTTP authentication didn't work with Firefox
2) Fixed issue where Firefox rendering didn't work on headless environments
3) Other minor changes

Version 4.0.1
Dec. 3, 2016

Various bug fixes and minor improvements

Version 4.0
Dec. 1, 2016

Vastly updated UI, various bugs fixes and lots of smaller enhancements

Version 3.2.1
Nov. 24, 2016

1) Improved compatibility with versions of Splunk
2) Fixed overly restrictive URL validation
3) Fixed issue where some parts of the stash file may not have been indexed, losing parts of large result sets
4) Fixed controller logs which were not sourcetyped correctly

Version 3.2
Sept. 21, 2016
  • Added ability to view results in search from the modular input creation page
  • Improved documentation on the search command options
Version 3.1.2
Sept. 20, 2016

Fixed problem where matches were not visible when the content is very long

Version 3.1.1
July 14, 2016

Fixed problem where you could not create new inputs

Version 3.1
July 11, 2016

Added ability to grant access to make inputs to non-admin users

Version 3.0
May 26, 2016
  • Added ability to rendering using a browser (to get page contents after JS rendering has executed)
  • MD5 and SHA224 hashes are now included in the results
  • Added ability to output matches as separate fields
  • Matches are now listed in results in order that they discovered
Version 2.1
May 24, 2016
  • Simplified the data input configuration screen
  • Added ability to include the raw content in case you want to do your own parsing in SPL
  • Added ability to specify a custom string that will separate extracted values
  • Fixed incorrect reporting of matches count
Version 2.0
May 3, 2016
  • Added ability to crawl websites
Version 1.2.0
Jan. 3, 2016
  • Added the ability to use the tag names as the field names
  • Fixed issue where the selector would sometimes not match if the content was upper-case and the selector wasn't
  • Added a BNF file for the search command
Version 1.1.3
Dec. 16, 2015

Password no longer must be re-typed every time an input is modified

Version 1.1.2
Nov. 30, 2015

Fixed issue where fields without spaces were not being extracted as multi-value fields by default

Version 1.1.1
Sept. 7, 2015

Updated to the latest version of the modular input library; should fix problems where the input crashes

Version 1.1
Aug. 24, 2015

Added ability to specify the user-agent string

Version 1.0.5
June 22, 2015
  • Fixed issue where web input controller used the incorrect logger name
  • Fixed issue where you could not select the sourcetype correctly in some cases
  • Added a search command for performing web scrapes from the search page
Version 1.0.4
March 28, 2015
  • Fixed issue where some files could not be parsed because lxml won't parse correctly encoded files sometimes
  • Enhanced logging for when interval gap is too large and when checkpoint file could not be found
Version 1.0.3
Jan. 9, 2015
  • Fixed issue where the input would not stay on the interval because it included processing time in the interval
  • Fixed issue where the modular input logs were not sourcetyped correctly
Version 1.0.2
Nov. 29, 2014

Fixed issue where the input would:
* sometimes fail due to exception thrown from sleep() being interrupted
* sometimes fail due to splunkd connection failure
* ignore the host field that was set on the configuration page

Version 1.0.1
Nov. 12, 2014

Fixed issue where preview did not work

Version 1.0
Oct. 28, 2014

Added ability to use a proxy server

Version 0.9
Aug. 17, 2014
  • Fixed issue where not all matches were returned
  • Added preview dialog to modular input page
  • Added raw_match_count to output which counts CSS matches, even they included no text
  • Fixed incompatibility with other apps that also import the modular_input base class
  • Fixed issue where entering and then clearing the sourcetype causes an error
  • Added ability to specify attributes that should be used for the field names
Version 0.8
July 13, 2014

Fixed problem where websites in non-Ascii encoding did not get decoded correctly

Version 0.7
July 11, 2014
Version 0.6
July 8, 2014
  • Switched to multi-value output of matches and added transform for parsing match field
  • Fixed exception that could happen if the web-page was not available
  • Put authentication fields on a separate location on the manager page
Version 0.5
July 7, 2014

A Splunk input for retrieving and indexing information from web-pages


Subscribe Share

Are you a developer?

As a Splunkbase app developer, you will have access to all Splunk development resources and receive a 10GB license to build an app that will help solve use cases for customers all over the world. Splunkbase has 1000+ apps from Splunk, our partners and our community. Find an app for most any data source and user need, or simply create your own with help from our developer portal.

Follow Us:
Splunk, Splunk>,Turn Data Into Doing, Data-to-Everything, and D2E are trademarks or registered trademarks of Splunk Inc. in the United States and other countries. All other brand names,product names,or trademarks belong to their respective owners.