Excluding Published Document Paths or File Types from the InSite Search Index

Product: InSite Search

Version: All, ISS 2, 2.6, 2.6.12

Published: June 1, 2017

Last updated: 1/30/2021

Comments:
0 Comments

Description

Readable files within the Document asset path will be indexed by the iFilter indexing process used with InSite Search. If there are file types residing within your Documents folder that you do not wish to include in your search results, you can configure a Custom Setting, ExcludeDocumentExtensions, to define filetype extensions to not index.

If there are specific folders within your Documents asset location path that you do not wish to be returned as Search Results, you can set the ExcludeDocumentPaths Custom Setting to prevent these subdirectories from being indexed.

Purpose

Both of these Custom Settings provide the ability for the indexing process to exclude specific files. This will, in turn, further enhance the search result experience of InSite Search.

Requirements

  • InSite Search 2.6.12 and above.
  • CMS 9 SR6 and above (requirement for InSite Search 2.6.12).
  • CMS Published content, to be rendered through a DSS site.

Step-by-Step

To exclude documents:

  1. Navigate to the Administration > Search Configuration > Settings.
  2. Add a new Custom Setting to be added to your CMS SearchSource.config on publish:
    • For excluding documents by File extension:
      • Name: ExcludeDocumentExtensions
      • Value: a comma-delimited list of file extensions e.g. "xml,css,js"

        CMS SearchConfig DocExt

    • For excluding documents by path:
      • Name: ExcludeDocumentPaths
      • Value: list of folder paths that are comma-delimited, relative to the \Documents folder e.g., "IntranetDocs,MyDocs\Private,Do\Not\Index"

        CMS SearchConfig DocPath

  3. Save the settings, then publish from the CMS so that the Custom Settings are written to a new SearchSource.config.

Additional Information

Because this feature affects index-time boost score calculation, you'll need to delete and then re-index your publish content when either the feature is first enabled or each time an adjustment is made to the configuration for this feature. For this reason, it's recommended that you test your configuration in a staging environment prior to implementing changes in production environments.


Comments

There are no comments yet.