Posts Tagged search
While taming the subject of indexes, we recently had some discussion with one of our users who complained about long response times with some language specific search filter such as (cn:fr.6:=*John*).
These extended filters rely on I18N Collation matching rules and indexes that I’ve described in an old post for OpenDS.
It turned out that he had defined the matching rule for the index, and rebuilt it, but had missed an important part: the index-type did not include “extensible”.
The proper command to create an extensible index is the following :
dsconfig set-local-db-index-prop --backend-name userRoot --index-name cn \ --set index-extensible-matching-rule:fr.6 \ --add index-type:extensible \ --hostname localhost --port 4444 \ --bindDN cn=Directory\ Manager --bindPassword ****** \ -X -n
fr.6 is the shortcut for the French substring collation matching rule which full OID is 184.108.40.206.220.127.116.11.18.104.22.168.1.
Note that if you don’t specify the extensible index-type, the server will not build the index for the extensible matching rule. The use of the index-type is consistent with the other types of index, equality or else, and allows you to disable and re-enable extensible indexes without having to re-enter all OIDs.
In a previous post I talked about analyzing search filters and indexes. Matt added in a comment that OpenDJ has another mean of understanding how indexes are used in a search. Here’s a detailed post.
The OpenDJ LDAP directory server supports a “magic” operational attribute that allows an administrator to get from the server information about the processing of indexes for a specific search query: debugsearchindex.
If the attribute is set in the requested attributes in a search operation, the server will not return all entries as expected, but a single result entry with a fixed distinguished name and a single valued attribute debugsearchindex that contains the information related to the index processing, including the number of candidate entries per filter component, the overall number of candidate, and whether any or all of the search is indexed.
$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(mail=user.*)(cn=*Denice*))" debugsearchindex Password for user 'cn=Directory Manager': ******* dn: cn=debugsearch debugsearchindex: filter=(&(mail=user.*)[INDEX:mail.substring][COUNT:2000](cn=*Denice*)[INDEX:cn.substring][COUNT:1])[COUNT:1] final=[COUNT:1] $ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "objectclass=*" debugsearchindex Password for user 'cn=Directory Manager': ********* dn: cn=debugsearchdebugsearchindex: filter=(objectClass=*)[NOT-INDEXED] scope=wholeSubtree[COUNT:2007] final=[COUNT:2007] $ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "mail=user.1*" debugsearchindex Password for user 'cn=Directory Manager': ********* dn: cn=debugsearch debugsearchindex: filter=(mail=user.1*)[INDEX:mail.substring][COUNT:1111] scope=wholeSubtree[COUNT:2007] final=[COUNT:1111]
Note that sometimes, OpenDJ tries to optimize the query and use some other index than the regular one for the query. For example, it might use the equality index for an initial substring filter. The index used during the search does appear in the debugsearchindex attribute. Also, once the result set has been narrowed down to very few entries, it will stop using index and evaluate directly the entry set, as for the example below:
$ bin/ldapsearch -h localhost -p 1389 -D "cn=Directory Manager" -b "dc=example,dc=com" "(&(cn=Denice*)(mail=user.9*))" debugsearchindex Password for user 'cn=Directory Manager': dn: cn=debugsearch debugsearchindex: filter=(&(cn=Denice*)[INDEX:cn.equality][COUNT:1])[COUNT:1] final=[COUNT:1]
LDAP directory services greatly rely on indexes to provide fast and accurate search results.
OpenDJ, the open source LDAP directory services for the Java platform, provides a number of tools to ensure indexes are efficiently used or to optimize them for even better performances.
To start with, OpenDJ rejects by default all unindexed searches, unless the authenticated user has the privilege to perform them. Unindexed searches are rejected because they result in scanning the whole database, which consumes lots of resources and time. There are legitimate uses of unindexed search though, and OpenDJ offers a way to control who can perform them through a privilege. To learn more about privileges, how to grant them, please check the Administration Guide or some of my previous posts.
When unindexed searches are completed, OpenDJ (starting with revision 7148 of the OpenDJ trunk, and therefore OpenDJ 2.5) does logs the “Unindexed” keyword as part of the Search Response access log message. But the access log file can also be used to identify search operations that are not making an optimal use of indexes. Simply check for those search responses that have been returned with an etime (execution time) greater than the average.
The access log example below contains both an unusually high etime (expressed in ms) and the Unindexed tag.
[27/Jul/2011:20:27:27 +0200] SEARCH RES conn=0 op=1 msgID=2 result=0 nentries=10001 Unindexed etime=1846
The verify-index command let you check that no index is corrupted (i.e. no data is missing from indexes).
The rebuild-index command let you build or rebuild an index that would be corrupted or had its configuration changed.
One of the tuning parameter of indexes is the index-entry-limit (which was known in Sun DSEE as the AllIDsThreshold), the maximum size of entries kept in an index record, before the server stop maintaining that record and consider it’s more efficient to scan the whole database. For more information on the index entry limit, check the Section 7.2.4 Changing Index Entry Limits of the Indexing chapter of the Administration Guide.
OpenDJ provides a static analyzer of indexes which can help to understand how well the attributes are indexed, as well as help to tune the index entry limit. This tool is a function of the dbtest utility and is simply used as follow:
$ bin/dbtest list-index-status -n userRoot -b "dc=example,dc=com"
Index Name Index Type JE Database Name Index Valid Record Count Undefined 95% 90% 85%
--------------------------------------------------------------------------------------------------------------------------------------- id2children Index dc_example_dc_com_id2children true 2 0 0 0 0 id2subtree Index dc_example_dc_com_id2subtree true 2 0 0 0 0 uid.equality Index dc_example_dc_com_uid.equality true 2000 0 0 0 0 aci.presence Index dc_example_dc_com_aci.presence true 0 0 0 0 0 ds-sync-conflict.equality Index dc_example_dc_com_ds-sync-conflict.equality true 0 0 0 0 0 givenName.equality Index dc_example_dc_com_givenName.equality true 2000 0 0 0 0 givenName.substring Index dc_example_dc_com_givenName.substring true 5777 0 0 0 0 objectClass.equality Index dc_example_dc_com_objectClass.equality true 6 0 0 0 0 member.equality Index dc_example_dc_com_member.equality true 0 0 0 0 0 uniqueMember.equality Index dc_example_dc_com_uniqueMember.equality true 0 0 0 0 0 cn.equality Index dc_example_dc_com_cn.equality true 2000 0 0 0 0 cn.substring Index dc_example_dc_com_cn.substring true 19407 0 0 0 0 sn.equality Index dc_example_dc_com_sn.equality true 2000 0 0 0 0 sn.substring Index dc_example_dc_com_sn.substring true 8147 0 0 0 0 telephoneNumber.equality Index dc_example_dc_com_telephoneNumber.equality true 2000 0 0 0 0 telephoneNumber.substring Index dc_example_dc_com_telephoneNumber.substring true 16506 0 0 0 0 ds-sync-hist.ordering Index dc_example_dc_com_ds-sync-hist.ordering true 1 0 0 0 0 mail.equality Index dc_example_dc_com_mail.equality true 2000 0 0 0 0 mail.substring Index dc_example_dc_com_mail.substring true 7235 0 0 0 0 entryUUID.equality Index dc_example_dc_com_entryUUID.equality true 2002 0 0 0 0 Total: 20
If an index contains a non zero value (N) in the undefined column, it means N index keys have reached the index entry limit and are no longer maintained. This can be normal, for example with the ObjectClass equality index, where the vast majority of entries will have the same objectclasses (top, Person, organizationalPerson, inetOrgPerson). But, for other attributes, such as cn, it may indicate that the index entry limit is too low.
Finally, OpenDJ has an option to do a live analysis of search filters and how they use indexes. To enable live index analysis, simply enable it for the database backend that contains the data :
dsconfig set-backend-prop --backend-name userRoot --set index-filter-analyzer-enabled:true \ --set max-entries:50 -h localhost -p 4444 -D cn=Directory\ Manager -w ****** -n -X
The max-entries parameter specifies how many filter items are being analyzed and kept in memory. Only the last max-entries will be kept. If there is a huge variety of requests against the directory service, you might want to increase the number. However, keep in mind that the analysis is kept in memory, and the higher the number the largest the impact on the overall performances of the server.
We do not recommend that you leave the index analysis enabled all the time, especially in production. The index analyzer should be used to gather statistics over a flow of requests for a short period of time, and should be disabled afterwards to free the resources.
The result of the index analyzer can be retrieved under the cn=monitor suffix, more specifically as part of the database environment of the backend.
$ bin/ldapsearch -p 1389 -D cn=directory\ manager -w secret12 \ -b "cn=userRoot Database Environment,cn=monitor" '(objectclass=*)' filter-use dn: cn=userRoot Database Environment,cn=monitor filter-use: (uid=user.*) hits:1 maxmatches:20 message: filter-use: (tel=*) hits:1 maxmatches:-1 message:presence index type is disabled for the tel attribute filter-use: (objectClass=groupOfURLs) hits:1 maxmatches:0 message: filter-use: (objectClass=groupOfEntries) hits:1 maxmatches:0 message: filter-use: (objectClass=person) hits:1 maxmatches:20 message: filter-use: (objectClass=ds-virtual-static-group) hits:1 maxmatches:0 message: filter-use: (aci=*) hits:1 maxmatches:0 message: filter-use: (objectClass=groupOfNames) hits:1 maxmatches:0 message: filter-use: (objectClass=groupOfUniqueNames) hits:1 maxmatches:0 message: filter-use: (objectClass=ldapSubentry) hits:1 maxmatches:0 message: filter-use: (objectClass=subentry) hits:1 maxmatches:0 message:
hits represents the number of time this filter was used. the maxmatches represents the maximum number of entries that were returned for that filter.
Index analysis and tuning is not a simple task, and I recommend to play with these tools a lot on a test environment to understand how to get the best out of them. But, as you can see, OpenDJ provides you with all the tools you need to get the best performances out of your LDAP directory.