ICU 50.1.2  50.1.2
Public Member Functions | Static Public Member Functions
icu::BoyerMooreSearch Class Reference

BoyerMooreSearch. More...

#include <bmsearch.h>

Inheritance diagram for icu::BoyerMooreSearch:
icu::UObject icu::UMemory

Public Member Functions

 BoyerMooreSearch (CollData *theData, const UnicodeString &patternString, const UnicodeString *targetString, UErrorCode &status)
 Construct a BoyerMooreSearch object. More...
 
 ~BoyerMooreSearch ()
 The desstructor. More...
 
UBool empty ()
 Test the pattern to see if it generates any CEs. More...
 
UBool search (int32_t offset, int32_t &start, int32_t &end)
 Search for the pattern string in the target string. More...
 
void setTargetString (const UnicodeString *targetString, UErrorCode &status)
 Set the target string for the match. More...
 
CollDatagetData ()
 Return the CollData object used for searching. More...
 
CEListgetPatternCEs ()
 Return the CEs generated by the pattern string. More...
 
BadCharacterTable * getBadCharacterTable ()
 Return the BadCharacterTable object computed for the pattern string. More...
 
GoodSuffixTable * getGoodSuffixTable ()
 Return the GoodSuffixTable object computed for the pattern string. More...
 
virtual UClassID getDynamicClassID () const
 UObject glue... More...
 
- Public Member Functions inherited from icu::UObject
virtual ~UObject ()
 Destructor. More...
 

Static Public Member Functions

static UClassID getStaticClassID ()
 UObject glue... More...
 

Detailed Description

BoyerMooreSearch.

This object holds the information needed to do a Collation sensitive Boyer-Moore search. It encapulates the pattern, the "bad character" and "good suffix" tables, the Collator-based data needed to compute them, and a reference to the text being searched.

To do a search, you fist need to get a CollData object by calling CollData::open. Then you construct a BoyerMooreSearch object from the CollData object, the pattern string and the target string. Then you call the search method. Here's a code sample:

void boyerMooreExample(UCollator *collator, UnicodeString *pattern, UnicodeString *target)
{
    UErrorCode status = U_ZERO_ERROR;
    CollData *collData = CollData::open(collator, status);
    if (U_FAILURE(status)) {
        // could not create a CollData object
        return;
    }
    BoyerMooreSearch *search = new BoyerMooreSearch(collData, *patternString, target, status);
    if (U_FAILURE(status)) {
        // could not create a BoyerMooreSearch object
        CollData::close(collData);
        return;
    }
    int32_t offset = 0, start = -1, end = -1;
    // Find all matches
    while (search->search(offset, start, end)) {
        // process the match between start and end
        ...
        // advance past the match
        offset = end; 
    }
    // at this point, if offset == 0, there were no matches
    if (offset == 0) {
        // handle the case of no matches
    }
    delete search;
    CollData::close(collData);
    // CollData objects are cached, so the call to
    // CollData::close doesn't delete the object.
    // Call this if you don't need the object any more.
    CollData::flushCollDataCache();
}

NOTE: This is a technology preview. The final version of this API may not bear any resenblence to this API.

Knows linitations: 1) Backwards searching has not been implemented.

2) For Han and Hangul characters, this code ignores any Collation tailorings. In general, this isn't a problem, but in Korean locals, at strength 1, Hangul characters are tailored to be equal to Han characters with the same pronounciation. Because this code ignroes tailorings, searching for a Hangul character will not find a Han character and visa-versa.

3) In some cases, searching for a pattern that needs to be normalized and ends in a discontiguous contraction may fail. The only known cases of this are with the Tibetan script. For example searching for the pattern "\u0F7F\u0F80\u0F81\u0F82\u0F83\u0F84\u0F85" will fail. (This case is artificial. We've been unable to find a pratical, real-world example of this failure.)

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
See Also
CollData

Definition at line 108 of file bmsearch.h.

Constructor & Destructor Documentation

icu::BoyerMooreSearch::BoyerMooreSearch ( CollData theData,
const UnicodeString patternString,
const UnicodeString targetString,
UErrorCode status 
)

Construct a BoyerMooreSearch object.

Parameters
theData- A CollData object holding the Collator-sensitive data
patternString- the string for which to search
targetString- the string in which to search or NULL if youu will set it later by calling setTargetString.
status- will be set if any errors occur.

Note: if on return, status is set to an error code, the only safe thing to do with this object is to call the destructor.

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
icu::BoyerMooreSearch::~BoyerMooreSearch ( )

The desstructor.

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview

Member Function Documentation

UBool icu::BoyerMooreSearch::empty ( )

Test the pattern to see if it generates any CEs.

Returns
TRUE if the pattern string did not generate any CEs
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
BadCharacterTable* icu::BoyerMooreSearch::getBadCharacterTable ( )

Return the BadCharacterTable object computed for the pattern string.

Returns
the BadCharacterTable object.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
CollData* icu::BoyerMooreSearch::getData ( )

Return the CollData object used for searching.

Returns
the CollData object used for searching
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
virtual UClassID icu::BoyerMooreSearch::getDynamicClassID ( ) const
virtual

UObject glue...

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview

Implements icu::UObject.

GoodSuffixTable* icu::BoyerMooreSearch::getGoodSuffixTable ( )

Return the GoodSuffixTable object computed for the pattern string.

Returns
the GoodSuffixTable object computed for the pattern string.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
CEList* icu::BoyerMooreSearch::getPatternCEs ( )

Return the CEs generated by the pattern string.

Returns
a CEList object holding the CEs generated by the pattern string.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
static UClassID icu::BoyerMooreSearch::getStaticClassID ( )
static

UObject glue...

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
UBool icu::BoyerMooreSearch::search ( int32_t  offset,
int32_t &  start,
int32_t &  end 
)

Search for the pattern string in the target string.

Parameters
offset- the offset in the target string at which to begin the search
start- will be set to the starting offset of the match, or -1 if there's no match
end- will be set to the ending offset of the match, or -1 if there's no match
Returns
TRUE if the match succeeds, FALSE otherwise.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
void icu::BoyerMooreSearch::setTargetString ( const UnicodeString targetString,
UErrorCode status 
)

Set the target string for the match.

Parameters
targetString- the new target string
status- will be set if any errors occur.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview

The documentation for this class was generated from the following file: