**Problem Statement:**
**Implement Rabin Karp hashing based Pattern-Matching Algorithm.**
Example:

** Text (T): ** * rabing karp uses hashing technique*
**Pattern (P) :** * hashing*
**n **: length of text

**m : **length of pattern

**Download Source Code**
**Solution:**
Rabin Karp Algorithm finds a match of the pattern in the text by using hashing. Match is found only if hash of Pattern and hash of text of m characters gives same result.

Our objective is to reduce complexity from O(m x n ) as was in case of Brute force. We will have to desigyn a hash function which take O(1) complexity for finding hash of the pattern.

We will consider Horner's rule to calculate the hash of the pattern and hash of text of

*'m' *length.

Let us denote pattern as polynomial

here we will call 'x' as radix , and above polynomial can also be represented as

The pattern **'P'** will match Text '**T**' iff T[i , i+1 , .....i+m-1] = P [0,1,......, m-1 ]

**t****i**** : **represents the hash value from T[i , .....i+m-1]

and **t****i+1** : represents the hash value from T[i+1 , .....i+m-1 , i+m]

Lets consider a Text containing all numbers as characters , i.e. Character set is {1,2,3,4,5,6,7,8,9,0} and

lets say radix = 10, therefore calculating the numerical value of "35678" which will be 35,678.

Therefore, **p = P[m-1] + 10 { P[m-2] + 10 { P[m-3] + 10 { ...... + 10 { P[1] + 10 * P[0] } } } }**

**for **"35678"

p =** 8 + 7*10 + 6*102 + 5*103 + 3 *104**

= 8 + 10 { 7 + 10 { 6 + 10 { 5 + 10 * 3 } } }

we assume that to calculate this hash takes O(1) complexity , assume if **t****i** does not match with pattern P, we will have to calculate T(i +1 ) next , here we can leverage **t****i** to calculate **t****i+1**.

**t****i** = **T[i+m-1] + 10 { T[i+m-2] + 10 { T[i+m-3] +10 { ...... + 10 { T[i+1] + 10 * T[i] } } } }**

Now,

**t****i+1**** = 10 (t****i – 10m-1 T[i]) + T[i+m]**

N.B.: **10m-1** value can also be pre-computed.** **

Now consider 35678 does not match and next digit to be considered is 4, then

- Remove Most Significant digit 35678 - 3 * 10000 = 5678
- Multiply above value by 10 and add 4 , 5678*10 + 4 = 56784

Below is the implementation:

/**
* Rabin Karp uses hashing technique to search for Strings
* @author PRATEEK Rathore
*/
public class RabinKarp {
//Contant prime number
private static final int CONSTANT_PRIME = 101;
//Hash Value of the pattern which will remain constant through out the problem
private long pathash;
private int radix;
//Input pattern
private String pat;
//Pattern Length
private int patLen;
//Its the place value of Most Significant bit
private long RM1;
//Preprocessing of pattern in the constructor
public RabinKarp( String pat) {
radix = 256;
patLen = pat.length();
this.pat = pat;
RM1 = 1;
for (int i = 1; i < patLen; i++)
RM1 = (radix * RM1) % CONSTANT_PRIME;
//return hash value of pattern
pathash = hash(pat, patLen);
}
//Return hash value of given string from 0 to the input length
private long hash(String pat, int len) {
long h = 0;
for (int j = 0; j < len; j++)
h = (radix * h + pat.charAt(j)) % CONSTANT_PRIME;
return h;
}
/**
* Sub-routine called when hash-function is equal,
* character by character comparison is done.
*/
private boolean doesMatch(String text, int currIndex) {
for (int j = 0; j < patLen; j++)
if (pat.charAt(j) != text.charAt(currIndex + 1 + j))
return false;
return true;
}
/**
* Search for the pattern in the input string
* @param text
* @return: index where pattern is found in text
*/
public int search(String text) {
int textLen = text.length();
if (patLen > textLen)
return -1;
long textHash = hash(text, patLen);
for (int i = 0; i < textLen - patLen; i++) {
textHash = radix * (textHash - RM1 * text.charAt(i))% CONSTANT_PRIME +
CONSTANT_PRIME + text.charAt(i + patLen) % CONSTANT_PRIME;
if (textHash == pathash && doesMatch(text, i))
return i;
}
return -1;
}
public static void main(String[] args) {
String pat = "prateek";
String txt = "Welcome prateek to Bangalore";
RabinKarp rabin = new RabinKarp( pat);
int index= rabin.search(txt);
System.out.println("Text found at "+ index + ": " + txt.substring(index, index+ pat.length()+1));
}
}

Time Complexity:
Worst Case is : O(m ( n- m+1 ))

Best and Average Case is : O (m + n )

References:

1. http://en.wikipedia.org/wiki/Horner's_method

Please comment and post your suggestions.

Happy Coding !! :)