# Looking for search substrings in a list of products

Reently, Sam asked me to help with a problem.

I am trying to create a function whereby users of my spreadsheet can type in 1–3 words to search a range of parts descriptions (text entries) and pull up their associated parts numbers and descriptions.

The intent is that they don't know the part number to order or the exact description, so they enter up to three words and it returns all the part numbers with those three words (or some of those words) and the descriptions of thos parts. Hopefully that makes sense.

I don't want it to be case or context sensitive. Is there a way to do this in excel?

Well yes there is.

Let's say that the user enters their search string into cell B3. The first job is to parse the search string, separating it into its three words. To help with this, I've added a space to the end of the search string.

=B3&" "

To find the first word in the string, we have the following formula:

=IF(ISERROR(TRIM(LEFT(B4, FIND(" ", B4,1)-1))),"",TRIM(LEFT(B4, FIND(" ", B4, 1)-1)))

The core of this formula is

LEFT(B4, FIND(" ", B4, 1)-1)

The FIND function is looking for the first instance of a space in the string, and taking one off this value. The LEFT function is bringing back the leftmost characters up to that point. So if the first space appears in character 8, then it's bringing back the first seven characters.

This I've trimmed, which removes double-spaces and any leading or trailing spaces.

And I've wrapped the whole thing in a big IF(ISERROR) clause. This basically says, if this whole formula yields an error, bring back nothing; otherwise, bring back the string.

The formulae for the second and third word are slightly more complex, but follow the same sort of logic.

=IF(ISERROR(TRIM(MID(B4, LEN(B7)+2,FIND(" ", B4, LEN(B7)+2)-LEN(B7)-1))), "", TRIM(MID(B4, LEN(B7)+2, FIND(" ", B4, LEN(B7)+2)-LEN(B7)-1)))

=IF(ISERROR(TRIM(MID(B3, LEN(B7)+LEN(B8)+3, 100))), "", TRIM(MID(B3, LEN(B7)+LEN(B8)+3, 100)))

Note, if the user enters four or more words, the third word formula currently breaks. But that could easily be solved. (I was merely following the business requirements.)

I've then created uppercase versions of each of these sub-strings in H1, I1 and J1.

=UPPER(B7)

In my example, the user has typed in <banana chicken muffin>. For some reason, they're interested in products with any one of those words in them.

Now let's say that we have an inventory list in column F, starting in row 3. In column G, we uppercase that using the same UPPER formula. That means we'll be looking for uppercase words within uppercase strings, so we should be good.

So we have column labels in H, I and J for each of our keywords, and our products appear in the rows. Let's say the first product is chicken curry.

In the intersections (e.g. H3), we put the following formula:

=IF(H\$1="",0,IF(ISERROR(FIND(H\$1,\$G3)),0,1))

This is looking for the word BANANA (the first word of our search string (H1) in cell G3 (which contains CHICKEN CURRY). If it can't find it, it'll return a 0. Otherwise, it'll return a 1.

This formula can be copied down the column and across to columns I and J. So looking for <banana> in <chicken curry> will yield a 0. But searching for <chicken> in chicken curry will yield a 1, as will the search for <banana> against <banana muffin>.

So we now have a matrix of 1s and 0s.

In column D, I've created a simple MAX against the relevant row.

=MAX(H3:J3)

This gives a 1 if that product was found in any of the keywords was found (i.e. the product is relevant) and a 0 if not. And then I've added an incrementing serial number in column E.

=IF(D4=1,E3+1,E3)

So if the current product has been found, add a 1 to the previous serial number. Otherwise, repeat the previous product's serial number.

So <chicken curry> has a serial number of 1, <banana split> gets a 2, but <beef rendang> repeats the 2, indicating that the product is not suitable. <banana muffin> is the next one to increment the serial number to 3.

Elsewhere, in column L say, I've put the numbers 1 through 40. And against each, I have a VLOOKUP to bring back the relevant products.

=IF(ISERROR(VLOOKUP(L2, E:F, 2, 0)), "", VLOOKUP(L2, E:F, 2, 0))

This is looking for an exact match of the serial number in column E (our list of sequence numbers) and bringing back the product name (column 2) for the first such occurrence. So against the number 2, it is looking for a 2 in the sequence number, and bringing back <banana split> the first product with that serial number. For number 3, it'll bring back <banana muffin>.

You can hide the redundant product numbers by giving them a Conditional format of <;;;> (without the < and >) if the product field is blank.

Job done!

This entry was posted in Uncategorized. Bookmark the permalink.