No, I ain't paying for full-text search | Implementing Full-Text Search in Firestore

It's been a long time since I haven't posted anything here. The life, the work, the hustle, everything's been going hard on me these days. I've switched my mother tongue to Satirish™. This is not my best piece of writing, so viewer discretion is a must.

My last endeavor, a website where ELT teachers can share their resources, has stuck in a phase where I needed to implement a feature that the users can search for resources with a simple search bar. It kinda looks like Google's home page.

home page of my new project

The problem is this: The simpler things are for the end user, the more complex it gets to implement for the developer.

The model looks like this (in Typescript):

type Resource = {
  createdAt: number,
  uid: string,
  title: string,
  shortDescription: string,
  longDescription: string,
  files: string[],
  tags: string[],
} & Model; // model is another type that has "id" field

So, the search I need to implement has to do these things:

  • Check if title contains any token in the query.

  • Check if shortDescription contains any token in the query.

  • Check if longDescription contains any token in the query.

  • Check if tags contain any token in the query.

This is a rather complex query, and Google, being a billion dollar company, has limited functionality on Firestore. Mainly:

  • You can compound 30 filters at most in a single query.

  • There's nothing like LIKE like in relational databases in Firestore (are you confused yet? double snap on your face). So, you can't search a substring in a string field in Firestore.

So, Firebase docs says they can't handle a complex query (despite being owned by a multibillion peak engineering company) and suggest me to use another third-party company (multimillion this time).

Don't get me wrong. Full-text search solutions are great because they can solve fuzzy searches just like in my case.

I've checked all the solutions provided. Algolia seemed the best fit since it allowed me to do 10k searches free every month. On the other hand, I duplicate my data, sending it to Algolia just for some fancy search functionality. And the more data I store, the costlier it gets. Also, integrating another service makes testing challenging.

Another thing is, the integration with Firebase extensions takes 2 cents every month for Algolia, which is not an expensive amount, but I have a big problem: I live in the third world.

I live in the third world and people tend to vote for demagogues here, which, as a result, makes life more expensive every second.

So, as an economically-challenged personality who cannot afford a satire attitude towards life but does it anyway, my brainmeats been overheating for the last week to make it as free and functional as possible.

FREE FULL-TEXT SEARCH IN FIREBASE (not rly, let's call it semi-text search)

Remember the Resource model? Let's throw a searchQueries: string[] field in there:

type Resource = {
  createdAt: number,
  uid: string,
  title: string,
  shortDescription: string,
  longDescription: string,
  files: string[],
  tags: string[],
  searchQueries: string[], // you're here
} & Model;

You might be asking: What are we gonna do with this? We gon trim, split and toLowercase every field we want searchable into it. Here's a sample createResource Firebase function:

const createResource = functions.https.onCall(async (raw: AddResourceSchemaType, context) => {
  if (context.auth?.uid === undefined) {
    console.error("anon user call");
    return;
  }

  // i validate and parse the data
  const data = AddResourceSchema.parse(raw);

  const searchQueries = [
    // make title field searchable
    ...data.title.trim().split(" ").map(s => s.toLowerCase()),
    // make tags field searchable
    ...data.tags.map(s => s.toLowerCase()),
  ];

  await firestore.collection("resources").add({
    // ... other fields ...
    searchQueries,
  } as Omit<Resource, "id">);
});

Now that we have Resource.searchQueries field, we can filter the resources with array-contains-any operator. (Thank god we have this operator at least).

There's only one problem, though: As the docs for array-contains-any states:

Use the array-contains-any operator to combine up to 30 array-contains clauses on the same field with a logical OR.

I can live with that, my users can live with that as well. As a matter of fact, anyone who is searching a 30-word query on the website? I call'em crazy.

That's why, my custom useSearch hook implementation looks like this:

const useSearch = (): UseSearchReturnType => {
  // other states and contexts

  const [ searchParams, ] = useSearchParams();

  const params = {
    uid: searchParams.get('uid'),
    q: searchParams.get('q'), // this is what i use
    tags: searchParams.get('tags'),
  }

  // create a `(QueryFieldFilterConstraint | QueryLimitConstraint)[]` to use it with firestore later on
  const filters = [
    ...(
      params.uid === null
        ? []
        : [where('uid', '==', params.uid)]
    ),
    ...(
      params.q === null
        ? []
        : [
          // here i use the filter on searchQueries
          where(
            'searchQueries',
            'array-contains-any',
            params.q.split('|').slice(0, 20), // firestore limit is 30, 20 to be safe
          )
        ]
    ),
    ...(
      params.tags === null
        ? []
        : [
          where(
            'tags',
            'array-contains-any',
            params.tags.split('|').slice(0, 5), // max 5 tags, cuz why not?
          )
        ]
    ),
    limit(100),
  ].filter((f) => f !== null).map(f => f!);

  // return a widget or smn
}

And tell you what?