Han Solo: Multiprocessing Strikes Back

Reading Time: 6 minutes

Neither that long ago, nor that far, far away, but in this very blog, we talked about the first iteration of what we call Han Solo, or, in other words, the Feed Processor.

This is a follow-up to that previous post, and a main course is best served after the entree. However, we’ll try to make so the appetizer will suffice.

Previously, on Han Solo

Han Solo is a project that ingests nightly feeds, provided by our integrated retailers and other sources, and produces updates for the catalog to be as accurate as possible.

We found some problems in our project. In short:

  • We had a problem with code quality. Little by little, the codebase became cumbersome and complicated to work with.
  • The coupling of the several stages that make up the pipe structure of Han Solo was too tight, as they had some dependencies that made it hard for them to be tested separately.
  • In the nightly runs, we saw that we’d eventually run out of processing time since all the feeds need to be ingested during a specific time window.

Now, don’t think that problems were the only motivation to move forward, we also saw some room for improvement!

Let’s welcome Han Solo v2: the Feed Processor

via Tenor

To make this easier to follow, we’ll call the first iteration Han Solo and the second iteration Feed Processor.

A decoupled nature: take two

One of the problems was that Han Solo was synchronous. We start a run and we wait for it to be resolved to move on to the next one.
We realized that there’s no reason to do this since each feed is independent from the other in terms of processing. So, for that, we created a brand new architecture that would allow us to start several feed runs at once.
This way, at any given time, we can find feed runs being processed at different stages!

But, multiprocessing is not flawless, and the processing of each feed would still be bound to the same host where it got started, making it harder for us to keep an evenly distributed level of processing in each of them. We have yet to know the main character of this upgrade: RabbitMQ.

Feed Processor flow

Now, some of these stages may seem familiar. Nevertheless, there’s a big difference: now, each stage has a RabbitMQ consumer and at least one RabbitMQ producer.

Let’s highlight the common ground with its predecessor:

  • Until Feed processing, each RabbitMQ message accounts for the whole feed. After that, each message represents an item of the feed.
  • The overall flow is more or less kept in the same way, and while the needs have evolved, they are mostly the same (albeit bigger 👀).

While the key differences are:

  • Each stage is addressed now by a separate component, with its own logic and subcomponents. The component that connects one stage to another/others is called the connector.
    Each connector has a subcomponent, which we call the worker, that handles the business logic in the specific stage. The worker’s output is the one that will be sent by the connector to the next stage/s.
  • The Feed Processor only accounts for the processing part (blue) of the flow. The feed fetching and normalizing tasks have been moved to their very own project.
  • Each connector gets to work when it receives a message and passes the results on to the next stage/s.
  • Since messages are queued into RabbitMQ, if a process dies, there’s very little risk of data loss.
    Still, to protect ourselves against such cases, we make use of the confirmation mechanism that AMQP offers to ensure that the flow is reliable and the data, whole.
  • Having the connectors separated and completely independent from one another means we can increase or decrease the number of processes running a stage if we see there’s more or less demand for a specific stage.
  • The finish line for a feed run is not “when the process is done” anymore since it’s not done in a synchronous way.
  • The communication between different processes that consume or produce messages is no longer language-specific.

What’s the opposite of sync, you say? 🥁

Enter async

Thanks to RabbitMQ, we are now able to spread the processing of the feed through Feed Processor’s stages, even if they are running in different hosts.

Furthermore, this immediately translates into an improved balancing of the tasks. For example, if two feeds get run at the same time and one has 10 items and the second one has 1000, in Han Solo one host would process 10 and another host would process 1000.

Same scenario, but using Feed Processor. Let’s think that we are at the stage Item processing. For this stage, we have two connectors per host, and we have two hosts. The Feed Processor will then process the 1010 items spread throughout all the four connectors in the two hosts that are currently consuming!

What does this mean? We just broke the host boundary and balanced the work!

The time window v2

Changing the flow from synchronous to asynchronous is something that allows us to fit more processing within our time window, allowing us to fit more feeds in the same slot!

Don’t wait for that result

There are some improvements involved in how, even though we are running the feed processor during the same time window as before, we’re managing close to three times the number of stores (and counting!) that we were handling with Han Solo.

One of them is not waiting for the feed to be done before processing others, but this means we lose observability on when it’s actually done.

Since each item is an asynchronous message, and we need all of them to be processed before we decide that we are done, we had to build a new “feed run is done” flow. This new way is not done by waiting, but by checking in periodically on the progress.

In Han Solo we had the constraint of having a bulk update at the end of the nightly runs (check A Brief History of Price Updates (Part 3) for a quick refresher!), but this constraint has been lifted. Now, in Feed Processor, a feed run being done means we can trigger the updates in the next services instead of waiting for the morning bulk update!

Retry that stage

Ever heard uncle Ben’s famous statement?

via imgflip

Since each stage is a stand-alone, each stage will handle its own retries if necessary. If the processing is successful, the output will just reach the next stage as input as if the error never happened and, if we can’t recover from the error, we’ll stop monitoring the feed to see if it’s done and store the error for further investigation.

This retrying logic allows for:

  • Better times: the whole flow is not blocked while retrying a problematic feed, except for a specific worker of a given stage.
  • Better times (reprise): we retry just the bit that failed instead of the entire thing!
  • Better traceability: it’s much easier to know what happened if the problem can affect just a small part of the code
  • Better decoupling: each stage decides when, if and how many times to retry according to what the problem was

Even more!

Switching over to RMQ async has had more advantages than the ones mentioned, and there were new challenges and changes that had not only to do with being asynchronous but rather with growing more and more.

via Tenor

There are some more crucial changes that were possible only thanks to migrating to Feed Processor. These are not only performant, but also brand new functionalities that bring a lot of value!

Updating prices, now in async flavor

In the early days of the Feed Processor, we still had to communicate with the price service in a sync way. The result was not important, but knowing if the request reached the service was, so we had to wait (again?!) for it to tell us that it reached it successfully. While this is irrelevant for a single price update, once we put together all the price updates during the night, it was quite costly.

Eventually, our price update service enabled communication via RabbitMQ as well and, alas, another wait bites the dust! 🚀

The item updates

We’ve gone into detail about prices, but what about item updates?

The catalog service receives the information for the creation or update of an item, concerning the data that the app will show to the user, such as the name of the item, the image, etc.
Both Han Solo and Feed Processor send item creation payloads to the catalog service, but Han Solo wasn’t flexible enough to add the update feature.

In Feed Processor, we were finally able to process the information for existing items and send it forward so they get automatically updated.

Feed Processor: New Horizons

Even with all these new and shiny features, the journey is not done and there’s still much more to be done and room to improve!

If you want to be part of the next shiny bits, take a look into our current openings !

Author: @arudp

Leave a Reply

Your email address will not be published.