Thursday, April 27, 2017

Benchmarking Mule Batch Approaches

Note: This blog post demonstrates that when fine-tuning a Mule application, the processing of really big volumes of data can be achieved in a matter of minutes. A next blog post, written by Irving Casillas, shows you exactly how to do this but adding resiliency. See http://blog.ioconnectservices.com/2017/08/mule-batch-adding-resiliency-to-manual.html.

Bulk processing is one of the most overlooked use cases in enterprise systems, even though they’re very useful when handling big loads of data. In this document, I will showcase different scenarios of bulk upsert to a database using Mule batch components to evaluate some aspects of performance. The objective is to create a best practice of what’s the best configuration of Mule batch components and jobs that process big loads of data for performance purposes. Our goal is to optimize timing executions without compromising computer resources like memory and network connections.


The computer used for this proof has a very commodity configuration as follows:
  • Lenovo ThinkPad T460
  • Intel Core i7 2.8 GHz
  • 8 GB RAM Memory
  • HDD 100GB SSD


The software installed is:
  • MySQL Server v8.0.
  • Mule v3.8.2.
  • Anypoint Studio v6.2.2.
  • JProfiler v9.2.1


The evaluation consists of the processing comma-separated values (.CSV) file which contains 821000+ records and 40 columns for each record.  First, we consume the file content, later we transform the data, and then we store them into a database table. The file sizes 741 MB when uncompressed. To ensure that each record has the latest information, the database queries must implement the upsert statement pattern.


Three approaches are shown here:
  1. Out-of-the-box Mule batch jobs and batch commit components.
  2. Using a custom pagination algorithm.
  3. A hybrid solution.


The first two approaches use the bulk mode of the Insert operation in the Database connector. The third approach uses the Bulk operation in the Database connector.


One important distinction here: The term “upsert execution” is used to refer to the action of inserting a new record but if you find a record with that key then update it with the new values if any. In MySQL this is called “Insert … on duplicate key update” and you can find the documentation here https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html. The term “Insert operation” is used to refer to the Insert configuration element in Mule. You can find documentation here https://docs.mulesoft.com/mule-user-guide/v/3.8/database-connector-reference#insert. In other words, we are executing MySQL upsert statements with the Insert operation from the Db component in Mule.


You can find the code in our GitHub repository https://github.com/ioconnectservices/upsertpoc.

Approach 1: Out-of-the-box Mule batch jobs and batch commit components

Some background about the Mule Batch connector. You can configure the block size and the maximum amount of threads of the connect. Moreover, in the Batch Commit component, you can also configure the commit size of the batch. This gives a lot of flexibility in terms of performance and memory tuning.


This flexibility comes with a price: You must calculate the amount of memory the computer will be able to manage for this process only on each block size. This can be easily calculated with the following formula:


Maximum memory = Size of the record * block size * number of maximum threads


For instance, in our test, the size of the record -which is a SQL upsert statement- is 3.1 KB. The settings for the Batch component are 200 records of block size and 25 running threads. This will require a total of 15.13 MB per block size. In this case, this will be executed a minimum of 4105 times approximately (remember the 821000 records?). Also, you must verify that your computer host has enough CPU and memory available for the garbage collection too.


The flow

batchFlow.png
Figure 1. The batch flow.


  1. The batch job is configured to use a maximum of 25 threads and a block size of 200 records.
  2. The file content is transformed into an iterator which is then passed to the batch step.
  3. The commit size of the Batch Commit is set to 200. This matches the block size of the batch job, meaning the full block will be committed.
  4. The Database connector is using an Insert operation in bulk mode and it’s parameterized.


A simple flow as all the pagination and bulk construction is done by Mule, we just need to worry about the SQL statement and performance.

The metrics

As explained before, batch jobs are designed to run as fastest as possible by running multiple processes in threads.


  1. The total time of execution is ~7 minutes.
  2. 4105 round trips are made to insert the records into the database.
  3. The maximum memory used during the execution of the batch is 1.29 GB.

Approach 2: Custom pagination

The overall idea here is to read the file, then to transform the content into a list and iterate through the list to create a page of records that allows us to construct a series of SQL queries. Later, the queries are sent in bulk fashion to a Database connector with the bulk mode flag enabled.

The flow

manual_pagination.PNG
Figure 2. Custom pagination flow.


  1. The number of pages is determined based on the page size. In this case, the page size is set to 200 like in the first approach.
  2. The For Each scope component takes the number of pages as the collection input.
  3. The CSV File Reader consumes the file and builds the map that will be used as the payload, and then it maps the CSV fields to columns in a database record.
  4. The created queries are passed to the Database asynchronous scope which executes the bulk statements with a maximum of 25 threads like in approach 1.

The metrics

  1. The total time of execution was 5 minutes average. The processing of the SQL bulk is done in a single thread, but the upsert execution is done asynchronously.
  2. The total of round trips to the database was 4109.
  3. The memory consumption was at a maximum of 1.42 GB.

The extra approach 3: Hybrid

This is an approach that was also tested but the results were not as satisfactory as the two above in regards to execution timing but it showed the least memory consumption. The results of the testing are presented next.


The SQL bulk query is constructed manually but the pagination is now handled by the batch job.

The flow

hybrid-approach.PNG
Figure 3. Batch manually building the SQL bulk statement.


  1. The CSV content is transformed into an iterator to be passed to the batch process step.
  2. The batch process handles the block size and pagination.
  3. In the batch process, each record is used to construct the SQL for one record in particular. The SQL query it is added to a collection that will be used to create one single SQL statement with all the queries appended to be processed in bulk.
  4. The Batch Commit connector handles the same size as the block size of the batch job.
  5. The Database connector uses the Bulk Execute operation to insert the records into the database.

The metrics

  1. The total time of execution when completed successfully is 18 minutes average.
  2. The number of connections matches the maximum 25 threads running in the batch job. This gives a total close to 4105 roundtrips to the database.
  3. The maximum memory used during the execution of the batch is 922 MB.


Conclusion

Many times, a thoughtful design is more helpful than the out-of-the-box features that any platform may offer. In this scenario, the custom pagination approach is the fastest to upsert the records into the database than the batch approach. However, a couple of things to consider as the outcome of this proof of concept:
  • The custom pagination approach is more flexible at treating data that can’t be split in records so easily.
  • For scenarios where you have a source with millions of records coming from separate systems, it’s generally a good practice to consume the content in a stream fashion to not blow the memory or the network.
  • It’s easier to maintain the batch job flow than the custom pagination flow.
  • Using Mule’s batch jobs gives you more facilities for batch result reporting as it gives you the total, succeeded and failed records.
  • If memory management is the most important factor to honor in your solution, then a hybrid algorithm approach is better as it shows the best numbers in memory.


As side experiments, I also observed that using the Bulk Execute operation in the Database connector is slower in performance than the Insert operation in bulk mode. Moreover, the parameterized mode allows you to take the data from any source -trusted or untrusted- and still have the queries sanitized.

10 comments:

  1. Nice Article! Could you please clarify how your custom pagination works? For approaches 1 and 2 being truly comparable, you need to take into account that the batch module stores all the records into a persistent queue before it starts executing any of the steps. That's because that module is built to prioritize reliability over performance. So notice that while a 100% in memory solution (as approach 1 seems to be) would be faster, it would support a system crash without dataloss.

    Would be great if you could clarify.
    THanks!

    ReplyDelete
    Replies
    1. As you mention, this exercise was thought considering memory consumption and execution as the priority. I think reliability for a separate post later.

      To answer your question, when the file is read it calculates the number of pages based on the total lines in the file and the specified batch size (pages = total of lines / batch size).Then it reads the file line by line, creates the SQL statement and collects the batch so that it executes the asynchronous upsert to the database. This reading is done over and over again until all calculated pages are completed.

      Hope this helps you to better understand my approach. Like I said before, reliability is thought for a later exercise.

      Delete
  2. Thanks for the read, I know setting all this up and testing is time consuming. Your side experiment. "Bulk Execute operation in the Database connector is slower in performance than the Insert operation in bulk mode" . That is interesting and not what I would have expected given that bulk execute should not need to do all the steps (like prepare statement) that the insert operation would perform. Can you elaborate on where those timings were taken. For example, the times could have been taken from the database side during processing or they could have been taken from Mule prior to the wire transfer. Just trying to understand all the elements that could affect those timings.

    ReplyDelete
    Replies
    1. I was surprised too about this. The metrics were taken using JProfiler, which contains a probe module to track activity for JDBC connections.

      https://resources.ej-technologies.com/jprofiler/help/doc/views/probes/databases.html

      In this tracker, you can see how long it takes a connection to complete the operation. It means it tracks when java.sql.Statement#executeBatch starts and finishes.

      This timing was double the time it takes to an Insert operation in bulk mode. Same batch size for both scenarios.

      Delete
  3. Hi Victor - Firstly this is a very good article. Tuning the block size and threads do help in the batch processing effectively and your post has done a great job in picturing that.

    Now, have you run your profiling of the memory & CPU while the batch is in it's input phase. I believe you should have seen a hockey stick spike in the memory utilization. That's where the critical section of the batch is I believe. Streaming is a good option to that but then Dataweave is not good at that. What was your observation in that?

    ReplyDelete
    Replies
    1. Hi, thanks for reading and your kind words.

      Yes, as you mention, the input phase is really the heavy weight part. If you don't handle properly the dataweave component you'll end up with OutOfMemoryError like I did at the beginning of my experiments. To overcome this problem, the transformation I propose, and comes along with the streaming technique, is that you use an iterator. This way you will keep in the heap only a portion of the content file keeping you from the horrible OOME in a more manageable amount of memory. I don't have the exact number off hand but will look in my archives.

      https://docs.mulesoft.com/mule-user-guide/v/3.8/dataweave-types#iterators

      See this piece of XML in the batch project
      https://github.com/ioconnectservices/upsertpoc/blob/master/batch/src/main/app/batch-approach.xml#L28

      One thing to note is that this phase takes a little more than 1 minute.

      Hope this makes sense to your question.

      Delete
  4. Nice post. Batch processing is such an integral part of business operations that you should approach modernization in phases and with proper planning. Get the best Cloud PBX in Baton Rouge.

    ReplyDelete
    Replies
    1. Thank you, Michael.
      Glad you liked it.

      Delete
  5. It's great that people are interested in this topic.

    ReplyDelete
  6. Great answer, very nice that you give such answer here.

    Jessica

    ReplyDelete