What is batch processing?
Batch processing is the method computers use to periodically complete high-volume, repetitive data jobs. Certain data processing tasks, such as backups, filtering, and sorting, can be compute intensive and inefficient to run on individual data transactions. Instead, data systems process such tasks in batches, often in off-peak times when computing resources are more commonly available, such as at the end of the day or overnight. For example, consider an ecommerce system that receives orders throughout the day. Instead of processing every order as it occurs, the system might collect all orders at the end of each day and share them in one batch with the order fulfillment team.
Why is batch processing important?
Organizations use batch processing because it requires minimal human interaction and makes repetitive tasks more efficient to run. You can set up batches of jobs composed of millions of records to be worked through together when compute power is most readily available, putting less stress on your systems. Modern batch processing also requires minimal human supervision or management. If there is an issue, the system automatically notifies the concerned team to solve it. Managers take a hands-off approach, trusting their batch processing software to do its job. More benefits of batch processing follow.
What is the history of batch processing?
Batch processing is more than a century old, although the technicalities of how it works have continually evolved. The first instance of batch processing dates back to 1890, when an electronic tabulator was used to record information for the United States Census Bureau. Census workers marked data cards—called punch cards—and processed them in batches through an electromechanical device. By the 1960s, developers could schedule batch programs on magnetic tape for computers to run sequentially throughout the day. Batch jobs also became commonplace as the mainframe computer improved and became more powerful and efficient. Modern organizations use software-based batch applications for common business processes such as generating reports, printing documents, or updating information at the end of the day.
What are examples of jobs batch processing can automate?
Batch process systems are used to process various types of data and requests. Some of the most common types of batch processing jobs include:
- Weekly/monthly billing
- Payroll
- Inventory processing
- Report generation
- Data conversion
- Subscription cycles
- Supply chain fulfillment
What are some use cases of batch processing systems?
There are multiple use cases of batch processing systems. Key examples follow.
Financial services
Financial services organizations, from agile financial technologies to legacy enterprises, have been using batch processing in areas such as high performance computing for risk management, end-of-day transaction processing, and fraud surveillance. They use batch processing to minimize human error, increase speed and accuracy, and reduce costs with automation.
Software as a service
Enterprises that deliver software as a service (SaaS) applications often run into issues when it comes to scalability. Using batch processing, you can scale customer demand while automating job scheduling. Creating containerized application environments to scale demand for high-volume processing is a project that can take months or even years to complete, but batch processing systems help you achieve the same result in a much shorter timeframe.
Medical research
Analysis of large amounts of data—or big data—is a common requirement in the field of research. You can apply batch processing in data analytics applications such as computational chemistry, clinical modeling, molecular dynamics, and genomic sequencing testing and analysis. For example, scientists use batch processing to capture better data to begin drug design and gain a deeper understanding of the role of a particular biochemical process.
Digital media
Media and entertainment enterprises require highly scalable batch processing systems to automatically process data—such as files, graphics, and visual effects—for high-resolution video content. You can use batch processing to accelerate content creation, dynamically scale media packaging, and automate media workload.
How does batch processing work?
While batch processing applications vary depending on the type of task that needs to be done, the basics of any batch job remain the same. The user can run batch jobs by specifying the following details:
- Name of the person submitting the job
- Batch processes or programs that need to run
- System location of the data input
- System location for processed data output
- Time, or batch window, when the batch job should be run
The user also specifies the batch size, or the number of work units that the system needs to process in one complete batch operation. Some examples of batch size include:
- Number of batch file lines to read and store in the database
- Number of messages to read and process from a queue
- Number of transactions to sort and send to the next application
During the batch window, the batch processing system uses the batch size information to allocate the resources needed to run the batch job efficiently. Modern systems can run hundreds of thousands of batch jobs on premises or in the cloud.
Dependencies
Batch job tasks can run sequentially or simultaneously. Sequences can differ depending on whether an earlier task is completed successfully. Examples of dependencies include a customer making an order in an online store or paying a bill. A dependency can also be set up to initiate a job processing cycle.
Cron commands
A cron command is a batch job that runs regularly. You can set up recurrence patterns for batch jobs—for example, setting up a job to invoice for subscriptions at the end of every month.
How can you monitor batch processing?
While batch processing systems work with minimal input from personnel, they still need some oversight. To monitor batch processes, you can set up alerts—or exceptions—that are sent when the batch job succeeds, fails, or has finished running.
Monitors
Monitors in batch processes look for abnormalities, such as a job taking longer than it should to complete. In this instance, it would stop the next job from beginning and inform the relevant staff of the exception.
Post-processing analysis
You can view the history of a batch job after it has been processed. Most batch processes include log files that record messages while the job was running.
What is the difference between batch processing and stream processing?
Whereas batch systems process large volumes of data and requests in sequential order, stream processing continually analyzes data that flows through a system or between devices. Stream processing monitors real-time data and continually passes it on in the network. It requires more processing power to monitor the large amounts of data.
When the size of data being streamed is not known or infinite, streaming data can be preferable to batch processing. As a result, stream processing is commonly used for business functions such as cybersecurity, Internet of Things (IoT), personalized marketing services, and log monitoring.
Given their complementary capabilities, some enterprises have implemented a hybrid system that includes batch processing and stream processing in their daily operations.
How does AWS help with batch processing?
You can save up to 90% on fully managed batch processing with AWS Batch. AWS Batch dynamically provisions the optimal quantity and type of compute resources—such as CPU or memory-optimized instances—and eliminates the need to install and manage batch processing system infrastructure. You can spend less time managing infrastructure, and more time analyzing results and solving problems.
You can also run your batch workloads on Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances. Amazon EC2 Spot Instances are your unused Amazon EC2 capacity available at up to a 90% discount compared to On Demand Instances prices. Spot Instances are ideal for batch processing applications because you can run hyperscale workloads at a significant cost savings, or you can accelerate your workloads by running parallel tasks.
Get started with batch processing by creating an AWS account.
Next steps with batch processing on AWS
Get started building with AWS Batch in the AWS management console.