Saturday, May 17, 2014

Load Appnexus Country List using Spring Batch: Part 1

In this post we will discuss the details of loading the list of countries from Appnexus using Spring Batch.

Sequence of steps

The following diagram shows the sequence of steps involved in fetching the list of countries from Appnexus.


Created with Raphaƫl 2.1.0MyServerMyServerAppnexusAppnexusAuthentication RequestAuthentication TokenAuthentication CompleteGET /country?start_element=0country_listPage 1num_element=100, start_element=0, count=250GET /country?start_element=100country_listPage 2num_element=100, start_element=100, count=250GET /country?start_element=200country_listPage 3num_element=50, start_element=200, count=250
Summary of steps

  1. Fetch the authentication token from Appnexus using the user name and password. 
  2. Requesting country endpoint with start element zero. 
  3. Identifying the remaining elements from the three values start element, num_elements and count. 
  4. Requesting for subsequent elements using the same endpoint. 
  5. Continue the process until all the pages are read. 


Using for loop 

The above interaction could be simply achieved by using a for loop or a while loop. But the object to this post is to present you with an alternate mechanism of achieving the same and the advantages of it. I have chosen Spring Batch for this purpose, in the rest of this post we will look at a brief introduction to Spring batch followed by the advantages of choosing it.

Introduction to Spring Batch 

Spring batch is a framework designed to provide the necessary infrastructure for the execution of batch jobs. For more information about Spring batch refer the Wikipedia entry or the official homepage.

Introduction to batch jobs 

For a job to be classified as a batch job it should possess a few characteristics. For instance,

  • it should be long-running
  • it should not need human interaction
  • most often it will be run on a periodic basis. 

These are just a few characteristics that are vital for a batch job, for more information on batch jobs and their characteristics kindly refer to the following Wikipedia entry on batch jobs.

Why batch infrastructure? 

As I suggested in one of the above paragraphs, the specific problem of loading the list of countries from Appnexus could be solved just using a for loop, but then why should use Spring batch to do it. There are a couple of disadvantages in doing it with plain Java.

  • No history of the jobs run. 
  • No restart capability. 
  • No protection against the job being running again. 
  • Retry and failure handling has to be implemented ourselves. 
  • In case of complex jobs with multiple steps, the flow has to be managad manually. 

Spring batch provides a lot of functionality like history of jobs, restarting capability, protection against unwanted reputation of jobs, error handling and custom step flow. Getting into more details about Spring batch is not the purpose of this post, kindly refer to the official documentation for more information.

Spring Batch Terminology

Job

A Job is an entity that encapsulates an entire batch process.

Step

A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing.

Tasklet

The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure.

All the above definitions are taken from the official documentation, kindly refer to it for more information on Spring batch architecture and terminology.

Required Components

Going by the about definition, we could adopt the sequence listed in the above diagram as follows:

Appnexus country load Job - Overall flow of loading all the countries from Appnexus.

Appnexus country load Step - As you can see the job has 3 calls to Appnexus country endpoint, which is naturally the steps involved.

To summarise, Appnexus country load has one job, which has three steps, but then all the steps are similar (except for the start element which is different). So this leaves us with a single job with a single step which is repeatedly called until all the pages are read.

Spring Batch Configuration

Please find below the configuration file for the job we just discussed: country-load-job.xml

<batch:job id="countryLoadJob">
 <batch:step id="step1">
  <batch:tasklet ref="countryLoadTasklet" />
  <batch:next on="CONTINUE*" to="step1" />
  <batch:end on="END" />
  <batch:listeners>
   <batch:listener ref="pagingAwareStepExecutionListener" />
  </batch:listeners>
 </batch:step>
</batch:job>

As you can see here, the configuration is actually simple. As you can see from the XML, there is only one step, "step1", the step is being called repeatedly as long as the status is "CONTINUE" and the job is terminated once the status is "END". Kindly pay attention to the listener "pagingAwareStepExecutionListener" which determines the status of the step depending upon the values start element, number of elements and the total number of elements.

This brings us to the end of this post on loading Appnexus countrie using Spring batch. In the following post we shall get into the details of the implementation of countryLoadTasklet and pagingAwareStepExecutionListener.

Kindly post a comment if you either like it or if you feel the content is inappropriate in some way.

No comments :

Post a Comment