Monday, May 19, 2014

Loading Appnexus Reports using Spring batch

Introduction to Appnexus reporting API

Unlike other Apis which works on request-response model, Appnexus reporting API is quite different as the report could not be directly downloaded, but has to be scheduled instead.  It has the following sequence of steps:

  • Step 1. Create a JSON-formatted report request
  • Step 2. POST the request to the Report Service
  • Step 3. GET the report status from the Report Service
  • Step 4. GET the report data from the Report Download Service

The following sequence diagram shows the steps involved in requesting and downloading a report from Appnexus.

Created with Raphaƫl 2.1.0MyServerMyServerAppnexusAppnexusStep 1I need a report(spec)report_idStep 2Is report ready?nope!Step 2(repeat)Is report ready?yesStep 3/download?report_id=xxcsv file

Modelling the above sequence using Spring batch

Configuration XML

The following listing shows the sample configuration file that is required for creating the batch job.


<batch:job id="reportLoadJob">
 <batch:step id="reportStep1" next="reportStep2">
  <batch:tasklet ref="reportRequestTasklet" />
 </batch:step>
 <batch:step id="reportStep2" next="reportStep3">
  <batch:tasklet ref="reportStatusCheckTasklet" />
 </batch:step>
 <batch:step id="reportStep3" next="reportStep4">
  <batch:tasklet ref="reportDownloadTasklet" />
 </batch:step>
 <batch:step id="reportStep4">
  <batch:tasklet>
   <batch:chunk reader="cvsFileItemReader" 
    writer="mysqlItemWriter" commit-interval="2">
  </batch:chunk>
   </batch:tasklet>
 </batch:step>
</batch:job>


Step 1: Report Request Tasklet

This is the Tasklet that is responsible for hitting Appnexus reporting API with the required parameters, and receives the report ID that would be eventually generated by Appnexus.


Step 2: Report Status Check Tasklet

This is the Tasklet that is responsible for hitting Appnexus reporting API with the report ID that was fetched in the previous step to check whether the requested report has been generated. If not, the step is repeated again with some time interval such that it is then until the report is ready. Once the report is ready, you will receive a response with status okay, along with the download URL, which will be used by the subsequent step to download the report.


Step 3: Report Download Tasklet

In this Tasklet, the download URL which was received in the previous step will be used to download the created CSV file. The received a file is stored in the local file system and the name of the file is updated in the job execution context, such that the next step could read the file name from it.


Step 4: Report Persist Tasklet

This is the step that is responsible for reading the data from the CSV file and persisting it into the database. For this to be done, first of all an entity that represents each line of the file has to be created. Then they could use chunk oriented processing to save the data in the database. This Tasklet is different from all other Tasklet we have seen so far, because they here the Tasklet consist of a reader, a writer and commit interval.

The reader is responsible for reading the CSV file online at a time, and it is passed on to the writer in chunks specified by the commit interval, following which the writer will save the items to the database.

Thus is quite easy to model a multistep job using Spring batch. Kindly post a comment if you either like it or if you feel the content is inappropriate in some way. If you need help modelling any of your jobs using Spring batch, kindly post a comment, that will be taken up as a topic for the subsequent posts. 

No comments :

Post a Comment