Thursday, March 13, 2014

Hibernate optimizations beyond JPA

Optimizations on the persistence layer not supported by JPA

With the JPA 2 standard a great support came for storing and referencing data in collections. 
Entities can now reference other objects that are of the type basic, used for primitive type.
The annotation for supporting this is @ElementCollection.
With this a separate table is created used to store the primitive types. 
It can be specified further with @CollectionTable and @Column.

For small amount of types you would define the class of the entity containing the primitive type as @Embeddable. 

If the size of the primitive types is big/very big, you would leave the default behavior and let the persistence provider to lazy load the primitives types as they are needed.

More thoughts have to put into a real relationship between entities.
If you don't pay attention you could very easily run into the famous N+1-SELECT-problem.
If an entity has a one-to-many relationship to another entity and no specific loading strategy is defined. The standard lazy-loading will lead to the fact, that on every access to an attribute of the other entity, this entity will be loaded separately. 
Common use case:
1) N Entities are loaded from a persistence service
2) Looping over this result(N) set and applying certain business logic on it
3) With this the entity graph is used and the code navigates to the one-to-many relationship 
4) With the lazy-loading configuration of this relationship for every item in the loop another fetch of the one-to-many relationship is done. The persistence provider will generate an additional SELECT.

What to do in order to optimize this situation?
  • usage of @BatchSize  (non-JPA compliant)
  • usage of SUBSELECT-FETCH  (non-JPA compliant)
  • own criteria/JPQL for joining both entities
  • ...
But with the first 2 points we are already outside of the JPA standard.
@BatchSize could ease the problem that the fetching is reduced from n+1 to a n+batchSize+1 problem.

@Entity 
public class A

@ManyToOne
@BatchSize(size=5)
public B getBs() {
  ..
}

The second point leads to completely load the collection after entity A is loaded.
@Entity 
public class A

@ManyToOne
@Fetch(FetchMode.SUBSELECT)
public B getBs() {
  ..
}

But this is again a Hibernate dependency and Hibernate supports this on collections, and all XToMany-relationships. 

Leaving it to 3. point and the danger of retrieving too much data.

Thursday, February 27, 2014

JPA2 new features

What features came with JPA2?

The JPA2 was delivered with Java EE6. 
JPA2.1 was shipped with EE7 and is currently the latest version that can be used. 
Features:
  1. properties have been standardized
  2. support for using cache solutions
  3. better and finer support of lockings 
  4. enhancement of JPQL
  5. support of validation API
1. In the first JPA version the properties in the xml - configuration have been proprietary, so for each JPA provider the used property have been different:
in hibernate the url to the datasource was named "hibernate.connection.url" in toplink it was named "toplink.jdbc.url". Now common properties have been abstracted to:
<property name=“javax.persistence.jdbc.driver" 
 value=“XXX”/>
 <property name=“javax.persistence.jdbc.url" 
 value="XXX"/>
 <property name=“javax.persistence.jdbc.user" 
 value="XXX"/>
 <property name=“javax.persistence.jdbc.password" 
 value=“XXX"/>


2) Cache support allows main operations like 
  • does an entity exists in the cache: boolean contains(Class clazz, Object entity)
  • remove entity from cache: evict(Class clazz, Object entity)
  • remove all entities from a type: evict(Class clazz)
  • clear the cache: evictAll()

3) Better support for locking modes
  • OPTIMISTIC
  • OPTIMISTIC_FORCE_INCREMENT
  • PESSIMISTIC
  • PESSIMISTIC_FORCE_INCREMENT
For retrieval API of the entity manager you can specify one of the above mentioned modes or lock it after obtaining the entity: 
      EntityClassX entity = em.find(EntityClassX.class, id, PESSIMISTIC); 
vs.
     em.lock(entity, PESSIMISTIC);

Surely it is possibly to read the entity without severe lock mode, apply business logic on it and the obtain the lock to the end of the business transaction: 
     em.refresh(entity, PESSIMISTIC);



4) Enhancements of JPQL
  • date and time support like {d '2014-02-27'} or {t'14:00:00'}
  • member support: FROM ORDER O WHERE 'RECURRING_INVOICES' MEMBER OF O.TYPES
  • Collections comparing to empty: FROM ORDER O WHERE O.ORDERITEMS IS EMPTY
  • index support (retrieving rows based on the existing index of the table): WHERE INDEX(t) BETWEEN x AND y
  • ...

5) Validation
The validation part used in JPA2 is based on the specification in JSR303 and has the reference implemenation: HibernateValidator. 
Important to mention is that the JPA2 does not explicitly define a bean validation support. 
So the JPA provider could have a bean validator support. With Hibernate as the JPA provider, the Hibernate Validator is used.




Saturday, February 22, 2014

Spring batch - special aspects of batch processing

Spring batch what for?

The spring batch is a framework specially designed for batch processing.
Intended for processing e.g. files with large amount of data, providing a clear DSL defined based on XML.
The framework comes with abstractions and defaults that have "extension points" where business or processing logic can be placed.

Why should I use spring batch?

It is regarded as the standard framework for batch processing. A lot of developers know how to use it.
It provides useful abstractions and can be configured in many regards to support more advanced requirements:
  • transaction support
  • retry support
  • skip functionality
  • perfectly integrated in the spring world (DI etc.)
  • strong layered architecture
  • very scalable due to support of step partitioning, multi-threaded steps, …

Basic concept

In spring batch the processing starts with spring batch job. It consists of steps. 
The steps can be chunk-oriented or so called TaskletStep but this is for supporting legacy code.
The main components of spring batch are: 
  • ItemReader - reads one item
  • ItemProcessor - processes one item, but is optional
  • ItemWriter - writes a list of items
Besides that there are different type of listeners for placing business
logic:
  • job/step execution listener
  • chunk listener
  • ItemRead/Process/Write- Listener
  • SkipListener
The transaction boundary can never be around steps or a complete job.
Metadata like execution starting/end point, amount of commits/rollbacks, step status etc. are saved at several points:
  • step execution context - a map that is used for serializing data
  • chunk execution context - used inside a chunk transaction for knowing the current item in process
The default behavior for rollbacks is that if a non-caught exceptions happened during the processing of a chunk, the step is rolled back.
All committed chunks until that stay committed, but the complete job fails.

The meta data of a step are initialized in the beginning of the step and updated at the end of the step. This is done in separate transactions, in order to update the state of the step status this has to be done in an own transaction as the processing of the step itself can fail and has to be rollbacked then.

A spring batch job consists of steps as we know now and these steps consists of chunks. Each chunk is executed in its own transaction.
When does spring batch know how much data has to be read into the chunk?
This is specified by a policy: CompletionPolicy.
Specifying the commitInterval on the chunk tag leads to a SimpleCompletionPolicy.
As soon as the amount of items are read satisfying the completionPolicy, the items read and processed are passed to the ItemWriter.

Restart of a job

How is a restart of a job done?
Well, for batch processing the data retrieval if not from a file but retrieved from a database table(s) or messaging system are done over the declaration of a non-transactional datasource. As on a processing error a rollback would close the data retrieval channel as well.
On this non-transactional datasource no other job, component or module should operate. The data is normally read by use a database cursor in order to avoid memory issues. Spring batch provides here the JdbcCursorItemReader.
The restart of a job itself is analyzed by spring batch. If a job is called with the same parameters (same job parameters) spring batch sees this as a restart of the job if the job before ended with a failure.
In order to accomplish that the state is saved in the execution context of the chunk.
Here every reader persists the counter of read items inside the transaction of the chunk. Every chunk commits his work inside his transaction towards the end of work. Meaning if the completionPolicy is set to 10. After the 10 item the chunk tries to commit its work. On job restart the execution point of the last successful chunk is taken and the not committed items are processed.
The execution context is seen by all the readers - that means the state of the counter can be modified by different counters. This setup is not thread-safe!

Ordering

For the restart-ability of a job the ordering of the read data must be well-defined.
If the data comes from database or from a file the ordering in the data retrieval must be explicit set in order that the restart of the job get the items that have to be reprocessed in the same order as on the first run.














Friday, February 21, 2014

What is the impact of the ESB on the work of a requirement analyst?


What is the impact of the ESB on the work of a requirement analyst?

  • Work gets easier
  • Helping the requirement analysis
  • Moves the focus to the business events
  • Volère model



Why?

Let's look at the requirement analysis starting point:

Everything begins with the system idea document.
Next step on the list to do: stackholder matrix.
Leading to the business context diagram:
  Mainly covers the actors, parties, processes
  of the business process to support or that needs to be automated
  It should discover all business events coming to the work

Next starting point for requirements trawling:
Starting from the business events
Every business event is answered by a business use case of the work.
Requirement analyst should analyse business use case with use case templates.
Then talk to the solution designers and architects


What is easier?

Business events are coming to the bus!
BA should start work here



Wednesday, February 19, 2014

Benefits and usage of spring data JPA

What are the benefits of using spring data JPA?

Spring data JPA addresses the following situations:
  • unclear how the persistence layer will develop
    • the first prototype starts because of time and focus with a map, later it probably it will be replaced with a longterm persistence
  • persistence layer might change from relational to NoSQL or vice versa 
  • for the sake of a set of fast running unit-tests the persistence layer might be configured to use light-weight persistence like a simple map

To be really open regarding the persistence layer, the domain layer should be separated from the data access layer. For this an approach like the repository pattern from Martin Fowler is common praxis.
The repository enforces to treat objects of a type as a "conceptual set" like a collection. 
With a simple DAO approach you see the DAO as a gateway for accessing the database. 
This DAO tend to grow extensively as new querying or update functionality is needed.
This leads to poor responsibility. With the repository you treat all the objects as a conceptual set. 
For querying and update extensions the repository will make usage of DAO(s). 
So the DAO are well-focussed and have a single responsibility for gathering / changing data.
The set objects of a type are handled in the repository.
In the beginning of your development you can just have a simple in-memory storage as a map 
in order to focus on the domain logic etc. Later you can delegate the storage and access to sophisticated DAO(s).
So the domain objects used by the business logic in the domain layer are developed against the interfaces that are used by the repository interfaces.

On the top of the jpa entities the repository layer is placed.
Next to the domain objects the repository interfaces are placed. They always present to the outside the interfaces that are used by the domain layer which are exposed by the repository interfaces. These repository interfaces provide basic CRUD functionalities.

Example:
domain layer: Customer implements ICustomer
repository layer: CustomerRepository delivers ICustomer
persistence layer: CustomerRepositoryImpl implements the CustomerRepository

The CustomerRepositoryImpl also could make further usage of DAOs to access the objects.
The CustomerRepositoryImpl will make usage of the EntityManager of JPA and will define the transactional context:

public class CustomerRepositoryImpl implements CustomerRepository {

    private EntityManager entityManager;

    @Transactional
    public ICustomer save(ICustomer customer) {
      Customer c = new Customer(customer);
       entityManager.persist©;
       return c;
    } 

}

Usage of spring data jpa

The spring data jpa has the objective to simply the development of the repository layer, mentioned above as this code is boilerplate-code. With spring data jpa you only have to define the interface, an implementation for delegation and provide the corresponding spring configuration.
The rest will be instantiated and delivered by Spring.
So first of all we have to define the repository interface:

public interface CustomerJpaRepository extends JpaRepository<Customer, Long> {
    Customer save(Customer customer);
}

Second we need the spring configuration:
<jpa:repositories base-package="…">
   <jpa:repository id="customerJpaRepository" />
</jpa:repositories>

Unfortunately spring data jpa can't operate directly on the interface defined as the CustomerJpaRepository. It always needs the specific jpa entity. 
Therefore we need to implement the interface of CustomerJpaRepository.
But we will inject an instance of the CustomerJpaRepository and all operations will delegate to this instance:

public class CustomerJpaRepositoryImpl implements CustomerJpaRepository {
       private CustomerJpaRepository repo;

      public Customer save(Customer customer) {
           return repos.save(customer);
      }  

}

In the background spring data jpa will dynamically provide an instance of the interface and make it available under the id customerJpaRepository.


Advantages of spring data jpa

Spring data jpa provides finder methods out of the box. 
So based on naming conventions findByX will be provided by spring data jpa dynamically and will result to an entity result where all the entities will have for their field X the corresponding parameter value.
Besides there are other useful features like paging including sorting and others.





Friday, February 14, 2014

Requirements analysis: Business context diagram

Benefit of a business context diagram?

Defines the scope of the work we have to study.
It shows the work as a single, as-yet-uninvestigated process.
It is surrounded by adjacent systems and actors.
Arrows show the data flows between the work and the adjacent systems → carried as business events.

The business context diagram shows where the responsibilities of the adjacent systems start and end.

The data flow makes it clear what work has to be done by the adjacent systems and what has to be done by the work.
Preplanned business use cases that are activated as soon as an actor initiates an business event.


Why does business events and business use cases help?

  • A way to partition the work in a non - subjective way by identifying responses to outside stimuli.
  • Benefit of a clear view of the needed functionality.
  • Internal partitions are mainly the result of technologies, design and history.
  • Business events point out what belongs together.
  • Perfect vehicle for further requirement analysis work!






Benefits of an ESB

Source application / service has not to be changed

Business events that happened are published by the source application to the bus.
Typically there is an application interested in this event, so it will consume it as this kind of processing would have been done by a point-to-point communication.
→ If business requirements changes and enterprise landscape grow more and more applications might potentially be interested in that kind of events. So the producing application and already existing consumers do not have to be changed at all.

Easy integration through a DSL

Additional systems that are interested in that event just subscribe to the topic or if there is a dispatcher application in place, that consumes special kind of messages laying on the bus, will then dispatch requests to the corresponding systems.
Integration scenarios are typically configured / implemented with the help of a DSL leading to more maintainability and faster realization.

Further benefits


  • Heterogeneous environments get interoperable
  • Service virutalization aka location transparency
  • Service versioning
  • Transaction-based message processing