Thursday, March 13, 2014

Hibernate optimizations beyond JPA

Optimizations on the persistence layer not supported by JPA

With the JPA 2 standard a great support came for storing and referencing data in collections. 
Entities can now reference other objects that are of the type basic, used for primitive type.
The annotation for supporting this is @ElementCollection.
With this a separate table is created used to store the primitive types. 
It can be specified further with @CollectionTable and @Column.

For small amount of types you would define the class of the entity containing the primitive type as @Embeddable. 

If the size of the primitive types is big/very big, you would leave the default behavior and let the persistence provider to lazy load the primitives types as they are needed.

More thoughts have to put into a real relationship between entities.
If you don't pay attention you could very easily run into the famous N+1-SELECT-problem.
If an entity has a one-to-many relationship to another entity and no specific loading strategy is defined. The standard lazy-loading will lead to the fact, that on every access to an attribute of the other entity, this entity will be loaded separately. 
Common use case:
1) N Entities are loaded from a persistence service
2) Looping over this result(N) set and applying certain business logic on it
3) With this the entity graph is used and the code navigates to the one-to-many relationship 
4) With the lazy-loading configuration of this relationship for every item in the loop another fetch of the one-to-many relationship is done. The persistence provider will generate an additional SELECT.

What to do in order to optimize this situation?
  • usage of @BatchSize  (non-JPA compliant)
  • usage of SUBSELECT-FETCH  (non-JPA compliant)
  • own criteria/JPQL for joining both entities
  • ...
But with the first 2 points we are already outside of the JPA standard.
@BatchSize could ease the problem that the fetching is reduced from n+1 to a n+batchSize+1 problem.

@Entity 
public class A

@ManyToOne
@BatchSize(size=5)
public B getBs() {
  ..
}

The second point leads to completely load the collection after entity A is loaded.
@Entity 
public class A

@ManyToOne
@Fetch(FetchMode.SUBSELECT)
public B getBs() {
  ..
}

But this is again a Hibernate dependency and Hibernate supports this on collections, and all XToMany-relationships. 

Leaving it to 3. point and the danger of retrieving too much data.

Thursday, February 27, 2014

JPA2 new features

What features came with JPA2?

The JPA2 was delivered with Java EE6. 
JPA2.1 was shipped with EE7 and is currently the latest version that can be used. 
Features:
  1. properties have been standardized
  2. support for using cache solutions
  3. better and finer support of lockings 
  4. enhancement of JPQL
  5. support of validation API
1. In the first JPA version the properties in the xml - configuration have been proprietary, so for each JPA provider the used property have been different:
in hibernate the url to the datasource was named "hibernate.connection.url" in toplink it was named "toplink.jdbc.url". Now common properties have been abstracted to:
<property name=“javax.persistence.jdbc.driver" 
 value=“XXX”/>
 <property name=“javax.persistence.jdbc.url" 
 value="XXX"/>
 <property name=“javax.persistence.jdbc.user" 
 value="XXX"/>
 <property name=“javax.persistence.jdbc.password" 
 value=“XXX"/>


2) Cache support allows main operations like 
  • does an entity exists in the cache: boolean contains(Class clazz, Object entity)
  • remove entity from cache: evict(Class clazz, Object entity)
  • remove all entities from a type: evict(Class clazz)
  • clear the cache: evictAll()

3) Better support for locking modes
  • OPTIMISTIC
  • OPTIMISTIC_FORCE_INCREMENT
  • PESSIMISTIC
  • PESSIMISTIC_FORCE_INCREMENT
For retrieval API of the entity manager you can specify one of the above mentioned modes or lock it after obtaining the entity: 
      EntityClassX entity = em.find(EntityClassX.class, id, PESSIMISTIC); 
vs.
     em.lock(entity, PESSIMISTIC);

Surely it is possibly to read the entity without severe lock mode, apply business logic on it and the obtain the lock to the end of the business transaction: 
     em.refresh(entity, PESSIMISTIC);



4) Enhancements of JPQL
  • date and time support like {d '2014-02-27'} or {t'14:00:00'}
  • member support: FROM ORDER O WHERE 'RECURRING_INVOICES' MEMBER OF O.TYPES
  • Collections comparing to empty: FROM ORDER O WHERE O.ORDERITEMS IS EMPTY
  • index support (retrieving rows based on the existing index of the table): WHERE INDEX(t) BETWEEN x AND y
  • ...

5) Validation
The validation part used in JPA2 is based on the specification in JSR303 and has the reference implemenation: HibernateValidator. 
Important to mention is that the JPA2 does not explicitly define a bean validation support. 
So the JPA provider could have a bean validator support. With Hibernate as the JPA provider, the Hibernate Validator is used.




Saturday, February 22, 2014

Spring batch - special aspects of batch processing

Spring batch what for?

The spring batch is a framework specially designed for batch processing.
Intended for processing e.g. files with large amount of data, providing a clear DSL defined based on XML.
The framework comes with abstractions and defaults that have "extension points" where business or processing logic can be placed.

Why should I use spring batch?

It is regarded as the standard framework for batch processing. A lot of developers know how to use it.
It provides useful abstractions and can be configured in many regards to support more advanced requirements:
  • transaction support
  • retry support
  • skip functionality
  • perfectly integrated in the spring world (DI etc.)
  • strong layered architecture
  • very scalable due to support of step partitioning, multi-threaded steps, …

Basic concept

In spring batch the processing starts with spring batch job. It consists of steps. 
The steps can be chunk-oriented or so called TaskletStep but this is for supporting legacy code.
The main components of spring batch are: 
  • ItemReader - reads one item
  • ItemProcessor - processes one item, but is optional
  • ItemWriter - writes a list of items
Besides that there are different type of listeners for placing business
logic:
  • job/step execution listener
  • chunk listener
  • ItemRead/Process/Write- Listener
  • SkipListener
The transaction boundary can never be around steps or a complete job.
Metadata like execution starting/end point, amount of commits/rollbacks, step status etc. are saved at several points:
  • step execution context - a map that is used for serializing data
  • chunk execution context - used inside a chunk transaction for knowing the current item in process
The default behavior for rollbacks is that if a non-caught exceptions happened during the processing of a chunk, the step is rolled back.
All committed chunks until that stay committed, but the complete job fails.

The meta data of a step are initialized in the beginning of the step and updated at the end of the step. This is done in separate transactions, in order to update the state of the step status this has to be done in an own transaction as the processing of the step itself can fail and has to be rollbacked then.

A spring batch job consists of steps as we know now and these steps consists of chunks. Each chunk is executed in its own transaction.
When does spring batch know how much data has to be read into the chunk?
This is specified by a policy: CompletionPolicy.
Specifying the commitInterval on the chunk tag leads to a SimpleCompletionPolicy.
As soon as the amount of items are read satisfying the completionPolicy, the items read and processed are passed to the ItemWriter.

Restart of a job

How is a restart of a job done?
Well, for batch processing the data retrieval if not from a file but retrieved from a database table(s) or messaging system are done over the declaration of a non-transactional datasource. As on a processing error a rollback would close the data retrieval channel as well.
On this non-transactional datasource no other job, component or module should operate. The data is normally read by use a database cursor in order to avoid memory issues. Spring batch provides here the JdbcCursorItemReader.
The restart of a job itself is analyzed by spring batch. If a job is called with the same parameters (same job parameters) spring batch sees this as a restart of the job if the job before ended with a failure.
In order to accomplish that the state is saved in the execution context of the chunk.
Here every reader persists the counter of read items inside the transaction of the chunk. Every chunk commits his work inside his transaction towards the end of work. Meaning if the completionPolicy is set to 10. After the 10 item the chunk tries to commit its work. On job restart the execution point of the last successful chunk is taken and the not committed items are processed.
The execution context is seen by all the readers - that means the state of the counter can be modified by different counters. This setup is not thread-safe!

Ordering

For the restart-ability of a job the ordering of the read data must be well-defined.
If the data comes from database or from a file the ordering in the data retrieval must be explicit set in order that the restart of the job get the items that have to be reprocessed in the same order as on the first run.














Friday, February 21, 2014

What is the impact of the ESB on the work of a requirement analyst?


What is the impact of the ESB on the work of a requirement analyst?

  • Work gets easier
  • Helping the requirement analysis
  • Moves the focus to the business events
  • Volère model



Why?

Let's look at the requirement analysis starting point:

Everything begins with the system idea document.
Next step on the list to do: stackholder matrix.
Leading to the business context diagram:
  Mainly covers the actors, parties, processes
  of the business process to support or that needs to be automated
  It should discover all business events coming to the work

Next starting point for requirements trawling:
Starting from the business events
Every business event is answered by a business use case of the work.
Requirement analyst should analyse business use case with use case templates.
Then talk to the solution designers and architects


What is easier?

Business events are coming to the bus!
BA should start work here



Wednesday, February 19, 2014

Benefits and usage of spring data JPA

What are the benefits of using spring data JPA?

Spring data JPA addresses the following situations:
  • unclear how the persistence layer will develop
    • the first prototype starts because of time and focus with a map, later it probably it will be replaced with a longterm persistence
  • persistence layer might change from relational to NoSQL or vice versa 
  • for the sake of a set of fast running unit-tests the persistence layer might be configured to use light-weight persistence like a simple map

To be really open regarding the persistence layer, the domain layer should be separated from the data access layer. For this an approach like the repository pattern from Martin Fowler is common praxis.
The repository enforces to treat objects of a type as a "conceptual set" like a collection. 
With a simple DAO approach you see the DAO as a gateway for accessing the database. 
This DAO tend to grow extensively as new querying or update functionality is needed.
This leads to poor responsibility. With the repository you treat all the objects as a conceptual set. 
For querying and update extensions the repository will make usage of DAO(s). 
So the DAO are well-focussed and have a single responsibility for gathering / changing data.
The set objects of a type are handled in the repository.
In the beginning of your development you can just have a simple in-memory storage as a map 
in order to focus on the domain logic etc. Later you can delegate the storage and access to sophisticated DAO(s).
So the domain objects used by the business logic in the domain layer are developed against the interfaces that are used by the repository interfaces.

On the top of the jpa entities the repository layer is placed.
Next to the domain objects the repository interfaces are placed. They always present to the outside the interfaces that are used by the domain layer which are exposed by the repository interfaces. These repository interfaces provide basic CRUD functionalities.

Example:
domain layer: Customer implements ICustomer
repository layer: CustomerRepository delivers ICustomer
persistence layer: CustomerRepositoryImpl implements the CustomerRepository

The CustomerRepositoryImpl also could make further usage of DAOs to access the objects.
The CustomerRepositoryImpl will make usage of the EntityManager of JPA and will define the transactional context:

public class CustomerRepositoryImpl implements CustomerRepository {

    private EntityManager entityManager;

    @Transactional
    public ICustomer save(ICustomer customer) {
      Customer c = new Customer(customer);
       entityManager.persist©;
       return c;
    } 

}

Usage of spring data jpa

The spring data jpa has the objective to simply the development of the repository layer, mentioned above as this code is boilerplate-code. With spring data jpa you only have to define the interface, an implementation for delegation and provide the corresponding spring configuration.
The rest will be instantiated and delivered by Spring.
So first of all we have to define the repository interface:

public interface CustomerJpaRepository extends JpaRepository<Customer, Long> {
    Customer save(Customer customer);
}

Second we need the spring configuration:
<jpa:repositories base-package="…">
   <jpa:repository id="customerJpaRepository" />
</jpa:repositories>

Unfortunately spring data jpa can't operate directly on the interface defined as the CustomerJpaRepository. It always needs the specific jpa entity. 
Therefore we need to implement the interface of CustomerJpaRepository.
But we will inject an instance of the CustomerJpaRepository and all operations will delegate to this instance:

public class CustomerJpaRepositoryImpl implements CustomerJpaRepository {
       private CustomerJpaRepository repo;

      public Customer save(Customer customer) {
           return repos.save(customer);
      }  

}

In the background spring data jpa will dynamically provide an instance of the interface and make it available under the id customerJpaRepository.


Advantages of spring data jpa

Spring data jpa provides finder methods out of the box. 
So based on naming conventions findByX will be provided by spring data jpa dynamically and will result to an entity result where all the entities will have for their field X the corresponding parameter value.
Besides there are other useful features like paging including sorting and others.





Friday, February 14, 2014

Requirements analysis: Business context diagram

Benefit of a business context diagram?

Defines the scope of the work we have to study.
It shows the work as a single, as-yet-uninvestigated process.
It is surrounded by adjacent systems and actors.
Arrows show the data flows between the work and the adjacent systems → carried as business events.

The business context diagram shows where the responsibilities of the adjacent systems start and end.

The data flow makes it clear what work has to be done by the adjacent systems and what has to be done by the work.
Preplanned business use cases that are activated as soon as an actor initiates an business event.


Why does business events and business use cases help?

  • A way to partition the work in a non - subjective way by identifying responses to outside stimuli.
  • Benefit of a clear view of the needed functionality.
  • Internal partitions are mainly the result of technologies, design and history.
  • Business events point out what belongs together.
  • Perfect vehicle for further requirement analysis work!






Benefits of an ESB

Source application / service has not to be changed

Business events that happened are published by the source application to the bus.
Typically there is an application interested in this event, so it will consume it as this kind of processing would have been done by a point-to-point communication.
→ If business requirements changes and enterprise landscape grow more and more applications might potentially be interested in that kind of events. So the producing application and already existing consumers do not have to be changed at all.

Easy integration through a DSL

Additional systems that are interested in that event just subscribe to the topic or if there is a dispatcher application in place, that consumes special kind of messages laying on the bus, will then dispatch requests to the corresponding systems.
Integration scenarios are typically configured / implemented with the help of a DSL leading to more maintainability and faster realization.

Further benefits


  • Heterogeneous environments get interoperable
  • Service virutalization aka location transparency
  • Service versioning
  • Transaction-based message processing






Sunday, February 9, 2014

Message routing over an ESB

How is the routing done over the ESB / bus?

Routing configuration can be available on the client or on the bus and can be changed during runtime.
Certain bus products allow you to configure routing of messaging by using a DSL. 
Here is an example:

from("invoices").choise().when(header("invoiceType").isEqualTo("clearing")
.to("clearingQueue")
.otherwise("costcenterQueue")

This shows a content-based routing. Depending on a meta data the message from one queue is routed to one of other 2 queues.

Further example with Camel:

For routing purpose we take a quick at Camel.
Camel is an integration platform, a framework with the target to provide EIP based components.
Is Camel itself already an ESB? As there is no standard definition, the answer is yes and no.
But the core functionality it brings is definitely  a routing and transformation engine.
Routes are defined in Camel with XML configuration or a DSL like above.
The messages are collected at the endpoints and processed through the defined routes.
A route itself contains of flow and integration logic. 
A message always has a producer and a consumer - inside the Camel context (runtime system of Camel)
there are processors that processes the message further by filtering, enriching, routing, transforming etc.
The component inside Camel that manages these processing that happens between the service provider and consumer
is called MEC (Message Exchange Container). This component has further information about the unique message ID, exception information, etc.

The routes are defined or to be more explicit are added to the Camel context.
This runtime system brings all the defined components in Camel together.
A route definition is a plain Java class that needs to extends RouteBuilder and needs to implement the configure-method.
In here the route must start with from(…) and end with to(…).

Like in the above snippet all the processing logic happens besides these two points.

Thursday, February 6, 2014

What is the difference between an ESB, SOA and EAI?

ESB - Enterprise Service Bus

An ESB is an integration platform in that applications that want to communicate with each other have to integrate. It also defines a backbone of your enterprise landscape so that applications and services can easily communicate.

SOA - Service Oriented Architecture

A SOA describes an architecture style in that software resources of an enterprise get accessible and discoverable on network as dedicated and defined services.

EAI - Enterprise Application Integration

An EAI is driven by business needs to achieve a certain business objective by connecting applications inside an enterprise and external partner systems. So it is a concept for integrating business functions with the help of a dedicated IT infrastructure along the value change. As they are provided by different applications and platforms by nature EAI deals with data and business process integration.


So what is the difference?

To make it short: 
SOA is an architecture style based on services, EAI is a concept about connecting applications and services to new valuable services. Whereas the ESB is a concrete method for establishing an integration platform for inter-application communication.
Both SOA and EAI need in their concrete transformation components and the ESB plays an very important role for the realization.

Monday, February 3, 2014

ESB: Drawbacks and risks

Depending of the choice of the broker architecture:

Very often a central Hub is chosen as the broker architecture. That means the messages on the bus all go through the hub.
- Scalability
-- less scalable as a distributed broker architecture

- Maintenance
++ easy to monitor
++ easy understanding 
-- maintenance work on hub gets crucial

- Stability
-- single point of failure


Service locator
The service locator is responsible for locating / identifying the service of an application, that needs to be called to process a certain message.
Every service provider has to register the service at the service registry. Service consumer ask for the service and retrieve the endpoint for starting to consume the service.
The danger of a centralized service locator is similarly to the central Hub. If the service locator is to a certain moment not available no messages are processed as no services could be identified.
It is important that there are multiple instances of the service registry and that the configuration information is synchronized between them.

Messaging
Leads to decoupling between the communication partners - the messages are guaranteed delivered by buffering, saving and forwarding. 
But there are situation where asynchronous communication leads to just more overhead and complexity swallowing the advantages.

Batch processing
The bus should not be misused for batch processing as throughput may much less as direct batch processing.
Mass data processing should still be accomplished by corresponding batch applications that might publish their result or special cases to the bus.

Business logic
A big danger is the possible development that business logic leaks into the bus - leading to maintainability and scalability problems.
Business logic should not be handled directly in the bus. 

Commands
The bus should not be thought of an assembly line for placing orders. 

Much better is to place occurred business events on the bus and let the connected applications / services consume those events.

ESB: Typical tasks

- Transports
Transportation of the messages
- Messaging
Allowing asynchronous and synchronous messaging.
- Security
Application that wants access to the bus must authenticate and must get authorization for the requested operation.
- Datatransformations
A message of data model A is transformed into data model B without losing content.
- Service locator
On receive of a message the bus identifies / locates the service of an application to be called.
- Interceptors
After processing messages at certain stages processing can be intercepted by configured service calls - e.g. allowing realization of cross-cutting concerns.
- Protocol bindings
The bus allows forwarding requests from one endpoint to another one. Not changing the message at all or enriching or transforming it.
- Service model

Sources that will  / should be connected to the bus:
- Web services
- Queues
- Portals
- File / FTP / ...
- BPEL

Systems that would / should be connected to the bus:
- business applications
- mobile devices
- partners
- browsers
- rich clients

Friday, January 31, 2014

ESB concept / bus

1) Explanation
The ESB (Enterprise Service Bus) is the holy grail of applications integration.
It is actually an integration platform used to allow applications to integrate more easily into an enterprise landscape.
The ESB defines the infrastructure, the foundation, in which the different applications and services of an enterprise landscape will integrate. It stands for a specific architecture style enforcing a communication bus to be used for inter-application or inter-services communication. Depending on the structuring of the enterprise landscape, that means service oriented or application based, the bus is used for service integration in comparison to "only" connecting different applications over a common communication bus.

2) Initial situation
Having an enterprise landscape with many applications that need to communicate with each other,
lead to many point-to-point communications with all its disadvantages.
That are implementing retry mechanism in each application that interacts with another application.
Dealing with the complexity of timeouts(connection and read timeout) - finding the right timeout window, distinguishing between operations that can and that shouldn't be repeated.

3) Benefits
Business events that happened are published by the source application to  the bus.
Typically there is an application interested in this event, so it will consume it as this kind of processing would have been  done by a point-to-point communication.
But now comes the benefits of an enterprise bus into play. If business requirements changes and enterprise landscape grow more and more applications might potentially be interested in that kind of events. So the producing application does not have to be changed at all.
Additional systems that are interested in that event just subscribe to the topic or if there is a dispatcher application in place, that consumes special kind of messages laying on the bus, will then dispatch requests to the corresponding systems.
Integration scenarios are typically configured / implemented with the help of a DSL leading to more maintainability and faster realization.


ESB allows:
- Heterogeneous environments get interoperable
- Service virutalization aka as location transparency
The service consumer and the service provider is decoupled by the ESB.
With service virtualization the service consumer has not be reconfigured if the service provider is changed regarding
   endpoint information.

- Service versioning
With message transformation the message of an interface that is not support anymore is translated into a message of
  new interface version.
Benefitting from again the fact that the service consumer is decoupled from the service provider.
The additional "layer" can be used to allow a technical mapping between an old interface and the new one.
Allowing business adaption independent of interface availabilities.

- Transaction-based message processing
The message is taken from queue and during the processing of the defined workflow involving maybe different services
   and service calls, the whole transaction is only committed if the process arrived the last service that is a database
   adapter. The whole process is committed as the flow reaches the db adapter.
So the ESB can be used to coordinate distributed transactions with different services involved. The client has only to  
   mark the begin and end of a transaction.

The coordination work is done by the bus.


4) Background
Basic groundwork was done by Hophe and Woolf with their work on EIP - Enterprise Integration Platform.
They have described common integration scenarios, abstracted them into different patterns, and categorized them into the following 6 categories:

- Message endpoints
All kind of patterns for connecting applications that should integrate with each other.
E.g.:
- Polling consumer - An adapter that polls periodically a data source and consume the data that needs to be
            processed.
         - Service activator - A component that locates a service of an application that then will be called.

- Message construction
All kind of patterns that deal with messages itself.
     E.g.:
- Correlation Identifier  -  The correlation id is used to map messages to a transaction Messages that took part in a
           transaction should have the same correlation id.
  - Message expiration - Message have a period of time in which they need to be processed. If this time passed
           without processing they are dropped by the message broker or moved do a dead letter queue.

- Message channels
All kind of patterns that describe how message are delivered to message endpoints.
E.g.:
 - Guaranteed delivery - Producer can rely to 100% on the fact that if a topic or queue has confirmed the receive of a
           message, that the message will be later on processed. After the confirmation the resources of the JMS client are
           freed and he can continue with further actions/processing.
-  Point-to-point channels - A message channel pattern to realize a synchronous communication based on JMS.
           Sender blocks until receiver of a message has processed and delivered the result to the sender.

- Message routing
All kind of pattern that do not change the semantics of a message - no content change. But based on certain rules
     messages are forwarded to different endpoint.
E.g.:
- Aggregator - Split message are composed to a new resulting message.
- Splitter - A message is split in several parts that result in new messages.
- Message transformation
All kind of patterns that change the content of a message.
E.g.:
- Enricher - A message is enriched with further information from other services, data sources etc.
- Message translator - Transforming a message of a data model A into a message of a data model B without
          changing the semantic of the message.
- System management
All kind of patterns that can't be categorized in the above categories or have general supporting characteristics.


Saturday, January 25, 2014

GAE and JPA

How does JPA work in a GAE environment?

The Google App Engine(GAE) supports JPA, but the persistence is not done in a relational database.
It uses a NoSQL-database, using BigTable technology.
So there have to be some restrictions:
1) Polymorph queries
2) Aggregation functions
3) Transactional behavior: in a transaction only objects of the entity group may be changed
4) ...

1) Especially in an object-oriented abstraction where the data model knows about inheritance relations between the entities and they get persisted accordingly e.g. each class will be saved into a separate table. 
On querying such a structure with JPQL GAE does not allow you the use of polymorph queries:
A extends B extends C
If you are not interested in retrieving a special entity type, it's very handy to retrieve all entities based on C, the top super class, and apply abstracted treatment in cases where common treatment can be done.
So a "from C where …." - JPQL query will be possible on a relational database, but unfortunately will fail in a GAE environment.

2) Aggregation functions like SUM, AVG, … are not usable on GAE.
The like -operator is limited in use - it can only be used on the end of a search - token e.g. ... like 'Adam R%' ....

3) The transactional behavior JPA offers is limited on the GAE platform.
In one single transaction A only objects of the same entity group may be changed.
Their changes will then be applied accordingly in the database.
An entity group is a collection or grouping of objects that creates a data structure. This data structure consists of root objects and dependent objects. Instances of an entity group are so called
- root objects (starting entity)
- and from that dependent objects.
On creation of an instance objects can point to a parent entity.
The entity without a parent entity are the so called root entity.
Datasets of this entity groups reside on different nodes of the cluster in the distributed data storage. But one single dataset is normally physically available on one node. So that during data processing communication overhead is reduced.  

One may question how the relationship inside an entity group is realized as GAE does not run on a relational database?
The relationship is not handled like on relational database by using attribute/s or in JPA language properties. The primary key of the entity is used to route through the hierarchy:
In such an entity group we have a parent having a primary key and all the children have a pk that contains the parent pk. The pk normally has the type and a certain id. Hence a child pk consists of the type + id of parent + own id.
For example: Invoice(5)/InvoiceItem(1)

Because of this @ManyToMany and joining is not usable as well.