Understanding Hibernate's Application-Level Repeatable Reads

# Understanding Hibernate's Application-Level Repeatable Reads When invoking methods like `entityManger.find`, and `CrudRepository.findById`, or querying by natural ID, Hibernate first checks the first-level cache (a.k.a Session or persistence context). If the requested entity is already present in the cache, Hibernate will not execute an SQL query, it will return the cached entity instead. What about other query methods? Does Hibernate return cached entities in those cases as well? Let's explore this by diving into the source code of ([Hibernate-ORM 6.6](https://github.com/hibernate/hibernate-orm/tree/6.6)) and walking through a concrete example of how such a query is handled. ```java @Repository public interface AccountRepository extends JpaRepository<Account, Long> { Account findByName(String name); } @Service public class AccountService { @Transactional public void readValueTwice(String name) { Account account = accountRepository.findByName(name); //... account = accountRepository.findByName(name); //... } } ``` Here is an example where a transaction reads an account's value twice by its name. When `findByName` method is invoked, and the corresponding query is executed by Hibernate, the `doExecuteQuery` method of Hibernate `JdbcSelectExecutorStandardImpl` is eventually called as part of the query execution pipeline. ```java public class JdbcSelectExecutorStandardImpl implements JdbcSelectExecutor { //... private <T, R> T doExecuteQuery( JdbcOperationQuerySelect jdbcSelect, JdbcParameterBindings jdbcParameterBindings, ExecutionContext executionContext, RowTransformer<R> rowTransformer, Class<R> domainResultType, int resultCountEstimate, StatementCreator statementCreator, ResultsConsumer<T, R> resultsConsumer) { //... final JdbcValues jdbcValues = resolveJdbcValuesSource( executionContext.getQueryIdentifier( deferredResultSetAccess.getFinalSql() ), jdbcSelect, resultsConsumer.canResultsBeCached(), executionContext, deferredResultSetAccess ); //... final RowReader<R> rowReader = ResultsHelper.createRowReader( session.getFactory(), rowTransformer, domainResultType, jdbcValues ); final RowProcessingStateStandardImpl rowProcessingState = new RowProcessingStateStandardImpl( valuesProcessingState, executionContext, rowReader, jdbcValues ); final T result = resultsConsumer.consume( jdbcValues, session, processingOptions, valuesProcessingState, rowProcessingState, rowReader ); //... return result; } } ``` In the `doExecuteQuery` method of `JdbcSelectExecutorStandardImpl`, the `resolveJdbcValuesSource` method is first invoked to execute the prepared statement. The result of this execution is stored in a `JdbcValues` object, which represents the raw data returned from the database. Next, a instance of `StandardRowReader` named `rowReader` is created by `ResultsHelper.createRowReader` method. This object is responsible for coordinating the reading and transformation of result values from the `jdbcValues`. A`RowProcessingStateStandardImpl` instance, `rowProcessingState`, is then created to maintain the state associated with reading and processing data of each row. Both `rowReader` and `rowProcessingState` are passed to the `consume` method of the `resultsConsumer` (in this case, it is an instance of `ListResultsConsumer`). This method orchestrates the reading and transformation of the result rows, ultimately returning the final query result. ```java public class ListResultsConsumer<R> implements ResultsConsumer<List<R>, R> { //... @Override public List<R> consume( JdbcValues jdbcValues, SharedSessionContractImplementor session, JdbcValuesSourceProcessingOptions processingOptions, JdbcValuesSourceProcessingStateStandardImpl jdbcValuesSourceProcessingState, RowProcessingStateStandardImpl rowProcessingState, RowReader<R> rowReader) { //... rowReader.startLoading( rowProcessingState ); //... try { final JavaType<R> domainResultJavaType = resolveDomainResultJavaType( rowReader.getDomainResultResultJavaType(), rowReader.getResultJavaTypes(), typeConfiguration ); //... final Results<R> results; //... else { results = new Results<>( domainResultJavaType, initialCollectionSize ); } final int readRows; //... else { readRows = read( rowProcessingState, rowReader, results ); } //... return results.getResults(); } //... } //... private static <R> int read( RowProcessingStateStandardImpl rowProcessingState, RowReader<R> rowReader, Results<R> results) { int readRows = 0; while ( rowProcessingState.next() ) { results.add( rowReader.readRow( rowProcessingState ) ); rowProcessingState.finishRowProcessing( true ); readRows++; } return readRows; }//... } ``` The call to `rowReader.startLoading( rowProcessingState )` create instances of `InitializerData` for `rowProcessingState`, and `rowReader`. These instances will later hold state information needed to resolve and initialize entity instances during row processing. The method `read(rowProcessingState, rowReader, results)` iterates over all result rows. For each row, it calls `rowReader.readRow(rowProcessingState)`, adds the result to the `results` object, and completes processing for that row. Let's check the implementation details of the `readRow` method of `StandardRowReader` class: ```java public class StandardRowReader<T> implements RowReader<T> { //... @Override public T readRow(RowProcessingState rowProcessingState) { coordinateInitializers( rowProcessingState ); final T result; //... else { if ( resultAssemblers.length == 1 && rowTransformer == null ) { result = (T) resultAssemblers[0].assemble( rowProcessingState ); } //... } //... return result; } private void coordinateInitializers(RowProcessingState rowProcessingState) { for ( int i = 0; i < resultInitializers.length; i++ ) { resultInitializers[i].resolveKey( resultInitializersData[i] ); } //... }//... } ``` The invocation `resultInitializers[i].resolveKey( resultInitializersData[i] )` uses the current row's values (from `jdbcValues`) and the associated `EntityPersister` (stored as field `concreteDescriptor` in `resultInitializersData[i]`) to construct an `EntityKey`. It then checks whether this key exists in the first-level cache(the field `HashMap<EntityKey, EntityHolderImpl> entitiesByKey` of `StatefulPersistenceContext`). If the cache contains the entity, its associated` EntityHolderImpl` is used to populate the `entityHolder` field of the corresponding `InitializerData`. Then the state of the `InitializerData` set to `INITIALIZED`, and its `instance` field is assigned the `entity` field from the `EntityHolderImpl`. This logic is encapsulated primarily within the `EntityInitializerImpl`, and `StatefulPersistenceContext` classes. The following code snippet highlights several key methods involved. The resolved entity instance is then returned as the result of the `readRow` method and ultimately becomes part of the final result list. ```java public class EntityInitializerImpl extends AbstractInitializer<EntityInitializerImpl.EntityInitializerData> implements EntityInitializer<EntityInitializerImpl.EntityInitializerData> { //... protected void resolveEntityKey(EntityInitializerData data, Object id) { EntityPersister concreteDescriptor = data.concreteDescriptor; if ( concreteDescriptor == null ) { concreteDescriptor = data.concreteDescriptor = determineConcreteEntityDescriptor( data.getRowProcessingState(), discriminatorAssembler, entityDescriptor ); assert concreteDescriptor != null; } data.entityKey = new EntityKey( id, concreteDescriptor ); } //... @Override public void resolveInstance(EntityInitializerData data) { //... final PersistenceContext persistenceContext = rowProcessingState.getSession() .getPersistenceContextInternal(); data.entityHolder = persistenceContext.claimEntityHolderIfPossible( data.entityKey, null, rowProcessingState.getJdbcValuesSourceProcessingState(), this ); //... } protected void resolveEntityInstance1(EntityInitializerData data) { //... else { final Object existingEntity = data.entityHolder.getEntity(); if ( existingEntity != null ) { data.setInstance( data.entityInstanceForNotify = existingEntity ); if ( data.entityHolder.getEntityInitializer() == null ) { assert data.entityHolder.isInitialized() == isExistingEntityInitialized( existingEntity ); if ( data.entityHolder.isInitialized() ) { data.setState( State.INITIALIZED ); } //... } //... } //... } //... }//... } public class StatefulPersistenceContext implements PersistenceContext { //... private HashMap<EntityKey, EntityHolderImpl> entitiesByKey; //... private Map<EntityKey, EntityHolderImpl> getOrInitializeEntitiesByKey() { if ( entitiesByKey == null ) { entitiesByKey = CollectionHelper.mapOfSize( INIT_COLL_SIZE ); } return entitiesByKey; } //... @Override public EntityHolder claimEntityHolderIfPossible( EntityKey key, Object entity, JdbcValuesSourceProcessingState processingState, EntityInitializer<?> initializer) { final Map<EntityKey, EntityHolderImpl> entityHolderMap = getOrInitializeEntitiesByKey(); final EntityHolderImpl oldHolder = entityHolderMap.get( key ); final EntityHolderImpl holder; if ( oldHolder != null ) { //... // Skip setting a new entity initializer if there already is one owner // Also skip if an entity exists which is different from the effective optional object. // The effective optional object is the current object to be refreshed, // which always needs re-initialization, even if already initialized if ( oldHolder.entityInitializer != null || oldHolder.entity != null && oldHolder.state != EntityHolderState.ENHANCED_PROXY && ( processingState.getProcessingOptions().getEffectiveOptionalObject() == null || oldHolder.entity != processingState.getProcessingOptions().getEffectiveOptionalObject() ) ) { return oldHolder; } //... } //... }//... } ``` ## Summary When querying using methods other than `entityManager.find`, `CrudRepository.findById`, or queries by natural ID. Regardless of whether the first-level cache already contains the entity or not, Hibernate will always execute the SQL statement. After executing the query, Hibernate constructs an `EntityKey` from the result. It then uses this key to check the persistence context (first-level cache). If the entity is already present, Hibernate will discard the query result and return the cached entity instead. This is how Hibernate provides application-level repeatable reads. ### REPEATABLE_READ Isolation Level vs. Hibernate's Application-Level Repeatable Reads 1. The REPEATABLE_READ Isolation level prevents not only non-repeatable reads but also dirty reads. In modern databases like PostgreSQL, it can also prevent phantom reads. On the other hand, if application-level repeatable reads is used with the READ_UNCOMMITTED isolation level, dirty reads might still occur because the underlying database allows them. Additionally, application-level repeatable reads cannot prevent phantom reads by itself. 2. The REPEATABLE_READ Isolation level ensures that all reads in a transaction see data from the same snapshot. In contrast, application-level repeatable reads only guarantees that an entity read multiple times within the same Hibernate session will return the same in-memory object (from the first-level cache). When used with the READ_UNCOMMITTED or READ_COMMITTED isolation level, this can lead to unexpected results if a transaction reads some parts of the data directly from the database (potentially showing updates from other transactions), while other parts are served from Hibernate's session cache. (see [this example](https://stackoverflow.com/questions/25106636/strategies-for-dealing-with-concurrency-issues-caused-by-stale-domain-objects-g)) The cache is not aware of changes committed by other transactions unless the entities are explicitly refreshed. This [Github repository](https://github.com/EddieChoCho/isolation-level-demo) conatins showcases of Hibernate's application-level repeatable reads ## References 1. [Hibernate-ORM Github Repository](https://github.com/hibernate/hibernate-orm) 2. [Hibernate User Guide](https://docs.jboss.org/hibernate/stable/orm/userguide/html_single/Hibernate_User_Guide.html) 3. [The JPA and Hibernate first-level cache](https://vladmihalcea.com/jpa-hibernate-first-level-cache/) 4. [How does Hibernate guarantee application-level repeatable reads](https://vladmihalcea.com/hibernate-application-level-repeatable-reads/#:~:text=A%20Hibernate%20persistence%20context%20can,have%20application%2Dlevel%20repeatable%20reads) 5. [Designing Data-Intensive Applications](https://dataintensive.net/)