Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Spring Boot JPA Batch Insert: How to Skip Select?

Learn how to batch insert with Spring Boot + JPA without triggering select statements for composite keys using Persistable.isNew().
Frustrated Java developer solving JPA batch insert select issue using Persistable in Spring Boot Frustrated Java developer solving JPA batch insert select issue using Persistable in Spring Boot
  • ⚡ Spring Boot JPA batch insert can trigger unexpected SELECTs when using composite keys, making performance much worse.
  • 🔁 Hibernate issues a SELECT before INSERT because it checks if an entity exists and because composite keys do not have auto-generated IDs.
  • 🛠️ Implementing the Persistable interface lets you override JPA’s check for new entities, stopping unwanted SELECTs.
  • 🚀 Turning on Hibernate’s batch processing settings in Spring Boot makes things faster and reduces database load.
  • 🧪 Controlled tests and logging tools check that your batch inserts skip selects and keep data correct.

If you have used Spring Boot with JPA for batch inserts and seen SELECT statements you did not expect before each INSERT — especially with composite keys — you are not alone. This behavior can really hurt performance in services that handle a lot of data. The good news is there is a simple workaround using the Persistable interface. In this guide, you will learn why this happens, how it hurts performance, and how to fix it well with Spring Boot.


The Batch Insert Problem in JPA

Batch inserts are very important for performance in today's microservices and applications that handle a lot of data. You would expect JPA and Hibernate to support good batch processing. But when you use composite keys, something odd happens: a SELECT runs before every INSERT.

Why This Is a Problem

Each SELECT call adds:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • Extra trips to the database
  • More delay
  • More CPU use on the database server
  • Can slow down data pipelines for ETL or imports

In the worst cases, this can turn a batch operation that takes seconds into one that takes many minutes in the database.

What Happens Inside JPA

To understand why this happens, you need to look at how JPA (through Hibernate) handles saving entities.

Spring Data JPA often sends entity-saving tasks to Hibernate. When you call methods like save() or saveAll(), JPA does not immediately do an INSERT. Instead, it first checks if the entity already exists. It uses this logic:

  1. Is there an Identity Strategy?
    If you use GenerationType.IDENTITY (auto-incrementing ID), Hibernate sees null id values as new records and skips the SELECT.

  2. Is it a Composite Key or Manually Given ID?
    If your entity has a composite ID or you give the identifier directly, Hibernate cannot tell if the entity already exists or not. So it does a SELECT.

This check involves either:

  • Calling EntityManager.contains(entity)
  • Or calling find() on the ID

In both cases, Hibernate does a SELECT to figure out if an INSERT or UPDATE is needed.

Understanding Composite Keys in JPA

Composite keys let multiple fields together define an entity's main key. This is often helpful in showing real-world relationships, like composite foreign keys in join tables.

Implementing with @EmbeddedId

JPA gives a good way to handle composite keys using the @Embeddable and @EmbeddedId annotations. Here is an explanation:

@Embeddable
public class OrderItemId implements Serializable {
    private Long orderId;
    private Long productId;

    // Implement equals() and hashCode()
}
@Entity
public class OrderItem {
    @EmbeddedId
    private OrderItemId id;

    private int quantity;

    // Other fields, constructors, and accessors
}

Using manual identifiers means Hibernate cannot tell if a record is new. So it does a SELECT to check if it exists before inserting.

The Cause: Why Does a SELECT Occur for Every INSERT?

Simply put, in technical terms: the default way Spring Data JPA works through CrudRepository.save or JpaRepository.save is to call EntityManager.merge(entity). What happens is:

  • If the entity is new, persist() should be called, which leads to an INSERT.
  • If the entity already exists, merge() decides whether to INSERT or UPDATE.

But JPA cannot tell the difference between "new" and "existing" just from a manually given composite ID. So it does a SELECT.

Solution: Implementing Persistable<T>

The best way to fix this is to implement the org.springframework.data.domain.Persistable<T> interface. It lets you tell the framework something it cannot figure out — whether the entity is "new".

How Persistable<T> Works

The interface looks like this:

public interface Persistable<ID> {
    ID getId();
    boolean isNew();
}

By implementing it, you tell Spring Data directly to:

  • Use EntityManager.persist(), which leads to an INSERT
  • Not check the database (by returning true from isNew())
  • Stop unnecessary SELECTs and dirty checks

Complete Example:

@Entity
public class OrderItem implements Persistable<OrderItemId> {
    @EmbeddedId
    private OrderItemId id;

    private int quantity;

    @Transient
    private boolean isNew;

    public OrderItem() {}

    public OrderItem(OrderItemId id, int quantity) {
        this.id = id;
        this.quantity = quantity;
        this.isNew = true;
    }

    @Override
    public OrderItemId getId() {
        return id;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }

    public void setNew(boolean isNew) {
        this.isNew = isNew;
    }
}

Real-World Code Example

Here is how you might do a good Spring Boot batch insert using entities that implement Persistable:

List<OrderItem> items = new ArrayList<>();
Long orderId = 123L;

for (int i = 0; i < 1000; i++) {
    OrderItemId itemId = new OrderItemId(orderId, (long) i);
    OrderItem item = new OrderItem(itemId, (i + 1) * 10);
    item.setNew(true); // Explicitly mark as new
    items.add(item);
}

orderItemRepository.saveAll(items); // Fast, SQL-efficient insert

This way makes sure no SELECT happens before the insert, making batch operations much faster.

Batch Insert Configuration in Spring Boot & Hibernate

JPA does not batch automatically. You must set up Hibernate batch properties in your application.properties or application.yml:

spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.generate_statistics=true
logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE

Explanation:

  • hibernate.jdbc.batch_size=50: Puts 50 inserts into one group.
  • order_inserts=true: Tells Hibernate to sort INSERTs for entities that are alike.
  • generate_statistics=true: Lets you get information with Hibernate statistics.

⚠️ Combine these with good transaction management and EntityManager.flush() when you ingest a lot of data.

Verifying Behavior with SQL Logs and Hibernate Stats

To make sure the fix works:

  1. Turn on detailed logging for SQL.
  2. Count the SELECTs that are logged during a batch save.
  3. Check the total SQL queries against the batch size.

You can also turn on and read Hibernate statistics:

Statistics stats = entityManagerFactory.unwrap(SessionFactory.class).getStatistics();
System.out.println("Entity INSERT Count: " + stats.getEntityInsertCount());
System.out.println("Entity SELECT Count: " + stats.getEntityFetchCount());

Ideally, EntityFetchCount should be zero or very small in a batch insert.

Performance Benefits of Skipping SELECTs

Stopping SELECTs during batch inserts can give you:

  • Better database IOPS (input/output operations per second)
  • Less fighting for data and fewer locks on tables used often
  • Less network delay for each operation

According to DZone, batch performance can be 3 times faster or more by not doing needless SELECTs.

In large company applications, skipping 1,000 SELECTs during a batch insert can mean:

  • 250ms saved per operation
  • Less JDBC connection time
  • Less delay further down on the database

Edge Cases and Caveats

💣 Pitfalls to Avoid

  1. False Positives on isNew()

    • If isNew incorrectly returns true for records that already exist, it can cause a database error.
  2. Composite Keys with Generated Columns

    • If any part of the key is made by the database (@GeneratedValue), batching will not work.
  3. Mixing Persistable with Merged Entities

    • Controlling this carefully is hard if the same entity is used in different situations (insert versus update).
  4. Skipping Lifecycle Callbacks

    • @PrePersist and @PostPersist can act differently if entities are not marked as 'managed'.

Extendability and Clean Code Patterns

To cut down on repeated code across many entities, use an abstract base class for all Persistable entities:

public abstract class AbstractPersistable<T> implements Persistable<T> {
    @Transient
    protected boolean isNew = true;

    public void setNew(boolean isNew) {
        this.isNew = isNew;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }
}

Now update your entity to extend this base:

public class OrderItem extends AbstractPersistable<OrderItemId> {
    @EmbeddedId
    private OrderItemId id;

    private int quantity;

    @Override
    public OrderItemId getId() {
        return id;
    }
}

This makes code cleaner, consistent, and DRY (Don't Repeat Yourself).

Testing Strategy: Ensuring Safe Batch Inserts

Unit or integration tests can act like real batch inserts. You should check:

  • No duplicate entries
  • Foreign key rules stay in place
  • Row count in the database is the same as the number of saved entities

Here is a sample test:

@Test
@Transactional
public void verifyBatchInsertSkipsSelects() {
    List<OrderItem> items = prepareBatchOf100Items();
    orderItemRepository.saveAll(items);

    assertEquals(100, orderItemRepository.count());
}

Use mocking frameworks like DataSourceProxy or tools like p6spy to check DB queries.

Additional Optimization Tips

  • Use @Transactional at the method level to batch inserts in one session.
  • Call entityManager.flush() and entityManager.clear() every so often in loops with more than 1,000 records.
  • Check that inserts have good indexes on composite keys.

When NOT to Use This Pattern

Think twice before using Persistable.isNew() if:

  • Data is already partly in the database
  • There are complicated rules for deciding if an entity is really new
  • The batch job has random inserts and updates

Instead, split your data into only insert operations.

Clean, Controlled, and Efficient Batch Inserts in JPA

Spring Boot JPA batch insert performance can get worse when entities use composite keys but do not tell Hibernate if they are new. By implementing Persistable<T> and changing settings for the application, developers can skip needless SELECTs, get faster, and handle more work.

This is not just a small special fix. It is a very important performance gain for any company working with big or real-time data loading systems. Use this method carefully — and your systems (and users) will thank you.

Looking for more Spring Data performance tips? Join our newsletter and never miss a tip again.


Citations:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading