Aggregations
OpenSearch
ElasticSearch
Elastic
AWS
Java
Data analytics
Navigating Complex Aggregations in Amazon OpenSearch with Java

by: Jerrish Varghese

February 05, 2024

titleImage

In the realm of data analytics, the ability to efficiently aggregate and analyze data can uncover valuable insights that drive strategic decision-making. Amazon OpenSearch Service, building upon the powerful features of Elasticsearch, offers robust capabilities for handling complex data operations. This blog post delves into executing complex aggregations on an OpenSearch index and demonstrates how to process these results using Java, focusing on integration within a Spring Boot application using Spring Data Elasticsearch.

Scenario: Analyzing Building Registry Data

Imagine you are tasked with analyzing data from a buildings index in OpenSearch. This index contains records for buildings, including details about their authorization status, data availability, and their current state (active, under construction, or demolished). The objective is to perform an aggregation that filters buildings based on an office code and ward year, then categorizes them by their authorization status, data availability, and overall status.

The OpenSearch Aggregation Query

To achieve this, we start by constructing an OpenSearch aggregation query. This query will filter documents based on the given office code and ward year, then aggregate results by ward ID, diving further into nested aggregations to analyze the data at a deeper level.

Here’s a sample query:

GET /buildings/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {"nested": {
          "path": "localBody",
          "query": {"term": {"localBody.officeCode": {"value": 111111}}}
        }},
        {"nested": {
          "path": "doors.ward",
          "query": {"term": {"doors.ward.wardYear": "2000"}}
        }}
      ]
    }
  },
  "aggs": {
    "doors_info": {
      "nested": {"path": "doors"},
      "aggs": {
        "nested_ward": {
          "nested": {"path": "doors.ward"},
          "aggs": {
            "filter_by_wardYear": {
              "filter": {"term": {"doors.ward.wardYear": "2000"}},
              "aggs": {
                "by_ward": {
                  "terms": {"field": "doors.ward.ward", "size": 10},
                  "aggs": {
                    "ward_name": {"terms": {"field": "doors.ward.name", "size": 1}},
                    "ward_number": {"terms": {"field": "doors.ward.wardNumber", "size": 1}},
                    "reverse_to_doors": {
                      "reverse_nested": {},
                      "aggs": {
                        "authorization_type": {
                          "nested": {"path": "authorizationType"},
                          "aggs": {"auth_type": {"terms": {"field": "authorizationType.name", "size": 10}}}
                        },
                        "data_availability": {"terms": {"field": "isDataAvailable", "size": 2}},
                        "building_status": {"terms": {"field": "status", "size": 4}}
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Executing the Query and Processing Results in Java

To execute this query and process the results in a Spring Boot application, we leverage the ElasticsearchOperations interface provided by Spring Data Elasticsearch.

Preparing the Aggregation Query in Java

We start by constructing our aggregation query using NativeSearchQueryBuilder:

private static NativeSearchQuery prepareAggregationQuery(Long officeCode, Integer wardYear) {
    TermQueryBuilder officeCodeQuery =
            QueryBuilders.termQuery("localBody.officeCode", officeCode);

    NestedQueryBuilder localBodyQuery =
            QueryBuilders.nestedQuery("localBody", officeCodeQuery, ScoreMode.None);

    TermQueryBuilder wardYearQuery = QueryBuilders.termQuery("doors.ward.wardYear", wardYear);

    FilterAggregationBuilder filterByWardYear =
            AggregationBuilders.filter("filter_by_wardYear", wardYearQuery)
                    .subAggregation(
                            AggregationBuilders.terms("by_ward")
                                    .field("doors.ward.ward")
                                    .size(1000)
                                    .subAggregation(
                                            AggregationBuilders.terms("ward_name")
                                                    .field("doors.ward.name")
                                                    .size(1))
                                    .subAggregation(
                                            AggregationBuilders.terms("ward_number")
                                                    .field("doors.ward.wardNumber")
                                                    .size(1))
                                    .subAggregation(
                                            new ReverseNestedAggregationBuilder(
                                                            "reverse_to_doors")
                                                    .subAggregation(
                                                            AggregationBuilders.nested(
                                                                            "authorization_type",
                                                                            "authorizationType")
                                                                    .subAggregation(
                                                                            AggregationBuilders
                                                                                    .terms(
                                                                                            "auth_type")
                                                                                    .field(
                                                                                            "authorizationType.name")
                                                                                    .size(10)))
                                                    .subAggregation(
                                                            AggregationBuilders.terms(
                                                                            "data_availability")
                                                                    .field(
                                                                            "isDataAvailable")
                                                                    .size(2))
                                                    .subAggregation(
                                                            AggregationBuilders.terms(
                                                                            "building_status")
                                                                    .field("status")
                                                                    .size(4))));

    NestedAggregationBuilder doorsInfoAgg =
            AggregationBuilders.nested("doors_info", "doors")
                    .subAggregation(
                            AggregationBuilders.nested("nested_ward", "doors.ward")
                                    .subAggregation(filterByWardYear));

    return new NativeSearchQueryBuilder()
            .withQuery(QueryBuilders.boolQuery().filter(localBodyQuery))
            .withAggregations(doorsInfoAgg)
            .build();
}

Executing the Query and Extracting Results

With the query prepared, we then execute it and process the results:

public List<BuildingCountAggregationResult> findAggregationsByOfficeCodeAndWardYear(Long officeCode, Integer wardYear) {
    NativeSearchQuery searchQuery = prepareAggregationQuery(officeCode, wardYear);
    SearchHits<BuildingRegistry> searchHits = elasticsearchOperations.search(searchQuery, BuildingRegistry.class);
    return extractResults(Objects.requireNonNull(searchHits.getAggregations()));
}

Processing Aggregation Results

The extractResults method navigates through the nested aggregation structure, extracting relevant data:

private List<BuildingCountAggregationResult> extractResults(Aggregations aggregations) {
    // Implement the logic to parse and map aggregation results based on the structure
    // demonstrated in the sample aggregation query result
}

This method would carefully traverse through the aggregation structure, mapping the extracted data into a list of BuildingCountAggregationResult objects, each representing summarized information for a ward.

Conclusion

Complex aggregations in Amazon OpenSearch open a gateway to deeply understanding your data. By leveraging Java and Spring Data Elasticsearch, developers can not only execute these powerful queries but also seamlessly integrate the insights gained into their applications. This blend of powerful search capabilities and programmatic access to aggregation results empowers developers to build richer, data-driven features that enhance the value of their software solutions.

In navigating the challenges of data analysis, OpenSearch and Spring Data together offer a robust toolkit for transforming raw data into actionable insights, proving indispensable in the modern data landscape.

contact us

Get started now

Get a quote for your project.
logofooter
title_logo

USA

Edstem Technologies LLC
254 Chapman Rd, Ste 208 #14734
Newark, Delaware 19702 US

INDIA

Edstem Technologies Pvt Ltd
Office No-2B-1, Second Floor
Jyothirmaya, Infopark Phase II
Ernakulam, Kerala 682303

© 2024 — Edstem All Rights Reserved

Privacy PolicyTerms of Use