Bulk Address Validation & Geocoding with Oracle eLocation & APEX

Bulk Address Validation & Geocoding with Oracle eLocation & APEX

ยท

8 min read

Introduction

Much has been written about the native Map regions in APEX; most of it by Plamen Mushkov. APEX Map regions provide a Low Code way for visualizing your addresses (and related data) using SQL.

With the introduction of the Geocoded Address item type in APEX 21.2, you can also validate and geocode addresses during user input. This item type allows you to validate Geocode addresses before they enter your database.

There are many scenarios, however, where invalid addresses enter your database from other sources (e.g., integrations, bulk uploads, etc.), or you may already have invalid legacy addresses.

This post will focus on how we can use the same underlying technology as the APEX Geocoded Address item type, i.e., the Oracle eLocation APIs, to perform bulk validation and geocoding of addresses that are already in your database.

I will illustrate this with a Demo APEX App that allows you to upload an Excel spreadsheet of addresses, validate them using the eLocation API, and review the validation results. You can get the code for the Demo APEX App from my GitHub repository here.

Here is a video showing the Demo APEX App in action:

Example Use Case

There are many possible use cases for address validation and Geocoding. One that I am familiar with is validating addresses in Oracle e-Business Suite (EBS).

Many organizations have hundreds of thousands of Customer and Supplier addresses in EBS that have never been validated.

Having Validated and Geocoded addresses in EBS has many advantages, including the following:

  • Reduce mistakes in shipping
  • Improved logistics and route planning
  • Ability to Visualize customers and suppliers (and their related data) to make informed business decisions
  • Improve customer demographic awareness

Oracle eLocation API

The eLocation APIs are APIs that allow you to pass in an XML document containing addresses to validate. The API then analyzes these addresses and returns an XML or JSON response with the validation details and related Longitudes and Latitudes. The eLocation APIs do not require any authentication.

Postman Example Request

Let's see an example request using Postman. In this example, we are going to validate two addresses.

  • Endpoint https://elocation.oracle.com/geocoder/gcserver
  • Parameters
    • xml_request XML payload containing addresses to validate
    • format determines if you want an XML or JSON formatted response

Screenshot of Request

Postman_Example_1.png Postman_Example_2.png

Screenshot of the Response

Postman_Example_3.png

Example XML Payload

The XML below is from the sample request above.

<?xml version="1.0" encoding="UTF-8"?>
<geocode_request
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../geocode_request.xsd">
    <address_list>
        <input_location id="1101">
            <input_address match_mode="RELAX_POSTAL_CODE">
                <gdf_form name = "" street="3201+NATIONAL+AVENUE+" builtup_area="SAN+DIEGO" order1_area="CA" postal_code="92113" country="US"/>
            </input_address>
        </input_location>
        <input_location id="1102">
            <input_address match_mode="RELAX_POSTAL_CODE">
                <gdf_form name = "" street="5036+IMPERIAL+AVE+" builtup_area="SAN+DIEGO" order1_area="CA" postal_code="92113" country="US"/>
            </input_address>
        </input_location>
    </address_list>
</geocode_request>

I will now discuss some of the key fields from the above XML.

Address Representation. In the above example, I am using gdf_form to determine which address fields need to be passed in for validation. The gdf_form is of type gdf_formType. Here is an excerpt of the XSD showing all of the possible input fields for the type gdf_formType:

<xsd:complexType name="gdf_formType">
    <xsd:attribute name="name" type="xsd:string"/>
    <xsd:attribute name="street" type="xsd:string"/>
    <xsd:attribute name="intersecting_street" type="xsd:string"/>
    <xsd:attribute name="builtup_area" type="xsd:string"/>
    <xsd:attribute name="order8_area" type="xsd:string"/>
    <xsd:attribute name="order2_area" type="xsd:string"/>
    <xsd:attribute name="order1_area" type="xsd:string"/>
    <xsd:attribute name="country" type="xsd:string"/>
    <xsd:attribute name="postal_code" type="xsd:string"/>
    <xsd:attribute name="postal_addon_code" type="xsd:string"/>
  </xsd:complexType>

You can see details of other address formats by examining the XSD in the documentation Address Representation.

Match Mode. The match_mode field is another critical element of the XML payload. Match Mode determines how closely the attributes of an input address must match the addresses used by Oracle for Geocoding. Input addresses can include different ways of representing the same thing (such as Street and the abbreviation St), and they can consist of minor errors (such as the wrong postal code, even though the street address and city are correct and the street address is unique within the city).

In the above example, I use a match_mode of RELAX_POSTAL_CODE. Definition - The postal code (if provided), base name, house or building number, and street type can be different from the data used for geocoding.

You can learn more about the Match Modes here.

Location ID. The location_id field allows you to pass in a unique identifier for each address that is being validated. The same ID values are returned in the response making it easy to match input addresses with the Geocoded addresses.

Example XML Response

Now let's look at an example XML response. The response below was generated using the example XML payload above.

<?xml version="1.0" encoding="UTF-8"?>
<geocode_response>
    <geocode id="1101" match_count="1">
        <match sequence="0" longitude="-117.12509" latitude="32.69598" match_code="1"  error_message="??X?#ENUT?B281CP?" match_vector="??010101010??000?" srid="8307">
            <output_address name="" house_number="3201" street="National Ave" settlement="San Diego" builtup_area="San Diego" municipality="San Diego" order1_area="CA" order8_area="" country="US" language="ENG" postal_code="92113" postal_addon_code="" side="R" percent="0.88" edge_id="1175166931" />
        </match>
    </geocode>
    <geocode id="1102" match_count="1">
        <match sequence="0" longitude="-117.08563" latitude="32.70428" match_code="1"  error_message="??X?#ENUT?B281CP?" match_vector="??010101010??000?" srid="8307">
            <output_address name="" house_number="5036" street="Imperial Ave" settlement="San Diego" builtup_area="San Diego" municipality="San Diego" order1_area="CA" order8_area="" country="US" language="ENG" postal_code="92113" postal_addon_code="" side="L" percent="1.0" edge_id="1175765016" />
        </match>
    </geocode>
</geocode_response>

Now let's look at some of the key fields from the response.

Number of Matches. The match_count field indicates how many matches were found for the input address. Multiple potential addresses are not often returned, but it is possible, so you need to code for it.

Match Code . The match_code field is a number that summarizes which input address attributes matched the data used by Oracle for geocoding. You can learn more about match codes here.

You can use match_code to indicate to your users how close a match the input address was. For example, you may decide that Match Codes other than 1 or 2 indicate an invalid address that needs to be corrected.

  • Match Code 1 (Exact match: the city name, postal code, street base name, street type (and suffix or prefix or both, if applicable), and house or building number match the data used for geocoding)
  • Match Code 2 (The city name, postal code, street base name, and house or building number match the data used for geocoding, but the street type, suffix, or prefix does not match)

Match Vector . The field match_vector contains a string that indicates how each address attribute has been matched against the data used for geocoding. Use this field if you need to analyze which fields in the input address were matched. You can read more here.

Example JSON Response

Although the API only accepts XML input, you can specify a JSON response by setting the parameter format to JSON. Although more convenient than XML, the JSON response does not contain as much detail as the XML response.

[
    [
        {id: "1101",x: -117.12509,y: 32.69598,houseNumber: "3201",street: "National Ave",settlement: "San Diego",municipality: "San Diego",region: "CA",postalCode: "92113",country: "US",language: "ENG",name: "",edgeId: 1175166931,percent: 0.88,side: "R",matchVector: "??010101010??000?",accuracy: 1,matchCode: 1
        }
    ],
    [
        {id: "1102",x: -117.08563,y: 32.70428,houseNumber: "5036",street: "Imperial Ave",settlement: "San Diego",municipality: "San Diego",region: "CA",postalCode: "92113",country: "US",language: "ENG",name: "",edgeId: 1175765016,percent: 1.0,side: "L",matchVector: "??010101010??000?",accuracy: 1,matchCode: 1
        }
    ]
]

APEX Demo Application

Now that we better understand the API, we can incorporate it into our batch validation Demo APEX App. This section will focus on code in the Demo App, which can be downloaded here GitHub Repository.

Uploading Addresses

Page 1 allows you to upload an Excel Spreadsheet of Addresses to validate. I have included a sample Excel file in the GitHub Repository.

The procedure cndemo_addr_utl_pk.import_file is called from Page 1 and it parses the Excel file using apex_data_parser and inserts records into the table cndemo_addresses.

Processing Addresses

The APEX Automation validate-addresses calls the procedure cndemo_addr_utl_pk.validate_addressses to loop through batches of 10 PENDING addresses at a time. I have implemented this as an Automation so that it can run on a schedule in the background.

For each batch of addresses, the procedure call_elocation is called. This procedure calls the eLocation API, parses the XML response, and updates the table cndemo_addresses with match results. At the end of the procedure, any address records not returned by the eLocation API are marked as errors.

Timing

The process can validate up to 29 addresses per second. This level of performance may sound fast, but processing tens of thousands of addresses can take a while. If you need to process hundreds of thousands of addresses, I recommend using a paid address validation solution such as Smarty. The good news is you only need to validate an address once ๐Ÿ˜Š.

I tested the validation of 200 addresses using three different batch sizes. The optimal batch size appears to be about 50.

  • Batch Size: 10, Total Seconds: 9, Addresses Per Second: 22
  • Batch Size: 50, Total Seconds: 7, Addresses Per Second: 29
  • Batch Size: 100, Total Seconds: 7, Addresses Per Second: 29

Other Functionality

  • Page 22 of the Demo App maps a single address using the 'Display Map' item type. This item type expects GeoJSON as input. The function address_geojson fetches the longitude and latitude for a given address and returns the appropriate GeoJSON.
  • Page 30 was created using the 'Map' type in the Create Page Wizard. This page shows a heat map illustrating where the addresses are focused.

Conclusion

As we have seen in this post, valid addresses bring important benefits to your business. In addition, having accurately Geocoded addresses can turn your addresses (and related data) into valuable insights that can drive business decisions.

The Oracle eLocation APIs provide the ability for you to both validate and Geocode your addresses.

๐Ÿ”— Read More

ย