Run SQL Directly Against Files in Cloud Object Store with APEX

Run SQL Directly Against Files in Cloud Object Store with APEX

ยท

4 min read

Did you know that using APEX PL/SQL APIs, you can run SQL queries against flat files stored in Cloud Object Storage? In this short, I will take you through an example of how you can do this with Oracle Cloud Infrastructure (OCI) Object Store. The principles are the same for other object stores, e.g., AWS S3, Microsoft SharePoint, etc.

Pre-Requisites

  • You must have the root SSL certificate related to the Cloud Provider Object Store URL installed in your database wallet. This is taken care of for you if you use apex.oracle.com, Oracle Autonomous Database, or APEX Service.
  • You must have an APEX Web Credential that has read access to a bucket in OCI Object Store. See my post Secure APEX Access to OCI Object Storage for details on how to do this.

Setup

  • I downloaded a CSV file from Kaggle with International football โšฝ results from 1872 to 2022.
  • The file has 43,752 records and is 3.12 MB in size
  • I uploaded the CSV to an Oracle Cloud Infrastructure (OCI) Object Store Bucket called APEX_OCI_BLOG_FILES
  • The URL for the file in OCI is https://objectstorage.us-ashburn-1.oraclecloud.com/n/XXXX/b/APEX_OCI_BLOG_FILES/o/results.csv
  • I have an APEX web credential called APEX_OCI_BLOG_CREDENTIAL, which has read access to the OCI bucket

Oracle APEX PL/SQL APIs

We will work with two APEX PL/SQL APIs, which make querying files in cloud object stores possible. These are apex_data_parser and apex_web_service.

Determine the File Structure

The first thing we need to do is understand the file's structure.

SELECT * 
FROM   apex_data_parser.get_columns
         (apex_data_parser.discover
          (p_content => apex_web_service.make_rest_request_b
                         (p_url => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/XXXX/b/APEX_OCI_BLOG_FILES/o/results.csv',
                          p_http_method => 'GET',
                          p_credential_static_id => 'APEX_OCI_BLOG_CREDENTIAL'),
           p_max_rows  => 1000,
           p_file_name =>'test.csv' ))
  • The API apex_web_service.make_rest_request_b fetches the CSV file from OCI Object Storage and returns a BLOB. The Web Credential APEX_OCI_BLOG_CREDENTIAL is used to authenticate into OCI Object Storage
  • The result from apex_web_service.make_rest_request_b is passed into apex_data_parser.discover, which inspects the file and creates a JSON document describing the files structure
    • The name for p_file_name is not important, but the extension is. The extension is used to determine the file type that is being parsed
    • p_max_rows tells the API to only look at the first 1,000 records when inspecting the structure
  • The JSON from apex_data_parser.discover is passed into apex_data_parser.get_columns which generates a columnar representation of the JSON like this: Results from apex_data_parser.discover

Web Credentials can be used with apex_web_service outside of the context of an APEX Session, as long as the schema is mapped to an APEX Workspace. If the schema is mapped to multiple workspaces, you must first call APEX_UTIL.SET_WORKSPACE or APEX_UTIL.SET_SECURITY_GROUP_ID. See documentation for details.

Querying the File

Now that we know the file's structure, we can run queries against it. In the SQL below, I am fetching the win, loss, and draw record for Wales ๐Ÿด๓ ง๓ ข๓ ท๓ ฌ๓ ณ๓ ฟ. BTW, at the time of writing this post, it is 221 wins, 315 Losses, and 151 draws ๐Ÿ˜”

WITH scores AS 
 (SELECT line_number
  ,      TO_DATE(col001,'YYYY"-"MM"-"DD')  date_played
  ,      col002  home_team
  ,      col003  away_team
  ,      TO_NUMBER(col004)  home_score
  ,      TO_NUMBER(col005)  away_score
  ,      col006  tournament
  ,      col007  city
  ,      col008  country
  FROM   apex_data_parser.parse
               (p_content => apex_web_service.make_rest_request_b
                                (p_url => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/XXXX/b/APEX_OCI_BLOG_FILES/o/results.csv',
                                 p_http_method  => 'GET',
                                 p_credential_static_id => 'APEX_OCI_BLOG_CREDENTIAL'),
                p_skip_rows => 1,
                p_detect_data_types => 'N',
                p_file_name => 'test.csv'))
SELECT SUM(CASE 
             WHEN home_team = 'Wales' AND home_score > away_score THEN 1
             WHEN away_team = 'Wales' AND away_score > home_score THEN 1
             ELSE 0
            END)  number_of_wins
,       SUM(CASE 
             WHEN home_team = 'Wales' AND home_score < away_score THEN 1
             WHEN away_team = 'Wales' AND away_score < home_score THEN 1
             ELSE 0
            END)  number_of_losses
,       SUM(CASE 
             WHEN home_team = 'Wales' AND home_score = away_score THEN 1
             WHEN away_team = 'Wales' AND away_score = home_score THEN 1
             ELSE 0
            END)  number_of_draws
FROM   scores
WHERE  (home_team = 'Wales' OR away_team = 'Wales')
  • The BLOB returned from apex_web_service.make_rest_request_b is passed into apex_data_parser.parse, which parses the file and returns rows
  • apex_data_parser.parse returns a line number and generic columns col001 thru col300.
    • I have aliased the generic column names with the names found during discovery
    • p_skip_rows tells the API to skip the first row
    • p_detect_data_types tells the API not to inspect the data types in the file (we already know these from the previous step). This helps with performance
  • Read my previous post Easy file Parsing with the APEX_DATA_PARSER API for more details on the APEX_DATA_PARSER API

Conclusion

I showed you how to query a CSV file stored in Oracle Object Storage using APEX PL/SQL APIs. You can use the same APIs to query Excel, JSON, and XML files.

You can run these same SQL statements from APEX Components such as Interactive Reports, Classic Reports, etc. You can incorporate this technique into PL/SQL packages for more complex logic.

๐Ÿ”— Read More

ย