# The Robots are Coming for your ATP APEX Website

# Introduction

I recently used the APEX 22.2 Meta Tags feature to add [Open Graph](https://ogp.me/) tags to my APEX Developer Blogs website. Open Graph meta tags are the tags that allow you to create fancy Tweets like the below:

![Oracle APEX Developer Blogs](https://cdn.hashnode.com/res/hashnode/image/upload/v1703083363295/8323253f-f7ca-4426-aba6-8f630aa1993d.png align="center")

For more on APEX Meta Tags, check out [this post](https://blog.apexapplab.dev/make-your-app-stand-out-using-meta-tags-better-seo-and-shareability) from @[Plamen Mushkov](@plamen9).

My problem was that the title and card were not displayed when I entered the URL containing Meta Tags into Twitter, LinkedIn, etc.

<div data-node-type="callout">
<div data-node-type="callout-emoji">🤖</div>
<div data-node-type="callout-text">After some digging around, I discovered the issue was caused by <code>robots.txt</code>.</div>
</div>

At this point, I should say that the APEX Developer Blogs website is hosted on the Oracle OCI APEX Service (ATP Lite). I am also using a Vanity URL, which is another important factor.

Read [this post](https://tm-apex.hashnode.dev/boost-your-brand-identity-with-vanity-urls-for-oracle-autonomous-database-via-oci-load-balancer) from @[Timo Herwix](@timoherwix) to find out how to add a vanity URL to your OCI APEX Service or ATP APEX Apps.

# What is robots.txt

`robots.txt` is a text file webmasters use to communicate with web crawlers and other web robots, indicating which parts of a website should not be processed or scanned. This **file is placed in the website's root directory and is publicly accessible** to provide instructions based on the Robots Exclusion Standard. Here is a [link](https://developers.google.com/search/docs/crawling-indexing/robots/intro) to a Google guide to robots.txt.

You might wonder what robots.txt has to do with Meta Tags on social media sites. Twitter, for example, checks your robots.txt before scanning your URLs. Here is an excerpt from their [Cards Getting Started Guide](https://developer.twitter.com/en/docs/twitter-for-websites/cards/guides/getting-started).

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Twitter’s crawler respects Google’s <a target="_blank" rel="noopener noreferrer nofollow" class="reference external" href="https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt" style="pointer-events: none">robots.txt specification</a> when scanning URLs. If a page with card markup is blocked, no card will be shown. If an image URL is blocked, no thumbnail or photo will be shown.</div>
</div>

# The Issue

The issue is caused by the default robots.txt served by Oracle OCI, which looks like the one below. This essentially says no one can crawl any websites on this server.

```plaintext
User-agent: *
Disallow: /
```

Even without Vanity URLs, if you have an Always Free ATP instance, you can see the default robots.txt by adding robots.txt to your Always Free generated URL:

![Oracle OCI Always Free Default robots.txt](https://cdn.hashnode.com/res/hashnode/image/upload/v1703086952108/2d991431-1182-47e6-9fe9-48b0cda721df.png align="center")

The second part of the issue is that because it is a fully managed solution, you cannot access the webserver to change the robots.txt file.

# The Solution

## Vlad's Solution

I found the solution in the comments of this [idea](https://apex.oracle.com/ideas/FR-3105) from the APEX Ideas App. The bullet points below are a direct copy and paste from Vlad **Uvarov's response to the idea.** Vlad is from the APEX Development team.

> So, the possible solutions (some of which were already mentioned in this Idea):
> 
> 1. A **web server** on Compute for the static website, including /robots.txt, or the **customer-managed ORDS** with its own docroot, fronted by the vanity URL LBaaS. This would be an overkill for just the *robots.txt* requirement, but customers may also be interested in other benefits these options bring. No extra cost (can use Always Free resources).
>     
> 2. An **ORDS RESTful module** that prints out the desired static or dynamic *robots.txt* directives, and an LBaaS Rule Set with the 302 URL redirect rule for the /robots.txt path. Crawlers usually follow these 3xx redirects (see [here](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt#http-status-codes)). The same RESTful module can dynamically generate the Sitemap based on APEX dictionary views. No extra cost.
>     
> 3. A static *robots.txt* in a public **Object Storage** bucket, and an LBaaS Rule Set with the 302 URL redirect. Object Storage comes with a large number of [free requests](https://oracle.com/free), but even if exceeded, the cost would typically be under $1 per month.
>     
> 4. A **Web Application Firewall (WAF)** rule for [Request Access Control](https://docs.oracle.com/en-us/iaas/Content/WAF/AccessControl/create_access_rule_request_control.htm), which is applied to the /robots.txt path and performs the HTTP Response [Action](https://docs.oracle.com/en-us/iaas/Content/WAF/Actions/create_action.htm) to return your static *robots.txt* content as the response body with 200 OK status code. This essentially intercepts and overrides the default /robots.txt. Attach this WAF to your vanity URL LBaaS. No extra cost (but you need to use a Paid account and be within the [number of requests limit](https://www.oracle.com/security/cloud-security/web-application-firewall/)) and you get all benefits of WAF.
>     

## Implementing Bullet 3 Static robots.txt

I will focus on implementing option three, which I saw as the simplest solution (in my opinion). This solution depends on having the Vanity URL configured and access to the OCI Load Balancer on which the Vanity URL is configured.

### 1 - Create robots.txt

Using any text editor, create a file called robots.txt. Follow this [Google Guide](https://developers.google.com/search/docs/crawling-indexing/robots/intro) to find out more about the specifications.

TL;DR, my robots.txt now looks like this. It is saying that anyone can crawl my site.

```plaintext
User-agent: * 
Disallow:
User-agent: AdsBot-Google
Allow: /

Sitemap: https://cloudnueva.com/ords/cnapps/website/sitemap
```

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">If you want to improve your APEX Site's SEO, you should also read <a target="_blank" rel="noopener noreferrer nofollow" href="https://www.insum.ca/search-engine-optimization-with-apex-creating-a-google-sitemap/" style="pointer-events: none">this post</a> from Insum, which shows you how to create a SiteMap using ORDS.</div>
</div>

## 2 - Upload robots.txt to a Public OCI Bucket

Create a Public Bucket in OCI:

![Public OCI Storage Bucket](https://cdn.hashnode.com/res/hashnode/image/upload/v1703087630970/f85c0641-e461-44d0-ac90-f87d7852be16.png align="center")

Upload your robots.txt to the bucket and save the URL somewhere:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088019942/b560bd1b-a625-4cda-bbb2-b0d7a70d29be.png align="center")

### 3 - Add a Re-Direct to the Load Balancer

From the OCI console, access the Loadbalancer where your Vanity URL is configured:

![OCI Load Balancer](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088244855/bdc3c1f1-880c-4540-ab0c-59bbf8511490.png align="center")

Then click the 'Rule sets' link to view the Rule sets you have configured.

![OCI Load Balancer Rule Sets](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088223568/f75e575f-fd83-41f2-8872-cc30d2d96c7d.png align="center")

After selecting the appropriate Rule set, click on the 'URL redirect rules' link:

![OCI Load Balancer Rule Sets redirect rules](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088407286/ba2d7bbd-6dba-4e3d-9aae-9c4080e539a7.png align="center")

Click Edit, and add the below rule:

![OCI Load Balancer redirect rule for robots.txt](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088622810/b4ffef83-7d02-4cc8-b5b4-e7862bd018bf.png align="center")

* This rule tells the Load Balancer to redirect requests to www.yourdomain.com/robots.txt to your static file hosted in your OCI bucket.
    
* Note that the 'Hostname' and the 'Path' are broken into two parts in the rule.
    
* **Note**: If you have subdomains, e.g., apps.cloudnueva.com, then you need to set up redirect rules for them also.
    

### 4 - Test Your robots.txt

If all is well, you should be able to access `https://yourdomin.com/robots.txt` and be re-directed to your robots.txt file. For example, `https://cloudnueva.com/robots.txt`

![Cloud Nueva robots.txt](https://cdn.hashnode.com/res/hashnode/image/upload/v1703088798834/c3e31306-5c38-4df9-a131-b51bbbe0ff9a.png align="center")

# Other Testing Tools

Here are some other tools that you may find useful.

## Google Search Console

If you are using the Google Search Console, then verify that Google can read your robots.txt and see what value it has for it. Sign in to the Google Search Console and click Settings, then Open the robots.txt report.

![Google Search Console Settings](https://cdn.hashnode.com/res/hashnode/image/upload/v1703089147228/04b5c007-633b-4ace-a8e3-5b3387562efe.png align="center")

Click on a specific file to see what Google thinks is in your robots.txt file. You can also submit a request for Google to recheck the robots.txt from here.

![Google Search Console robots.txt](https://cdn.hashnode.com/res/hashnode/image/upload/v1703089320931/119f7828-d3a6-4f39-965f-24c2f2c50d78.png align="center")

Another side effect of Google not being able to see the correct robots.txt file is that Google also won't read your sitemap. Once you have added your new robots.txt, verify and resubmit your sitemap under the 'Sitemaps' menu option.

![Google Search Console Sitemaps](https://cdn.hashnode.com/res/hashnode/image/upload/v1703089441686/43571817-484e-4eef-bd01-ca10da2ee286.png align="center")

## Other Testing Tools

Various SEO websites have free tools that allow you to check that certain agents can access your robots.txt, e.g., [https://technicalseo.com/tools/robots-txt/](https://technicalseo.com/tools/robots-txt/)

![robots.txt testing tool screenshot](https://cdn.hashnode.com/res/hashnode/image/upload/v1703089858476/de40ebe5-ed99-4a9d-919b-206e7cc9f910.png align="center")

## Test Your Social Posts

Finally, back to the point of this post. Now that we have fixed our robots.txt, we should be able to see our fancy Meta Tag-driven cards when we post on Twitter and LinkedIn. Open Graph has a nice little tool that not only tests your Meta Tags but also previews what your posts would look like on various platforms. You can access the tool by following [this link](https://www.opengraph.xyz/).

![Open Graph Testing Tool Screenshot](https://cdn.hashnode.com/res/hashnode/image/upload/v1703090147819/ccaafcd4-988e-43e3-be11-2b4d05945462.png align="center")

# Conclusion

I accept this post as a bit of an edge case but as APEX continues to gain in popularity I fully expect people to start hosting APEX-based websites where SEO is important to them. Given this, we need to make sure APEX sites can match those created by other content creation platforms.

Ideally, this APEX Ideas App [idea](https://apex.oracle.com/ideas/FR-3105) is taken seriously by the APEX Development team and they make it easier to deal with robots.txt (and sitemaps for that matter).

## References

* [Insum Search Engine Optimization With APEX: Creating a Google Sitemap](https://www.insum.ca/search-engine-optimization-with-apex-creating-a-google-sitemap/)
    
* [Make your app stand out using Meta Tags - better SEO and shareability](https://blog.apexapplab.dev/make-your-app-stand-out-using-meta-tags-better-seo-and-shareability)
    
* [Boost Your Brand Identity with Vanity URLs for Oracle Autonomous Database via OCI Load Balancer](https://tm-apex.hashnode.dev/boost-your-brand-identity-with-vanity-urls-for-oracle-autonomous-database-via-oci-load-balancer)
    
* [Open Graph Meta Tag Testing Tool](https://www.opengraph.xyz/)
