This is a submission for the Xano AI-Powered Backend Challenge: Production-Ready Public API
What I Built
I built LucideCrawl — a production-ready public API that allows developers to safely ingest and analyze web content at scale while protecting users from phishing, scams, and malicious sites.
LucideCrawl provides four core capabilities, all implemented entirely in Xano with robust authentication, per-user rate limiting, usage tracking, and audit logging:
- Phishing & Safety Detection – AI-powered, real-time evaluation of URLs to detect scams, impersonation, urgent threats, and security risks.
- Ask Questions About Web Pages – Extract clean content and answer natural language questions grounded in the page content. Perfect for RAG, summarization, or compliance checks.
- Sitemap-Based Bulk Ingestion – Crawl all pages from a sitemap.xml with include/exclude path filtering.
- Full Website Crawl – Depth-controlled crawling of entire websites with domain/path rules, delivering structured, clean data.
LucideCrawl is ideal for:
- Browser extensions and email tools needing instant phishing detection
- AI agents requiring safe, grounded web data
- Knowledge platforms building search indexes or SEO audits
- Security teams monitoring brand impersonation
All core logic, authentication, API key management, and rate limiting are built natively in Xano.
API Documentation
Base URL: https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx
Authentication:
- All endpoints require an
x-api-keyheader. - API keys are generated upon signup and displayed only once. Users can manage them in their account.
Rate Limits (monthly, per user):
| Plan | trust_scan | ask_the_page | load_sitemap | site_crawl |
|---|---|---|---|---|
| Free | 4 | 5 | 5 | 2 |
| Pro | 100 | 5,000 | 50 | 20 |
| Enterprise | 1000 | 50,000 | 500 | 200 |
Core Endpoints:
POST /trust_scan– AI-powered URL safety scan
Input:{ "url": "https://example.com" }
Returns:safety_score,safety_label,confidence_level,phishing_category,impersonated_brand,detected_threats,risk_factors,details, anduser_action_recommendation.POST /ask_the_page– Answer questions about a web page
Input:{ "url": "...", "question": "..." }
Returns: Grounded AI answer with metadata.POST /load_sitemap– Bulk page ingestion from sitemap.xml
Input:{ "sitemap_url": "...", "include_paths": [...], "exclude_paths": [...] }
Returns: Array of structured page data.POST /site_crawl– Depth-first crawl of a website
Input:
{
"base_url": "https://example.com",
"page_limit": 100,
"crawl_depth": 3,
"include_subdomains": false,
"follow_external_links": false,
"include_paths": ["/blog/", "/docs/"],
"exclude_paths": ["/login", "/checkout"]
}
Returns: Array of crawled pages in clean, structured format.
Each response includes a usage object with monthly consumption and remaining quota.
Demo
/trust_scan API
curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/trust_scan \
-H "x-api-key: sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{"url": "https://paypal-security-update-2025.com/login"}'
Response (simplified):
{
"success": true,
"data": {
"safety_score": 0.08,
"safety_label": "Danger",
"confidence_level": "high",
"phishing_category": "financial",
"impersonated_brand": "PayPal",
"detected_threats": [
"URGENT_ACTION_REQUIRED",
"FAKE_LOGIN_FORM"
],
"details": "This page mimics PayPal's login interface and uses urgency tactics to steal credentials.",
"user_action_recommendation": "Do not enter any information. Close immediately."
},
"usage": {
"month": "2025-12",
"used": 12,
"limit": 100,
"remaining": 88
}
}
/ask_the_page API
Ask a direct question about a webpage and receive an AI-generated explanation based on the page’s content and domain signals.
Example: Verify a banking login page
curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/ask_the_page \
-H "x-api-key: sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://bankofamerica-secure-login-2025.com",
"question": "Is this the real Bank of America login page?"
}'
Response (simplified):
{
"success": true,
"data": {
"url": "https://bankofamerica-secure-login-2025.com",
"question": "Is this the real Bank of America login page?",
"answer": "No, this is not a legitimate Bank of America page. The domain is not owned by Bank of America and uses urgency-driven login prompts commonly associated with phishing attacks.",
"page_title": "Bank of America Secure Login"
},
"usage": {
"month": "2025-12",
"used": 13,
"limit": 100,
"remaining": 87
}
}
/crawl_webpage API
Fetch and extract structured data from any public webpage.
This endpoint is useful for content analysis, AI training, indexing, or downstream security scans.
Example: Crawl a webpage
curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/crawl_webpage \
-H "x-api-key: sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'
Response (simplified):
{
"success": true,
"data": {
"url": "https://example.com",
"raw_html": "<!doctype html>...</html>",
"clean_text": "This domain is for use in documentation examples without needing permission. Avoid use in operations. Learn more",
"metadata": {
"title": "Example Domain",
"description": null,
"keywords": null,
"canonical": null,
"language": "en"
},
"headings": [
{
"level": "h1",
"text": "Example Domain"
}
],
"links": [
{
"url": "https://iana.org/domains/example",
"text": "Learn more",
"is_external": true
}
],
"structured_page_json": {
"title": "Example Domain",
"url": "https://example.com",
"language": "en",
"clean_text": "This domain is for use in documentation examples without needing permission. Avoid use in operations. Learn more",
"metadata": {
"title": "Example Domain",
"description": null,
"keywords": null,
"canonical": null,
"language": "en"
},
"headings": [
{
"level": "h1",
"text": "Example Domain"
}
],
"links": [
{
"url": "https://iana.org/domains/example",
"text": "Learn more",
"is_external": true
}
]
}
},
"usage": {
"month": "2025-12",
"used": 24,
"limit": 100,
"remaining": 76
}
}
/site_crawl API
Crawl an entire website (or section of it) and return structured content from multiple pages in a single request.
Ideal for bulk analysis, AI training, indexing, and large-scale security scans.
Input Parameters
-
url– Starting URL for the crawl -
pageLimit– Maximum number of pages to crawl -
crawlDepth– How deep the crawler should follow internal links -
includeSubdomains– Whether to include subdomains -
followExternalLinks– Whether to crawl external domains -
includePaths– Optional allowlist of URL paths -
excludePaths– Optional blocklist of URL paths
Example: Crawl a small website
curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/site_crawl \
-H "x-api-key: sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"pageLimit": 3,
"crawlDepth": 1,
"includeSubdomains": false,
"followExternalLinks": false,
"includePaths": [],
"excludePaths": []
}'
Response (simplified):
{
"success": true,
"message": "Crawl completed successfully",
"crawl_id": 6,
"pages_crawled": 2,
"formattedData": [
{
"url": "https://httpbin.org/forms/post",
"title": null,
"content": "Customer name: Telephone: E-mail address: Pizza Size Small Medium Large...",
"metadata": {
"title": null,
"description": null,
"keywords": null,
"canonical": null,
"language": null
},
"headings": []
}
],
"crawledData": [
{
"url": "https://httpbin.org",
"raw_html": "<!DOCTYPE html>...</html>",
"clean_text": "A simple HTTP Request & Response Service.",
"metadata": {
"title": "httpbin.org",
"description": null,
"keywords": null,
"canonical": null,
"language": "en"
},
"headings": [
{
"level": "h2",
"text": "httpbin.org"
}
],
"links": [
{
"url": "https://github.com/requests/httpbin",
"is_external": true
}
]
}
],
"usage": {
"current": 49,
"limit": 50
}
}
/load_sitemap API
Load and process an XML sitemap, extract valid URLs, and crawl only the pages you care about using path filters.
Ideal for bulk ingestion, SEO analysis, AI training, and large-scale monitoring without full site crawling.
Input Parameters
-
sitemap_url– Full URL to the sitemap XML -
include_paths– Optional list of URL path prefixes to allow -
exclude_paths– Optional list of URL path prefixes to block
include_pathsandexclude_pathswork together to precisely control which sitemap URLs are processed.
Example: Load and filter a sitemap
curl -X POST https://xmmh-djbw-xefx.n7e.xano.io/api:x9tl6bvx/load_sitemap \
-H "x-api-key: sk_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"sitemap_url": "https://octopus.do/sitemap.xml",
"include_paths": [
"/sitemap/changelog/"
],
"exclude_paths": []
}'
Response (simplified)
{
"success": true,
"message": "Sitemap processed successfully",
"sitemap_id": 4,
"data": {
"message": "Sitemap processed. Found 1 valid pages.",
"formattedData": [
{
"url": "https://octopus.do/sitemap/changelog",
"title": "Check out our Updates and Roadmap | Octopus.do",
"content": "November 28, 2025 Feature Export to Excel... Octopus core refactoring...",
"metadata": {
"title": "Check out our Updates and Roadmap | Octopus.do",
"description": "Our changelog includes new product features and development updates.",
"keywords": null,
"canonical": null,
"language": "en"
},
"headings": [
{ "level": "h1", "text": "Changelog" },
{ "level": "h2", "text": "Export to Excel" },
{ "level": "h2", "text": "Introducing Sitemap AI assistant BETA" }
]
}
],
"raw": [
{
"url": "https://octopus.do/sitemap/changelog",
"raw_html": "<!DOCTYPE html>...</html>",
"clean_text": "November 28, 2025 Feature Export to Excel...",
"metadata": {
"title": "Check out our Updates and Roadmap | Octopus.do",
"description": "Our changelog includes new product features and development updates.",
"keywords": null,
"canonical": null,
"language": "en"
},
"headings": [
{ "level": "h1", "text": "Changelog" }
],
"links": [
{
"url": "https://x.com/octopusdoHQ",
"text": "Follow for updates",
"is_external": true
}
]
}
]
},
"usage": {
"current": 70,
"limit": 70,
"remaining": 0
}
}
| Endpoint | Purpose |
|---|---|
/crawl_webpage |
Crawl and extract structured data from a single webpage |
/site_crawl |
Crawl multiple pages across a website with depth and path controls |
/load_sitemap |
Load and process URLs from an XML sitemap with smart filtering |
/trust_scan |
Detect phishing, scams, and impersonation signals |
/ask_the_page |
Explain page legitimacy and risks in clear, human-readable language |
Account & Key Management Endpoints
These endpoints allow developers to programmatically manage their access keys.
5. `GET /get_api_keys` – List Active Keys
Retrieves a paginated list of API keys for the authenticated user.
- Input: None (uses Auth Token).
-
Returns: A list of keys with values masked (e.g.,
sk_123...890), creation date, and plan type.
6. `POST /generate_api_key` – Create New Key
Generates a new secure API key (24 random bytes, hex-encoded).
-
Input:
{ "name": "Production App" }(Optional label). -
Returns: The full API key string (shown only once),
id, andname. -
Note: This action is logged in the
event_logtable for security auditing.
7. `DELETE /delete_api_key` – Revoke Key
Permanently deletes an API key.
-
Input:
{ "key_id": 15 } - Returns: Success message.
- Security: Validates that the key belongs to the authenticated user before deletion.
The AI Prompt I Used
“Build a production-ready backend for an application called LucideCrawl.
LucideCrawl is a Web Crawling + Content Extraction API that allows developers to crawl websites, extract clean data, and optionally run AI analysis using Google Gemini.
The backend must expose a secure public API with proper documentation, authentication, rate limiting, and usage logs.
Focus on practicality and simplicity—avoid overly complex ML/RAG pipelines.”
The AI produced strong foundations that I then refined for production readiness.
How I Refined the AI-Generated Code
I used the AI-generated Xano backend as a strong starting point, then refined and extended it to meet the needs of a production-ready, public API for website scanning and phishing detection. My goal was to align the generated structure with real-world usage patterns such as API access control, usage tracking, detailed scan results, and long-term data integrity.
Below are the key transformations I made, with examples where applicable.
1. API Key Management and Access Control
Initial state (AI-generated):
- Basic API key support tied to a user
Improvements made:
-
Introduced a dedicated
api_keytable that supports:- Multiple API keys per user
- A unique constraint on the
keyfield - An optional
namefield so users can label keys (e.g., “Production”, “Testing”) - A
plan_typefield to support tiered access (free / pro)
api_key
- id
- user_id
- key (unique)
- name (optional)
- plan_type
- created_at
This makes API access more flexible, secure, and easier to manage for end users.
2. Monthly Usage Aggregation
Initial state:
- Request-level usage was logged, but not aggregated
Improvements made:
- Added an
api_monthly_usagetable to track request counts per user per month - Enforced a unique constraint on
(user_id, month)to guarantee one row per user per billing cycle
api_monthly_usage
- user_id
- month (YYYY-MM)
- call_count
This structure enables efficient rate limiting, analytics, and future billing logic without relying on expensive log scans.
3. Usage Log Enhancements
Initial state:
- Core usage logging was already present
Improvements made:
- Extended the existing
usage_logtable by adding aurlfield - Preserved all existing fields and historical data
This provides better visibility into how each endpoint is used and which URLs are being analyzed, improving traceability and auditing.
4. Detailed Scan Result Storage
Initial state:
- Scan results were stored at a high level
Improvements made:
- Expanded the
scan_historytable to store detailed analysis returned by the external scanning API - Added only missing columns to ensure idempotent and non-destructive schema updates
New fields include:
- Safety score and label
- Confidence level
- Phishing category
- Detected threats and risk factors (JSON)
- User-facing explanations and recommendations
- External verification results
- Scan timestamp
This allows scan results to be both machine-readable and human-friendly.
5. Question & Answer History Tracking
New addition:
-
Created an
ask_historytable to persist user questions, AI responses, and related metadata such as:- Model used
- Scraped text length
- Timestamp of the request
This provides a clear audit trail and supports analytics, debugging, and future model evaluation.
6. Crawl and Sitemap History Separation
Improvements made:
- Added a
crawl_historytable to store website crawl configurations and outcomes - Added a
sitemap_historytable to track sitemap processing independently
By separating crawl and sitemap data, the backend remains clean, easier to query, and more adaptable as crawling features expand.
7. Schema Evolution and Safety
All schema changes were applied in a way that:
- Adds new capabilities without breaking existing functionality
- Avoids recreating tables or modifying existing columns
- Preserves historical data
This approach ensures smooth iteration while maintaining system stability.
The AI-generated backend provided a solid foundation in Xano. By thoughtfully extending it with additional tables, constraints, and fields, I tailored the system to support real-world API usage, detailed scan analysis, and long-term scalability—while staying fully aligned with Xano best practices.
Key improvements:
-
Robust header handling: Case-insensitive
x-api-keydetection - Per-endpoint rate limiting: Separate usage tracking for each endpoint
- Atomic usage counting: Prevents accidental overcharging on failed requests
- Comprehensive history tables: For auditing and dashboard support
- Plan-based dynamic limits: Free, Pro, Enterprise
- Long operation timeouts: Up to 600s for deep crawls
-
Consistent response format: Always
{ success, data, usage }
These refinements ensure fairness, transparency, and a developer-friendly API.
My Overall Experience Using Xano
Xano made it possible to build a fully featured, secure, public-facing API in a relatively short amount of time.
One of the most helpful aspects was the visual canvas and function stack, which clearly shows how data flows through each API—from authentication, to rate limiting, processing, logging, and response handling. While I didn’t have enough time to explore the visual builder as deeply as I would have liked due to the deadline, it was still very useful for understanding and structuring complex logic.
I relied mostly on XanoScript for implementation. As someone still gaining experience with it, I encountered several syntax and structure errors along the way. However, the Logic Assistant was extremely helpful in resolving these issues. It not only identified errors quickly but often suggested cleaner, more efficient ways to write the logic, which significantly improved the quality of my code and helped me learn faster.
Overall, Xano struck a great balance between visual tooling and low-level control. Even when working primarily with XanoScript, the platform provided enough guidance and tooling to move quickly, fix mistakes, and ship with confidence.
LucideCrawl is now live, helping developers build safer, smarter web-powered applications.
Perfect — here’s a tighter, DEV.to–editorial–ready version.
It’s more concise, more confident, and reads like a featured challenge submission, not documentation.
You can paste this as-is under ## Demo.
Demo
To showcase LucideCrawl in a real-world scenario, I built Trust Shield — a lightweight Chrome extension that uses the POST /trust_scan endpoint to protect users from phishing in real time.
Trust Shield turns LucideCrawl from a backend API into always-on, user-facing protection that runs quietly in the background while users browse.
What Trust Shield Does
On every page visit:
- The current URL is automatically scanned using LucideCrawl
-
The extension badge updates instantly:
- 🛡️ Green — Safe
- ⚠️ Red — Potentially dangerous
-
Risky sites trigger a Chrome notification with:
- Safety label
- Risk score
- Clear next-step guidance
-
Clicking the extension icon opens a popup with:
- Safety score and confidence
- Detected threats and risk factors
- Impersonated brand (if applicable)
- Human-readable explanation of the risk
The user never has to think about scanning, protection happens by default.
Source Code
Open source:
trust-shield
Try It Yourself (≈ 2 Minutes)
1. Get an API key
Create a free LucideCrawl account via the POST /auth/signup endpoint using any REST client (Postman, cURL, etc.).
Copy the generated sk_... API key (shown once).
2. Install the Chrome extension
- Clone or download: trust-shield
- Open
chrome://extensions/ - Enable Developer mode
- Click Load unpacked and select the project folder
3. Connect your API key
- Click the Trust Shield icon
- Paste your API key
- Click Save
4. Test it
- Visit a safe site → badge turns green 🛡️
- Visit a phishing test site → badge turns red ⚠️ and a warning appears
No refreshes, no manual scans — protection is automatic.





Top comments (2)
We loved your post so we shared it on social.
Keep up the great work!
Thank you so much for the share! Hello to everyone seeing this👋, we’re just getting started.