Natural language over security data.

Security teams shouldn't need a SQL expert to query their scan results. The Query model (internally named NL-PGSQL) enables anyone on the team to ask questions about findings, trends, and exposure in natural language โ€” and receive a precise PostgreSQL query they can run against the AquilaX data store.

This powers the analytics layer of Securitron: when engineers ask questions in the chat interface about scan trends or repository risk rankings, the QnA model hands off data retrieval tasks to NL-PGSQL, which generates the appropriate query and returns structured results.

Model ID: AquilaX-AI/NL-PGSQL โ€” available on HuggingFace. Base architecture: google/flan-t5-base. Task: text-to-SQL (natural language โ†’ PostgreSQL). Trained on AquilaX's security data schema.

Architecture.

NL-PGSQL is fine-tuned on google/flan-t5-base โ€” a 250M parameter encoder-decoder model from Google pre-trained on 1,800+ NLP tasks via instruction tuning. FLAN-T5's instruction-following capability makes it well-suited for constrained text generation tasks like SQL synthesis, where the output must follow strict grammatical rules.

Training details:

  • Base model: google/flan-t5-base (250M parameters, encoder-decoder)
  • Task: Seq2Seq โ€” natural language question โ†’ PostgreSQL statement
  • Training split: 90% train / 10% validation
  • Evaluation metric: SacreBLEU โ€” measures token-level overlap between generated and reference SQL
  • Training data: Paired NL/SQL examples drawn from AquilaX's security schema (findings, scans, repositories, organisations, severity distributions)

5-step inference pipeline.

Every natural language query goes through a standardised 5-step pipeline before the SQL statement is returned:

1

Preprocess

Normalise the input โ€” strip extra whitespace, lower-case, and resolve abbreviations (e.g. "crit" โ†’ "critical", "repo" โ†’ "repository").

2

Add task prefix

Prepend the model's task instruction: "Translate the following text to PGSQL: ". This activates the fine-tuned SQL generation behaviour.

3

Tokenise

Encode the prefixed input with the FLAN-T5 tokeniser. The maximum input length is 512 tokens; longer inputs are truncated with a warning.

4

Generate

Run encoder-decoder inference with beam search (beam width 4). The decoder generates SQL tokens one at a time until an EOS token or max length is reached.

5

Decode

Convert output token IDs back to text with skip_special_tokens=True. Post-process to ensure valid SQL syntax โ€” validate brackets, aliases, and table name references.

Query examples.

Natural Language
Show me all critical findings from the last 7 days, ordered by repository
SELECT f.*, r.name AS repo_name FROM findings f JOIN repositories r ON f.repo_id = r.id WHERE f.severity = 'CRITICAL' AND f.created_at >= NOW() - INTERVAL '7 days' ORDER BY r.name ASC;
Natural Language
Which repositories have the most unresolved high severity findings?
SELECT r.name, COUNT(f.id) AS finding_count FROM findings f JOIN repositories r ON f.repo_id = r.id WHERE f.severity = 'HIGH' AND f.status != 'resolved' GROUP BY r.name ORDER BY finding_count DESC LIMIT 10;
Natural Language
What is the average security score across all repositories in my organisation?
SELECT AVG(security_score) AS avg_score FROM repositories WHERE org_id = $1 AND last_scanned_at IS NOT NULL;
Natural Language
Count findings by CWE type for the past month
SELECT cwe_id, cwe_name, COUNT(*) AS count FROM findings WHERE created_at >= NOW() - INTERVAL '30 days' GROUP BY cwe_id, cwe_name ORDER BY count DESC;

Running inference.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "AquilaX-AI/NL-PGSQL"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
model.eval()

def nl_to_sql(question: str) -> str:
    # Step 1: Preprocess and add task prefix
    prefixed = f"Translate the following text to PGSQL: {question.strip()}"

    # Step 2: Tokenise
    inputs = tokenizer(
        prefixed,
        return_tensors="pt",
        max_length=512,
        truncation=True
    )

    # Step 3: Generate with beam search
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        num_beams=4,
        early_stopping=True
    )

    # Step 4: Decode
    sql = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return sql

# Example usage
question = "Which repositories have critical SQL injection findings?"
sql_query = nl_to_sql(question)
print(sql_query)
# Output: SELECT r.name FROM repositories r JOIN findings f ON f.repo_id = r.id
#         WHERE f.cwe_id = 89 AND f.severity = 'CRITICAL';

Evaluation and limitations.

The model is evaluated on a held-out 10% validation set using SacreBLEU, which measures how closely the generated SQL matches reference queries at the token level. A SacreBLEU score above 40 is generally considered good for text-to-SQL tasks.

Scope limitation: NL-PGSQL is trained specifically on AquilaX's security data schema. It produces reliable queries for findings, repositories, scans, organisations, and severity data โ€” but should not be used as a general-purpose NL-to-SQL translator for arbitrary database schemas.

Generated queries should be reviewed before execution in production environments. The AquilaX platform runs all generated queries through a validation layer that checks for syntax errors, verifies table and column references, and enforces row-level security based on the requesting user's organisation scope.

"The goal was never to replace SQL developers โ€” it was to let a Head of Engineering ask 'how many critical findings were introduced this sprint?' without filing a ticket to the data team."