Audio Agent

The Audio Agent provides automated audio transcription and intelligent question-answering capabilities. It processes audio files, generates transcriptions using AI-powered speech recognition, and enables users to query the audio content using natural language.

Base URL

/api/agents/audio_agent

Authentication

All endpoints require authentication. Sign up to the https://cloud.nextneural.ai to get your API key.

How It Works

The Audio Agent performs comprehensive audio processing:

Audio Transcription: Converts audio files to text using AI-powered speech recognition
Intelligent Search: Finds relevant content from your audio based on natural language queries
Answer Generation: Provides contextual answers to questions about your audio content
Conversation Management: Maintains chat history and conversation context for seamless interactions

Endpoints

1. Health Check

Check if the Audio Agent service is running.

Endpoint: GET /health

Authentication: None required

Response:

{
  "status": "healthy",
  "service": "AUDIO Agent"
}

2. Process Audio File

Process an audio file to generate transcription. The file path is automatically retrieved from the knowledge base document.

Endpoint: POST /process-audio

Authentication: Required (scope: agent:audio)

Request Body:

{
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

kb_document_id (required, string): UUID of the knowledge base document containing the audio file

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

Response (New Processing):

{
  "message": "Audio file processed and stored successfully.",
  "already_processed": false,
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "transcript": "This is the full transcription of the audio file..."
}

Response (Already Processed):

{
  "message": "Audio already processed",
  "already_processed": true,
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "transcript": "This is the full transcription of the audio file...",
  "processed_date": "2025-01-14T10:30:00"
}

Notes:

The audio file path is automatically fetched from the knowledge base document
Document ownership is verified before processing
If audio was already processed, returns cached result without re-processing
High-accuracy AI transcription ensures quality results

3. Re-parse Audio File

Re-process an already processed audio file by deleting old data and re-transcribing.

Endpoint: POST /reparse-audio

Authentication: Required (scope: agent:audio)

Request Body:

{
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

kb_document_id (required, string): UUID of the knowledge base document containing the audio file

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/reparse-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

Response:

{
  "message": "Audio file re-parsed and stored successfully.",
  "reparsed": true,
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "transcript": "This is the newly generated transcription..."
}

Notes:

Deletes existing audio record and re-processes from scratch
Useful when transcription quality was poor or audio was updated

4. Ask Question (RAG Query)

Query the audio content using natural language. The system retrieves relevant chunks and generates contextual answers.

Endpoint: POST /ask_audio

Authentication: Required

Request Body:

{
  "question": "What were the main topics discussed in the meeting?",
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

question (required, string): Natural language question about the audio content
kb_document_id (required, string): UUID of the knowledge base document to query

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were the main topics discussed?",
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

Response:

{
  "answer": "Based on the audio transcription, the main topics discussed were: 1) Project timeline and milestones, 2) Budget allocation for Q2, 3) Team resource planning, and 4) Client feedback on the prototype.",
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Supported Query Types:

Specific questions: "What is the project deadline?"
Summary requests: "Give me a summary of the audio"
Full transcript: "Show me the complete transcript"
Key highlights: "What are the important points?"
Topic exploration: "What topics are covered?"

Notes:

Intelligent search finds the most relevant content from your audio
Generates accurate, contextual answers
Handles greetings and casual conversation naturally

5. Create Conversation

Create a new conversation session for chat history tracking.

Endpoint: POST /conversations/create

Authentication: Required

Request Body:

{
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Discussion"
}

Parameters:

kb_document_id (optional, string): UUID of the knowledge base document
title (optional, string): Custom title for the conversation

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
    "title": "Q4 Planning Meeting"
  }'

Response:

{
  "id": 789,
  "user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Q4 Planning Meeting",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:30:00"
}

Notes:

If kb_document_id is provided without audio_id, the system finds the most recent audio for that document
Document ownership is verified
Conversations track chat history and context

6. Add Message to Conversation

Add a user or assistant message to an existing conversation.

Endpoint: POST /conversations/{conversation_id}/messages

Authentication: Required

Path Parameters:

conversation_id (required, integer): ID of the conversation

Request Body:

{
  "conversation_id": 789,
  "role": "user",
  "content": "What were the action items?"
}

Parameters:

conversation_id (required, integer): ID of the conversation
role (required, string): Either "user" or "assistant"
content (required, string): Message content

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789/messages" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "conversation_id": 789,
    "role": "user",
    "content": "What were the action items?"
  }'

Response:

{
  "id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
  "conversation_id": 789,
  "role": "user",
  "content": "What were the action items?",
  "timestamp": "2025-01-14T10:35:00"
}

Notes:

Conversation must belong to the authenticated user
Updates conversation's last_message_at timestamp
Messages are ordered by timestamp

7. Get Conversation History

Retrieve all conversations for the authenticated user.

Endpoint: GET /conversations/history

Authentication: Required

Query Parameters:

limit (optional, integer, default: 100): Maximum number of conversations to return

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history?limit=50" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

[
  {
    "id": 789,
    "fileName": "meeting_recording.mp3",
    "analyzedAt": "2025-01-14T10:35:00",
    "duration": "5 messages",
    "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000"
  },
  {
    "id": 788,
    "fileName": "Q3 Review",
    "analyzedAt": "2025-01-13T15:20:00",
    "duration": "12 messages",
    "kbDocumentId": "660e8400-e29b-41d4-a716-446655440001"
  }
]

Notes:

Returns conversations in reverse chronological order (newest first)
Shows message count as "duration"
Displays audio filename or KB document title
Only returns user's own conversations
Internal IDs are not exposed for security

8. Get Specific Conversation

Retrieve a specific conversation with all its messages.

Endpoint: GET /conversations/{conversation_id}

Authentication: Required

Path Parameters:

conversation_id (required, integer): ID of the conversation to retrieve

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "id": 789,
  "user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Q4 Planning Meeting",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:35:00",
  "audio": {
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
    "file_name": "audio_550e8400-e29b-41d4-a716-446655440000"
  },
  "messages": [
    {
      "id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
      "role": "user",
      "content": "What were the action items?",
      "timestamp": "2025-01-14T10:35:00"
    },
    {
      "id": "b2c3d4e5-f6a7-6789-01bc-def123456789",
      "role": "assistant",
      "content": "The action items mentioned were...",
      "timestamp": "2025-01-14T10:35:05"
    }
  ]
}

Notes:

Only the conversation owner can access it
Returns 404 if conversation doesn't exist or access denied
Messages are ordered chronologically

9. Delete Conversation

Delete a conversation and all its messages.

Endpoint: DELETE /conversations/{conversation_id}

Authentication: Required

Path Parameters:

conversation_id (required, integer): ID of the conversation to delete

Request Example:

curl -X DELETE "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "success": true,
  "message": "Conversation deleted successfully"
}

Notes:

Only the conversation owner can delete it
All messages are cascade deleted
Returns 404 if conversation doesn't exist or access denied

Data Models

AudioInfo Structure

{
  "id": 456,
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "file_name": "audio_550e8400-e29b-41d4-a716-446655440000",
  "file_size": 5242880,
  "total_character": 15000,
  "full_text": "Complete transcription text...",
  "date_time": "2025-01-14T10:30:00",
  "user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Field Descriptions:

id: Unique identifier for the audio record (integer)
kb_document_id: UUID reference to knowledge base document (string)
file_name: Internal audio filename (string)
file_size: Size of transcript file in bytes (integer)
total_character: Total character count in transcript (integer)
full_text: Complete transcription text (string)
date_time: Processing timestamp (ISO 8601 string)
user_id: UUID of the owner (string)

Conversation Structure

{
  "id": 789,
  "user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Discussion",
  "started_at": "2025-01-14T10:30:00",
  "last_message_at": "2025-01-14T10:35:00",
  "is_active": true
}

Message Structure

{
  "id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
  "conversation_id": 789,
  "role": "user",
  "content": "What were the action items?",
  "timestamp": "2025-01-14T10:35:00"
}

Error Responses

All endpoints may return the following error responses:

400 Bad Request:

{
  "detail": "kb_document_id is required."
}

400 Bad Request (Invalid Document):

{
  "detail": "Document with UUID 550e8400-e29b-41d4-a716-446655440000 does not exist in knowledgebase"
}

400 Bad Request (Access Denied):

{
  "detail": "Access denied. Document 550e8400-e29b-41d4-a716-446655440000 does not belong to user a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

404 Not Found:

{
  "detail": "File path not found for KB document."
}

404 Not Found (Conversation):

{
  "detail": "Conversation not found or access denied"
}

500 Internal Server Error:

{
  "detail": "Transcription failed: [error message]"
}

Best Practices

Audio Quality Requirements

Format: MP3, WAV, M4A, or other common audio formats
Duration: Any length (longer files take more time to process)
Audio Quality: Clear speech, minimal background noise
Language: English (primary), with support for multiple languages
Bitrate: 128 kbps or higher recommended

Recording Guidelines

Use a good quality microphone
Record in a quiet environment
Speak clearly and at moderate pace
Avoid overlapping speech in multi-speaker scenarios
Keep audio files under 100MB for optimal processing

Query Best Practices

Specific Questions: Ask direct questions for precise answers
Summary Requests: Use keywords like "summary", "overview", "main points"
Full Transcript: Request "complete transcript" or "everything said"
Contextual Queries: Reference specific topics or speakers when possible
Follow-up Questions: Use conversations to maintain context

Integration Workflow

# 1. Upload audio to media directory (via your file upload system)
# Audio file is linked to a knowledge base document with UUID

# 2. Process the audio file
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

# 3. Create a conversation
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
    "title": "Meeting Analysis"
  }'

# 4. Ask questions about the audio
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were the main action items?",
    "kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

# 5. Retrieve conversation history
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history" \
  -H "Authorization: Bearer YOUR_TOKEN"

Performance Considerations

Transcription Time:
- Short audio (< 5 min): 10-30 seconds
- Medium audio (5-20 min): 30-90 seconds
- Long audio (> 20 min): 1-5 minutes
Query Response Time: Typically 1-3 seconds for standard queries
Storage: Each minute of audio generates approximately 150-200 words
Caching: Processed audio is cached; use /reparse-audio to force re-processing

Security Features

User Isolation: All queries are private to your account
Document Ownership: Only you can access your documents
Authentication: All endpoints require valid API tokens
Conversation Privacy: Your conversations are completely private
UUID-based Identification: Secure, non-sequential identifiers for documents and messages
No Internal ID Exposure: Internal database IDs are never exposed in API responses

Troubleshooting

Transcription Issues

Problem: Poor transcription quality

Cause: Background noise, unclear speech, low audio quality
Solution: Re-record with better audio quality, use /reparse-audio endpoint

Problem: Transcription failed

Cause: Unsupported audio format, corrupted file, API issues
Solution: Convert to MP3/WAV, verify file integrity, check API key configuration

Query Issues

Problem: "No relevant information found"

Cause: Query doesn't match audio content, audio not processed
Solution: Verify audio was processed, rephrase query, ask for summary first

Problem: Incomplete answers

Cause: Query doesn't match available content well
Solution: Ask more specific questions, request full transcript

Performance Issues

Problem: Slow transcription

Cause: Large audio file
Solution: Split large files, retry if timeout occurs

Problem: Slow query responses

Cause: Complex query or large audio file
Solution: Use more specific queries

Limitations

Language Support: Optimized for English; other languages may have reduced accuracy
Audio Length: Very long files (> 2 hours) may have processing delays

Future Enhancements

Speaker diarization (identify different speakers)
Multi-language support with automatic detection
Real-time streaming transcription
Audio quality analysis and enhancement
Custom vocabulary and domain-specific training
Integration with video files (extract audio track)

Base URL​

Authentication​

How It Works​

Endpoints​

1. Health Check​

2. Process Audio File​

3. Re-parse Audio File​

4. Ask Question (RAG Query)​

5. Create Conversation​

6. Add Message to Conversation​

7. Get Conversation History​

8. Get Specific Conversation​

9. Delete Conversation​

Data Models​

AudioInfo Structure​

Conversation Structure​

Message Structure​

Error Responses​

Best Practices​

Audio Quality Requirements​

Recording Guidelines​

Query Best Practices​

Integration Workflow​

Performance Considerations​

Security Features​

Troubleshooting​

Transcription Issues​

Query Issues​

Performance Issues​

Limitations​

Future Enhancements​

Base URL

Authentication

How It Works

Endpoints

1. Health Check

2. Process Audio File

3. Re-parse Audio File

4. Ask Question (RAG Query)

5. Create Conversation

6. Add Message to Conversation

7. Get Conversation History

8. Get Specific Conversation

9. Delete Conversation

Data Models

AudioInfo Structure

Conversation Structure

Message Structure

Error Responses

Best Practices

Audio Quality Requirements

Recording Guidelines

Query Best Practices

Integration Workflow

Performance Considerations

Security Features

Troubleshooting

Transcription Issues

Query Issues

Performance Issues

Limitations

Future Enhancements