Skip to main content

Audio Agent

The Audio Agent provides automated audio transcription and intelligent question-answering capabilities. It processes audio files, generates transcriptions using AI-powered speech recognition, and enables users to query the audio content using natural language.

Base URL​

/api/agents/audio_agent

Authentication​

All endpoints require authentication. Sign up to the https://cloud.nextneural.ai to get your API key.

How It Works​

The Audio Agent performs comprehensive audio processing:

  1. Audio Transcription: Converts audio files to text using AI-powered speech recognition
  2. Intelligent Search: Finds relevant content from your audio based on natural language queries
  3. Answer Generation: Provides contextual answers to questions about your audio content
  4. Conversation Management: Maintains chat history and conversation context for seamless interactions

Endpoints​

1. Health Check​

Check if the Audio Agent service is running.

Endpoint: GET /health

Authentication: None required

Response:

{
"status": "healthy",
"service": "AUDIO Agent"
}

2. Process Audio File​

Process an audio file to generate transcription. The file path is automatically retrieved from the knowledge base document.

Endpoint: POST /process-audio

Authentication: Required (scope: agent:audio)

Request Body:

{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

  • kb_document_id (required, string): UUID of the knowledge base document containing the audio file

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

Response (New Processing):

{
"message": "Audio file processed and stored successfully.",
"already_processed": false,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the full transcription of the audio file..."
}

Response (Already Processed):

{
"message": "Audio already processed",
"already_processed": true,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the full transcription of the audio file...",
"processed_date": "2025-01-14T10:30:00"
}

Notes:

  • The audio file path is automatically fetched from the knowledge base document
  • Document ownership is verified before processing
  • If audio was already processed, returns cached result without re-processing
  • High-accuracy AI transcription ensures quality results

3. Re-parse Audio File​

Re-process an already processed audio file by deleting old data and re-transcribing.

Endpoint: POST /reparse-audio

Authentication: Required (scope: agent:audio)

Request Body:

{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

  • kb_document_id (required, string): UUID of the knowledge base document containing the audio file

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/reparse-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

Response:

{
"message": "Audio file re-parsed and stored successfully.",
"reparsed": true,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the newly generated transcription..."
}

Notes:

  • Deletes existing audio record and re-processes from scratch
  • Useful when transcription quality was poor or audio was updated

4. Ask Question (RAG Query)​

Query the audio content using natural language. The system retrieves relevant chunks and generates contextual answers.

Endpoint: POST /ask_audio

Authentication: Required

Request Body:

{
"question": "What were the main topics discussed in the meeting?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Parameters:

  • question (required, string): Natural language question about the audio content
  • kb_document_id (required, string): UUID of the knowledge base document to query

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main topics discussed?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

Response:

{
"answer": "Based on the audio transcription, the main topics discussed were: 1) Project timeline and milestones, 2) Budget allocation for Q2, 3) Team resource planning, and 4) Client feedback on the prototype.",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}

Supported Query Types:

  • Specific questions: "What is the project deadline?"
  • Summary requests: "Give me a summary of the audio"
  • Full transcript: "Show me the complete transcript"
  • Key highlights: "What are the important points?"
  • Topic exploration: "What topics are covered?"

Notes:

  • Intelligent search finds the most relevant content from your audio
  • Generates accurate, contextual answers
  • Handles greetings and casual conversation naturally

5. Create Conversation​

Create a new conversation session for chat history tracking.

Endpoint: POST /conversations/create

Authentication: Required

Request Body:

{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Discussion"
}

Parameters:

  • kb_document_id (optional, string): UUID of the knowledge base document
  • title (optional, string): Custom title for the conversation

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting"
}'

Response:

{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:30:00"
}

Notes:

  • If kb_document_id is provided without audio_id, the system finds the most recent audio for that document
  • Document ownership is verified
  • Conversations track chat history and context

6. Add Message to Conversation​

Add a user or assistant message to an existing conversation.

Endpoint: POST /conversations/{conversation_id}/messages

Authentication: Required

Path Parameters:

  • conversation_id (required, integer): ID of the conversation

Request Body:

{
"conversation_id": 789,
"role": "user",
"content": "What were the action items?"
}

Parameters:

  • conversation_id (required, integer): ID of the conversation
  • role (required, string): Either "user" or "assistant"
  • content (required, string): Message content

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"conversation_id": 789,
"role": "user",
"content": "What were the action items?"
}'

Response:

{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"conversation_id": 789,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}

Notes:

  • Conversation must belong to the authenticated user
  • Updates conversation's last_message_at timestamp
  • Messages are ordered by timestamp

7. Get Conversation History​

Retrieve all conversations for the authenticated user.

Endpoint: GET /conversations/history

Authentication: Required

Query Parameters:

  • limit (optional, integer, default: 100): Maximum number of conversations to return

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history?limit=50" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

[
{
"id": 789,
"fileName": "meeting_recording.mp3",
"analyzedAt": "2025-01-14T10:35:00",
"duration": "5 messages",
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000"
},
{
"id": 788,
"fileName": "Q3 Review",
"analyzedAt": "2025-01-13T15:20:00",
"duration": "12 messages",
"kbDocumentId": "660e8400-e29b-41d4-a716-446655440001"
}
]

Notes:

  • Returns conversations in reverse chronological order (newest first)
  • Shows message count as "duration"
  • Displays audio filename or KB document title
  • Only returns user's own conversations
  • Internal IDs are not exposed for security

8. Get Specific Conversation​

Retrieve a specific conversation with all its messages.

Endpoint: GET /conversations/{conversation_id}

Authentication: Required

Path Parameters:

  • conversation_id (required, integer): ID of the conversation to retrieve

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"audio": {
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"file_name": "audio_550e8400-e29b-41d4-a716-446655440000"
},
"messages": [
{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
},
{
"id": "b2c3d4e5-f6a7-6789-01bc-def123456789",
"role": "assistant",
"content": "The action items mentioned were...",
"timestamp": "2025-01-14T10:35:05"
}
]
}

Notes:

  • Only the conversation owner can access it
  • Returns 404 if conversation doesn't exist or access denied
  • Messages are ordered chronologically

9. Delete Conversation​

Delete a conversation and all its messages.

Endpoint: DELETE /conversations/{conversation_id}

Authentication: Required

Path Parameters:

  • conversation_id (required, integer): ID of the conversation to delete

Request Example:

curl -X DELETE "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"success": true,
"message": "Conversation deleted successfully"
}

Notes:

  • Only the conversation owner can delete it
  • All messages are cascade deleted
  • Returns 404 if conversation doesn't exist or access denied

Data Models​

AudioInfo Structure​

{
"id": 456,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"file_name": "audio_550e8400-e29b-41d4-a716-446655440000",
"file_size": 5242880,
"total_character": 15000,
"full_text": "Complete transcription text...",
"date_time": "2025-01-14T10:30:00",
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Field Descriptions:

  • id: Unique identifier for the audio record (integer)
  • kb_document_id: UUID reference to knowledge base document (string)
  • file_name: Internal audio filename (string)
  • file_size: Size of transcript file in bytes (integer)
  • total_character: Total character count in transcript (integer)
  • full_text: Complete transcription text (string)
  • date_time: Processing timestamp (ISO 8601 string)
  • user_id: UUID of the owner (string)

Conversation Structure​

{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Discussion",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"is_active": true
}

Message Structure​

{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"conversation_id": 789,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}

Error Responses​

All endpoints may return the following error responses:

400 Bad Request:

{
"detail": "kb_document_id is required."
}

400 Bad Request (Invalid Document):

{
"detail": "Document with UUID 550e8400-e29b-41d4-a716-446655440000 does not exist in knowledgebase"
}

400 Bad Request (Access Denied):

{
"detail": "Access denied. Document 550e8400-e29b-41d4-a716-446655440000 does not belong to user a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

404 Not Found:

{
"detail": "File path not found for KB document."
}

404 Not Found (Conversation):

{
"detail": "Conversation not found or access denied"
}

500 Internal Server Error:

{
"detail": "Transcription failed: [error message]"
}

Best Practices​

Audio Quality Requirements​

  • Format: MP3, WAV, M4A, or other common audio formats
  • Duration: Any length (longer files take more time to process)
  • Audio Quality: Clear speech, minimal background noise
  • Language: English (primary), with support for multiple languages
  • Bitrate: 128 kbps or higher recommended

Recording Guidelines​

  1. Use a good quality microphone
  2. Record in a quiet environment
  3. Speak clearly and at moderate pace
  4. Avoid overlapping speech in multi-speaker scenarios
  5. Keep audio files under 100MB for optimal processing

Query Best Practices​

  1. Specific Questions: Ask direct questions for precise answers
  2. Summary Requests: Use keywords like "summary", "overview", "main points"
  3. Full Transcript: Request "complete transcript" or "everything said"
  4. Contextual Queries: Reference specific topics or speakers when possible
  5. Follow-up Questions: Use conversations to maintain context

Integration Workflow​

# 1. Upload audio to media directory (via your file upload system)
# Audio file is linked to a knowledge base document with UUID

# 2. Process the audio file
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

# 3. Create a conversation
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Analysis"
}'

# 4. Ask questions about the audio
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main action items?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

# 5. Retrieve conversation history
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history" \
-H "Authorization: Bearer YOUR_TOKEN"

Performance Considerations​

  • Transcription Time:
    • Short audio (< 5 min): 10-30 seconds
    • Medium audio (5-20 min): 30-90 seconds
    • Long audio (> 20 min): 1-5 minutes
  • Query Response Time: Typically 1-3 seconds for standard queries
  • Storage: Each minute of audio generates approximately 150-200 words
  • Caching: Processed audio is cached; use /reparse-audio to force re-processing

Security Features​

  • User Isolation: All queries are private to your account
  • Document Ownership: Only you can access your documents
  • Authentication: All endpoints require valid API tokens
  • Conversation Privacy: Your conversations are completely private
  • UUID-based Identification: Secure, non-sequential identifiers for documents and messages
  • No Internal ID Exposure: Internal database IDs are never exposed in API responses

Troubleshooting​

Transcription Issues​

Problem: Poor transcription quality

  • Cause: Background noise, unclear speech, low audio quality
  • Solution: Re-record with better audio quality, use /reparse-audio endpoint

Problem: Transcription failed

  • Cause: Unsupported audio format, corrupted file, API issues
  • Solution: Convert to MP3/WAV, verify file integrity, check API key configuration

Query Issues​

Problem: "No relevant information found"

  • Cause: Query doesn't match audio content, audio not processed
  • Solution: Verify audio was processed, rephrase query, ask for summary first

Problem: Incomplete answers

  • Cause: Query doesn't match available content well
  • Solution: Ask more specific questions, request full transcript

Performance Issues​

Problem: Slow transcription

  • Cause: Large audio file
  • Solution: Split large files, retry if timeout occurs

Problem: Slow query responses

  • Cause: Complex query or large audio file
  • Solution: Use more specific queries

Limitations​

  • Language Support: Optimized for English; other languages may have reduced accuracy
  • Audio Length: Very long files (> 2 hours) may have processing delays

Future Enhancements​

  • Speaker diarization (identify different speakers)
  • Multi-language support with automatic detection
  • Real-time streaming transcription
  • Audio quality analysis and enhancement
  • Custom vocabulary and domain-specific training
  • Integration with video files (extract audio track)