Audio Agent
The Audio Agent provides automated audio transcription and intelligent question-answering capabilities. It processes audio files, generates transcriptions using AI-powered speech recognition, and enables users to query the audio content using natural language.
Base URL​
/api/agents/audio_agent
Authentication​
All endpoints require authentication. Sign up to the https://cloud.nextneural.ai to get your API key.
How It Works​
The Audio Agent performs comprehensive audio processing:
- Audio Transcription: Converts audio files to text using AI-powered speech recognition
- Intelligent Search: Finds relevant content from your audio based on natural language queries
- Answer Generation: Provides contextual answers to questions about your audio content
- Conversation Management: Maintains chat history and conversation context for seamless interactions
Endpoints​
1. Health Check​
Check if the Audio Agent service is running.
Endpoint: GET /health
Authentication: None required
Response:
{
"status": "healthy",
"service": "AUDIO Agent"
}
2. Process Audio File​
Process an audio file to generate transcription. The file path is automatically retrieved from the knowledge base document.
Endpoint: POST /process-audio
Authentication: Required (scope: agent:audio)
Request Body:
{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}
Parameters:
kb_document_id(required, string): UUID of the knowledge base document containing the audio file
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'
Response (New Processing):
{
"message": "Audio file processed and stored successfully.",
"already_processed": false,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the full transcription of the audio file..."
}
Response (Already Processed):
{
"message": "Audio already processed",
"already_processed": true,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the full transcription of the audio file...",
"processed_date": "2025-01-14T10:30:00"
}
Notes:
- The audio file path is automatically fetched from the knowledge base document
- Document ownership is verified before processing
- If audio was already processed, returns cached result without re-processing
- High-accuracy AI transcription ensures quality results
3. Re-parse Audio File​
Re-process an already processed audio file by deleting old data and re-transcribing.
Endpoint: POST /reparse-audio
Authentication: Required (scope: agent:audio)
Request Body:
{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}
Parameters:
kb_document_id(required, string): UUID of the knowledge base document containing the audio file
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/reparse-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'
Response:
{
"message": "Audio file re-parsed and stored successfully.",
"reparsed": true,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"transcript": "This is the newly generated transcription..."
}
Notes:
- Deletes existing audio record and re-processes from scratch
- Useful when transcription quality was poor or audio was updated
4. Ask Question (RAG Query)​
Query the audio content using natural language. The system retrieves relevant chunks and generates contextual answers.
Endpoint: POST /ask_audio
Authentication: Required
Request Body:
{
"question": "What were the main topics discussed in the meeting?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}
Parameters:
question(required, string): Natural language question about the audio contentkb_document_id(required, string): UUID of the knowledge base document to query
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main topics discussed?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'
Response:
{
"answer": "Based on the audio transcription, the main topics discussed were: 1) Project timeline and milestones, 2) Budget allocation for Q2, 3) Team resource planning, and 4) Client feedback on the prototype.",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}
Supported Query Types:
- Specific questions: "What is the project deadline?"
- Summary requests: "Give me a summary of the audio"
- Full transcript: "Show me the complete transcript"
- Key highlights: "What are the important points?"
- Topic exploration: "What topics are covered?"
Notes:
- Intelligent search finds the most relevant content from your audio
- Generates accurate, contextual answers
- Handles greetings and casual conversation naturally
5. Create Conversation​
Create a new conversation session for chat history tracking.
Endpoint: POST /conversations/create
Authentication: Required
Request Body:
{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Discussion"
}
Parameters:
kb_document_id(optional, string): UUID of the knowledge base documenttitle(optional, string): Custom title for the conversation
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting"
}'
Response:
{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:30:00"
}
Notes:
- If
kb_document_idis provided withoutaudio_id, the system finds the most recent audio for that document - Document ownership is verified
- Conversations track chat history and context
6. Add Message to Conversation​
Add a user or assistant message to an existing conversation.
Endpoint: POST /conversations/{conversation_id}/messages
Authentication: Required
Path Parameters:
conversation_id(required, integer): ID of the conversation
Request Body:
{
"conversation_id": 789,
"role": "user",
"content": "What were the action items?"
}
Parameters:
conversation_id(required, integer): ID of the conversationrole(required, string): Either "user" or "assistant"content(required, string): Message content
Request Example:
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"conversation_id": 789,
"role": "user",
"content": "What were the action items?"
}'
Response:
{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"conversation_id": 789,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}
Notes:
- Conversation must belong to the authenticated user
- Updates conversation's
last_message_attimestamp - Messages are ordered by timestamp
7. Get Conversation History​
Retrieve all conversations for the authenticated user.
Endpoint: GET /conversations/history
Authentication: Required
Query Parameters:
limit(optional, integer, default: 100): Maximum number of conversations to return
Request Example:
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history?limit=50" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
[
{
"id": 789,
"fileName": "meeting_recording.mp3",
"analyzedAt": "2025-01-14T10:35:00",
"duration": "5 messages",
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000"
},
{
"id": 788,
"fileName": "Q3 Review",
"analyzedAt": "2025-01-13T15:20:00",
"duration": "12 messages",
"kbDocumentId": "660e8400-e29b-41d4-a716-446655440001"
}
]
Notes:
- Returns conversations in reverse chronological order (newest first)
- Shows message count as "duration"
- Displays audio filename or KB document title
- Only returns user's own conversations
- Internal IDs are not exposed for security
8. Get Specific Conversation​
Retrieve a specific conversation with all its messages.
Endpoint: GET /conversations/{conversation_id}
Authentication: Required
Path Parameters:
conversation_id(required, integer): ID of the conversation to retrieve
Request Example:
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Q4 Planning Meeting",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"audio": {
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"file_name": "audio_550e8400-e29b-41d4-a716-446655440000"
},
"messages": [
{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
},
{
"id": "b2c3d4e5-f6a7-6789-01bc-def123456789",
"role": "assistant",
"content": "The action items mentioned were...",
"timestamp": "2025-01-14T10:35:05"
}
]
}
Notes:
- Only the conversation owner can access it
- Returns 404 if conversation doesn't exist or access denied
- Messages are ordered chronologically
9. Delete Conversation​
Delete a conversation and all its messages.
Endpoint: DELETE /conversations/{conversation_id}
Authentication: Required
Path Parameters:
conversation_id(required, integer): ID of the conversation to delete
Request Example:
curl -X DELETE "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/789" \
-H "Authorization: Bearer YOUR_TOKEN"
Response:
{
"success": true,
"message": "Conversation deleted successfully"
}
Notes:
- Only the conversation owner can delete it
- All messages are cascade deleted
- Returns 404 if conversation doesn't exist or access denied
Data Models​
AudioInfo Structure​
{
"id": 456,
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"file_name": "audio_550e8400-e29b-41d4-a716-446655440000",
"file_size": 5242880,
"total_character": 15000,
"full_text": "Complete transcription text...",
"date_time": "2025-01-14T10:30:00",
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
Field Descriptions:
id: Unique identifier for the audio record (integer)kb_document_id: UUID reference to knowledge base document (string)file_name: Internal audio filename (string)file_size: Size of transcript file in bytes (integer)total_character: Total character count in transcript (integer)full_text: Complete transcription text (string)date_time: Processing timestamp (ISO 8601 string)user_id: UUID of the owner (string)
Conversation Structure​
{
"id": 789,
"user_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Discussion",
"started_at": "2025-01-14T10:30:00",
"last_message_at": "2025-01-14T10:35:00",
"is_active": true
}
Message Structure​
{
"id": "a1b2c3d4-e5f6-5678-90ab-cdef12345678",
"conversation_id": 789,
"role": "user",
"content": "What were the action items?",
"timestamp": "2025-01-14T10:35:00"
}
Error Responses​
All endpoints may return the following error responses:
400 Bad Request:
{
"detail": "kb_document_id is required."
}
400 Bad Request (Invalid Document):
{
"detail": "Document with UUID 550e8400-e29b-41d4-a716-446655440000 does not exist in knowledgebase"
}
400 Bad Request (Access Denied):
{
"detail": "Access denied. Document 550e8400-e29b-41d4-a716-446655440000 does not belong to user a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
404 Not Found:
{
"detail": "File path not found for KB document."
}
404 Not Found (Conversation):
{
"detail": "Conversation not found or access denied"
}
500 Internal Server Error:
{
"detail": "Transcription failed: [error message]"
}
Best Practices​
Audio Quality Requirements​
- Format: MP3, WAV, M4A, or other common audio formats
- Duration: Any length (longer files take more time to process)
- Audio Quality: Clear speech, minimal background noise
- Language: English (primary), with support for multiple languages
- Bitrate: 128 kbps or higher recommended
Recording Guidelines​
- Use a good quality microphone
- Record in a quiet environment
- Speak clearly and at moderate pace
- Avoid overlapping speech in multi-speaker scenarios
- Keep audio files under 100MB for optimal processing
Query Best Practices​
- Specific Questions: Ask direct questions for precise answers
- Summary Requests: Use keywords like "summary", "overview", "main points"
- Full Transcript: Request "complete transcript" or "everything said"
- Contextual Queries: Reference specific topics or speakers when possible
- Follow-up Questions: Use conversations to maintain context
Integration Workflow​
# 1. Upload audio to media directory (via your file upload system)
# Audio file is linked to a knowledge base document with UUID
# 2. Process the audio file
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/process-audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'
# 3. Create a conversation
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/create" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Analysis"
}'
# 4. Ask questions about the audio
curl -X POST "https://nextneural-api.superteams.ai/api/agents/audio_agent/ask_audio" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"question": "What were the main action items?",
"kb_document_id": "550e8400-e29b-41d4-a716-446655440000"
}'
# 5. Retrieve conversation history
curl -X GET "https://nextneural-api.superteams.ai/api/agents/audio_agent/conversations/history" \
-H "Authorization: Bearer YOUR_TOKEN"
Performance Considerations​
- Transcription Time:
- Short audio (< 5 min): 10-30 seconds
- Medium audio (5-20 min): 30-90 seconds
- Long audio (> 20 min): 1-5 minutes
- Query Response Time: Typically 1-3 seconds for standard queries
- Storage: Each minute of audio generates approximately 150-200 words
- Caching: Processed audio is cached; use
/reparse-audioto force re-processing
Security Features​
- User Isolation: All queries are private to your account
- Document Ownership: Only you can access your documents
- Authentication: All endpoints require valid API tokens
- Conversation Privacy: Your conversations are completely private
- UUID-based Identification: Secure, non-sequential identifiers for documents and messages
- No Internal ID Exposure: Internal database IDs are never exposed in API responses
Troubleshooting​
Transcription Issues​
Problem: Poor transcription quality
- Cause: Background noise, unclear speech, low audio quality
- Solution: Re-record with better audio quality, use
/reparse-audioendpoint
Problem: Transcription failed
- Cause: Unsupported audio format, corrupted file, API issues
- Solution: Convert to MP3/WAV, verify file integrity, check API key configuration
Query Issues​
Problem: "No relevant information found"
- Cause: Query doesn't match audio content, audio not processed
- Solution: Verify audio was processed, rephrase query, ask for summary first
Problem: Incomplete answers
- Cause: Query doesn't match available content well
- Solution: Ask more specific questions, request full transcript
Performance Issues​
Problem: Slow transcription
- Cause: Large audio file
- Solution: Split large files, retry if timeout occurs
Problem: Slow query responses
- Cause: Complex query or large audio file
- Solution: Use more specific queries
Limitations​
- Language Support: Optimized for English; other languages may have reduced accuracy
- Audio Length: Very long files (> 2 hours) may have processing delays
Future Enhancements​
- Speaker diarization (identify different speakers)
- Multi-language support with automatic detection
- Real-time streaming transcription
- Audio quality analysis and enhancement
- Custom vocabulary and domain-specific training
- Integration with video files (extract audio track)