Skip to main content

Advanced Video KYC Agent

The Advanced Video KYC Agent provides comprehensive identity verification with built-in fraud detection capabilities. It processes video files containing Aadhaar card information, extracts and validates identity details from both audio and visual components, and performs advanced security checks including lip-sync analysis, speaker detection, and background noise profiling.

Base URL​

/api/agents/advance_video_kyc

Authentication​

All endpoints require authentication. Sign up to https://cloud.nextneural.ai to get your API key.

How It Works​

The Advanced Video KYC Agent performs multi-layered identity verification:

  1. Audio Processing: Extracts audio from video, transcribes speech, translates to English, and extracts Aadhaar details
  2. Visual Processing: Detects Aadhaar card in video frames and extracts text using OCR
  3. Fraud Detection: Runs three parallel security checks:
    • Background Noise Analysis: Detects loud or unstable audio environments
    • Speaker Count Detection: Identifies multiple speakers (red flag for fraud)
    • Lip Sync Verification: Detects deepfakes and dubbed audio
  4. Cross-Validation: Compares audio and visual data with fraud penalties to calculate final verification score
  5. Background Processing: Uses Celery workers for scalable, persistent task processing

Key Differences from Basic Video KYC​

FeatureBasic Video KYCAdvanced Video KYC
Fraud DetectionNoneFull (Lip Sync, Speaker Count, Noise)
ProcessingSynchronousAsync (Celery Workers)
Score BreakdownSimpleDetailed with categories
Deepfake DetectionNoYes (Lip Sync Analysis)
Multi-Speaker DetectionNoYes (Replicate AI)
Task PersistenceNoYes (Redis Queue)

Endpoints​

1. Health Check​

Check if the Advanced Video KYC agent service is running.

Endpoint: GET /health

Authentication: None required

Response:

{
"status": "healthy",
"service": "Advanced Video KYC Agent"
}

2. Process KYC Video​

Process a video file for KYC verification with fraud detection. Returns immediately with "PROCESSING" status and dispatches work to background Celery workers.

Endpoint: POST /advance_video_kyc

Authentication: Required (scope: agent:advance_video_kyc)

Request Body:

{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"force_reprocess": false
}

Parameters:

  • document_id (required, string): UUID of the knowledge base document containing the video file
  • force_reprocess (optional, default: false): Re-process video even if already analyzed

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"force_reprocess": false
}'

Response (Processing Started):

{
"status": "PROCESSING",
"message": "Analysis queued in background worker",
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
"taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Response (Cached Result):

{
"status": "COMPLETED",
"record": {
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
"audioTranscript": "My name is Rajesh Kumar Singh...",
"parsedAudioData": { ... },
"parsedVisualData": { ... },
"fraudDetection": { ... },
"verificationScore": 85.0,
"scoreBreakdown": [ ... ],
"alreadyEvaluated": true
}
}

Response (Already Processing):

{
"status": "PROCESSING",
"message": "Analysis is already in progress",
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000"
}

Notes:

  • Returns immediately with PROCESSING status - use polling endpoint to check completion
  • Tasks persist in Redis queue even if backend restarts
  • Automatic retry on failure (2 retries with exponential backoff)
  • Worker pool manages concurrency limits

3. Check Processing Status​

Poll this endpoint to check if background analysis is complete.

Endpoint: GET /check_status/{document_id}

Authentication: Required (scope: agent:advance_video_kyc)

Path Parameters:

  • document_id (required): The UUID of the knowledge base document

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/check_status/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer YOUR_TOKEN"

Response (Processing):

{
"status": "PROCESSING",
"updated_at": "2025-01-14T10:30:00"
}

Response (Completed):

{
"status": "COMPLETED",
"record": {
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
"audioTranscript": "My name is Rajesh Kumar Singh. My date of birth is 15th August 1990. My Aadhaar number is 1234 5678 9012.",
"parsedAudioData": {
"name": "Rajesh Kumar Singh",
"dob": "15/08/1990",
"aadharId": "1234 5678 9012"
},
"parsedVisualData": {
"name": "RAJESH KUMAR SINGH",
"dob": "15/08/1990",
"aadharId": "1234 5678 9012",
"gender": "Male",
"confidence": 92.0
},
"fraudDetection": {
"backgroundNoiseDetected": false,
"backgroundNoiseType": null,
"speakerCount": 1,
"lipSyncFraudDetected": false
},
"verificationScore": 100.0,
"scoreBreakdown": [
{
"category": "Name Match",
"description": "Audio and visual names match exactly",
"impact": "+40",
"type": "positive"
},
{
"category": "Date of Birth Match",
"description": "Audio and visual DOB match exactly",
"impact": "+20",
"type": "positive"
},
{
"category": "Aadhaar Number Match",
"description": "Audio and visual Aadhaar numbers match exactly",
"impact": "+40",
"type": "positive"
},
{
"category": "Background Noise",
"description": "Clean audio environment detected",
"impact": "0",
"type": "positive"
},
{
"category": "Speaker Count",
"description": "Single speaker detected (valid)",
"impact": "0",
"type": "positive"
},
{
"category": "Lip Sync Integrity",
"description": "Lip sync verification passed",
"impact": "0",
"type": "positive"
}
],
"detectionConfidence": 92.0,
"warning": null,
"alreadyEvaluated": false,
"evaluatedAt": "2025-01-14T10:30:00",
"isReevaluation": false,
"createdAt": "2025-01-14T10:30:00",
"updatedAt": "2025-01-14T10:30:00"
}
}

Response (Failed):

{
"status": "FAILED",
"error": "Error processing video: No face detected in video"
}

Response (Not Found):

{
"status": "NOT_FOUND"
}

4. Get My KYC Records​

Retrieve all KYC records for the authenticated user.

Endpoint: GET /my_kyc_records

Authentication: Required (scope: agent:advance_video_kyc)

Query Parameters:

  • limit (optional, default: 100): Maximum number of records to return

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/my_kyc_records?limit=50" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"success": true,
"userId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"count": 3,
"records": [
{
"id": 789,
"analyzedAt": "2025-01-14T10:30:00",
"verificationScore": 100.0,
"name": "Rajesh Kumar Singh",
"status": "COMPLETED"
},
{
"id": 788,
"analyzedAt": "2025-01-13T15:20:00",
"verificationScore": 60.0,
"name": "Priya Sharma",
"status": "COMPLETED"
},
{
"id": 787,
"analyzedAt": "2025-01-12T09:45:00",
"verificationScore": 0,
"name": "Unknown",
"status": "PROCESSING"
}
]
}

Notes:

  • Returns records in reverse chronological order (newest first)
  • status field indicates current processing state (PENDING, PROCESSING, COMPLETED, FAILED)
  • Use status to show appropriate UI (spinner for PROCESSING, checkmark for COMPLETED)

5. Get Specific KYC Record​

Retrieve detailed information for a specific KYC record.

Endpoint: GET /kyc_record/{record_id}

Authentication: Required (scope: agent:advance_video_kyc)

Path Parameters:

  • record_id (required): The ID of the KYC record to retrieve

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/kyc_record/789" \
-H "Authorization: Bearer YOUR_TOKEN"

Response:

{
"success": true,
"record": {
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
"audioTranscript": "My name is Rajesh Kumar Singh...",
"parsedAudioData": {
"name": "Rajesh Kumar Singh",
"dob": "15/08/1990",
"aadharId": "1234 5678 9012"
},
"parsedVisualData": {
"name": "RAJESH KUMAR SINGH",
"dob": "15/08/1990",
"aadharId": "1234 5678 9012",
"gender": "Male",
"confidence": 92.0
},
"fraudDetection": {
"backgroundNoiseDetected": false,
"backgroundNoiseType": null,
"speakerCount": 1,
"lipSyncFraudDetected": false
},
"verificationScore": 100.0,
"scoreBreakdown": [ ... ],
"detectionConfidence": 92.0,
"warning": null,
"alreadyEvaluated": false,
"evaluatedAt": "2025-01-14T10:30:00",
"isReevaluation": false,
"createdAt": "2025-01-14T10:30:00",
"updatedAt": "2025-01-14T10:30:00"
}
}

Data Models​

Parsed Audio Data Structure​

{
"name": "Full Name",
"dob": "DD/MM/YYYY",
"aadharId": "XXXX XXXX XXXX"
}

Parsed Visual Data Structure​

{
"name": "FULL NAME",
"dob": "DD/MM/YYYY",
"aadharId": "XXXX XXXX XXXX",
"gender": "Male/Female",
"confidence": 92.0
}

Fraud Detection Structure​

{
"backgroundNoiseDetected": false,
"backgroundNoiseType": "Loud Background: Room noise is -35.5dB",
"speakerCount": 1,
"lipSyncFraudDetected": false
}

Score Breakdown Item Structure​

{
"category": "Name Match",
"description": "Audio and visual names match exactly",
"impact": "+40",
"type": "positive"
}

Type Values:

  • positive: Check passed, contributes positively to score
  • warning: Partial match or minor issue
  • negative: Check failed, may reduce score

Full Record Structure​

{
"kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
"audioTranscript": "Transcribed and translated speech...",
"parsedAudioData": {
"name": "Full Name",
"dob": "DD/MM/YYYY",
"aadharId": "XXXX XXXX XXXX"
},
"parsedVisualData": {
"name": "FULL NAME",
"dob": "DD/MM/YYYY",
"aadharId": "XXXX XXXX XXXX",
"gender": "Male/Female",
"confidence": 92.0
},
"fraudDetection": {
"backgroundNoiseDetected": false,
"backgroundNoiseType": null,
"speakerCount": 1,
"lipSyncFraudDetected": false
},
"verificationScore": 100.0,
"scoreBreakdown": [ ... ],
"detectionConfidence": 92.0,
"warning": null,
"alreadyEvaluated": false,
"evaluatedAt": "2025-01-14T10:30:00",
"isReevaluation": false,
"createdAt": "2025-01-14T10:30:00",
"updatedAt": "2025-01-14T10:30:00"
}

Verification Score Calculation​

The verification score (0-100) is calculated by combining data matching and fraud detection:

Positive Score Components (Max: 100 points)​

ComparisonPointsCondition
Name - Exact Match+40Audio name exactly matches visual name
Name - Partial Match+20Some name parts match between audio and visual
DOB - Exact Match+20Date of birth matches exactly
Aadhaar - Exact Match+40Aadhaar number matches exactly

Fraud Detection Penalties​

CheckPenaltyCondition
Background Noise-10High or unstable background noise detected
No Speakers-40No speakers detected in audio
Multiple Speakers-20More than 1 speaker detected
Lip Sync Fraud-30Lip movements don't match audio (deepfake indicator)

Score Interpretation​

Score RangeInterpretationRecommended Action
90-100ExcellentAuto-approve
70-89GoodAccept with minor review
50-69ModerateManual review required
30-49PoorRequest new video
0-29Failed / Fraud RiskReject, investigate if fraud flags present

Fraud Detection Details​

1. Background Noise Analysis​

Analyzes audio quality using librosa for spectral analysis.

Detection Criteria:

  • Noise Floor Limit: -42.0 dB (audio below this is considered clean)
  • Stability Limit: 8.0 dB fluctuation (excessive variation indicates unstable environment)

Detected Issues:

  • Loud background (room noise exceeds threshold)
  • Unstable background (noise fluctuates significantly)

2. Speaker Count Detection​

Uses Replicate's whisper-diarization model to detect number of distinct speakers.

Red Flags:

  • 0 speakers: No valid speech detected
  • >1 speakers: Multiple people speaking (potential coaching/prompting)

Valid Scenario:

  • Exactly 1 speaker detected

3. Lip Sync Verification (Deepfake Detection)​

Advanced analysis using MediaPipe face mesh and audio correlation.

Detection Methods:

  • Ghost Speaker Detection: Audio is loud but mouth is closed (>35% mismatch)
  • Correlation Analysis: Mouth movements don't correlate with audio (score < 0.20)
  • Dubbing Detection: Audio delayed by >100ms from visual

Detected Issues:

  • Lip sync mismatch (potential deepfake)
  • Dubbed audio (voice recorded separately)
  • Ghost speaker (audio playing without corresponding mouth movement)

Processing States​

StatusDescription
PENDINGRecord created, waiting for processing to start
PROCESSINGVideo is being analyzed by Celery worker
COMPLETEDProcessing finished successfully
FAILEDProcessing failed (see warning_message for details)

Error Responses​

All endpoints may return the following error responses:

400 Bad Request:

{
"detail": "document_id is required for security validation"
}

403 Forbidden (Document Access):

{
"detail": "Access denied. Document 550e8400-e29b-41d4-a716-446655440000 does not belong to user a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

403 Forbidden (Invalid Document):

{
"detail": "Document with UUID 550e8400-e29b-41d4-a716-446655440000 does not exist in knowledgebase"
}

404 Not Found:

{
"detail": "File path not found for KB document."
}

404 Not Found (Record):

{
"detail": "KYC record not found or you don't have access to it"
}

500 Internal Server Error:

{
"detail": "Error initiating processing: [error message]"
}

Warning Messages​

The system returns prioritized warning messages:

Critical Warnings (Fraud Detected)​

  • "CRITICAL: Lip sync mismatch detected (Potential Deepfake/Dubbing)"
    • Highest priority, indicates possible video manipulation

Security Warnings​

  • "WARNING: Multiple speakers detected in audio"
    • Multiple people detected, possible coaching
  • "WARNING: High background noise detected ({noise_type})"
    • Audio quality compromised

Data Quality Warnings​

  • "No Aadhaar card detected in video"
    • OCR couldn't find Aadhaar card in any frame
  • "Incomplete data extracted from video. Please upload a clearer video"
    • Some fields (name, DOB, or Aadhaar) missing from extracted data

Best Practices​

Video Quality Requirements​

  • Resolution: Minimum 720p recommended
  • Duration: 10-30 seconds optimal
  • Lighting: Good, even lighting on the Aadhaar card
  • Focus: Card should be clearly visible and in focus
  • Audio: Clear speech, minimal background noise
  • Content: User should speak name, DOB, and Aadhaar number clearly
  • Single Speaker: Only the applicant should speak in the video

Recording Guidelines​

  1. Record in a quiet environment
  2. Hold the Aadhaar card steady in frame for at least 3-5 seconds
  3. Ensure card is flat and not tilted
  4. Avoid glare or reflections on the card
  5. Speak clearly and at moderate pace
  6. Pronounce Aadhaar number digit by digit
  7. Use proper date format (day, month, year)
  8. Ensure only one person speaks throughout

Integration Workflow​

# 1. Upload video to knowledge base (via your file upload system)
# This creates a KB document with UUID

# 2. Start KYC processing (returns immediately)
curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_id": "550e8400-e29b-41d4-a716-446655440000"
}'

# 3. Poll for status (recommended: every 5 seconds)
while true; do
STATUS=$(curl -s -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/check_status/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer YOUR_TOKEN" | jq -r '.status')

if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then
break
fi
sleep 5
done

# 4. Get full record details
curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/kyc_record/789" \
-H "Authorization: Bearer YOUR_TOKEN"

# 5. Decision logic based on score and fraud flags:
# - Score >= 70 AND no fraud flags → Approve
# - Score >= 50 AND no critical fraud → Manual Review
# - Fraud flags present OR score < 50 → Reject/Investigate

# 6. Re-process if needed (e.g., improved AI model)
curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"force_reprocess": true
}'

Performance Considerations​

  • Processing Time: Typically 30-60 seconds per video (depends on video length)
  • Fraud Checks: Run concurrently with video/audio processing
  • Caching: Completed results are cached; use force_reprocess=true to re-analyze
  • Concurrent Processing: Multiple videos processed by separate Celery workers
  • Task Persistence: Tasks survive server restarts (persisted in Redis)
  • Retry Logic: Automatic 2 retries with exponential backoff on failure

Security Features​

  • User Isolation: All queries are private to your account
  • Document Ownership: Only you can access your documents
  • Authentication: All endpoints require valid API tokens with agent:advance_video_kyc scope
  • UUID-based Identification: Secure, non-sequential identifiers for documents
  • No File Path Exposure: File paths are never exposed in API requests or responses
  • Fraud Detection: Multi-layered security checks for deepfakes and manipulation
  • Background Workers: Processing isolated in separate worker processes

Troubleshooting​

Low Verification Scores​

Possible Causes:

  • Audio and visual data don't match
  • Fraud detection penalties applied
  • Poor video quality

Solutions:

  • Ensure person speaks details matching their Aadhaar card
  • Record in quiet environment (reduces noise penalty)
  • Ensure face is clearly visible (for lip sync analysis)

Lip Sync Fraud Detected​

Possible Causes:

  • Video was dubbed or audio replaced
  • Face not clearly visible throughout video
  • Poor lighting on face
  • Video recorded without audio, dubbed later

Solutions:

  • Record video with live audio in one take
  • Ensure face is well-lit and clearly visible
  • Don't use edited or spliced videos

Multiple Speakers Detected​

Possible Causes:

  • Another person speaking in background
  • TV/Radio playing in background
  • Someone coaching the applicant

Solutions:

  • Record in a private, quiet room
  • Ensure only the applicant speaks
  • Turn off all audio sources

Background Noise Issues​

Possible Causes:

  • Recording in noisy environment
  • Poor microphone quality
  • Wind or fan noise

Solutions:

  • Record in a quiet indoor environment
  • Use device's primary microphone
  • Reduce background noise sources

No Aadhaar Card Detected​

Possible Causes:

  • Card not visible in any frame
  • Card too small or far from camera
  • Poor lighting or focus

Solutions:

  • Hold card closer to camera
  • Ensure entire card is visible for 3-5 seconds
  • Improve lighting conditions
  • Keep card flat and avoid angles

Processing Stuck in PROCESSING State​

Possible Causes:

  • Celery worker overloaded
  • Worker crashed during processing
  • Redis connection issues

Solutions:

  • Wait and poll again (workers auto-retry)
  • Use force_reprocess=true to restart
  • Contact support if persists > 10 minutes