Advanced Video KYC Agent

The Advanced Video KYC Agent provides comprehensive identity verification with built-in fraud detection capabilities. It processes video files containing Aadhaar card information, extracts and validates identity details from both audio and visual components, and performs advanced security checks including lip-sync analysis, speaker detection, and background noise profiling.

Base URL

/api/agents/advance_video_kyc

Authentication

All endpoints require authentication. Sign up to https://cloud.nextneural.ai to get your API key.

How It Works

The Advanced Video KYC Agent performs multi-layered identity verification:

Audio Processing: Extracts audio from video, transcribes speech, translates to English, and extracts Aadhaar details
Visual Processing: Detects Aadhaar card in video frames and extracts text using OCR
Fraud Detection: Runs three parallel security checks:
- Background Noise Analysis: Detects loud or unstable audio environments
- Speaker Count Detection: Identifies multiple speakers (red flag for fraud)
- Lip Sync Verification: Detects deepfakes and dubbed audio
Cross-Validation: Compares audio and visual data with fraud penalties to calculate final verification score
Background Processing: Uses Celery workers for scalable, persistent task processing

Key Differences from Basic Video KYC

Feature	Basic Video KYC	Advanced Video KYC
Fraud Detection	None	Full (Lip Sync, Speaker Count, Noise)
Processing	Synchronous	Async (Celery Workers)
Score Breakdown	Simple	Detailed with categories
Deepfake Detection	No	Yes (Lip Sync Analysis)
Multi-Speaker Detection	No	Yes (Replicate AI)
Task Persistence	No	Yes (Redis Queue)

Endpoints

1. Health Check

Check if the Advanced Video KYC agent service is running.

Endpoint: GET /health

Authentication: None required

Response:

{
  "status": "healthy",
  "service": "Advanced Video KYC Agent"
}

2. Process KYC Video

Process a video file for KYC verification with fraud detection. Returns immediately with "PROCESSING" status and dispatches work to background Celery workers.

Endpoint: POST /advance_video_kyc

Authentication: Required (scope: agent:advance_video_kyc)

Request Body:

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "force_reprocess": false
}

Parameters:

document_id (required, string): UUID of the knowledge base document containing the video file
force_reprocess (optional, default: false): Re-process video even if already analyzed

Request Example:

curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "550e8400-e29b-41d4-a716-446655440000",
    "force_reprocess": false
  }'

Response (Processing Started):

{
  "status": "PROCESSING",
  "message": "Analysis queued in background worker",
  "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
  "taskId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Response (Cached Result):

{
  "status": "COMPLETED",
  "record": {
    "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
    "audioTranscript": "My name is Rajesh Kumar Singh...",
    "parsedAudioData": { ... },
    "parsedVisualData": { ... },
    "fraudDetection": { ... },
    "verificationScore": 85.0,
    "scoreBreakdown": [ ... ],
    "alreadyEvaluated": true
  }
}

Response (Already Processing):

{
  "status": "PROCESSING",
  "message": "Analysis is already in progress",
  "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000"
}

Notes:

Returns immediately with PROCESSING status - use polling endpoint to check completion
Tasks persist in Redis queue even if backend restarts
Automatic retry on failure (2 retries with exponential backoff)
Worker pool manages concurrency limits

3. Check Processing Status

Poll this endpoint to check if background analysis is complete.

Endpoint: GET /check_status/{document_id}

Authentication: Required (scope: agent:advance_video_kyc)

Path Parameters:

document_id (required): The UUID of the knowledge base document

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/check_status/550e8400-e29b-41d4-a716-446655440000" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response (Processing):

{
  "status": "PROCESSING",
  "updated_at": "2025-01-14T10:30:00"
}

Response (Completed):

{
  "status": "COMPLETED",
  "record": {
    "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
    "audioTranscript": "My name is Rajesh Kumar Singh. My date of birth is 15th August 1990. My Aadhaar number is 1234 5678 9012.",
    "parsedAudioData": {
      "name": "Rajesh Kumar Singh",
      "dob": "15/08/1990",
      "aadharId": "1234 5678 9012"
    },
    "parsedVisualData": {
      "name": "RAJESH KUMAR SINGH",
      "dob": "15/08/1990",
      "aadharId": "1234 5678 9012",
      "gender": "Male",
      "confidence": 92.0
    },
    "fraudDetection": {
      "backgroundNoiseDetected": false,
      "backgroundNoiseType": null,
      "speakerCount": 1,
      "lipSyncFraudDetected": false
    },
    "verificationScore": 100.0,
    "scoreBreakdown": [
      {
        "category": "Name Match",
        "description": "Audio and visual names match exactly",
        "impact": "+40",
        "type": "positive"
      },
      {
        "category": "Date of Birth Match",
        "description": "Audio and visual DOB match exactly",
        "impact": "+20",
        "type": "positive"
      },
      {
        "category": "Aadhaar Number Match",
        "description": "Audio and visual Aadhaar numbers match exactly",
        "impact": "+40",
        "type": "positive"
      },
      {
        "category": "Background Noise",
        "description": "Clean audio environment detected",
        "impact": "0",
        "type": "positive"
      },
      {
        "category": "Speaker Count",
        "description": "Single speaker detected (valid)",
        "impact": "0",
        "type": "positive"
      },
      {
        "category": "Lip Sync Integrity",
        "description": "Lip sync verification passed",
        "impact": "0",
        "type": "positive"
      }
    ],
    "detectionConfidence": 92.0,
    "warning": null,
    "alreadyEvaluated": false,
    "evaluatedAt": "2025-01-14T10:30:00",
    "isReevaluation": false,
    "createdAt": "2025-01-14T10:30:00",
    "updatedAt": "2025-01-14T10:30:00"
  }
}

Response (Failed):

{
  "status": "FAILED",
  "error": "Error processing video: No face detected in video"
}

Response (Not Found):

{
  "status": "NOT_FOUND"
}

4. Get My KYC Records

Retrieve all KYC records for the authenticated user.

Endpoint: GET /my_kyc_records

Authentication: Required (scope: agent:advance_video_kyc)

Query Parameters:

limit (optional, default: 100): Maximum number of records to return

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/my_kyc_records?limit=50" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "success": true,
  "userId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "count": 3,
  "records": [
    {
      "id": 789,
      "analyzedAt": "2025-01-14T10:30:00",
      "verificationScore": 100.0,
      "name": "Rajesh Kumar Singh",
      "status": "COMPLETED"
    },
    {
      "id": 788,
      "analyzedAt": "2025-01-13T15:20:00",
      "verificationScore": 60.0,
      "name": "Priya Sharma",
      "status": "COMPLETED"
    },
    {
      "id": 787,
      "analyzedAt": "2025-01-12T09:45:00",
      "verificationScore": 0,
      "name": "Unknown",
      "status": "PROCESSING"
    }
  ]
}

Notes:

Returns records in reverse chronological order (newest first)
status field indicates current processing state (PENDING, PROCESSING, COMPLETED, FAILED)
Use status to show appropriate UI (spinner for PROCESSING, checkmark for COMPLETED)

5. Get Specific KYC Record

Retrieve detailed information for a specific KYC record.

Endpoint: GET /kyc_record/{record_id}

Authentication: Required (scope: agent:advance_video_kyc)

Path Parameters:

record_id (required): The ID of the KYC record to retrieve

Request Example:

curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/kyc_record/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

Response:

{
  "success": true,
  "record": {
    "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
    "audioTranscript": "My name is Rajesh Kumar Singh...",
    "parsedAudioData": {
      "name": "Rajesh Kumar Singh",
      "dob": "15/08/1990",
      "aadharId": "1234 5678 9012"
    },
    "parsedVisualData": {
      "name": "RAJESH KUMAR SINGH",
      "dob": "15/08/1990",
      "aadharId": "1234 5678 9012",
      "gender": "Male",
      "confidence": 92.0
    },
    "fraudDetection": {
      "backgroundNoiseDetected": false,
      "backgroundNoiseType": null,
      "speakerCount": 1,
      "lipSyncFraudDetected": false
    },
    "verificationScore": 100.0,
    "scoreBreakdown": [ ... ],
    "detectionConfidence": 92.0,
    "warning": null,
    "alreadyEvaluated": false,
    "evaluatedAt": "2025-01-14T10:30:00",
    "isReevaluation": false,
    "createdAt": "2025-01-14T10:30:00",
    "updatedAt": "2025-01-14T10:30:00"
  }
}

Data Models

Parsed Audio Data Structure

{
  "name": "Full Name",
  "dob": "DD/MM/YYYY",
  "aadharId": "XXXX XXXX XXXX"
}

Parsed Visual Data Structure

{
  "name": "FULL NAME",
  "dob": "DD/MM/YYYY",
  "aadharId": "XXXX XXXX XXXX",
  "gender": "Male/Female",
  "confidence": 92.0
}

Fraud Detection Structure

{
  "backgroundNoiseDetected": false,
  "backgroundNoiseType": "Loud Background: Room noise is -35.5dB",
  "speakerCount": 1,
  "lipSyncFraudDetected": false
}

Score Breakdown Item Structure

{
  "category": "Name Match",
  "description": "Audio and visual names match exactly",
  "impact": "+40",
  "type": "positive"
}

Type Values:

positive: Check passed, contributes positively to score
warning: Partial match or minor issue
negative: Check failed, may reduce score

Full Record Structure

{
  "kbDocumentId": "550e8400-e29b-41d4-a716-446655440000",
  "audioTranscript": "Transcribed and translated speech...",
  "parsedAudioData": {
    "name": "Full Name",
    "dob": "DD/MM/YYYY",
    "aadharId": "XXXX XXXX XXXX"
  },
  "parsedVisualData": {
    "name": "FULL NAME",
    "dob": "DD/MM/YYYY",
    "aadharId": "XXXX XXXX XXXX",
    "gender": "Male/Female",
    "confidence": 92.0
  },
  "fraudDetection": {
    "backgroundNoiseDetected": false,
    "backgroundNoiseType": null,
    "speakerCount": 1,
    "lipSyncFraudDetected": false
  },
  "verificationScore": 100.0,
  "scoreBreakdown": [ ... ],
  "detectionConfidence": 92.0,
  "warning": null,
  "alreadyEvaluated": false,
  "evaluatedAt": "2025-01-14T10:30:00",
  "isReevaluation": false,
  "createdAt": "2025-01-14T10:30:00",
  "updatedAt": "2025-01-14T10:30:00"
}

Verification Score Calculation

The verification score (0-100) is calculated by combining data matching and fraud detection:

Positive Score Components (Max: 100 points)

Comparison	Points	Condition
Name - Exact Match	+40	Audio name exactly matches visual name
Name - Partial Match	+20	Some name parts match between audio and visual
DOB - Exact Match	+20	Date of birth matches exactly
Aadhaar - Exact Match	+40	Aadhaar number matches exactly

Fraud Detection Penalties

Check	Penalty	Condition
Background Noise	-10	High or unstable background noise detected
No Speakers	-40	No speakers detected in audio
Multiple Speakers	-20	More than 1 speaker detected
Lip Sync Fraud	-30	Lip movements don't match audio (deepfake indicator)

Score Interpretation

Score Range	Interpretation	Recommended Action
90-100	Excellent	Auto-approve
70-89	Good	Accept with minor review
50-69	Moderate	Manual review required
30-49	Poor	Request new video
0-29	Failed / Fraud Risk	Reject, investigate if fraud flags present

Fraud Detection Details

1. Background Noise Analysis

Analyzes audio quality using librosa for spectral analysis.

Detection Criteria:

Noise Floor Limit: -42.0 dB (audio below this is considered clean)
Stability Limit: 8.0 dB fluctuation (excessive variation indicates unstable environment)

Detected Issues:

Loud background (room noise exceeds threshold)
Unstable background (noise fluctuates significantly)

2. Speaker Count Detection

Uses Replicate's whisper-diarization model to detect number of distinct speakers.

Red Flags:

0 speakers: No valid speech detected
>1 speakers: Multiple people speaking (potential coaching/prompting)

Valid Scenario:

Exactly 1 speaker detected

3. Lip Sync Verification (Deepfake Detection)

Advanced analysis using MediaPipe face mesh and audio correlation.

Detection Methods:

Ghost Speaker Detection: Audio is loud but mouth is closed (>35% mismatch)
Correlation Analysis: Mouth movements don't correlate with audio (score < 0.20)
Dubbing Detection: Audio delayed by >100ms from visual

Detected Issues:

Lip sync mismatch (potential deepfake)
Dubbed audio (voice recorded separately)
Ghost speaker (audio playing without corresponding mouth movement)

Processing States

Status	Description
`PENDING`	Record created, waiting for processing to start
`PROCESSING`	Video is being analyzed by Celery worker
`COMPLETED`	Processing finished successfully
`FAILED`	Processing failed (see warning_message for details)

Error Responses

All endpoints may return the following error responses:

400 Bad Request:

{
  "detail": "document_id is required for security validation"
}

403 Forbidden (Document Access):

{
  "detail": "Access denied. Document 550e8400-e29b-41d4-a716-446655440000 does not belong to user a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

403 Forbidden (Invalid Document):

{
  "detail": "Document with UUID 550e8400-e29b-41d4-a716-446655440000 does not exist in knowledgebase"
}

404 Not Found:

{
  "detail": "File path not found for KB document."
}

404 Not Found (Record):

{
  "detail": "KYC record not found or you don't have access to it"
}

500 Internal Server Error:

{
  "detail": "Error initiating processing: [error message]"
}

Warning Messages

The system returns prioritized warning messages:

Critical Warnings (Fraud Detected)

"CRITICAL: Lip sync mismatch detected (Potential Deepfake/Dubbing)"
- Highest priority, indicates possible video manipulation

Security Warnings

"WARNING: Multiple speakers detected in audio"
- Multiple people detected, possible coaching
"WARNING: High background noise detected ({noise_type})"
- Audio quality compromised

Data Quality Warnings

"No Aadhaar card detected in video"
- OCR couldn't find Aadhaar card in any frame
"Incomplete data extracted from video. Please upload a clearer video"
- Some fields (name, DOB, or Aadhaar) missing from extracted data

Best Practices

Video Quality Requirements

Resolution: Minimum 720p recommended
Duration: 10-30 seconds optimal
Lighting: Good, even lighting on the Aadhaar card
Focus: Card should be clearly visible and in focus
Audio: Clear speech, minimal background noise
Content: User should speak name, DOB, and Aadhaar number clearly
Single Speaker: Only the applicant should speak in the video

Recording Guidelines

Record in a quiet environment
Hold the Aadhaar card steady in frame for at least 3-5 seconds
Ensure card is flat and not tilted
Avoid glare or reflections on the card
Speak clearly and at moderate pace
Pronounce Aadhaar number digit by digit
Use proper date format (day, month, year)
Ensure only one person speaks throughout

Integration Workflow

# 1. Upload video to knowledge base (via your file upload system)
# This creates a KB document with UUID

# 2. Start KYC processing (returns immediately)
curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

# 3. Poll for status (recommended: every 5 seconds)
while true; do
  STATUS=$(curl -s -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/check_status/550e8400-e29b-41d4-a716-446655440000" \
    -H "Authorization: Bearer YOUR_TOKEN" | jq -r '.status')

  if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then
    break
  fi
  sleep 5
done

# 4. Get full record details
curl -X GET "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/kyc_record/789" \
  -H "Authorization: Bearer YOUR_TOKEN"

# 5. Decision logic based on score and fraud flags:
# - Score >= 70 AND no fraud flags → Approve
# - Score >= 50 AND no critical fraud → Manual Review
# - Fraud flags present OR score < 50 → Reject/Investigate

# 6. Re-process if needed (e.g., improved AI model)
curl -X POST "https://nextneural-api.superteams.ai/api/agents/advance_video_kyc/advance_video_kyc" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "550e8400-e29b-41d4-a716-446655440000",
    "force_reprocess": true
  }'

Performance Considerations

Processing Time: Typically 30-60 seconds per video (depends on video length)
Fraud Checks: Run concurrently with video/audio processing
Caching: Completed results are cached; use force_reprocess=true to re-analyze
Concurrent Processing: Multiple videos processed by separate Celery workers
Task Persistence: Tasks survive server restarts (persisted in Redis)
Retry Logic: Automatic 2 retries with exponential backoff on failure

Security Features

User Isolation: All queries are private to your account
Document Ownership: Only you can access your documents
Authentication: All endpoints require valid API tokens with agent:advance_video_kyc scope
UUID-based Identification: Secure, non-sequential identifiers for documents
No File Path Exposure: File paths are never exposed in API requests or responses
Fraud Detection: Multi-layered security checks for deepfakes and manipulation
Background Workers: Processing isolated in separate worker processes

Troubleshooting

Low Verification Scores

Possible Causes:

Audio and visual data don't match
Fraud detection penalties applied
Poor video quality

Solutions:

Ensure person speaks details matching their Aadhaar card
Record in quiet environment (reduces noise penalty)
Ensure face is clearly visible (for lip sync analysis)

Lip Sync Fraud Detected

Possible Causes:

Video was dubbed or audio replaced
Face not clearly visible throughout video
Poor lighting on face
Video recorded without audio, dubbed later

Solutions:

Record video with live audio in one take
Ensure face is well-lit and clearly visible
Don't use edited or spliced videos

Multiple Speakers Detected

Possible Causes:

Another person speaking in background
TV/Radio playing in background
Someone coaching the applicant

Solutions:

Record in a private, quiet room
Ensure only the applicant speaks
Turn off all audio sources

Background Noise Issues

Possible Causes:

Recording in noisy environment
Poor microphone quality
Wind or fan noise

Solutions:

Record in a quiet indoor environment
Use device's primary microphone
Reduce background noise sources

No Aadhaar Card Detected

Possible Causes:

Card not visible in any frame
Card too small or far from camera
Poor lighting or focus

Solutions:

Hold card closer to camera
Ensure entire card is visible for 3-5 seconds
Improve lighting conditions
Keep card flat and avoid angles

Processing Stuck in PROCESSING State

Possible Causes:

Celery worker overloaded
Worker crashed during processing
Redis connection issues

Solutions:

Wait and poll again (workers auto-retry)
Use force_reprocess=true to restart
Contact support if persists > 10 minutes

Base URL​

Authentication​

How It Works​

Key Differences from Basic Video KYC​

Endpoints​

1. Health Check​

2. Process KYC Video​

3. Check Processing Status​

4. Get My KYC Records​

5. Get Specific KYC Record​

Data Models​

Parsed Audio Data Structure​

Parsed Visual Data Structure​

Fraud Detection Structure​

Score Breakdown Item Structure​

Full Record Structure​

Verification Score Calculation​

Positive Score Components (Max: 100 points)​

Fraud Detection Penalties​

Score Interpretation​

Fraud Detection Details​

1. Background Noise Analysis​

2. Speaker Count Detection​

3. Lip Sync Verification (Deepfake Detection)​

Processing States​

Error Responses​

Warning Messages​

Critical Warnings (Fraud Detected)​

Security Warnings​

Data Quality Warnings​

Best Practices​

Video Quality Requirements​

Recording Guidelines​

Integration Workflow​

Performance Considerations​

Security Features​

Troubleshooting​

Low Verification Scores​

Lip Sync Fraud Detected​

Multiple Speakers Detected​

Background Noise Issues​

No Aadhaar Card Detected​

Processing Stuck in PROCESSING State​

Base URL

Authentication

How It Works

Key Differences from Basic Video KYC

Endpoints

1. Health Check

2. Process KYC Video

3. Check Processing Status

4. Get My KYC Records

5. Get Specific KYC Record

Data Models

Parsed Audio Data Structure

Parsed Visual Data Structure

Fraud Detection Structure

Score Breakdown Item Structure

Full Record Structure

Verification Score Calculation

Positive Score Components (Max: 100 points)

Fraud Detection Penalties

Score Interpretation

Fraud Detection Details

1. Background Noise Analysis

2. Speaker Count Detection

3. Lip Sync Verification (Deepfake Detection)

Processing States

Error Responses

Warning Messages

Critical Warnings (Fraud Detected)

Security Warnings

Data Quality Warnings

Best Practices

Video Quality Requirements

Recording Guidelines

Integration Workflow

Performance Considerations

Security Features

Troubleshooting

Low Verification Scores

Lip Sync Fraud Detected

Multiple Speakers Detected

Background Noise Issues

No Aadhaar Card Detected

Processing Stuck in PROCESSING State