IKAR Lab 3: Forensic Audio Suite

Professional hardware and software suite for speech signal analysis.

Since it has been launched in 1992, IKAR Lab has evolved from a sound editor application to the most popular audio forensic lab in the world. Today it is serving experts in 350 laboratories in more than 40 countries worldwide.

Audio/speech signal analysis IKAR Lab 3

Overview

IKAR Lab 3. New generation of forensic audio analysis software and hardware suite.

IKAR Lab 3 is a professional hardware and software solution for advanced speech signal analysis. It provides the capabilities to perform a multitude of valuable audio processing, analysis, audio restoration and voice comparison functions.

IKAR Lab 3 makes possible to perform an in-depth analysis of voice and speech by numerous visualization tools, automated and human-assisted comparison instruments.

IKAR Lab 3 is comprised of advanced and time-tested technologies and algorithms which are already in use at over 450 installations in more than 70 countries world-wide making it the most popular suite of audio processing, analysis and voice biometric matching tools available today. IKAR Lab 3 is available as a software and component hardware only solution or a complete turn-key solution including the workstation hardware, auxiliary equipment and training courses.

Main components

SIS
Audio forensic software

Sound Cleaner
Noise reduction and audio enhancement software

Caesar
Audio transcription software

STC-H246
Audio hardware

EdiTracker
The plugin performs diagnostics of the authenticity of analogue and digital audio recordings

Diagnostic module
A new SIS module for a more reliable assessment of the authenticity and examinability of an audio recording

SIS

SIS

 

SIS

Audio forensic software.

SIS is the core software of the IKAR Lab 3 forensic audio kit. It includes powerful tools for speech signal research and enhanced speech visualisation and analysis, including speech segmentation, text transcription, automatic and semi-automatic identification tools and many others.

 

Methods:

  • Visualisation
  • Editing and processing
  • Detecting speech and noises
  • Text transcription and speech segmentation
  • Separation of speakers in a dialogue/
    polylogue
  • Multi-window interface
  • Signal comparison
  • Signal analysis
  • Managing projects and creating reports
  • Identification
  • Automatic comparison
  • Comparison of formants
  • Pitch comparison
  • Identification Wizard
  • Overall conclusion
  • Analysis of an audio recording extracted from a video file
  • EdiTracker and the diagnostic module

Visualisation

The algorithms used for the spectral representation of the signal ensure the highest possible quality and clarity of visible speech. The user selects the optimal display parameters on the fly or uses presets for various types of spectral analysis.

  • Waveform
  • FFT and LPC spectograms
  • Medium and instant spectrum
  • Cepstogram
  • Autocorrelation
  • Pitch Extractor
  • Formants extractor
  • Signal energy
  • Histogram and histogram correlation

Editing and processing

SIS provides a wide variety of expert editing and signal processing tools that improve the intelligibility of recorded speech and prepare audio recordings for further analysis.

  • Amplitude normalisation
  • Linear transformation
  • DC Offset Suppression
  • Mixing
  • Modulation
  • Tempo correction
  • Resampling
  • Bit depth conversion
  • Stereo separation and merging two mono signals to stereo
  • Phase change
  • Adaptive inverse filter
  • Adaptive tone suppressor
  • Adaptive broadband noise filter

Detecting speech and noises

The speech detector automatically marks speech fragments in the audio signal that are suitable for identification. The module can also be configured to detect noisy areas: dial tones, clipped fragments and clicks.

Text transcription and speech segmentation

The speech-to-text plugin allows to automatically obtain the text content of a speech signal of an audio recording in Russian, English, Spanish, Kazakh, and Arabic. Additionally, the transcription is accompanied by word-to-word segmentation indicating the location of spoken words. This functionality allows the expert to work effectively with large amounts of audio recordings.

In manual mode, selected audio fragments can easily be assigned to particular categories (e.g., different speakers, sounds or noises) with text comments while the general text will be exported to MS Word. If there are two files of transcribed text, the programme can automatically search for all matching words in the audio recordings compared.

Separating speakers in a dialogue/polylogue

The module automatically marks lines according to speakers. Its reliability is up to 95% with a signal-to-noise ratio of at least 20 dB and the duration of each speaker's speech of at least 16 seconds.

Using built-in algorithms, the module allows segmentation of the lines spoken by up to 5 speakers.

Multi-window interface

SIS allows several audio files to be opened in one or several windows at the same time. The windows can be positioned according to a particular task: vertically for identification purposes or horizontally to compare copies of audio recordings or the various sound cleaning options.
Signals can be opened in several layers in one window, and their colours and transparency can be changed for better visualisation.

Signal comparison

Windows can be connected according to time and spectral domain, which makes measurement easier using vertical and horizontal cursors. The instant spectrum can be overlaid for better visual comparison. Pitch histograms can be compared visually or numerically using values of minimum, maximum, median, asymmetry and general correlation.

Signal analysis

SIS automatically calculates the signal characteristics, based on which the expert arrives at a conclusion if the recording is suitable for the identification analysis.

  • Frequency response
  • Signal-to-noise ratio
  • Reverberation time
  • Clipping and tonal noises
  • Clear speech duration

Working with projects and creating reports

IKAR Lab 3 organises the expert's workflow efficiently. The project opens files that are related to examination directly from SIS, whether they are audio, text, video or photographic files. These files and identification results can be saved in a structured way, as can reports created in MS Word. The report can be supplemented with information on the settings for illustrations and visible representations of speech, screenshots of the working screen or its area.

Identification

This unique tool based on biometric algorithms and expert modules is made to automate and formalise the processes involved in audio forensics identification research: searching for comparable words and sounds, selecting sounds and melodic fragments to be compared, comparing speakers’ formants and pitches, and performing speech analysis. The results are presented as numerical indicators to contribute to the overall identification conclusion.

Automatic comparison

The module performs 1:1 voice signal comparison. The method it uses depends on the speech signal characteristics of the audio recordings studied. All results are based on the extraction of voice biometric traits and calculations regarding their similarity.

More methods of comparison: cxvector (a development on xvector) is used as the main method, and, in addition, smart-speaker and gen6-v3 (when the clear speech content in an audio recording is from 1.5 to 5 seconds). The new functionality offers faster and more secure identification.

The module’s machine learning process involved tens of thousands of speakers to make the engine train on the audio recordings made by speakers of different genders, ages, ethnicities, and languages. The varied types of speech material were captured in various channels and in multiple sound recording sessions. The high reliability of the biometric engine has been confirmed in NIST testing.

Comparison of formants

The process of comparing formants with the module involves two stages.

1. Search and selection of reference sound fragments for known and unknown speakers:

  • using the scatter plot with vowel triangle and highlighting the searching area
  • specifying the frequency range of formants search
  •  by the position of horizontal marks indicating the limits in hertz and percentage
  •  using a graphical vowel chart

2. Expert comparison.

The module automatically calculates FR, FA and LR for the sounds selected and decides whether the outcome of identification is positive, negative or undefined *

Additional features:

  • Visual comparison of selected sounds on a vowel chart
  • Comparison of the average formant values for selected sounds of two speakers
  • Specifying words or triads as textual comments on reference fragments
  • Exporting tables of reference fragments and results to MS Word

Pitch comparison

The pitch comparison module compares the specificities of speakers’ melodic patterns. The module enables melodic fragments to be selected, attributes them to 1 of 18 possible melodic types and compares them according to 15 parameters, including maximum, average and minimum pitch values, rate of pitch change, skewness, kurtosis and others. The algorithm generates results in the form of a match percentage for each parameter and delivers an overall identification/elimination conclusion or an inconclusive result. All data can be easily exported as text reports.

Identification wizard

This plugin offers a step-by-step identification process, displays the stages of research, and visualises the results for any comparison made.

Overall conclusion

The outcome of each method can be saved in a given project. The programme is designed to bring the results from each module into account when making an overall conclusion. The expert can adjust the relative weight of each method in the overall conclusion or their significance can be automatically assigned through a calculation of the qualitative and quantitative characteristics

of the audio recordings being compared. Based on the results, the expert can automatically generate a detailed report.

Analysis of an audio track extracted from a video

With the new SIS method, the expert gets immediate access to the audio track of a video file without requiring any additional editors. Just, upload the video file and SIS will automatically extract the audio track from the video and open it in a separate window. The module allows work to be simultaneously done on a video in the video player and an audio track in the editor. The video and the sound are synchronised, and the video is automatically modified while the audio track is being edited.

Sound Cleaner

Sound Cleaner. Noise reduction and audio enhancement software


Sound Cleaner

Noise reduction and audio enhancement software

Usually, the examination of audio recordings requires the creation of a verbatim record, or transcription, thereof. Since audio recordings obtained in an operational context are often recorded in difficult conditions and not readily intelligible, the first step is to clean the sound of noise. To do this, the IKAR Lab 3 suite is optionally equipped with Sound Cleaner. It includes modern signal processing algorithms that are effective at suppressing broadband noise, tonal interference and pulses, while performing frequency response correction, equalising the signal, etc.

All filters work in real time — the result can be heard immediately after the filter has been added to the processing chain so that the user can select the optimal parameters by ear.

STC Auto Filter
Significantly reduces the level of the most common types of noise using a single controller

Cell phone Noise Filter
Reduces interference from the characteristic intermittent sounds of incoming mobile phone calls.

Broadband Noise Filter
Reduces the noise level from rooms and streets and interference from communication channels or recording equipment. Such noises take the form of a hiss and cannot be suppressed by other methods, since the interference spectra intersect or coincide with the useful signal spectra.

Inverse Filter
Equalises the frequency response of the communication channel in which the recording was created. The filter has two settings:
amplification of the weak and suppression of the strong spectral components of the signal (flattening the average spectra).

Tone Suppressor
Suppresses the stationary narrow-band and regular interferences (vibrations, network pickup, noises from home appliances, slow music, the sound of a car passing, noise from water or a room, reverberation, etc.).

Reverb Suppressor
Increases the intelligibility of speech, decreases the level of reverberation in recordings and reduces user fatigue by making the perception of a useful reverberated speech signal easier despite the presence of additional noise

Click Suppressor
Automatically restores the speech or music signals distorted by pulse noises (clicks, radio noises, knocks, crackles, etc.).

Dynamic Range control
Improves intelligibility when large drops occur in signal level. For example, amplifying a weak signal and suppressing a strong signal to balance the amplitude of the output signal.

Equaliser
A 4096-band graphic equaliser with a built-in spectrograph for detailed spectrum correction in distorted recordings.

Clip Restorer
Restores overloaded recording fragments by reconstructing their waveforms.

Reference Noise Suppressor
Suppresses any noise from the main channel present in the reference channel (for example, TV or radio broadcastings, music, etc).

DTMF Suppressor
Processes the phone dialling signals, which is a sequence of short rectangular pulses with dual-frequency filling.

Sound Cleaner saves its processing results in WAV format and automatically generates comprehensive textual reports that log the process. The programme is compatible with any sound editors using VST 3 format.

Caesar

Caesar. Audio recording transcription module

Caesar

Audio recording transcription module

The module is designed to produce a verbatim transcription of recorded speech. The text is output into MS Word and automatically synchronised with the audio recording. This process simplifies the search for the corresponding audio fragment and text editing. The ability to playback the recording and transcribe it in a single, offline interface makes the expert’s work easier.

STC-H246

 

STC-H246

STC-H246

Audio hardware

To guarantee the high quality of input and output signals, the IKAR Lab 3 suite is equipped with a professional STC-H246 audio hardware device.

STC-H246 is perfect for setting up a workstation for digitising analogue audio recordings. The device is designes to measure parameters and generating electrical signals in the audio frequency range.

Specs

Parameter Value
Sampling rate: 8–200 kHz
Resolution ADC/DAC:  24 bit
Signal-to-noise ratio in the end-to-end channel, in the
frequency band from 20 to 20 kHz:
105 dB
Input/output channel connector types: XLR, RCA, S/PDIF, TRS 6.3
Number of channels: 2
Case: Metal
Size: 111×166×190 mm
OS: Windows 7,8,10

EdiTracker

EdiTracker


EdiTracker

The plugin performs diagnostics of the authenticity of analogue and digital audio recordings and greatly simplifies expert analysis using SIS by providing the user with manual and automatic analysis methods.


EdiTracker analysis methods:

  • Specifying the recording device parameters
  • Identifying traces of previous digital signal processing
  • Auditory analysis
  • Detecting traces of tampering through phase shifts in the harmonics and phase scanning
  • Scanning background noise

Specifying the recording device parameters:

Every analogue recording device has unique characteristics, such as frequency response, total harmonic distortion, pitch variation, effective frequency range, tempo deviation, etc.

EdiTracker automatically assesses these characteristics using a test signal. A mismatch between recording device parameters and characteristics of a signal allegedly recorded with that unit may be an indication of tampering.

Identifying traces of digital preprocessing:

Digital processing of analogue signals always requires a specific sample rate. During the digitising process, a phenomenon known as aliasing occurs. Aliasing degrades the audio quality as high-frequency components are superimposed on low-frequency ones.The vast majority of analogue-to-digital and digital-to-analogue converters use anti-aliasing filters. EdiTracker automatically detects traces of such filters, the presence of which may suggest that the audio has been digitised.

Detecting traces of tampering through phase shifts in the harmonics

EdiTracker automatically scans audio for technical narrow-band signals which normally come from an electrical network (ENF), batteries, nearby electrical appliances, etc., and estimates their phase continuity. An unjustified phase break can be interpreted as potential evidence of audio editing.

Scanning background noise:

Background scanning detects dramatic changes in the spectrum that are unnoticeable on the waveform and which may be signs of audio editing. EdiTracker also automatically scans the integrity of background noises and marks any abrupt change in noise level.

Auditory analysis:

Auditory analysis of these events based on the known characteristics of the recording equipment and methods used can reveal possible violations in the intergrity of the overall audio picture and identify the location, facts and methods of such violations. EdiTracker provides an extended list of auditory and linguistic indicators that may indicate breaches in the authenticity of a recording. These resources can be used to create a textual report.

Diagnostic

A new SIS module for a more reliable assessment of the authenticity and examinability of an audio recording. The module detects various signal features that explain the nature of its origin or possible processing methods, which may either be unknown or deliberately hidden. In addition to EdiTracker, it detects the application of certain operations on a signal using the following methods:

  • Spoofing detection
  • DC offset analysis
  • Analysis of A/μ encoding traces
  • Analysis of MP3 encoding traces


Spoofing detection:

The spoofing detector searches for traces of spoofing attacks in the audio recording, such as replays, speech synthesis and voice disguising. This algorithm is based on a neural network trained on various types of spoofing. As a result, it can conclude whether or not the audio recording is masquerading as the authentic recording of a speaker.


DC offset analysis:

This module analyses the audio recording to identify any dramatic change in DC offset, as this may be a sign of integrity violation. If such a violation is detected, the module highlights the corresponding areas.


Detection of A/μ coding:

This module analyses the audio recording to identify any dramatic change in DC offset, as this may be a sign of integrity violation. If such a violation is detected, the module highlights the corresponding areas.

This module analyses the audio recording to detect areas with signs of A/μ encoding. The possibility that an audio recording has been processed using these codecs is not indicated by the recording format. In the event of the detection of such coding, the module highlights the corresponding areas or the entire audio recording.


Detection of MP3 coding:


This module analyses the audio recording to identify signs of MP3 coding. The possibility that an audio recording has been processed using this codec is not indicated by the recording format. In the event of the detection of such MP3 coding, the module displays a message describing the signs detected. Additionally, spectrograms, graphs and histograms are displayed, explaining the decision made by the algorithm


###1NULL
###2array(7) { [0]=> array(13) { ["fid"]=> string(4) "3004" ["uid"]=> string(1) "1" ["filename"]=> string(12) "aes_39th.pdf" ["filepath"]=> string(40) "files/product/ikarlab2/docs/aes_39th.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(6) "852219" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1390285062" ["origname"]=> string(12) "aes_39th.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(102) "gr|Articles|articles|Channel compensation for forensic speaker identification using inverse processing" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [1]=> array(13) { ["fid"]=> string(4) "3005" ["uid"]=> string(1) "1" ["filename"]=> string(18) "f0-_iafpa_2007.pdf" ["filepath"]=> string(46) "files/product/ikarlab2/docs/f0-_iafpa_2007.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(5) "58301" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1390285083" ["origname"]=> string(18) "f0-_iafpa_2007.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(83) "gr|Articles|articles|Speaker identification based on the statistical analysis of f0" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [2]=> array(13) { ["fid"]=> string(4) "3006" ["uid"]=> string(1) "1" ["filename"]=> string(21) "ref_channel_aes46.pdf" ["filepath"]=> string(49) "files/product/ikarlab2/docs/ref_channel_aes46.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(6) "759968" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1390285109" ["origname"]=> string(21) "ref_channel_aes46.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(118) "gr|Articles|articles|Semi-automated technique for noisy recording enhancement using an independent reference recording" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [3]=> array(13) { ["fid"]=> string(4) "4574" ["uid"]=> string(1) "1" ["filename"]=> string(22) "ikar_lab_3_leaflet.pdf" ["filepath"]=> string(50) "files/product/ikarlab2/docs/ikar_lab_3_leaflet.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(6) "815284" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1628260401" ["origname"]=> string(22) "ikar_lab_3_leaflet.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(53) "gr|Brochures & White papers|papers|IKAR Lab 3 Leaflet" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [4]=> array(13) { ["fid"]=> string(4) "4575" ["uid"]=> string(1) "1" ["filename"]=> string(23) "ikar_lab_3_brochure.pdf" ["filepath"]=> string(51) "files/product/ikarlab2/docs/ikar_lab_3_brochure.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(7) "3628312" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1628260408" ["origname"]=> string(23) "ikar_lab_3_brochure.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(54) "gr|Brochures & White papers|papers|IKAR Lab 3 Brochure" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [5]=> array(13) { ["fid"]=> string(4) "4576" ["uid"]=> string(1) "1" ["filename"]=> string(26) "ikar_lab_3_leaflet_esp.pdf" ["filepath"]=> string(54) "files/product/ikarlab2/docs/ikar_lab_3_leaflet_esp.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(6) "802277" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1628260795" ["origname"]=> string(26) "ikar_lab_3_leaflet_esp.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(61) "gr|Brochures & White papers|papers|IKAR Lab 3 Leaflet Spanish" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } [6]=> array(13) { ["fid"]=> string(4) "4577" ["uid"]=> string(1) "1" ["filename"]=> string(27) "ikar_lab_3_brochure_esp.pdf" ["filepath"]=> string(55) "files/product/ikarlab2/docs/ikar_lab_3_brochure_esp.pdf" ["filemime"]=> string(15) "application/pdf" ["filesize"]=> string(7) "3701683" ["status"]=> string(1) "1" ["timestamp"]=> string(10) "1628260824" ["origname"]=> string(27) "ikar_lab_3_brochure_esp.pdf" ["list"]=> string(1) "0" ["data"]=> array(1) { ["description"]=> string(62) "gr|Brochures & White papers|papers|IKAR Lab 3 Brochure Spanish" } ["nid"]=> string(4) "1579" ["view"]=> string(0) "" } }