The following are the specifications for the audio capture format.
Property | Detail | Expected by SP |
---|---|---|
'RIFF' | RIFF file identification | ‘RIFF’ |
'WAVE' | File Type Header. For our purposes, it always equals “WAVE”. | ‘WAVE’ |
fmt marker | format sub-chunk identification OR Format chunk marker. | 'fmt' |
fmt size | length of format data or Sub-Chunk size | 16 |
Audio format | format specifier (Type of format (1 is PCM) - 2 byte integer) | 1 |
channels | number of channels | 1 |
sample rate | sample rate in kHz | 16 / 44.1 / 48 |
bit depth | bit depth (Bits per sample) | 16 |
Summary 16000 :
RIFF (little-endian) with 'fmt' marker and fmt-size = 16, WAVE audio,16 bit-Microsoft PCM, mono, 16000 Hz
Summary 44100 :
RIFF (little-endian) with 'fmt' marker and fmt-size = 16, WAVE audio,16 bit-Microsoft PCM, mono, 44100 Hz
Summary 48000 :
RIFF (little-endian) with 'fmt' marker and fmt-size = 16, WAVE audio,16 bit-Microsoft PCM, mono, 48000 Hz
Total 44-byte header vital details : Here Sample-rate would be different as per audio recording sampling rate
Marks of the file or chunk-id = 'RIFF'
File type header = 'WAVE'
Format chunk marker or sub chunk-id1 = 'fmt'
Length of format or sub chunk size = 16
Type of format or audio format = 1 (PCM)
Number of channels = 1 (MONO)
Sample-rate = 44100
Bits per sample = 16
Data chunk header or sub chunk-id2 = 'data'
Example log
fmt_size = 16 header_size = 550948 format = 1 channels = 1 sample rate = 44100 blocksize = 2 byte per sec = 88200 bit depth = 16 header_data_size = 550912 sample count = 275456 actual extracted samples extracted = 275456
Reference for PCM file with wav-header :
bytes variable description
0 - 3 'RIFF'/'RIFX' Little/Big-endian
4 - 7 wRiffLength length of file minus the 8 byte riff header
8 - 11 'WAVE'
12 - 15 'fmt '
16 - 19 wFmtSize length of format chunk minus 8 byte header
20 - 21 wFormatTag identifies PCM, ULAW etc
22 - 23 wChannels
24 - 27 dwSamplesPerSecond samples per second per channel
28 - 31 dwAvgBytesPerSec non-trivial for compressed formats
32 - 33 wBlockAlign basic block size
34 - 35 wBitsPerSample non-trivial for compressed formats
PCM formats then go straight to the data chunk:
36 - 39 'data'
40 - 43 dwDataLength length of data chunk minus 8 byte header
44 - (dwDataLength + 43) the data
(+ a padding byte if dwDataLength is odd)