OPUS

OPUS

opus๋Š” Xigh ์žฌ๋‹จ์—์„œ ๊ฐœ๋ฐœํ•˜๊ณ  IETF์—์„œ ํ‘œ์ค€ํ™”๋œ ์ฝ”๋ฑ์ด๋ฉฐ ์ŠคํŽ™์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

sampling rate

8 kHz ~ 48 kHz

bitrate

6 kbps ~ 510 kbps

delay (algorithmic)

2.5 ms - 60 ms

number of channels

mono/stereo (ํ˜„์žฌ๋Š” 255๊นŒ์ง€ ์ œ)

opus์˜ ๊ฐ€์žฅ ํฐ ํŠน์ง•์€ low-bitrate์™€ low-latency๋ฅผ ๊ผฝ์„ ์ˆ˜ ์žˆ๋‹ค. OPUS๋Š” codec์€ ํฌ๊ฒŒ ๋‘๊ฐœ์˜ ๋ชจ๋“ˆ๋กœ ๋‚˜๋ˆ„์–ด ์ง€๋Š”๋ฐ ์ฝ”๋ฑ์˜ ์ผ๋ฐ˜์ ์ธ conceptual coding์„ ๋”ฐ๋ฅด๋Š” CELT (Constrained Energy Lapped Transform)์™€ LPC (Linear Prediction Coding)์„ ์ด์šฉํ•˜์—ฌ ์••์ถ•ํ•˜๋Š” SILK๋กœ ๋‚˜๋‰œ๋‹ค. OPUS๋Š” SILK-only/hybrid/CELT-only ๋ชจ๋“œ๋กœ ๋‚˜๋‰˜๋ฉฐ, ๊ฐ๊ฐ์˜ ๊ฒฝ์šฐ์˜ bandwidth์ด ๋‹ค๋ฅด๋‹ค.

CELT

SILK

SILK๋Š” speech signal์„ ํƒ€๊ฒŸ์œผ Skype์—์„œ ๊ฐœ๋ฐœํ•œ ์˜ค๋””์˜ค ์ฝ”๋ฑ์œผ๋กœ LPC์— ๊ธฐ๋ฐ˜ํ•œ๋‹ค. Pic 1.์—์„œ ๋ณด์ด๋Š” Frequency response์˜ ๋นจ๊ฐ„์ƒ‰ ์„ ์€ ์‚ฌ๋žŒ์˜ ์Œ์„ฑ ์‹ ํ˜ธ์— STFT (Short-time Fourier Transform)๋ฅผ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ์ด๊ณ , ํŒŒ๋ž€์ƒ‰ ์„ ์€ STFT๋ฅผ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ์— LPC๋ฅผ ํ•œ๋ฒˆ ๋” ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ ์ด๋‹ค. ์‚ฌ๋žŒ ์Œ์„ฑ์˜ Frequency response๋ฅผ ๋ณธ๋‹ค๋ฉด ๊ฐ„๊ฒฉ์ด ์ข์€ fluctuation๋„ ๋ณด์ด์ง€๋งŒ ์ „์ฒด์ ์ธ envelope์„ ์‚ดํŽด๋ณธ๋‹ค ํ•ด๋„ local maxima์™€ local minima๊ฐ€ ์กด์žฌํ•œ๋‹ค. ์ด๋Ÿฐ ์Œ์„ฑ ์‹ ํ˜ธ์˜ envelope์ด ์ƒ๊ธฐ๋Š” ์›์ธ์€ ์„ฑ๋Œ€์—์„œ ์šธ๋ฆฐ ์†Œ๋ฆฌ๊ฐ€ vocal track์„ ์ง€๋‚˜๋Š” ๋™์•ˆ ๊ตฌ๊ฐ•์ด๋‚˜ ๊ธฐ๋„ ๋“ฑ์˜ ๋ฌผ๋ฆฌ์ ์ธ ๊ตฌ์กฐ์—์˜ํ•ด ๊ณต๋ช…์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ์Œํ–ฅํ•™์—์„œ๋Š” envelope์˜ ๊ฐ๊ฐ์˜ peak๋ฅผ formant๋ผ ํ•˜๋ฉฐ , formants ๋Š” ๋ฐœ์Œ์„ ๊ตฌ๋ณ„ํ•˜๋Š”๋ฐ ์ฃผ์š”ํ•œ ์š”์†Œ์ด๋‹ค.

Pic 1. Frequecy respose of speech signal [3]

LPC์˜ ๊ฐœ๋…๋งŒ ์„ค๋ช…ํ•˜๊ณ  ๋„˜์–ด๊ฐ€๋ คํ•œ๋‹ค. LPC๋ž€ Linear system์„ ๊ฐ€์ •ํ•˜์—ฌ ํŠน์ • frequency response๋ฅผ ํ•ด๋‹น ์‹œ์Šคํ…œ์— ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค. Formants๋ฅผ ์ฐพ๋Š” ๊ฒฝ์šฐ์—๋Š” ์‚ฌ๋žŒ์˜ vocal track์„ n๊ฐœ์˜ pole์„ ๊ฐ–๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹œ์Šคํ…œ์œผ๋กœ ๊ฐ€์ •ํ•œ ๋’ค, error๋ฅผ minimizeํ•˜์—ฌ ์ƒ์ˆ˜๋ฅผ ๊ตฌํ•œ๋‹ค.

s[n]=a1s[nโˆ’1]+a2s[nโˆ’2]+...++aks[nโˆ’k]+e[n]s[n] = a_{1}s[n-1] + a_{2}s[n-2] + ... + + a_{k}s[n-k] + e[n]

์ด๋ ‡๊ฒŒ ๊ตฌํ•ด์ง„ ์ƒ์ˆ˜๋ฅผ ๋ณด๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์•„์ฃผ ์ ์€ bit๋ฅผ ๊ฐ€์ง€๊ณ  formants์ •๋ณด๋ฅผ ๋ณด๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‚˜๋จธ์ง€ ์‹ ํ˜ธ๋ฅผ quantization ํ•˜์—ฌ์„œ ๋ณด๋‚ด๋Š” ๊ฒƒ์œผ๋กœ SILK์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์™„์„ฑ๋œ๋‹ค.

  • [1] This paper was accepted for publication at the 135th AES Convention. This version of the paper is from the authors and not from the AES

  • [3] http://amateurselectronics.blogspot.com/2013/07/simple-speech-recognition-system-using.html

Last updated

Was this helpful?