DRC - Digital reality crew logo
DRC Forum | Register | Log in | Forum help | Search

Internet audio and video streaming tutorial for noobs

DRC Forum
» Personal Interests
    » Photo, Audio & Video Production
        » Internet audio and video streaming tutorial for noobs
            » Page 1 of 1

  Post new topicReply to topic

Author Message
|DRC| Wartex

Wednesday, July 27 2005, 08:40:54 #27796     Internet audio and video streaming tutorial for noobs

Wartex's guide to digital audio and video, media streaming and more.

To understand the following, you have to know the basics such as how motion picture works, what is network bandwidth, how computers store and display graphical data.


Short glossary:

FPS = framerate, frames per second

B = byte = 8 bits

MB = Megabyte

Mb or Mbit= megabit

KB = kilobyte

Kb or kbit = kilobit

Codec (from coder/decoder) - piece of software and/or hardware used to code/decode audio or video data.

PCM - pulse-code modulation

Bpp - bits per pixel, describes color depth or compression ratio

Bitrate - number of bits that are conveyed or processed per unit of time

Chrominance (aka chroma) - color information

Luma (don't confuse with Luminance) - brightness information

CRT - cathode ray tube


Digital vs Analog

What is the difference between analog and digital? Basically, it's all about how the data is described and transmitted from device (say, TV station) to device (your TV set). If you look very closely at your TV screen or monitor, you will notice the images are made of tiny glowing dots (I'm not talking about pixels, I'm talking about physical fluorescent dots).

Usually there are 3 colors - red, green and blue. They form into triads or blocks that repeat in a pattern. The picture is drawn by highlighting corresponding dots in a zigzag-like manner starting from top left corner, 1 row at a time, down to the bottom-right corner of the screen. The screen is divided into a logical grid of rows and columns (columns are also referred to as "lines of vertical resolution"), each element of the grid is called a "pixel". Note that in CRTs pixels might occupy more than 1 triad/block, they can even occupy parts of neighboring triads/blocks (individual dots). Picture is drawn 525 pixels in a row, 425 rows, 60 times a second. Number of rows and columns varies based on standard.

Extra info for extra brain damage: Very happy Dots are highlighted by beams of electrons ejected from 3 electron guns (for red, green and blue dots) inside the screen tube, once electrons hit the luminous dots, their [kinetic] energy is converted to light. It's those beams that "scan" the surface of the screen in a zigzag like manner. Almost like cross-stitching a picture on a sweater, one row at a time.

Extra info for anal-retentive: When electrons hit the colored dots their energy is absorbed and electrons in the fluorescent material jump to higher energy levels (aka quantum leap). Then they fall back to low energy level and the difference in energy is released as photons (aka light particles). Each dot releases photons of specific energy, which defines what color we see. All photons are the same, they just have different energy or wavelength (frequency). Energy, wavelength and frequency are all interconnected and basically are the same thing.

1. Electron guns
2. Electron beams
3. Focusing coils
4. Deflection coils
5. Anode connection
6. Mask for separating beams for red, green, and blue part of displayed image
7. Phosphor layer with red, green, and blue zones
8. Close-up of the phosphor-coated inner side of the screen

Electron gun.

In analog transmission, the data about each pixel color is transmitted in real time as the electron guns trace the screen. Pixels don't really exist for electron guns, only the physical dots. 52" TV will have more dots per pixel than a 12" TV. CRTs have circuitry inside that synchronizes all this and center and stretch the image over the screen area as necessary.

In LCD monitors, pixels occupy exactly 1 block (3 dots - red, green, blue. Some LCD monitors have 2 blue dots, making it 4 dots per block). In such case, it is said that LCD monitor is running in "native" resolution. In LCD dots are actually tiny crystals that change properties of light or emit light when electrified.

If LCD monitor was manufactured with 1024 X 768 (columns x rows) of blocks, and you try to display a 640x480 signal, it will look blurry in some areas since some pixels will try to occupy more than one block (1024 / 640 = 1.6, i.e. each pixel will be 1.6 blocks wide), by borrowing dots (aka "subpixels") from neighbouring dots, monitor circuitry will have to calculate how to highlight the half-occupied blocks to accomodate for subpixels.

Back to our sheep. So analog transmission delivers image data as it is being drawn, carrying information as to how intense a specific electon gun has to be while is passes over a specific area of the screen. In digital transmission, information about entire frame or part of the frame arrives entirely and only then it's decoded, converted to analog and being drawn on the screen.

Well, they both do the same thing, so what's the damn difference?

The difference is in quality and price to deliver the content. Each analog TV channel broadcasts the timing (sync), chroma and luma signals on it's own frequency, and the frequency range (radio spectrum) is limited (by government regulations and physical cable properties), that means you can only stuff so many channels into the airwaves or cable.

TV "channels" are effectively fixed blocks of radio spectrum, in Northern America "channel 2" refers to the broadcast band of 54 to 60 MHz, with carrier frequencies of 55.25 MHz for NTSC analog video and 59.75 MHz for analog audio.

In analog world, even when there is not much data transmitted on a specific channel, such as a test screen you see at 4 AM (color tuning chart) or a static picture, entire channel's frequency range is used and no other information can be squeezed in, thus bandwidth is "wasted". You can only have one show per channel at a time.

In digital cable world, any analog "channel" can carry video and audio of any show being broadcast, in some cases more than 2 shows per analog channel. This is accomplished by encoding each video frame of all shows into digital "packets" or chunks of information, numbering the packets so that the digital cable box know which packet belongs to which show, and sending them all at the same time on various analog channels. Some pictures will contain a lot of fine details, which will make the packets larger and some will be simple which will make the packets smaller. (You can test this by using your digital camera and taking a picture of cloudless sky and then a picture of a tree and then comparing the file sizes. The sky one will be much smaller because it takes a lot more information to describe detail in tree's leaves and branches than a plain sky color. )

The cable box will only pick out the packets that are relevant to the show you are watching, convert them back to frames and show them on the screen one by one.

The whole coding and decoding process of video is done by codecs.

Why use codecs?

Codecs are used to compress data before storage or transportation and decode upon reception/playback. For example, when video is trasmitted over digital cable or satellite broadcast, each video frame is broken up into pieces of 128x128 pixels, each 128x128 block is compressed (u can say "zipped") - CODED - and then sent to your cable/satellite box, your box "unzips" - DECODES - each piece and reassembles it back into frame. It happens very fast, however you can see effects of this process when you flip channels - blocks that quickly build the image. This is because when you switch to that channel, you catch the box in the middle of "reassembling" the image. This is not exact description as there are many algorithms for this. Why do we need codecs? Next part:

Compression - imagine a movie (say, mpeg file) that has 640 colums and 480 rows of pixels per frame. That means one frame consists of 640 * 480 = 307200 pixels. If we use 24 bit color, we need to use 3 bytes of data per pixel (1 byte = 8 bits, 24bit / 8 = 3 bytes). Means one frame will be [307200 pixels] * [3 bytes per pixel] = 921600 bytes, or 922 KB. Normal movie has 24 frames per second. 921600 bytes * 24 times a second = 22 118 400 bytes / sec - 1 second of uncompressed video requires 22 MB storage or bandwidth. You need to have a fiber optical line that can pump thru 22 MB/s (or 220 MBit, or 0.22 gigabit) to watch uncompressed video. It will cost you about $24 000/mo to have such connection.

Now, if you take a JPEG file that is 640x480, it obviously is smaller than 922 KB. So if we show a sequence of jpegs, say 25 KB a frame, we are going to need only 24 FPS * 25 KB/frame = 625 KB storage or 625 KB/s - speed you can get with cable connection. MPEG file, basically is a sequence of JPEGs. There is additional compression method where frames only contain the difference from the previous frame - you can see this in broken avi files. Such method allows to lower data rate tenfold - 65 KB/s. This is why you can stream video off the internet even with a 56 k modem - because it's encoded/compressed with a codec. The core of any codec is the algorithm, or formula that is used to compress data. This algorithm is also called a "profile" (well, "profile" = codec + tweaks) or "filter". Different codes (aka profiles aka filters) do a different job at compressing same piece of data. In general, it's a trade off between output quality and resources consumed. This means, higher output quality will require more CPU power for encoding/decoding and more bandwidth since compression ratio won't be as good as with lower output quality. The lower the quality, the lower the bandwidth and coding/decoding CPU power required, however in some instances the decoder does a lot of "restoration" and requires a lot of CPU power.

Lossy and lossless compression

If you take a BMP file and zip it, then unzip it - does it look different? No, because zip is a lossless compression. For example, GIF*, PNG or TGA format uses zip-like lossless compression. So any image saved to TGA or PNG format will retain 100% of the color and detail. Reason I put a star next to GIF is because GIF only supports 256 colors, so there will be some color loss if original image has more than 256 colors.
If you save a file as Jpeg, and then convert jpeg back to BMP, you will see it will be different from the original. This is because JPEG uses lossy compression - some detail and color information it thrown out from the file to reduce size. Say, if you have a region on your image, for example sky in the backround, it is mostly blue but if you zoom in it has a lot of noise in it. Those "noise" pixels need to be described somehow, thus requiring more storage and making the file bigger. What JPEG does is blurs those pixels into an "average" color. The difference is almost unrecognizable to you yet the detail of the image is lower because background is now solid color. If you take a 100x100 pixels image filled with 1 color and save it as 256 color bitmap it will be about 1 MB. If you zip it, it will be 300 bytes. Whoa what a compression ratio!
But if you add some detail, and zip it again, zipped file will be larger. This is because monotonous data patterns (pixels in this case) have better compression ratio than more random patterns. For example, this article is boring because it's monotonous, so you can describe all 2 kilobytes of text I typed so far as "boring". Word "boring" has only 6 bytes. But if you are interested in it, you will remember it with all the detail and it will take more than 1 word to describe it to someone else.

Bah! What does this have to do with radio? Very happy

Be patient.

Sampling rate - how many times analog signal is measured per second and saved as a digital value. For example, sampling rate of a CD audio record is 44100 Hz - when sound is recorded from a live performance onto a master CD, each second of sound is broken up into 44100 pieces where each piece is measured as a signal level (voltage on input of the sound card) and recorded as a value. Why 44100 - tests showed this is how many measurements needed for it to be "detailed" enough for human ear. In reality, professional audio is recorded at 192 KHz - 192 000 samples per second. So recorded as what value? What like volts? Or amperes? Read on.

Resolution - what value range to use per 1 signal measurement. Resolution of digital audio is measured in bits. CD audio format uses 16 bits (per channel, we have 2 channels - left and right). You can describe 65536 numbers/combinations using 16 bits. That means signal level is described with a value from 0 to 65535 where 0 is silence and 65535 is maximum loud (usually 250 millivolts). Ever watched how a driver (speaker) cone moves when it's reproducing sound? Imagine when value is 0 speaker is in rest position and with 65535 it's extruded to the maximum. The higher the resolution, more precisely you can describe the movement of the speaker. If your resolution was say, 2 bits, you could only describe 4 signal levels:


If you can only have 4 speaker cone positions, you can do some buzzing sound at best. PC speaker (built in, the one you connect to mobo) uses 6 bits, this is why it sounds like shit. Old cellphones use 12 bits. DVD uses 24 bits - 16 millions of speaker cone positions - very precise - hence high quality.

Now let's put this all together:

Audio CD has 2 channels - left and right. When recorded from analog source, each channel's sound is measured 44100 times a second and signal level at each measurement is recorded with 16 bit resolution - 2 bytes a measurement. [2 channels] * [2 bytes] * [44100 samples] = 176400 bytes/sec. That means 1 second of CD audio takes 176 kilobytes of space. How much would a 5 minute song be? 5 mins = 300 seconds. 300 * 176400 = 52 920 000 bytes or 53 megabytes.
Now go and rip a 5 minute song into a WAV file and see how big it is. It will be precisely 53 MB. CD and WAV is the pretty much same format, and bad part about it - it's "raw", meaning it contains 100% of information that was recorded thru the microphone when the original recording was created, everything microphone heard is measured and stored, even very quiet sounds or sounds of very high frequency that we can't hear.

So how can we stuff 3 hours of mp3 onto a 80 minute CD? Next step - psychoacoustic compression.

When CD is recorded, signal value is measured and recorded no matter what. If you are in a recording studio, recording is on, but you are not singing yet and there is complete silence around, mic is still picking up your breath, dust crackling under your shoes, air movement noise and all other sounds you can't really hear are still recorded - and take up space. Now you start singing and nodding your head to keep the rhythm. You think mic is only recording your voice, but it also records the sounds of your hair, t-shirt, shoes, air whistling in your trachea and so on. Bunch of redundant information. If there is music overlayed, louder sounds of your voice overlap quieter sounds of music, and drums overlap your voice. You can only hear the louder one (well, you hear all of them but your brain perceives only some), but hardware is recording a mixture of ALL sounds. Now take the 50 MB WAV file and zip it. What, disappointed? Only 47 MB - not much space saved. This is because zip is a lossless compression - all the "inaudible/inperceivable" detail is still there in the WAV file.

Now we are going to take an MP3 codec and encode our file into mp3, that will remove all the inaudible detail from our WAV file just like JPEG removes invisible "noise" - it's called psychoacoustic compression, making the file much smaller. How much smaller? Read on.

Codec in itself is a formula, a software program that processes data using a formula. Formula that describes what information to keep and what to throw away and how much. Audio codecs, such as MP3 codecs, have 1 main parameter - bitrate. As I said above, WAV file or a CD is a "raw" recording, having 176 KB/s or 1460 kbit/s bitrate (176 * 8 bits). So say, if we encode a WAV 1460 kbit/s file into a 128 kbit/s (you can say "mp3 with 128 kbit/s bitrate") mp3, that means encoder will throw away ~90% of data - sounds that are there but that we cannot hear. But no one is perfect. Unfortunately it will throw away some audible sounds too, mostly high frequencies - hiss, cymbals etc. The lower the bitrate of output mp3 file - more sounds encoder will remove from the source WAV - the lower the quality. WAV/CD encoded into 192 or more kbit/s mp3 in general have no audible sounds removed, and 99% of people can not tell the difference between a "raw" 50 MB WAV and compressed to 10 MB 256 KBit/s mp3. You get 5x compression ratio and no audible difference! Now you can stuff 5 CDs of music onto 1 CD!

So wait a second, why do we need RM (real media), WMA (windows media audio), AAC (DVD sound), AC3 (minidisk) formats, they all do the same thing?

Well, one reason is some are better than others at sorting out what sounds to remove during compression, some are faster, and some are developed just to be incompatible so companies can cash in, for example you can technically save mp3 files on minidiscs, but Sony doesn't want you to, they want you to use AC3 aka ATRAC format instead (ac3 actually sounds better than mp3 with the same bitrate).

Cool, so lets just encode all files into ATRAC if it sounds better! And make ATRAC players instead of mp3 players!

This is where patents come into play. Companies who develop algorithms like mp3, ac3, aac want you to pay them royalties for using the _algorithm_ regardless how it is implemented - whether it's a CD ripping program, mp3 player like winamp or iPod. For every mp3 decoder chip Apple puts in the iPods they pay Thomson (company who owns patent on mp3 compression algorithm) $2.50 for _algorithm_ and another $10 for the chip itself to Texas Instruments. For every Winamp Pro player sold, Nullsoft chips out $0.75 to Thomson. For every minidisk player Aiwa makes they pay out $3 to Sony for the ATRAC decoder algorithm, plus the price of the chip. Chips themselves can be made by any company. For every cable box StarChoice makes, they pay something to a patent holder for a video/audio compression algorithm AND the price of the hardware.

This is where Ogg Vorbis comes into play. It is a 100% free, patent free algorithm, WAY BETTER than mp3. 96 kbps ogg file sounds like 160 kbit mp3 file. This means it has higher compression ratio and better quality.

I will stream our radio with lo-fi (48 kbps ogg), hi-fi (64 kbps ogg) and Premium 128+ kbit rates. 48 for people like Slowhand, with weak lines, 64 for general public and 128 - for money, with on-demand streaming (you can listen same song again, or any other song if you want at any time). Note that 128 kbps ogg is near "raw" CD quality. You can actually save a stream to your hard drive and later burn it onto a CD, and have a decent quality CD!

Here are some ideas:

1) At first we will not use any paid hosting, we are already paying for lines we hardly ever load 100%. This unused bandwidth can be used for streaming radio. I was playing quake today, and Elusive, CornBread and Zenex were streaming from me at 37 KB/s and I wasn't lagging. I already developed infrastructure for distributed network.

2) If you have an old computer at home that you have no use for, we can use it as a "broadcast node". It does not need to have a monitor, keyboard or mouse. It has to run 24/7 and has to be connected to the internet via a router or a switch (no hubs). It will encode files from playlist into an ogg stream and then forward it to "distribution nodes" - our computers. It will have a constant upstream of 15-25 KB/s. If you have cable internet, it will not affect your gaming or downloads in any way.

3) I developed a "failover" mechanism, this means if one of the distribution nodes goes out (you turn off your computer) listeners will be rolled over to remaining nodes. If one of the broadcast nodes fails (say winamp crashed), other one will pick up.

More to come.

Last edited by |DRC| Wartex on 21:08, Saturday Aug 15 2009; edited 8 times in total
Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ
Sponsored Ad

Today #     Sponsored Ads

|DRC| Wartex

Wednesday, July 27 2005, 16:14:20 #27806     

Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ

Wednesday, July 27 2005, 16:48:29 #27807     

considered vbr? does ogg ahve vbr option?
Profile | Send PM | AIM | MSN | SKYPE

Wednesday, July 27 2005, 16:55:37 #27808     

damn....that should be published somewhere. Do you think in binary? Again, a display of superior intelligence and reasoning.

I am into the distributed computing thing, I have 3 pc's here all folding@home. My pc is always on, and so is my cable. 512k upload (I think), so you can put me on the list...but don't ask me to write code. I can just barely get by with Quake scripts Blushing
Profile | Send PM | MSN | ICQ

Wednesday, July 27 2005, 17:06:54 #27809     

Great article. Good read, and the distributed network idea is a plus Good Good Good Good
graphical sigs are bad, mmkaythxkbye - the management
Profile | Send PM | | AIM | YIM | MSN | SKYPE | ICQ

Wednesday, July 27 2005, 18:55:32 #27811     

Good research. Good

I'm down with distributed computing rather than paying for a line. I have an old 400Mhz celeron with everything needed to be added to my network, with the exception of a hd. I know I've gotta have one around here somewhere though. I'll see what I can find or maybe salvage from another pc.
Profile | Send PM | AIM | YIM | SKYPE

Wednesday, July 27 2005, 20:24:26 #27816     

Yeah thanks for the research wartex. You know I have a 2nd pc waiting for the go-ahead. Good
46 and 2
Profile | Send PM | MSN | SKYPE
|DRC| Wartex

Wednesday, July 27 2005, 22:20:42 #27817     

|DRC| SLIM wrote:
Good research. Good

I'm down with distributed computing rather than paying for a line. I have an old 400Mhz celeron with everything needed to be added to my network, with the exception of a hd. I know I've gotta have one around here somewhere though. I'll see what I can find or maybe salvage from another pc.

How much ram do you have? If you have 256 or more, we can stuff linux into 1 floppy, and then create a RAM disk and load all mp3's into the ram disk. No HDD required. Very happy
Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ
|DRC| Wartex

Wednesday, July 27 2005, 22:32:32 #27818     

Yes, and to underline my intelligence and accent my level of maturity, it may be interesting for you to know, I farted about 85 times when I wrote this article because I ate 2 pounds of broccoli. At some point my cats left the room to sit near the balcony.
Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ

Thursday, July 28 2005, 02:24:18 #27822     

It has 384MB of ram, but I did find a 30gig 5400rpm hd and am currently installing windows xp pro on it.. but i will put linux on it if you like. Your call.

btw you're a sick bastard. Very happy
Profile | Send PM | AIM | YIM | SKYPE
|DRC| Blackshark

Thursday, July 28 2005, 03:46:11 #27827     

pretty nice idea and concept wartex
Profile | Send PM | MSN | SKYPE | ICQ

Saturday, July 30 2005, 07:08:11 #27855     

She's ready to go. What now?
Profile | Send PM | AIM | YIM | SKYPE

Saturday, July 30 2005, 20:52:26 #27864     

do you count the fart as a binary calculation too? does it affect bandwidth somewhere?
Profile | Send PM | MSN | ICQ

Sunday, July 31 2005, 05:37:22 #27869     

zenex wrote:
do you count the fart as a binary calculation too? does it affect bandwidth somewhere?

Every action has an equal and opposite reaction right. Smile
Profile | Send PM | AIM | YIM | SKYPE
|DRC| Wartex

Monday, August 1 2005, 17:51:24 #27891     

Installl VNC on it. The "free" edition.
Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ
|DRC| Wartex

Saturday, August 15 2009, 21:12:05 #42439     


i gotta dig up and finish all my articles
Profile | Send PM | WWW | AIM | YIM | MSN | SKYPE | ICQ
Display posts from previous:   

Jump to 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Optimized with phpBB SEO
Questions? Contact us at fubar_3-84-130-252@wartex.net