How It Work
How It Work
How It Work
The video has already been made or it is a finished product. A client HTTP is used to download the video from an HTTP server and the user views the video at a later time. The production, transmission, and use all happen at different time.
Furthermore, real time traffic can be divided into conversational and streaming traffic. Conversational traffic, such as voice or video telephony, involves an interactive exchange of streaming data. A video conference is using a camera that is connected to a server that transmits the video information as it produced. Everything that happens at the server site can be displayed on the computer screen site. This is both multimedia (video) and real time traffic (production and use at the same time). Humans want to perceive the voice and/or video arriving instantaneously or almost instantaneously in order to carry on a normal conversation. Therefore, the end- toend delay from recording the voice and/or video on one side to playing it on the other side(s) must be very low. Studies have one-way delays of less than 150 ms as best, whereas up to even 400 ms may be tolerable in certain conditions, such as when people are physically across the globe and psychologically prepared for small delays. On the other hand, some kinds of streaming real-time multimedia, such as streaming movies from a video server, do not have as tight delay constraints for real-time traffic apply.
Figure 1.2: Video frames are transmitted as they are produced and displayed. Real-time multimedia traffic has stringent requirements on the playback of the voice and video at the receiving side. Even if a stream is broken into segments (a requirement for transmission in packet form, as is the case for multimedia over IP), the playing of the voice or video needs to be smooth. In other words, the real time relationship between when the segments are played needs to match the real time relationship between when the segments were recorded on the sending site. Therefore, variance in the time sequencing of consecutive segments of the voice or video stream (variance, that is, from the time sequencing on the recording side) can be little tolerated. Besides, each packet holds ten second of video information. But there is assumption that the transfer delay is constant and equal to one second. The receiver sees the video with the same speed as it is created, the constant delay is immaterial.
In the current best-effort Internet service model, no service guarantees with respect to packet loss, delay jitter and available bandwidth can be made. Packet loss most often occurs due to congestion in network nodes, more and more packets are dropped by routers in IP networks when congestion increases. While packet loss is one of the things that make the TCP protocol efficient and fair for non-real time applications communicating over IP networks, the effect of packet loss is a major issue for real-time applications such as streaming of audiovisual media using the RTP protocol over UDP/IP. Even delay jitter manifests itself as packet loss, since packets received after the intended playout/presentation times are not useful.
Figure 1.4: Jitter One major challenge in video compression is the transmission of video in lossy environments. A solution is to make packets transmitted in real-time multimedia environments self-contained. Thus, no packets rely on other packets in the reconstruction process. As an example H.264/ AVC defines a network abstraction layer (NAL), in addition to the video coding layer (VLC) that allows for using the same video syntax in multiple environments. In addition, the VCL consists of the core compression engine and performs all the classic signal processing tasks. It is designed to be as network independent as possible. The VCL comprises syntactical levels known as the block, macro block, and slice level. The VCL contains coding tools that enhance the error resilience of the compressed video stream.
The NAL defines the interface between the video codec itself and the outside world and adapts the bit strings generated by the VCL to various network and multiplex environments in a network friendly way. It covers all syntactical levels above the slice level and operates on NAL units, which give support for the packet-based approach of most existing networks. A NAL unit (NALU) is effectively a packet that contains an integer number of bytes. The first byte of each NAL unit is a header byte that contains an indication of the type of data in the NAL unit, and the remaining bytes contain payload data of the type indicated by the header. The payload data in the NAL unit is interleaved as necessary with emulation prevention bytes, which are bytes inserted with a specific value to prevent a particular pattern of data called a start code prefix from being accidentally generated inside the payload. The NALU structure definition specifies a generic format for use in both packet-oriented system and bitstream-oriented transport system, and a series of NALUs generated by an encoder is referred to as a NAL unit stream. Currently, three major applications for H.264/AVC may be identified by using the IP protocol as a transport: The download of complete, pre-coded video streams. Here, the bit string is transmitted as a whole, using reliable protocols such as ftp or http. There are no restrictions in terms of delay, real time encoding/decoding process and error resilience. IP-based streaming. In general, it allows the start of video playback before the whole video bit stream has been transmitted, with an initial delay of only a few seconds, in a near real-time fashion. The video stream may be either pre-recorded or a live session, in which the video stream is compressed in real-time, often with different bit rates. Conversational applications, such as videoconferencing and video-telephony. For such applications delay constraints apply significantly less than one second end-toend latency and less than 150 ms as the goal so real time encoding and decoding processes are main issues. In addition, the use of H.264/AVC coded video in wireless environments is described in. The development of efficient scalable coding schemes is motivated mainly by the possibility of adapting the encoded data to match channel/network.
Figure1.5: Transport environment of H.264/AVC For video, scalable coding was included in MPEG-4. A scalable extension of the H.264 standard has been proposed. Both of the two coding schemes are the traditional block-based hybrid type; that is, the coding is done by first identifying and compensating for motion between successive frames and then encoding the residual image using a still-image-like coder. Such coding schemes are not ideal for scalability, due to the so-called drift problem (encoder and decoder mismatch). Since the encoder uses the full-rate, full resolution encoded frames as references for its inter frame prediction, an asymmetry occurs between encoder and decoder when decoding a lower-rate or lower-resolution version of the resource. Due to this, more inherently scalable video coding schemes have been investigated. 3D sub band video coding uses, as the name suggests, a sub band decomposition in both time and space prior to quantization and subsequent processing. This can easily be combined with embedded coding schemes, resulting in an embedded bit stream that, by definition, is scalable (decoding can be stopped at any time in the bit stream). This also eliminates the drift problem. The performance comparable to (non-scalable) H.264 can be obtained with these 3D coding schemes for most sequences. Furthermore, since the generated bit stream is embedded, effective unequal error
protection (UEP) schemes can easily be applied. These schemes utilize the fact that the importance of each bit is (conceptually) monotonically decreasing as one move from the start to the end of the bit stream. Different parts of the bit stream are assigned different strength error correction codes, dependent upon the relative importance of the bits and the channel/network conditions. Recently, this concept has been extended to transmission over multiple parallel channels with possibly different characteristics.
Figure 1.6: Streaming media architecture Other than that, there is several related protocol which support the real time traffic over internet such as: RTP(Real-Time Protocol) is used for real-time data transport(extends UDP, sits between application and UDP) RTCP(Real-time control protocol) is used for exchange control information between sender and receiver, works with conjunction with RTP. SIP(Session Initiation Protocol) which provides mechanisms for establishing calls over IP. RTSP(Real-time Streaming Protocol) for allow user to control display such as rewind,pause, and others. RSVP(Resource Reservation Protocol) to add determinism to connectionless information, provides QoS to some extent.