DRM protected dynamic adaptive HTTP streaming

6
DRM Protected Dynamic Adaptive HTTP Streaming Frank Hartung [email protected] Sinan Kesici Ericsson GmbH Research Multimedia Technologies, Ericsson Allee 1, 52134 Herzogenrath [email protected] Daniel Catrein [email protected] ABSTRACT Dynamic adaptive HTTP streaming (DASH) is a new concept for video streaming using consecutive downloads of short video segments. 3GPP has developed the basic DASH standard which is further extended by the Open IPTV Forum (OIPF) and MPEG. In all versions available to date, only very simple content protection use cases are enabled. Extensions are needed to enable important advanced use cases like pay-per-view and license change in an ongoing video channel. In this publication, we analyze what is missing in the current DASH standards with regards to content protection, and propose changes and extensions to DASH in order to enable the application of DRM. This includes changes to the Media Presentation Description (MPD), and the file format. With a suitable key and license structure used together with DASH, even complex use cases like pay-per- maximum-quality are possible. Besides the analysis of required changes to DASH for content protection, and the description of suitable key and license structures applied to DASH, we also present a proof-of-concept implementation of the proposed concepts. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Systems and Software - distributed systems, information networks; J.7 [Computers in Other Systems] General Terms: Security, Standardization Keywords: Adaptive HTTP streaming, content protection, digital rights management, encryption 1. INTRODUCTION Digital video has become so popular that it nowadays constitutes the majority of Internet traffic. Different protocols and principles for video transport have been developed and are in use. Previously, the idea was widely accepted that streaming video, in contrast to e.g. file transfer, can cope with packet losses, and that lossless transmission should be avoided in order to keep transmission delays due to re-transmissions low. The transport mechanism that was developed in that spirit is based on RTP transport and RTSP control commands. An example for an end-to-end streaming system that builds on RTP is the 3GPP Packet-switched Streaming Standard [1], PSS, which is implemented in virtually all mobile phones sold today. A drawback of RTP based streaming is however that special streaming servers are needed that support the RTP stack and RTSP based control. Recently, the idea of using off-the-shelf and possibly cloud based web servers for video delivery gained popularity. This implies the use of HTTP/TCP, instead of RTP/UDP, based transport. The requirements of trick play control and adaptivity directly lead to the concept of segmentation into small video segments which are concatenated for playback. The family of methods that use these ideas are called “dynamic adaptive HTTP streaming” (DASH) methods. DASH methods have other advantages too: DASH provides reliability through the use of TCP; content can be delivered through firewalls without problems; and DASH is congestion-controlled and can adapt to it. The first representatives of the DASH family were proprietary schemes proposed by Apple [7] and Microsoft [8]. Meanwhile, standards bodies have also developed standards for adaptive HTTP streaming. Interestingly, like in the case of RTP-based PSS streaming, again 3GPP was the pioneer, with their adaptive HTTP streaming specification described in [1]. The Open IPTV Forum (OIPF) adopted the 3GPP solution as baseline and extended it, most notably with support for MPEG-2 transport stream transport encoding [2]. Meanwhile, MPEG and IETF are also working on adaptive HTTP streaming standards, partly based on 3GPP DASH [1] as starting point. If DASH methods shall be used for commercial high-definition video content, they must satisfy the relevant technical and commercial requirements. A major requirement of video content providers and the media industry is the support for protection, that means for encryption and key management, together also commonly called Digital Rights Management (DRM). In this paper, we investigate concepts and needed extensions for the application of DRM protection to DASH. This paper is organized as follows: first, we briefly review the concepts of DASH and DRM systems. Then, we discuss and identify gaps in the standards that are missing for connecting DRM and DASH. We explain how the gaps can and should be filled to get a functional end-to-end system for DRM protected DASH. Finally, we present a proof-of-concept implementation that we developed to demonstrate the feasibility of protected DASH. We used Marlin DRM, but this should be regarded as an example; the use of other DRMs is equally possible. In the following, when talking about DASH, we refer to 3GPP DASH [1], unless otherwise expressed. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMSys’11, February 23–25, 2011, San Jose, California, USA. Copyright 2011 ACM 978-1-4503-0517-4/11/02...$10.00. 277

Transcript of DRM protected dynamic adaptive HTTP streaming

DRM Protected Dynamic Adaptive HTTP Streaming

Frank Hartung

[email protected]

Sinan Kesici Ericsson GmbH

Research Multimedia Technologies, Ericsson Allee 1, 52134 Herzogenrath

[email protected]

Daniel Catrein

[email protected]

ABSTRACT Dynamic adaptive HTTP streaming (DASH) is a new concept for video streaming using consecutive downloads of short video segments. 3GPP has developed the basic DASH standard which is further extended by the Open IPTV Forum (OIPF) and MPEG. In all versions available to date, only very simple content protection use cases are enabled. Extensions are needed to enable important advanced use cases like pay-per-view and license change in an ongoing video channel.

In this publication, we analyze what is missing in the current DASH standards with regards to content protection, and propose changes and extensions to DASH in order to enable the application of DRM. This includes changes to the Media Presentation Description (MPD), and the file format. With a suitable key and license structure used together with DASH, even complex use cases like pay-per-maximum-quality are possible.

Besides the analysis of required changes to DASH for content protection, and the description of suitable key and license structures applied to DASH, we also present a proof-of-concept implementation of the proposed concepts.

Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Systems and Software - distributed systems, information networks; J.7 [Computers in Other Systems]

General Terms: Security, Standardization

Keywords: Adaptive HTTP streaming, content protection, digital rights management, encryption 1. INTRODUCTION Digital video has become so popular that it nowadays constitutes the majority of Internet traffic. Different protocols and principles for video transport have been developed and are in use. Previously, the idea was widely accepted that streaming video, in contrast to e.g. file transfer, can cope with packet losses, and that lossless transmission should be avoided in order to keep transmission delays due to re-transmissions low. The transport mechanism that was developed in that spirit is based on RTP transport and RTSP control

commands. An example for an end-to-end streaming system that builds on RTP is the 3GPP Packet-switched Streaming Standard [1], PSS, which is implemented in virtually all mobile phones sold today. A drawback of RTP based streaming is however that special streaming servers are needed that support the RTP stack and RTSP based control. Recently, the idea of using off-the-shelf and possibly cloud based web servers for video delivery gained popularity. This implies the use of HTTP/TCP, instead of RTP/UDP, based transport. The requirements of trick play control and adaptivity directly lead to the concept of segmentation into small video segments which are concatenated for playback. The family of methods that use these ideas are called “dynamic adaptive HTTP streaming” (DASH) methods. DASH methods have other advantages too: DASH provides reliability through the use of TCP; content can be delivered through firewalls without problems; and DASH is congestion-controlled and can adapt to it. The first representatives of the DASH family were proprietary schemes proposed by Apple [7] and Microsoft [8]. Meanwhile, standards bodies have also developed standards for adaptive HTTP streaming. Interestingly, like in the case of RTP-based PSS streaming, again 3GPP was the pioneer, with their adaptive HTTP streaming specification described in [1]. The Open IPTV Forum (OIPF) adopted the 3GPP solution as baseline and extended it, most notably with support for MPEG-2 transport stream transport encoding [2]. Meanwhile, MPEG and IETF are also working on adaptive HTTP streaming standards, partly based on 3GPP DASH [1] as starting point.

If DASH methods shall be used for commercial high-definition video content, they must satisfy the relevant technical and commercial requirements. A major requirement of video content providers and the media industry is the support for protection, that means for encryption and key management, together also commonly called Digital Rights Management (DRM). In this paper, we investigate concepts and needed extensions for the application of DRM protection to DASH.

This paper is organized as follows: first, we briefly review the concepts of DASH and DRM systems. Then, we discuss and identify gaps in the standards that are missing for connecting DRM and DASH. We explain how the gaps can and should be filled to get a functional end-to-end system for DRM protected DASH. Finally, we present a proof-of-concept implementation that we developed to demonstrate the feasibility of protected DASH. We used Marlin DRM, but this should be regarded as an example; the use of other DRMs is equally possible.

In the following, when talking about DASH, we refer to 3GPP DASH [1], unless otherwise expressed.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMSys’11, February 23–25, 2011, San Jose, California, USA. Copyright 2011 ACM 978-1-4503-0517-4/11/02...$10.00.

277

2. BACKGROUND 2.1 Dynamic Adaptive HTTP Streaming (DASH) The DASH standard is composed of two main parts. One part defines the Media Presentation Description (MPD) that is used by the server to describe, and by the client to access content using HTTP requests. The other part defines the format of the media segments as extensions to the 3GPP File Format (3GP).

The purpose of the MPD is to give location and timing information to the client to fetch and playback the media segments of a particular content. The MPD syntax is defined in XML. Typically the MPD file is fetched using HTTP at the start of the streaming session.

Figure 1 Media Presentation Description (MPD) layout

The MPD consists of three major components, namely Periods, Representations and Segments (Fig. 1). Period elements are the outermost part of the MPD. Periods are typically larger pieces of media that are played out sequentially. Inside a period, multiple different encodings of the content may occur, called representations. These alternative representations can have, for example, different bitrates, frame rates or video resolutions. Finally, each representation describes a series of segments by HTTP URLs. Those URLs are either explicitly described in the representation (similar to a playlist) or described through a template construction, which allows the client to derive a valid URL for each segment of a representation. The MPD format is flexible and can support other media container formats such as MPEG-2 TS.

The 3GP file format is based on the ISO base media file format. A 3GP segment is either an initialization segment or a media segment. An initialization segment contains configuration data (formatted as so-called ‘ftyp’ and ‘moov’ boxes of the file format), whereas a media segment is a concatenation of one or more movie fragments of media pointers and samples (‘moof’ and ‘mdat’ boxes). Concatenation of the initialization segment and one or more media segments of the same representation results in a valid 3GP file (Fig. 2).

The 3GP file format was extended for the specific HTTP streaming requirements.

The optional Segment Index box (‘sidx’) helps a client to seek and switch in large or overlapping media segments by locating random access points and parts of a media segment suitable for partial download. It also provides absolute timing information for time recovery after seeking. Another extension is the Segment Type box (‘styp’) which includes brand (i.e., type) information for media segments and enables compatible usage between standards, for segments that comply with multiple DASH standards.

Figure 2 3GP based HTTP streaming segments

Media segments are identical to all users; adaptivity is obtained simply by switching between segments of alternative representations. This property makes DASH HTTP cache and Content Delivery Network (CDN) friendly. The media segments, uniquely identified by their URLs can be served from intermediate HTTP Proxy/Caches in the same way as any other web content.

The Open IPTV Forum has extended DASH to also be usable with MPEG-2 Transport streams (MPEG-2 TS). The MPD indicates through MIME Types that the format of the media segments is MPEG2-TS. Only restrictions, but no extensions, on the MPEG2-TS format have been defined in OIPF. This makes it possible to create a compliant MPEG2-TS stream by concatenating the media segments fetched by the client. The Program Specific Information (PSI) tables may be either contained in an initialization segment or in the media segments.

2.2 Digital Rights Management (DRM) Digital Rights Management denotes technologies that shall prevent unauthorized use and duplication of digital media. Conceptually, this is achieved using specification and expression of usage permissions pertaining to the data, and ensuring that the data is only rendered in accordance with those permissions. Technically, this requires encryption of the data. The key used for decryption, called content key, is itself encrypted and bundled with the permissions, into a “license” or “rights object” (Fig. 3).

Figure 3 Basic DRM principle: license and encrypted content together give content access

Encrypted data and the corresponding licenses are typically associated to each other using unique content identifiers. The hardest problem of DRM is ensuring that only the intended single receiver can access and use the license, and thus the content key, and thus the content. This is typically achieved by encrypting the license with a key that is only known to the sender and the intended receiver. This can conveniently be done by deploying a public-key infrastructure and surrounding trust ecosystem. Thus, it is ensured that only the authorized and trusted device can decrypt the license, thus get access to the content key, and thus get access to the content. As far as it is publicly known, most of the widely used DRM systems in use today follow the described principles, for example Open Mobile Alliance (OMA) DRM [19], Marlin DRM [5][6], or

278

Microsoft Playready DRM. For a more in-depth introduction to DRM please refer to [20].

3. DRM PROTECTED DASH In the previous section, we briefly reviewed plain DASH, without protection. Now we discuss extensions for the support of DRM protection.

3.1 Requirements First, we need to be clear what we want to achieve. The main goal is to enable encryption of DASH video and to signal the required information for license and key acquisition to the receiver. This shall work with any DRM system, e.g. OMA DRM 2.1 [19], or Marlin Broadband (Marlin BB) [5][6], and not be DRM system specific. It shall also be possible to use different DRM systems for key management in parallel. It shall be possible to use MP4 file based or MPEG-2 TS based DASH.

For television or IPTV use cases, we adopt the concept of video channels, as known from classical TV channels. In conjunction with DASH, each channel is available in multiple representations. It is desirable to be optionally able to bundle video channels into a channel group (often also called channel bouquet), that can then be purchased as an entity and be accessed with one common license. Also, it shall be optionally possible to have pay-per-view (PPV) parts within a channel, in other words be able to use a different key for certain parts of a video channel.

It shall further be possible to grant access to individual representations, i.e., quality levels. This is called pay-per-quality. An additional optional requirement is that access to a certain quality/bit-rate, i.e. representation, of a video also enables access to lower representations, but not to higher representations. This allows for different subscriptions to video services at different (maximum) quality representations, in other words, pay-per-maximum-quality.

3.2 Gap analysis: what is missing in the current standards Both 3GPP and OIPF have already included some provisions for content protection or DRM, but only in a rudimentary way.

In 3GPP DASH [1], some fields in the MPD are defined that can carry basic content protection information. This is restricted to the specification of one or more content protection systems in the form of a schemeIdUri, per representation. The choice of DRM systems is left open, however, if OMA DRM is used, the used file format must be the OMA non-streamable Packetized DRM Content Format (PDCF), which is a special version of the ISO file format with additional OMA DRM specific boxes for OMA DRM parameters and metadata. Besides the fact that no DRM specific content identifiers are included, as they are typically used to associate content and license, no segment-specific DRM information can be conveyed. The file format used in [1] is the 3GP file format [17], except, as said, if OMA DRM is used.

In OIPF DASH [2], it is clarified that the schemeIdUri shall carry the same type of DRM system identifier used elsewhere in the OIPF specification. For MPEG-2 TS, [2] supports the Marlin Broadband Transport Stream (BBTS) format [16], which is an extension of the original MPEG-2 TS specification [18] with some added crypto metadata. It is required that a file that concatenates initialization segment and a set of media segments is a BBTS compliant file. This may be achieved by using the same crypto-period boundaries and

keys across different representations. The TS carries DRM related metadata. General metadata e.g. indicating the used protection system and crypto parameters are included in the Program Map Table (PMT) containing conditional access (CA) descriptors and the Conditional Access Table (CAT). Multiplexed with the media elementary streams are key streams. So called Entitlement Management Message (EMM) streams contain long-lived keys; so-called Entitlement Control Message (ECM) streams contain short-lived keys. The DRM metadata in relation to a certain elementary stream are delivered as part of either the initialization segment or the media segments that carry the samples of the elementary stream. The ECM stream of a protection system in relation to a certain elementary stream has the same packet identifier (PID) in all segments in which it is included. [2] supports two protected MP4 file formats: OMArlin PDCF [14], which is an extension of OMA DRM PDCF [19], and the so-called Marlin MIPMP format [15]. It is required that a file that concatenates initialization segment and arbitrary media segments of any complete representation or the set of partial representations are stored as either a PDCF compliant file or MIPMP compliant file. Additional DRM metadata, besides the metadata in the MPD, can be stored in the file, specifically in the initialization segment.

Thus, it is possible with the existing standard to encrypt DASH segments; for example by using the encrypted OMA PDCF flavor of the ISO file format, or by using the encrypted Marlin BBTS flavor of the MPEG-2 TS format. However, in both cases, since adequate signaling in the MPD is missing, the client will only know after downloading and parsing of the segments whether it has a suitable license and is able to decrypt them.

In order to enable all requirements outlined above, some components need to be added.

First of all, a more sophisticated MPD signaling is required. A content identifier, DRMContentID. and a URL to the license server are needed in the MPD in order to allow checking for availability of licenses, and acquisition of missing licenses, even before the initialization segment is downloaded. Further, the granularity of these content identifiers should be per segment, not per representation. Another missing attribute is a pay-per-view (PPV) indicator which is not technically mandatory, but would allow signaling of PPV content to the end user.

Secondly, the key and license hierarchy should enable subscription to channel groups and PPV services, and quality-dependent access to media. However, this can be regarded to be “on top of” DASH, as it superimposes a license structure, but uses DASH as it is.

In order to enable PPV and fine-grained access, and enhance security in general, we introduce (possibly) separate keys per segment. This is similar to short-term keys as known in TV encryption.

We describe these extensions in more detail below.

3.3 Proposed Extensions to DASH for Content Protection In order to fulfill the requirements and use cases outlined above, we propose the following additions to the DASH standard, and the following way of structuring keys and licenses. Please note that both parts are independent from each other.

3.3.1 MPD extensions We propose to place a child element ContentProtection with additional attributes into each MPD, on representation or

279

alternatively on segment level. The first additional attribute is the rights issuer URL, RIURL, which gives an absolute URL to the license server or rights issuer of the content. This URL can be used to request a license from license server, using the protocol defined for the respective DRM system (please note the DRM system is also signaled in the MPD). The ContentProtection element further includes a DRMContentID attribute to allow license matching before data streaming. A PPV attribute is also placed into ContentProtection to indicate whether the user needs special per-view licenses for a program. The necessity of the PPV attribute will be explained below in section 3.3.2. Further, an extension mechanism holding DRM system specific metadata should be added. In our case, this was not necessary, so we did not implement such an extension.

The proposed MPD extensions allow early license acquisition, and thus help avoiding delays when starting the DASH streaming session. The merit of the PPV attribute is mainly enhanced usability, as pay-per-view media can be signaled to the user.

3.3.2 Architecture of Key Hierarchy For TV delivery, the use of a key hierarchy is an old and known concept. A long-lived key (with a lifetime of e.g. a day) is used to encrypt short-lived keys (with a lifetime of e.g. 10 seconds) which are used to encrypt the data. Initially it seems redundant to encrypt an encryption key; the idea behind it is however to a.) change the content key frequently to make its sharing difficult, b.) disallow access to the broadcasted content key, c.) being able to give access to the broadcasted content key to individual users by issuing targeted individual keys giving access to the content keys. In that spirit, ITU Recommendation 810 [12] proposed in 1992 a three-level key hierarchy. The keys are called Control Word (CW), Authorization Key (AK) and Distribution Key (DK) respectively, from upper level to lower level. The CW is used as short term key to scramble the content, the AK is used to encrypt CWs, and the DK is used to transmit AK in a secure way. The DK is common to all users. The scheme is however not used anymore, because it is vulnerable to piracy: once the DK is compromised, the service provider has to change all security modules. In 1996, Lee et al. [10], in 1999 Tu et al. [9] and in 2004, Huang et al. [11] proposed advanced four level key hierarchies which solved the above problem. The group oriented key distribution scheme of Huang et al. builds different channel groups. Each channel group is a subgroup of another channel group so that if a user registers for any channel group, he or she can access the subgroups as well. We used thus concept as a basis for our proposed key and license system for DASH.

In this section, we propose an efficient key distribution scheme based on a three-level key hierarchy which makes the use cases mentioned in section 3.1 possible for DASH and can be combined with any DRM system. We call the keys in the respective layers Segment Key (SK), Representation Key (RK) and Channel Group Key (CGK). The key for each level is used to encrypt/decrypt the keys for the previous level.

Figure 4 shows an example of a channel group. The channel group is composed of multiple channels with three quality representations. Each representation has one initial segment and multiple media segments. SK is the short term key and is used to encrypt/decrypt segments. To increase the security, each segment is encrypted with a different SK.

RK is used to encrypt/decrypt SK. Each representation of any channel has a unique RK.

Figure 4 Example channel group

The service provider encrypts the SK using the RK and than transmits the encrypted SK within the segments as:

{SKi j k l }RK i j k ---> Segment i j k l

where i is the channel group index, j is the channel index, k is the representation index and l is the segment index. That means the lth segment in kth representation of channel j of channel group group i is encrypted with the RK of this representation and embedded to the lth

segment.

The RKs are updated per program because the service provider may offer PPV services where the RKs are delivered to the users within a license.

Each channel group has different CGKs per representation in order to encrypt/decrypt the RK of the corresponding representation. The encrypted RK is transmitted within the initial segment of the same representation as:

{RKi j k }CGK i k ---> InitialSegment i j k

and the CGK is distributed to the users in a DRM license which needs to be acquired according to the license acquisition process and protocol for the used DRM system. As an example, CGKs are updated once a month, for monthly subscription. The CGK of the highest representation of any channel group is created randomly and the successive CGKs are derived from the previous one with a one-way hash function as:

CGKi ,k = H(CGKi ,k-1)

where H() is the one way hash function based on any cryptographically-secure hash algorithm. With this scheme, users can subscribe to any representation of a channel group by purchasing the CGK corresponding to that representation and also can access to lower representations by deriving the CGKs with one-way hash function H(). However, they do not have access to higher qualities than purchased. The use of the one-way hash function thus enables the pay-per-maximum-quality use case. In the case of subscription to a group of channels, the service provider has to make sure that channel groups are disjoint.

280

In the PPV case, the MPD should inform the user that the program is PPV and give the necessary information for temporary license acquisition (the user may have to interactively order the license and confirm payment). The RKs are created in the same way as the CGK by using a one-way hash function H() in order to let the users only access the purchased and lower representations, i.e., qualities. The described key architecture allows pay-per-view and key change, as well as pay-per-quality and pay-per-maximum-quality use cases. Further, it allows bundling of channels into channel groups that can be accessed through a common license.

3.4 Encryption In the proposed key distribution scheme, encryption is applied in three different steps: Encryption of segments (i.e., content), encryption of SKs and encryption of RKs. The encryption algorithm deployed is the Advanced Encryption Standard (AES) with 128 bits block size and 256 bits key size in cipher-block chaining (CBC) mode, which is the most widely used encryption algorithm for DRM systems, but other kinds of encryption algorithms ca be applied as well. Selective encryption is not used in our scheme. In general, switching between quality representations is done at segment boundaries. Therefore, encryption of a segment as a whole is not a problem in terms of quality representation switching. By assuming that switching between quality representations within segments can be applied in the future, segments are encrypted sample by sample in our scheme. Encrypted SKs and RKs are delivered to the users within a special box called ‘imif’ of media and initial segments respectively. This box is a sub-box of ‘sinf’ located inside the ‘ipro’ box which is a sub-box of the ‘meta’ box. This means four extra boxes are created for delivering encrypted keys. More information about MPEG-4 based file boxes can be found in [4].

3.4.1 Effect of Encryption on Data Size The encryption algorithm increases the size of the initialization segment and the media segments because of padding (insertion of dummy values to fill up the ciphertext to block boundaries) and inclusion of extra DRM/crypto related boxes. If the program is not PPV, the initial segment of each representation of the program contains an encrypted RK. The four additional empty boxes increase the size 256 bits (64 bits per box). After encrypting with CGK, the size of the RK increases from 256 bits to 384 bits because of the Initialization Vector (IV) that is placed in front of the encrypted RK. As a result, the size of the initial segments expands by 640 bits. If the program is PPV, the RK is delivered to the users within a DRM license, hence out-of-band for the media data. Thus, the size of the initial segment does not change. Embedding the encrypted SK into a segment increases the size of the segment by 640 bits as in the case of the initial segment. Each segment is encrypted sample by sample with the same SK, but with different initialization vectors (IVs). In our system, the IV of the first sample is created randomly and the other IVs are derived from the first IV. Therefore it is sufficient to just signal the first IV. This results in another 128 bits expansion in a media segment. Totally, there is a data expansion of 768 bits in a media segment. However, sample by sample encryption requires padding for each sample unless the sample size is a multiple of 128 bits. This results in a random data expansion between 0-127 per sample, i.e. around 64 bits on average.

3.4.2 Effect of Encryption on Segmentation Time In order to get an impression of the effect of encryption on the segmentation time, we segmented, with and without encryption, an example video with three representations. Each representation of the content is divided into 10 segments with equal duration of 12 seconds. The segmentation times are given in Figure 5.

Figure 5 Server Segmentation Time

Figure 5 shows the segmentation time of 30 segments on the content creation server, with and without encryption. The red circles indicate the segmentation duration with encryption, while blue stars represent the segmentation time without encryption, of the same segments. As can be seen from the figure, the delay stemming from encryption is comparable to the segmentation time but relatively small compared to segment duration which indicates that encryption will not cause any problem, even in the live streaming case. Encryption time is not a significant delay factor.

4. PROOF-OF-CONCEPT: MARLIN BB PROTECTED DASH We chose Marlin Broadband (BB) as DRM system for a proof-of-concept implementation that incorporates the concepts outlined above. Marlin BB has been developed by the Marlin Developer Community (MDC). MDC is an industry forum, mainly from the consumer electronics industries, that has developed a state-of-the art family of DRM systems. Marlin BB is conceptually similar to e.g. OMA DRM or proprietary DRMs like Windows Media DRM.

The DRMSystemID element the in MPD is set to “urn:dvb:casystemid:19188”, using the value for Marlin DRM defined in the DVB forum.

Our proof-of-concept system is mainly composed of six components: Key Database, Content Rendering, HTTP Server, Marlin BB Server, HTTP Client and Marlin Client, Figure 6 shows the interaction between the different components.

The content rendering component is responsible for off-line media segmentation and encryption using the keys in the key database. It produces the MPD, the unencrypted initialization segment per representation and encrypted media segments, which are all stored at the HTTP Server.

281

Figure 6 Proof-of-Concept Architecture

The HTTP client first downloads and interprets the MPD. Using the protection information in the MPD, the Marlin Client checks for the availability of a suitable license for the content. If necessary, a license is requested by the Marlin client from the Marlin BB server. The Marlin BB server checks the request and returns the license object which includes the key, DRMContentID, license expiration date and possibly other information. If the program is PPV, the requested license is a temporary license, otherwise, the requested license is a channel group license. Subsequently, the HTTP client requests media segments from the HTTP Server via the segment addresses in the MPD. As the license and thus the content key is available at the Marlin client, the Marlin client delivers the key to the HTTP client, which decrypts and renders the received media segments.

Marlin Client and Marlin Server both perform the Marlin protocols as specified in [6] which are Marlin registration, Node acquisition, Link acquisition, Marlin License acquisition, and, if necessary, Marlin de-registration.

Figure 7 Example Electronic Program Guide Detail

The proposed scheme has been implemented as part of a DASH client and server test system. In an example service, two channel groups are offered to the user. Each channel group has multiple channels with three quality levels/representations: HD TV, Standard TV and Mobile TV, which are composed of multiple segments. Figure 7 is a snap-shot of the user interface, in TV terms Electronic Program Guide (EPG), that is offered to the user.

Channel group I has three channels (channel 1, channel 2, channel 3). In an example case, the user has registered to the standard TV level of channel group I. Thus, the user can access standard TV and mobile TV levels of all channels in channel group I as shown by the green “OK” signs in Fig. 7. Channel 3 is an exception because the program running on this channel at this time is pay-per-view, and thus not included in the channel group license. In this case, the EPG guides the user to buy an additional temporary PPV license.

The license that the user bought for channel group I is not valid for channel group II. Thus, if the user wants to access the only channel of group II, he has to buy another license which was the HD license in the example case. Using that license, the user can access all three representations of channel group II, as shown in the Figure.

5. SUMMARY AND CONCLUSIONS Dynamic adaptive HTTP streaming (DASH) is a new concept for video streaming using consecutive downloads of short video segments. 3GPP has developed the basic DASH standard which is further extended by OIPF and MPEG. In all versions available to date, content protection is not properly enabled. Extensions are needed to enable important use cases like pay-per-view, license change in an ongoing video channel, and pay-per-maximum-quality.

In this publication, we have examined which extensions are needed, in order to use DASH for DRM protected content. This comprises required changes in the DASH standard, namely MPD metadata extensions, as well as changes in the used transport file formats, namely the inclusion of a ISO file format box carrying a segment key, and finally a suitable key and license structure applied to the underlying DASH concept. All those changes have been proposed and explained in the paper. With these changes and additions, which do not change the core idea of DASH, even more complex use cases like pay-per-view and pay-per-maximum-quality are possible.

As a proof-of-concept, we have implemented the proposed changes and integrated them with a real DRM key and license management system. For the proof-of-concept, we have used Marlin DRM, but any other similar DRM would be equally usable.

6. REFERENCES [1] 3GPP TS 26.234: Transparent end to end packet switched Streaming

Service (PSS), Protocols and codecs, v9.4.0 [2] Open IPTV Forum. HTTP Adaptive Streaming. Technical report, V2.0 [3] Open IPTV Forum, Authentication, Content Protection and Service

Protection. V2.0 [4] ISO/IEC International standard 14496, Information technology –

Coding of audio-visual objects, Part 12 : ISO base media file format [5] Marlin Developer Community. Marlin Architecture Overview [6] Marlin Developer Community. Marlin Broadband Architecture

Overview for Marlin Adopters [7] Apple. HTTP Streaming Overview. Technical report [8] Microsoft Corporation. ISS Smooth Streaming Technical Overview.

Technical report [9] F. K. Tu, C. S. Laih, and S. H. Toung, “On key distribution

management for conditional access system on pay-TV system,” IEEE Trans. Consumer Electron., vol. 45, no. 1, pp. 151–158, Feb 1999

[10] J. W. Lee, ”Key distribution and management for conditional access system on DBS” in Proc. Int. Conf. Cryptology and Information Security, pp.82-86, 1996

[11] Y. L. Huang and S. Shieh, “Efficient key distribution schemes for secure media delivery in pay-TV systems,” IEEE Trans. Multimedia, vol. 6, no. 5 , pp. 760–769, October 2004

[12] Conditional-Access Broadcasting systems, ITU-R Recommendation 810, 1992

[13] Open IPTV Forum, “Release 2 specification, Volume 2 – Media Formats”, V2.0

[14] Marlin Developer Community, OMArlin Specification, Version 1.0.3 [15] Marlin Developer Community, Marlin – File Formats Specification,

Version 1.1.2 [16] Marlin Developer Community, Marlin Broadband Transport Stream

Specification. Version 1.0.2 [17] 3GPP TS 26.244, Transparent end-to-end packet switched streaming

service (PSS); 3GPP file format (3GP) [18] Information technology – Generic coding of moving pictures and

associated audio information: Systems, ISO/IEC 13818-1:2000(E) [19] Open Mobile Alliance, OMA Digital Rights Management V2.2 [20] E. Becker, W. Buhse, D. Günnewig, N. Rump (Eds.), “Digital Rights

Management - Technological, Economic, Legal and Political Aspects”, Springer, 2nd edition, 2004

282