• [大笑]那依然是按劳(劳动价值或劳动能力)分配也! 2019-10-23
  • 确认过眼神 济南这些项目强强联合置业不容错过 ——凤凰网房产济南 2019-10-23
  • 萨拉赫缺阵 埃及队遭“绝杀” 2019-10-21
  • 我发现从五+年代农业用化肥农药,在六+年代几百年长的柿树几乎死光。没人研究! 2019-10-21
  • 【理上网来·喜迎十九大】全面从严治党的核心是加强党的领导 2019-10-20
  • LADY咔咔(37)我的乌克兰媳妇儿名爵ZS 2019-10-20
  • 赤裸裸的嗜血,比英国圈地运动疯狂多了[福尔摩斯] 2019-10-18
  • 光明日报:“互联网+农产品”不能一哄而上 2019-10-18
  • 瑞典民众庆祝世界杯胜利时发生枪击 致4人受伤 2019-10-16
  • 中国资本市场开放出大招 跨境证券投资更便利 2019-10-16
  • 中国足球,就是笑博士的“责权利平滑对接”改革的必然结果! 2019-10-13
  • 2017年度合肥市政务微信十强名单公布 2019-10-13
  • 全世界球迷进入“世界杯时间”,足球与美酒一同为世界带来快乐世界杯 中国 2019-10-10
  • 被曝资金链紧张、大规模裁员 小黄车快黄了? 2019-10-10
  • 情浓端午——临汾建设社区15分钟便民服务志愿者在行动 2019-10-09
  • 香港特码网站24码:Pronunciation Gap Analysis

    W3C First Public Working Draft

    This version:
    //www.gzifj.tw/TR/2019/WD-pronunciation-gap-analysis-20190905/
    Latest published version:
    //www.gzifj.tw/TR/pronunciation-gap-analysis/
    Latest editor's draft:
    https://w3c.github.io/pronunciation/gap-analysis
    Editors:
    (Educational Testing Service)
    (Pearson)
    (Educational Testing Service)
    (W3C)

    Abstract

    This document is the Gap Analysis Review which presents required features of Spoken Text Pronunciation and Presentation and existing standards or specifications that may support (or enable support) of those features. Gaps are defined when a required feature does not have a corresponding method by which it can be authored in HTML.

    Status of This Document

    This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at //www.gzifj.tw/TR/.

    This is a First Public Working Draft of Pronunciation Gap Analysis by the Accessible Platform Architectures Working Group. It was initially developed by the Pronunciation Task Force to present required features of Spoken Text Pronunciation and Presentation and existing standards or specifications that may support (or enable support) of those features.

    To comment, file an issue in the W3C pronunciation GitHub repository. If this is not feasible, send email to [email protected] (subscribe, archives). Comments are requested by 14 October 2019. In-progress updates to the document may be viewed in the publicly visible editors' draft.

    Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

    This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

    This document is governed by the 1 March 2019 W3C Process Document.

    1. Introduction

    This section is non-normative.

    Accurate, consistent pronunciation and presentation of content spoken by text to speech synthesis (TTS) is an essential requirement in education and other domains. Organizations such as educational publishers and assessment vendors are looking for a standards-based solution to enable authoring of spoken presentation guidance in HTML which can then be consumed by assistive technologies and other applications which utilize text to speech synthesis for rendering of content.

    W3C has developed two standards pertaining to the presentation of speech synthesis which have reached recommendation status, Speech Synthesis Markup Language (SSML) and the Pronunciation Lexicon Specification (PLS). Both standards are directly consumed by a speech synthesis engine supporting those standards. While a PLS file reference may be referenced in a HTML page using link rel, there is no known uptake of PLS using this method by assistive technologies. While there are technically methods to allow authors to inline SSML within HTML (using namespaces), such an approach has not been adopted, and anecdotal comments from browser and assistive technology vendors have suggested this is not a viable approach.

    CSS Speech Module is a retired W3C Working Group Note that describes mechanism by which content authors may apply a variety of speech styling and presentation properties to HTML. This approach has a variety of advantages but does not implement the full set of features required for pronunciation. Section 16 of the Note specifically references the issue of pronunciation:

    CSS does not specify how to define the pronunciation (expressed using a well-defined phonetic alphabet) of a particular piece of text within the markup document. A "phonemes" property was described in earlier drafts of this specification, but objections were raised due to breaking the principle of separation between content and presentation (the "phonemes" authored within aural CSS stylesheets would have needed to be updated each time text changed within the markup document). The "phonemes" functionality is therefore considered out-of-scope in CSS (the presentation layer) and should be addressed in the markup / content layer.

    While a portion of CSS Speech was demonstrated by Apple in 2011 on iOS with Safari and VoiceOver, it is not presently supported on any platform with any Assistive Technology, and work on the standard has itself been stopped by the CSS working group.

    Efforts to address this need have been considered by both assessment technology vendors and the publishing community. Citing the need for pronunciation and presentation controls, the IMS Global Learning Consortium added the ability to author SSML markup, specify PLS files, and reference CSS Speech properties to the Question Test Interoperability (QTI) Accessible Portable Item Protocol (APIP). In practice, QTI/APIP authored content is transformed into HTML for rendering in web browsers. This led to the dilemma that there is no standardized (and supported) method for inlining SSML in HTML, nor is there support for CSS Speech. This has led to the situation where SSML is the primary authoring model, with assessment vendors implementing a custom method for adding the SSML (or SSML-like) features to HTML using non-standard or data attributes, with customized Read Aloud software consuming those attributes for text to speech synthesis. Given the need to deliver accurate spoken presentation, non-standard approaches often include mis-use of WAI-ARIA, and novel or contextually non-valid attributes (e.g., label). A particular problem occurs when custom pronunciation is applied via a misuse of the aria-label attribute, which results in an issue for screen reader users who also rely upon refreshable braille, and in which a hinted pronunciation intended only for a text to speech synthesizer also appears on the braille display.

    The attribute model for adding pronunciation and presentation guidance for assistive technologies and text to speech synthesis has demonstrated traction by vendors trying to solve this need. It should be noted that many of the required features are not well supported by a single attribute, as most follow the form of a presentation property / value pairing. Using multiple attributes to provide guidance to assistive technologies is not novel, as seen with WAI-ARIA where multiple attributes may be applied to a single element, for example, role and aria-checked. The EPUB standard for digital publishing introduced a namespaced version of the SSML phoneme and alphabet attributes enabling content authors to provide pronunciation guidance. Uptake by the publishing community has been limited, reportedly due to the lack of support in reading systems and assistive technologies.

    2. Core Features for Pronunciation and Spoken Presentation

    The common spoken pronunciation requirements from the education domain serve as a primary source for these features. These requirements can be broken down into the following main functions that would support authoring and spoken presentation needs.

    2.1 Language

    When content is authored in mixed language, a mechanism is needed to allow authors to indicate both the base language of the content as well as the language of individual words and phrases. The expectation is that assistive technologies and other tools that utilize text to speech synthesis would detect and apply the language requested when presenting the text.

    Voice Family / Gender

    Content authors may elect to make adjustments of those paramters to control the spoken presentation for purposes such as providing a gender specific voice to reflect that of the author, or for a character (or characters) in theatrical presentation of a story. Many assistive technologies already provide user selection of voice family and gender independent of any authored intent.

    2.2 Phonetic Pronunciation of String Values

    In some cases words may need to have their phonetic pronunciation prescribed by the content author. This may occur when uncommon words (not supported by text to speech synthesizers), or in cases where word pronunciation will vary based on context, and that context may not be correctly described.

    2.3 String Substitution

    There are cases where content that is visually presented may require replacement (substitution) with an alternate textual form to ensure correct pronunciation by text to speech synthesizers. In some cases phonetic pronunciation may be a solution to this need.

    2.4 Rate / Pitch / Volume

    While end users should have full control over spoken presentation parameters such as speaking rate, pitch, and volume (e.g., WCAG 1.4.2 ), content authors may elect to make adjustments of those parameters to control the spoken presentation for purposes such as a theatrical presentation of a story. Many assistive technologies already provide user control speaking rate, pitch, and volume independent of any authored intent.

    2.5 Emphasis

    In written text, an author may find it necessary to add emphasis to an important word or phrase. HTML supports both semantic elements (e.g., em) and CSS properties which, through a variety of style options, make programmatic detection of authored emphasis difficult (e.g., font-weight: heavy). While the emphasis element has existed since HTML 2.0, there is currently no uptake by assistive technology or read aloud tools to present text semantically tagged for emphasis to be spoken with emphasis.

    2.6 Say As

    While text to speech engines continue to improve in their ability to process text and provide accurate spoken rendering of acronyms and numeric values, there can be instances where uncommon terms or alphanumeric constructs pose challenges. Further, some educators may have specific requirements as to how a numeric value be spoken which may differ from a TTS engine's default rendering. For example, the Smarter Balanced Assessment Consortium has developed Read Aloud Guidelines to be followed by human readers used by students who may require a spoken presentation of an educational test, which includes specific examples of how numeric values should be read aloud.

    2.6.1 Presentation of Numeric Values

    Precise control as to how numeric values should be spoken may not always be correctly determined by text to speech engines from context.  Examples include speaking a number as individual digits, correct reading of year values, and the correct speaking of ordinal and cardinal numbers.

    2.6.2 Presentation of String Values

    Precise control as to how string values should be spoken, which may not be determined correctly by text to speech synthesizers.

    2.7 Pausing

    Specific spoken presentation requirements exist in the Accessibility Guidelines from PARCC, and include requirements such as inserting pauses in the spoken presentation, before and after emphasized words and mathematical terms. In practice, content authors may find it necessary to insert pauses between numeric values to limit the chance of hearing multiple numbers as a single value. One common technique to achieve pausing to date has involved inserting non-visible commas before or after a text string requiring a pause. While this may work in practice for a read aloud TTS tool, it is problematic for screen reader users who may, based on verbosity settings, hear the multiple commas announced, and for refreshable braille users who will have the commas visible in braille.

    3. Gap Analysis

    Based on the features defined in the prior section, the following table presents existing speech presentation standards, HTML features, and WAI-ARIA attributes that may offer a method to achieve the requirement for HTML authors.

    Requirement
    HTML
    WAI-ARIA
    PLS
    CSS Speech
    SSML
    Language
    Yes


    Yes
    Voice Family/Gender



    Yes
    Yes
    Phonetic Pronunciation


    Yes

    Yes
    Substitution

    Partial


    Yes
    Rate/Pitch/Volume



    Yes
    Yes
    Emphasis
    Yes


    Yes
    Yes
    Say As




    Yes
    Pausing



    Yes
    Yes

    The following sections present how each of the required features may or may not be met by use of existing standards. A key considerationa in the analysis is whether a means exists to directly author (or annote) HTML content to incporate the spoken presentationa and pronunciation feature.

    3.1 Language

    Allow content authors to specify the language of text contained within an element so that the TTS used for rendering will select the appropriate language for synthesis.

    HTML

    lang attribute can be applied at the document level or to individual elements. (WCAG) (AT Supported: some)

    SSML

    <speak> In Paris, they pronounce it <lang xml:lang="fr-FR">Paris</lang> </speak>

    3.2 Voice Family/Gender

    Allow content authors to specify a specific TTS voice to be used to render text. For example, content which presents a dialog between two people, a woman and a man, the author may specify that a female voice be used for the woman's text and a male voice be used for the man. Some platform TTS services may support a variety of voices, identified by a name, gender, or even age.

    CSS

    lang attribute can be applied to the document itself or to elements. (WCAG) (AT Supported: some)

    SSML

    Editor's note

    To be added

    3.3 Phonetic Pronunciation

    Allow content authors to precisely specify the phonetic pronunciation of a word or phrase.

    PLS

    Editor's note

    To be added

    SSML

    Editor's note

    To be added

    3.4 Substitution

    Allow content authors to substitute a text string to be rendered by TTS instead of the actual text contained in an element.

    WAI-ARIA

    The aria-label and aria-labelledby attribute can be used by an author to supply a text string that will become the accessible name for the element upon which it is applied.  This usage effectively provides a mechanism for performing text substation that is supported by a screen reader. However, it is problematic for one significant reason; for users who utilize screen readers and refreshable Braille, the content that is voiced will not match the content that is sent to the refreshable Braille device. This mismatch would not be acceptable for some content, particularly for assessment content.

    SSML

    Editor's note

    To be added

    3.5 Rate/Pitch/Volume

    Allow content authors to specify characteristics, such as rate, pitch, and/or volume of the TTS rendering of the text.

    CSS

    Editor's note

    To be added

    SSML

    Editor's note

    To be added

    3.6 Emphasis

    Allow content authors to specify that text content be spoken with emphasis, for example, louder and more slowly. This can be viewed as a simplification of the Rate/Pitch/Volume controls to reduce authoring complexity.

    HTML

    Editor's note

    To be added

    CSS

    Editor's note

    To be added

    SSML

    Editor's note

    To be added

    3.7 Say As

    Editor's note

    To be added

    CSS

    Editor's note

    To be added

    SSML

    Editor's note

    To be added

    3.8 Pausing

    Editor's note

    To be added

    CSS

    Editor's note

    To be added

    SSML

    Editor's note

    To be added

    A. Acknowledgments

    This section is non-normative.

    The following people contributed to the development of this document.

    A.1 Participants active in the Pronunciation TF at the time of publication

  • [大笑]那依然是按劳(劳动价值或劳动能力)分配也! 2019-10-23
  • 确认过眼神 济南这些项目强强联合置业不容错过 ——凤凰网房产济南 2019-10-23
  • 萨拉赫缺阵 埃及队遭“绝杀” 2019-10-21
  • 我发现从五+年代农业用化肥农药,在六+年代几百年长的柿树几乎死光。没人研究! 2019-10-21
  • 【理上网来·喜迎十九大】全面从严治党的核心是加强党的领导 2019-10-20
  • LADY咔咔(37)我的乌克兰媳妇儿名爵ZS 2019-10-20
  • 赤裸裸的嗜血,比英国圈地运动疯狂多了[福尔摩斯] 2019-10-18
  • 光明日报:“互联网+农产品”不能一哄而上 2019-10-18
  • 瑞典民众庆祝世界杯胜利时发生枪击 致4人受伤 2019-10-16
  • 中国资本市场开放出大招 跨境证券投资更便利 2019-10-16
  • 中国足球,就是笑博士的“责权利平滑对接”改革的必然结果! 2019-10-13
  • 2017年度合肥市政务微信十强名单公布 2019-10-13
  • 全世界球迷进入“世界杯时间”,足球与美酒一同为世界带来快乐世界杯 中国 2019-10-10
  • 被曝资金链紧张、大规模裁员 小黄车快黄了? 2019-10-10
  • 情浓端午——临汾建设社区15分钟便民服务志愿者在行动 2019-10-09
  • 3d复式投注速查表 重庆肘时彩五星走势图 重庆时时彩开奖APP 龙虎和玩法 求猫咪看片app 扑克21点要牌技巧 彩票两个平台互刷会封号吗 买彩时时彩能稳赚吗 重庆时时彩开奖视频直播 澳门三个骰子出点规律