/ lab / notes / css / aural-speech

Aural CSS: Support for CSS 2 Aural Style Sheets / CSS 3 Speech Module

Last updated: May 12, 2010 (created on November 21, 2006)

Questions about support for aural CSS (Cascading Style Sheets) have been popping up in various corners of the Web lately, so I thought I would compile what I know as a supplementary page to my Screen Readers and Abbreviations tests.

If you find this information to be incomplete or inaccurate, please let me know so that I can update this page.

Note: Some new sources of information will be added to this page pending review. In the meantime, you may like to follow the links I have included at the bottom of the page and read for yourself.

  1. Introduction
  2. Summary of Known Support
  3. Details of Aural CSS Properties
  4. CSS 2 (superseded)
  5. CSS 2.1 (W3C Candidate Recommendation, 08 September 2009)
  6. CSS 3 (W3C Working Draft, 16 December 2004)
  7. Related Links
  8. New Sources Pending Addition


CSS includes 'aural' (or 'speech') properties that allow web designers and developers control over the way in which HTML (and XML) is synthesised as speech by CSS-aware software. However, these properties enjoy very limited support in current web browsers, screen readers and in other assistive technology software where the properties may be of benefit.

If you want a good introduction to aural style sheets, the “Aural stylesheets” section of Joe Clark's book, Building Accessible Websites, is a very informative.


Unfortunately, such limited support makes aural style sheets practically useless. Without an improved level of support from software vendors, web designers and developers are unlikely to use it as a tool. We have a paradox, however, as vendors are unlikely to prioritise support for something that is not used and has no benefit to them. Instead, current screen reader software (such as JAWS) and speaking browsers (such as Home Page Reader) analyse words to determine how they should be pronounced using their own non-CSS-based algorithms.

However, even if support was better than it is, using the aural properties is another matter. The average web designer or developer would still need the skill to write an appropriate and considerate aural style sheet, selecting voices, and perhaps positioning them spacially. If you think about how many designers actually use print style sheets, how many might actually implement aural style sheets?

It's also worth considering whether or not aural style sheets are as useful as they sound at first. Should speech properties be in the hands of web developers at all? Screen reader software typically allows users to set preferences for speech speed, voices, custom pronunciation dictionaries, etc. Such settings should remain under users' control rather than being part of building of a web site. Anything that could potentially override a user's settings could be harmful and insensitive.

Indeed, there may be some specific cases where aural style sheets would be useful. As an example, screen readers may struggle to pronounce the name of Australian airline, Qantas. Some speech synthesisers pronounce the name as “kan-tass” rather than “kwon-tass”. An aural style sheet could be used by a web developer to specify how speech synthesisers should pronounce the word. However, you must remember that the aural style sheet would only affect the web site to which it is applied, and that screen readers are used not only to access many web sites, but to access various computer software, too. Those screen reader settings are more useful to users than those set for a single web site.

Taking these considerations into account, I do not believe aural style sheets are important or useful enough to see their support improve in speech synthesisers.

A little history

Aural CSS first appeared in the CSS 2 Specification, the current official W3C Recommendation for CSS. The CSS 2.1 Specification – currently a "last call" Working Draft that will become the next official W3C Recommendation – extends the specification to include a new property, but deprecates the 'aural' media type and reserves the favoured 'speech' media type. The CSS 3 Speech module reworks and replaces the 'aural' properties as specified for CSS 2, 19 Aural style sheets / CSS 2.1, Appendix A. Aural style sheets. To quote some relevant sections of the CSS specifications:

UAs are not required to implement the properties of this chapter in order to conform to CSS 2.1.”

CSS 2.1, Appendix A. Aural style sheets


“We expect that in a future level of CSS there will be new properties and values defined for speech output. Therefore CSS 2.1 reserves the 'speech' media type (see chapter 7, "Media types"), but does not yet define which properties do or do not apply to it.

“The properties in this appendix apply to a media type 'aural', that was introduced in CSS 2. The type 'aural' is now deprecated.”

CSS 2.1, Appendix A. Aural style sheets, A.1 The media types 'aural' and 'speech'

Summary of Known Support

The CSS 3 Speech module is currently supported in:

Note about FireVox and Firefox: Firefox does not parse aural/speech CSS properties, so FireVox support is achieved by parsing the CSS directly.

CSS 2 Aural Style Sheets are currently supported in:

Note about Safari with VoiceOver: It has been suggested that using Safari with VoiceOver offers support for aural CSS. It seems that this is just rumour, but I have not yet tested it myself. My own initial testing using Safari 3.1.2 with VoiceOver indicates that the speak property is not supported.

Note about iCab: It has also been implied that iCab should support CSS 2 Aural style sheets as it claims full CSS 2.1 support. I currently have no information to confirm support.

Note about Window-Eyes: GW Micro are quoted as having said in December 2003 that they have no plans to support aural style sheets in Window-Eyes (see addendum to Shortened forms on the Web).


Details of Aural CSS Properties

The following table shows which properties are available in the different CSS specifications.

Table of Aural CSS Properties
CSS property CSS 2 CSS 2.1 CSS 3
azimuth y y n
cue y y y
cue-after y y y
cue-before y y y
elevation y y n
mark n n y
mark-after n n y
mark-before n n y
pause y y y
pause-after y y y
pause-before y y y
phonemes n n y
pitch y y n
pitch-range y y n
play-during y y n
rest n n y
rest-after n n y
rest-before n n y
richness y y n
speak y y y
speak-header n y n
speak-numeral y y n
speak-punctuation y y n
speech-rate y y n
stress y y n
voice-balance n n y
voice-duration n n y
voice-family y y y
voice-pitch n n y
voice-pitch-range n n y
voice-rate n n y
voice-stress n n y
voice-volume n n y
volume y y n

The following table shows the current support for aural/speech CSS properties.


not supported
is supported
partial support
not tested / level of support unknown (but unlikely if CSS 2)
Table of Known Support for Aural CSS Properties
CSS property Opera 9 FireVox Emacspeak
azimuth n ? ?
cue y ? ?
cue-after y ? ?
cue-before y ? ?
elevation n ? ?
mark ? ? ?
mark-after ? ? ?
mark-before ? ? ?
pause y ? ?
pause-after y ? ?
pause-before y ? ?
phonemes y ? ?
pitch n ? ?
pitch-range n ? ?
play-during n ? ?
rest ? ? ?
rest-after ? ? ?
rest-before ? ? ?
richness n ? ?
speak y ? ?
speak-header n ? ?
speak-numeral n ? ?
speak-punctuation n ? ?
speech-rate n ? ?
stress n ? ?
voice-balance y ? ?
voice-duration y ? ?
voice-family y ? ?
voice-pitch y / ?
voice-pitch-range y ? ?
voice-rate y / ?
voice-stress y ? ?
voice-volume y / ?
volume n ? ?


CSS 2 is now superseded by CSS 2.1. The following properties were defined in the CSS 2 W3C Recommendation, 12 May 1998 (revised 11 April 2008):

19 properties:

  1. azimuth
  2. cue
  3. cue-after
  4. cue-before
  5. elevation
  6. pause
  7. pause-after
  8. pause-before
  9. pitch
  10. pitch-range
  11. play-during
  12. richness
  13. speak
  14. speak-numeral
  15. speak-punctuation
  16. speech-rate
  17. stress
  18. voice-family
  19. volume

Note: The speak-date and speak-time properties were referenced in a W3C note in 1997, but never made it into a specification.

CSS 2.1

W3C Candidate Recommendation, 08 September 2009:

The new property speak-header is introduced and 'aural' media type is deprecated in favour of 'speech' media type.

20 properties:

  1. azimuth
  2. cue
  3. cue-after
  4. cue-before
  5. elevation
  6. pause
  7. pause-after
  8. pause-before
  9. pitch
  10. pitch-range
  11. play-during
  12. richness
  13. speak
  14. speak-header
  15. speak-numeral
  16. speak-punctuation
  17. speech-rate
  18. stress
  19. voice-family
  20. volume


W3C Working Draft, 16 December 2004:

22 properties:

  1. cue
  2. cue-after
  3. cue-before
  4. mark
  5. mark-after
  6. mark-before
  7. pause
  8. pause-after
  9. pause-before
  10. phonemes
  11. rest
  12. rest-after
  13. rest-before
  14. speak
  15. voice-balance
  16. voice-duration
  17. voice-family
  18. voice-pitch
  19. voice-pitch-range
  20. voice-rate
  21. voice-stress
  22. voice-volume

New Sources Pending Addition

There are a few pages of information I've found that I still need to read through and/or digest, but you can take a look yourself in the meantime: