Last updated: May 12, 2010 (created on November 21, 2006)
Questions about support for aural CSS (Cascading Style Sheets) have been popping up in various corners of the Web lately, so I thought I would compile what I know as a supplementary page to my Screen Readers and Abbreviations tests.
If you find this information to be incomplete or inaccurate, please let me know so that I can update this page.
Note: Some new sources of information will be added to this page pending review. In the meantime, you may like to follow the links I have included at the bottom of the page and read for yourself.
CSS includes 'aural' (or 'speech') properties that allow web designers and developers control over the way in which HTML (and XML) is synthesised as speech by CSS-aware software. However, these properties enjoy very limited support in current web browsers, screen readers and in other assistive technology software where the properties may be of benefit.
If you want a good introduction to aural style sheets, the “Aural stylesheets” section of Joe Clark's book, Building Accessible Websites, is a very informative.
Unfortunately, such limited support makes aural style sheets practically useless. Without an improved level of support from software vendors, web designers and developers are unlikely to use it as a tool. We have a paradox, however, as vendors are unlikely to prioritise support for something that is not used and has no benefit to them. Instead, current screen reader software (such as JAWS) and speaking browsers (such as Home Page Reader) analyse words to determine how they should be pronounced using their own non-CSS-based algorithms.
However, even if support was better than it is, using the aural properties is another matter. The average web designer or developer would still need the skill to write an appropriate and considerate aural style sheet, selecting voices, and perhaps positioning them spacially. If you think about how many designers actually use print style sheets, how many might actually implement aural style sheets?
It's also worth considering whether or not aural style sheets are as useful as they sound at first. Should speech properties be in the hands of web developers at all? Screen reader software typically allows users to set preferences for speech speed, voices, custom pronunciation dictionaries, etc. Such settings should remain under users' control rather than being part of building of a web site. Anything that could potentially override a user's settings could be harmful and insensitive.
Indeed, there may be some specific cases where aural style sheets would be useful. As an example, screen readers may struggle to pronounce the name of Australian airline, Qantas. Some speech synthesisers pronounce the name as “kan-tass” rather than “kwon-tass”. An aural style sheet could be used by a web developer to specify how speech synthesisers should pronounce the word. However, you must remember that the aural style sheet would only affect the web site to which it is applied, and that screen readers are used not only to access many web sites, but to access various computer software, too. Those screen reader settings are more useful to users than those set for a single web site.
Taking these considerations into account, I do not believe aural style sheets are important or useful enough to see their support improve in speech synthesisers.
Aural CSS first appeared in the CSS 2 Specification, the current official W3C Recommendation for CSS. The CSS 2.1 Specification – currently a "last call" Working Draft that will become the next official W3C Recommendation – extends the specification to include a new property, but deprecates the 'aural' media type and reserves the favoured 'speech' media type. The CSS 3 Speech module reworks and replaces the 'aural' properties as specified for CSS 2, 19 Aural style sheets / CSS 2.1, Appendix A. Aural style sheets. To quote some relevant sections of the CSS specifications:
“UAs are not required to implement the properties of this chapter in order to conform to CSS 2.1.”
And:
“We expect that in a future level of CSS there will be new properties and values defined for speech output. Therefore CSS 2.1 reserves the 'speech' media type (see chapter 7, "Media types"), but does not yet define which properties do or do not apply to it.
“The properties in this appendix apply to a media type 'aural', that was introduced in CSS 2. The type 'aural' is now deprecated.”
CSS 2.1, Appendix A. Aural style sheets, A.1 The media types 'aural' and 'speech'
The CSS 3 Speech module is currently supported in:
-xv-
prefix to work in Opera, e.g. -xv-voice-balance: right
.Note about FireVox and Firefox: Firefox does not parse aural/speech CSS properties, so FireVox support is achieved by parsing the CSS directly.
CSS 2 Aural Style Sheets are currently supported in:
Note about Safari with VoiceOver: It has been suggested that using
Safari with VoiceOver offers support for aural CSS.
It seems that this is just rumour, but I have not yet tested it myself. My own initial testing using Safari 3.1.2 with VoiceOver indicates that the speak
property is not supported.
Note about iCab: It has also been implied that iCab should support CSS 2 Aural style sheets as it claims full CSS 2.1 support. I currently have no information to confirm support.
Note about Window-Eyes: GW Micro are quoted as having said in December 2003 that they have no plans to support aural style sheets in Window-Eyes (see addendum to Shortened forms on the Web).
References:
The following table shows which properties are available in the different CSS specifications.
CSS property | CSS 2 | CSS 2.1 | CSS 3 |
---|---|---|---|
azimuth | y | y | n |
cue | y | y | y |
cue-after | y | y | y |
cue-before | y | y | y |
elevation | y | y | n |
mark | n | n | y |
mark-after | n | n | y |
mark-before | n | n | y |
pause | y | y | y |
pause-after | y | y | y |
pause-before | y | y | y |
phonemes | n | n | y |
pitch | y | y | n |
pitch-range | y | y | n |
play-during | y | y | n |
rest | n | n | y |
rest-after | n | n | y |
rest-before | n | n | y |
richness | y | y | n |
speak | y | y | y |
speak-header | n | y | n |
speak-numeral | y | y | n |
speak-punctuation | y | y | n |
speech-rate | y | y | n |
stress | y | y | n |
voice-balance | n | n | y |
voice-duration | n | n | y |
voice-family | y | y | y |
voice-pitch | n | n | y |
voice-pitch-range | n | n | y |
voice-rate | n | n | y |
voice-stress | n | n | y |
voice-volume | n | n | y |
volume | y | y | n |
The following table shows the current support for aural/speech CSS properties.
Key:
CSS property | Opera 9 | FireVox | Emacspeak |
---|---|---|---|
azimuth | n | ? | ? |
cue | y | ? | ? |
cue-after | y | ? | ? |
cue-before | y | ? | ? |
elevation | n | ? | ? |
mark | ? | ? | ? |
mark-after | ? | ? | ? |
mark-before | ? | ? | ? |
pause | y | ? | ? |
pause-after | y | ? | ? |
pause-before | y | ? | ? |
phonemes | y | ? | ? |
pitch | n | ? | ? |
pitch-range | n | ? | ? |
play-during | n | ? | ? |
rest | ? | ? | ? |
rest-after | ? | ? | ? |
rest-before | ? | ? | ? |
richness | n | ? | ? |
speak | y | ? | ? |
speak-header | n | ? | ? |
speak-numeral | n | ? | ? |
speak-punctuation | n | ? | ? |
speech-rate | n | ? | ? |
stress | n | ? | ? |
voice-balance | y | ? | ? |
voice-duration | y | ? | ? |
voice-family | y | ? | ? |
voice-pitch | y | / | ? |
voice-pitch-range | y | ? | ? |
voice-rate | y | / | ? |
voice-stress | y | ? | ? |
voice-volume | y | / | ? |
volume | n | ? | ? |
CSS 2 is now superseded by CSS 2.1. The following properties were defined in the CSS 2 W3C Recommendation, 12 May 1998 (revised 11 April 2008): http://www.w3.org/TR/2008/REC-CSS2-20080411/aural.html
19 properties:
Note: The speak-date
and speak-time
properties were referenced in
a W3C note in 1997, but never made it into a specification.
W3C Candidate Recommendation, 08 September 2009: http://www.w3.org/TR/CSS21/aural.html
The new property speak-header
is introduced and 'aural' media type is deprecated in favour of 'speech' media type.
20 properties:
W3C Working Draft, 16 December 2004: http://www.w3.org/TR/css3-speech/#property-index
22 properties:
Q
Tag” on A List ApartThere are a few pages of information I've found that I still need to read through and/or digest, but you can take a look yourself in the meantime:
speak-header
property mentioned was introduced in
CSS 2.1, which isn't yet an official recommendation, so that particular property will probably never see the light of day. I'm not sure
whether or not Emacspeak supports it. Also, those CSS 2 properties don't feature in CSS 3, as the CSS 3
Speech Module deprecates many of the CSS 2 aural properties. As a result, I think if software begins to support the aural
CSS properties in any useful way, we'll be seeing support for the CSS 3 properties and not the older CSS 2
properties. The article also references Opera as supporting aural CSS. Opera only claims support
for a selection of properties defined by the CSS 3 Speech Module, not the CSS 2 aural properties. The article received
hype from sites like SitePoint.com (Jul 3, 2006 News Wire), so I'm a
little worried that it puts across inaccurate information. Anyway, Accessites have updated the page (15 Feb 2007) with a note to this effect. It
did serve to remind me about Fonix SpeakThis (a hosted text-to-speech service that would deliver your
website in MP3 format), I think referencing it now is somewhat out of date as SpeakThis seems to
have died a death. Fonix seem to have replaced it with other services and I have found little reference to the software since 2003.display:none;
in CSS with a
'screen' media type will be parsed by a browser and the HTML element targetted by that CSS will (should) be removed from
the DOM. Hence, the model of the page in the screen reader's virtual buffer will not contain that HTML element either. It
has been suggested that screen readers should ignore any CSS meant for other media (screen media, for example), but interpret any aural
CSS, meaning a shift away from reliance on browsers and towards behaving as a speech browser rather than a screen reader.