Speaking Emacs

An audio desktop

Visually impaired people have to use either a speaking or braille interface for using a computer. This interface to support the visually impaired user is often called "screen reader", sometimes provided by the operating system or as a third party application.

F.e. MacOS provides VoiceOver, Linux - especially the Gnome desktop - comes with Orca. Various applications bring their own audio interface, f.e. Google Chrome offers Chromevox. I have heard of Microsoft Narrator for Windows, but never used it. As third-party tools for Windows, there are also JAWS or NVDA. Same, I have no experience in them, as I try to avoid Windows.

But all these tools lack in a special way - they are just crutches for blind users to use a visual desktop. The concept of using the computer is still screen-oriented.

Imagine an audio-desktop focusing on providing a primary audio interface for visually impaired people to use a computer!

Emacspeak

Providing an audio desktop was the vision of T.V. Raman, who started about 30 years ago implementing Emacspeak. He is IMHO really and truly a pioneer of giving access to computers for visually impaired people!

Emacspeak is based on Emacs, the probably most all-round editor, actually an operating system on itself, kind of descendant of the Lisp Machines. Emacs provides email readers, news readers, various shells, xterm-implementation in elisp, web browsers, his own window manager exwm, clients for Mastodon, Jabber, Twitter, a calculator (actually full math-package), various personal information management tools, games, emulators, what ever is needed to edit code (programming language specific modes, lsp, repls, …), and much more. See also the EmacsWiki to get an idea.

T.V. Raman tailored around Emacs a full eco-system for blind users: his audio desktop Emacspeak. Not simply speaking content to blind users, but having audio icons (sound effects for various actions), different voices for different text (f.e. speak a link or a headline in a different pitch),

Various applications are integrated into Emacspeak: having an ebook-shelf, preconfigured radio-stations, reading email, usenet news, news-feeds, browse the web, keep your notes, maintain your computer, have a shell, handle files and directories, just to mention a few.

The actual speaking can be done by various back ends, f.e.

DECtalk hardware,
DECtalk software,
Espeak,
IBM TTS - in the past you could obtain a license via Oralux, especially to use their excellent voxin voices, BUT this is currently no longer possible, as they seem to have run out of licenses. Important note: That means - as far as I understood - you can't use the excellent voxin voices for Emacspeak, only via some workarounds (f.e. via multispeech). This is actually very bad news, as the voxin voices are IMHO the best available voices currently!
MacOS own voices.

The bridge between Emacspeak and the above mentioned speaking software / hardware is Emacspeaks own speech-server written in Tcl/TclX.

Alternatively there is also multispeech provided by poretsky. Via multispeech with some hacks, I could also get Emacspeak to use the mentioned voxin voices. Unfortunately, rather sluggish, see my post to the emacspeak mailing list about my troubles.

Emacspeak has evolved over the years currently containing about 80k lines of elisp and Tcl/TclX code! Many of T.V. Ramans adaptions are cramping directly into various emacs features and provided emacs packages. Unfortunately this can result in issues, if f.e. these packages are changed. Not all of these integration seem to be still maintained. Also, as T.V. Raman is the main contributor, Emacspeak is tailored very specific to his needs, and his ideas of an audio desktop.

speechd-el

All the trouble with the speaking servers, the unsatisfying situation around the voxin voices, having difficulties to switch between German and English voices, but mostly because of the IMHO better design of speechd-el, I finally switched to speechd-el maintained by Milan Zamazal in context of the Free(B)Soft Laboratory.

Speechd-el has a very different approach than emacspeak:

it "just" provides an audio interface to vanilla emacs. Theoretically, if additional or 3rd party packages are implemented correctly on top of vanilla emacs, they should be speak-able by speechd-el by default. Surprisingly this works quite well, many packages work out of the box, probably because of the sophisticated design of Emacs itself. But also many software components don't work, or rather are not spoken correctly, or speak even to much.
it directly plugs into speech-dispatcher (like f.e. orca), meaning - as the voxin voices work just fine without IBM TTS with speech-dispatcher, you have the excellent voxin voices!
unfortunately this also means, it is only usable on operating systems, where speech-dispatcher is available. Currently AFAIK Linux, OpenBSD, and probably some other \*BSDs. So, no Windows, MacOS?
the elisp code is much easier to grok (just 5700 lines of elisp code), very well documented, IMHO a much better and complete software design, especially if you want to change things on your own.
it also support sound icons. You'll have to get the sound-icons package and extract it to "/usr/share/sounds/sound-icons", or rather put your own sounds into it.
speechd-el supports also braille interfaces (emacspeak doesn't), but I have never used them.

BUT - as mentioned - speechd-el just provides an audio interface to vanilla emacs. Setting up email readers, web browser, any special needs for a blind user, like speaking the clock, etc., you'll have to do it on your own!

So, I started to adapt my Emacs configuration to incorporate speechd-el, with the final goal, to have an audio desktop to use and manage my computer in any aspect.

I was very surprised, that I didn't find any examples of configuration of emacs using speechd-el from users, that went a similar path. Perhaps I just missed it? Please let me know, if you use Emacs + speechd-el in such a way!

My current setup of speechd-el

Following my attempts to make Emacs + speechd-el my audio desktop. This is work in progress, I am by no way an expert in elisp. Please take it just as an starting point, if you want to go a similar path.

All these snippets are put on top of my general Emacs configuration, this shows only my speechd-el specific configurations.

UPDATE - please see https://diesenbacher.net/blog/entries/Updated-speechd-el-config.html my updated config.

prepare for speechd-el

debug on errors
disable eldoc mode, as this just speaks to much. If you need eldoc hints, just call (eldoc) and get the information in a separate buffer.
reduce the mode-line

(setq debug-on-error t)
(global-eldoc-mode -1)
(setf mode-line-format '("%e" mode-line-buffer-identification mode-line-modified mode-line-position ))

load speechd-el

add the directory of speechd-el to the load-paths.
reduce speechd-out-active-drivers to audio, as I don't use braille.

(add-to-list 'load-path (expand-file-name "~/install/speechd-el/"))
(autoload 'speechd-speak "speechd-speak" nil t)
(setf speechd-out-active-drivers '(ssip))

Settings

Various

Various speechd-el specific settings as I prefer explicitly setting variables over customizing-feature of Emacs.

always speak whole line
speak echo as words are types
reduce changes of buffer states to be spoken
the speechd event map

(setf speechd-speak-read-command-keys nil)

(setf speechd-speak-whole-line t)

(setf speechd-speak-echo 'word)

(setf speechd-speak-use-index-marks t)

(setf speechd-speak-buffer-insertions t)

(speechd-set-punctuation-mode 'all)

(setf speechd-speak-ignore-command-keys
      '(forward-char backward-char right-char left-char
                     next-line previous-line delete-char
                     comint-delchar-or-maybe-eof delete-backward-char
                     backward-delete-char-untabify
                     delete-forward-char c-electric-backspace
                     c-electric-delete-forward))

(setf speechd-speak-auto-speak-buffers '("*Help*"
                                         "*Completions*") )

(setf speechd-speak-by-properties-on-movement t)

(setf speechd-speak-state-changes
      '(
        ;; buffer-identification
        buffer-read-only
        ;; frame-name
        ;; frame-identification
        major-mode
        ;; minor-modes
        buffer-file-coding
        terminal-coding
        input-method
        ;; process
        ))

(setf speechd-out--event-mapping
      '((empty . empty-text)
        (whitespace . whitespace) 
        (beginning-of-line . beginning-of-line)
        (end-of-line . end-of-line)
        (start . start)
        (finish . finish)
        (minibuffer . prompt)
        (message . message)))

Faces

Define voices and for which face to use which voice, f.e.:

higher pitch for various links
lower pitch for headlines

(setf speechd-voices '((voice-link . ((pitch . 50)))
                       (voice-function-name . ((pitch . -30)
                                               (rate . -10)
                                               (style . 3)
                                               (punctuation-mode . all)))
                       (voice-heading . ((pitch . -250)))
                       (voice-source-code . ((pitch . 0)
                                             (rate . -10)
                                             (punctuation-mode . all)))))

(setf speechd-face-voices '((font-lock-function-name-face . voice-function-name)
                            (Link . voice-link)
                            (info-xref . voice-link)
                            (shr-line . voice-link)
                            (elpher-gemini . voice-link)
                            (org-level-1 . voice-heading)
                            (org-level-2 . voice-heading)
                            (org-level-3 . voice-heading)
                            (org-block . voice-source-code)
                            (org-source . voice-source-code)
                            (org-link . voice-link)
                            (shr-h1 . voice-heading)
                            (shrface-h1-face . voice-heading)
                            (shrface-h2-face . voice-heading)
                            (shrface-h3-face . voice-heading)
                            (shrface-h4-face . voice-heading)
                            (elpher-gemini-heading1 . voice-heading)
                            (elpher-gemini-heading2 . voice-heading)
                            (elpher-gemini-heading3 . voice-heading)
                            (shrface-href-face . voice-link)))

Application/package specific adaptions

completion in the minibuffer

additionally to M-up and M-down also define C-<up>/<left> and C-<down>/<right> for completion, as they are much more reachable on my keyboard.

(define-key minibuffer-mode-map (kbd "C-<up>") 'minibuffer-previous-completion)
(define-key minibuffer-mode-map (kbd "C-<left>") 'minibuffer-previous-completion)

(define-key minibuffer-mode-map (kbd "C-<down>") 'minibuffer-next-completion)
(define-key minibuffer-mode-map (kbd "C-<right>") 'minibuffer-next-completion)

various post-command-hooks

speak number of yanked chars
TODO yank-pop, kill-line
speak number of chars put into kill-ring

(defun okflo-post-command-hook ()
  (when global-speechd-speak-mode
    (cond
     ((equal this-command 'yank)
      (let ((yanked-text (car kill-ring-yank-pointer)))
        (speechd-say-text (format "yanked %s chars" (length yanked-text)) :priority 'important)))
     ((and (equal this-command 'self-insert-command)
           (equal last-command 'yank-pop))
      (speechd-say-text (format "saved %s chars into killring" (length text)) :priority 'important))
     ((equal this-command 'kill-ring-save)
      (let ((text (car kill-ring-yank-pointer)))
        (speechd-say-text (format "saved %s chars into killring" (length text)) :priority 'important))))))

(add-hook 'post-command-hook 'okflo-post-command-hook)

fast switch between german and english voice

(defun okflo-switch-to-german ()
  (interactive)
  (speechd-set-voice "petra-ml-embedded-high")
  (speechd-set-language "de")
  (speechd-set-rate 20)
  (message "Deutsch"))

(define-key speechd-speak-mode-map "g" 'okflo-switch-to-german)

(defun okflo-switch-to-english ()
  (interactive)
  (speechd-set-voice "allison-embedded-high")
  (speechd-set-language "en")
  (speechd-set-rate 0)
  (message "english"))

(define-key speechd-speak-mode-map (kbd "C-g") 'okflo-switch-to-english)

control sound volume

"C-e +" and "C-e -" increases or decreases volume.

(defun okflo-volume- ()
  (interactive)
  (shell-command "amixer sset Master 5%-")
  (message "Volume down"))

(define-key speechd-speak-mode-map "-" 'okflo-volume-)

(defun okflo-volume+ ()
  (interactive)
  (shell-command "amixer sset Master 5%+")
  (message "Volume up"))

(define-key speechd-speak-mode-map "+" 'okflo-volume+)

Filter current buffer in a new buffer

takes the content of the current buffer and copies it into a new buffer "\*Filter: <buffer-name>*". This buffer is associated with no file. Purpose of this buffer is to (destructively) filter the content and make it easy to grok.
press "C-e f" to activate it.
commands as minor mode `okflo-filter-mode` (always affecting the whole buffer!) for the new buffer:
- "C-c d": Delete all lines containing regex.
- "C-c k": Keep all lines containing regex.
- "C-c h": Count number of lines containing regex.

(defun okflo-keep-lines (regexp &optional rstart rend interactive)
  (interactive
   (progn
     (barf-if-buffer-read-only)
     (keep-lines-read-args "Keep lines containing match for regexp")))
  (let ((orig-lines (count-lines (point-min) (point-max))))
    (keep-lines regexp (point-min) (point-max))
    (message (format "Reduced from %s to %s lines" orig-lines (count-lines (point-min) (point-max))))))

(defun okflo-delete-lines (regexp &optional rstart rend interactive)
  (interactive
   (progn
     (barf-if-buffer-read-only)
     (keep-lines-read-args "Flush lines containing match for regexp")))
  (let ((orig-lines (count-lines (point-min) (point-max))))
    (flush-lines regexp (point-min) (point-max))
    (message (format "Reduced from %s to %s lines" orig-lines (count-lines (point-min) (point-max))))))

(define-minor-mode okflo-filter-mode
  "Easily descructively parse the content."
  :lighter "OFM"
  :keymap `((,(kbd "C-c k") . okflo-keep-lines)
            (,(kbd "C-c d") . okflo-delete-lines)
            (,(kbd "C-c h") . how-many)))

(defun okflo-filter-buffer ()
  (interactive)
  (let* ((tobe-filtered-buf (current-buffer))
         (buf-name (format "*Filter:<%s>*"(buffer-name tobe-filtered-buf)))
         (new-buf (get-buffer-create buf-name)))
    (save-excursion
      (copy-to-buffer new-buf (point-min) (point-max)))
    (set-buffer new-buf)
    (switch-to-buffer new-buf)
    (okflo-filter-mode)))

(define-key speechd-speak-mode-map "f" 'okflo-filter-buffer)

Speak time

  (defun okflo-speak-time ()
  (interactive)
  (let ((dt (decode-time (current-time)))
        (months '(January Febuary March April May June July August September October December)))
    (speechd-say-text
     (format "Time %s:%s Date %s %s. %s"
             (decoded-time-hour dt)
             (decoded-time-minute dt)
             (nth (1- (decoded-time-month dt)) months)
             (decoded-time-day dt)
             (decoded-time-year dt))
     :priority 'important)))

(define-key speechd-speak-mode-map "t" 'okflo-speak-time)

Battery

(require 'battery)

(defun okflo-speak-battery-status ()
  (interactive)
  (battery))

(define-key speechd-speak-mode-map "~" 'okflo-speak-battery-status)

Helm

snippet just for reference - for now I avoid any completion framework like helm (there are many) - because working with the vanilla \*Completetion* buffer seems to work best.

(require 'helm)

(defun okflo-helm-move-selection-after-hook ()
  ;; stolen from emacspeak-helm.el
  (let* ((inhibit-read-only t)
         (line (buffer-substring (line-beginning-position) (line-end-position)))
         (count-msg (format "%d of %d"
                            (- (line-number-at-pos) 1)
                            (- (count-lines(point-min) (point-max)) 1))))
    (when (and line count-msg)
      (speechd-say-text (concat line " - " count-msg )))))

(add-hook 'helm-move-selection-after-hook #'okflo-helm-move-selection-after-hook)
(add-hook 'helm-after-initialize-hook #'okflo-helm-move-selection-after-hook)


(defun helm-display-mode-line (source &optional force)
  "do nothing"
  ;; make function helm-display-mode-line doing nothing, to prevent
  ;; "holm-customize-group" be spoken.
  )

eshell / eat

have a command in eshell (d for done) to append as " ; d" to get audio feedback, when command has finished, optionally provide a text to be spoken after "d", defaults to "done.".
after each command finished processing, give audio feedback and tell us how many lines of output are done.

(cl-defun eshell/d (&optional (text "done."))
  (speechd-say-sound "at" :priority 'important)
  (speechd-say-text text :priority 'important))

(defun okflo-eshell-post-command-hook ()
  (when global-speechd-speak-mode
    (speechd-say-sound "at"
                       :priority 'important)
    (speechd-say-text (format "output %s lines" (1- (count-lines (point) (1+ eshell-last-input-end))))
                      :priority 'important )))

(add-hook 'eshell-post-command-hook
          'okflo-eshell-post-command-hook)

elfeed

change entries in search-view, so that title gets spoken first.
elfeed-search-remain-on-entry needs to be set to t, otherwise following entries are spoken in entry-view

(require 'elfeed)

(setf elfeed-search-title-max-width 150)

(defun okflo-elfeed-search-print-entry (entry)
  "Print ENTRY to the buffer."
  (let* ((date (elfeed-search-format-date (elfeed-entry-date entry)))
         (title (or (elfeed-meta entry :title) (elfeed-entry-title entry) ""))
         (title-faces (elfeed-search--faces (elfeed-entry-tags entry)))
         (feed (elfeed-entry-feed entry))
         (feed-title
          (when feed
            (or (elfeed-meta feed :title) (elfeed-feed-title feed))))
         (tags (mapcar #'symbol-name (elfeed-entry-tags entry)))
         (tags-str (mapconcat
                    (lambda (s) (propertize s 'face 'elfeed-search-tag-face))
                    tags ","))
         (title-width (- (window-width) 10 elfeed-search-trailing-width))
         (title-column (elfeed-format-column
                        title (elfeed-clamp
                               elfeed-search-title-min-width
                               title-width
                               elfeed-search-title-max-width)
                        :left)))
    (insert (propertize title-column 'face title-faces 'kbd-help title) " ")
    (when feed-title
      (insert (propertize feed-title 'face 'elfeed-search-feed-face) " "))
    (insert (propertize date 'face 'elfeed-search-date-face) " ")
    (when tags
      (insert "(" tags-str ")"))))

(setf elfeed-search-print-entry-function #'okflo-elfeed-search-print-entry)

(setf elfeed-search-remain-on-entry t)

org-mode

Org-mode is simply great. Whatever kind of notes you take, including timestamps, having an agenda, export to any document format. And the best thing for visually impaired persons: it provides structured text, easier to skim and grok!

get feedback, whether entry is folded, bound to C-c f in org-buffers.

(defun okflo-say-org-fold-status ()
  (interactive)
  (save-excursion
    (end-of-line)
    (if (org-fold-folded-p)
        (speechd-say-text "folded" :priority 'important)
      (speechd-say-text "not folded" :priority 'important))))

(define-key org-mode-map (kbd "C-c f") 'okflo-say-org-fold-status)

EWW

Using a text-browser like eww, without any Javascript. This works suprisingly well. Github, stackexchange, … , after pressing "R" to reduce the unnecessary content.

have an audio-feedback, when load/rendering of page is done.
shrface: convert any html in eww to org-mode presentation, so that we have it in a structured view.

(defun okflo-eww-page-loaded ()
  (when global-speechd-speak-mode
    (speechd-say-sound "piano-3.wav" :priority 'important)
    (speechd-say-text "page loaded" :priority 'important)))

(add-hook 'eww-after-render-hook #'okflo-eww-page-loaded)

(use-package shrface
  :ensure t
  :config
  (shrface-basic)
  (shrface-trial)
  (shrface-default-keybindings)         ; setup default keybindings
  (setq shrface-href-versatile t))

(add-hook 'eww-after-render-hook #'shrface-mode)

(with-eval-after-load 'eww
  (define-key eww-mode-map (kbd "<tab>") 'shrface-outline-cycle)
  (define-key eww-mode-map (kbd "S-<tab>") 'shrface-outline-cycle-buffer)
  (define-key eww-mode-map (kbd "C-t") 'shrface-toggle-bullets)
  (define-key eww-mode-map (kbd "C-j") 'shrface-next-headline)
  (define-key eww-mode-map (kbd "C-k") 'shrface-previous-headline)
  (define-key eww-mode-map (kbd "M-l") 'shrface-links-counsel)
  (define-key eww-mode-map (kbd "M-h") 'shrface-headline-counsel)
  (setq shr-inhibit-images t)) 

(setq org-startup-folded t)

mu4e

Beside of many other email readers in Emacs, mu4e is great!

when viewing a message speak its subject

(require 'mu4e)
(setf mu4e-headers-fields
      '((:from-or-to . 30) (:thread-subject . 70) (:flags . 6) (:human-date . 12)))

(defun okflo-mu4e-speak-subject ()
  (when global-speechd-speak-mode
    (speechd-say-text (getf (mu4e-message-at-point) :subject) :priority 'important)))

(add-hook 'mu4e-view-rendered-hook
          'okflo-mu4e-speak-subject)

finally

start speechd-el

(speechd-speak)