secluded/content/posts/audacity-and-the-telemetry-...

17 KiB
Raw Blame History

title author lastmod tags categories draft
Audacity and the telemetry pull request
Amolith
2023-01-27T13:00:37-05:00
Open source culture
Audio editing
Music
Drama
Technology
true

Five days ago at the time of writing, Dmitry Vedenko opened a Pull Request (PR) in Audacity's GitHub repository entitled Basic telemetry for the Audacity. About two days later, all hell broke loose. That PR now has over 3.3 thousand downvotes and more than one thousand comments from nearly 400 individuals. I started reading the posts shortly after they began and kept up with them over the following days, reading every single new post. I recognise that few people are going to feel like wading through over 1k comments so this is my attempt to provide a summary of the PR itself using the community's code reviews along with a summary of the various opinions conveyed in the comments.

When I reference comments, I'll provide a footnote that includes a link to the comment and a link to a screenshot just in case it's removed or edited in the future.

Audacity's acquisition

I haven't been able to find much information in this area so forgive me if I'm scant on details.

On 30 April, a company called Muse Group acquired Audacity. According to their website, Muse is the parent company behind many musical applications and tools. It was founded by Eugeny Naidenov just days before it acquired Audacity. Before all of this, Eugeny Naidenov founded Ultimate Guitar (UG) in 1998. The service grew rather quickly and now has over 300 million users. UG acquired Dean Zelinsky Guitars in 2012, Agile Partners in 2013, MuseScore in 2017, and Crescendo in 2018. Muse Group was established in 2021 and it seems as if all of the services UG acquired were (or will be) transferred to Muse Group, as well as UG itself. Immediately following its establishment, Muse not only acquired Audacity but also StaffPad.

I say 30 April because that's when Muse published their press release and when Martin Keary (Tantacrul) published a video entitled Im now in charge of Audacity. Seriously. According to his comment,1 Martin will help with proposing Audacity's roadmap and many of its future features as well as working with the community. This has been his role with MuseScore since he joined that project and he will be continuing it here.

-----BEGIN PERSONAL OPINION-----

Looking at his website, I also suspect he will play a large role in redesigning Audacity's interface. Considering that he was instrumental in designing the best mobile interface I've ever had the absolute pleasure of experiencing, I have high hopes that this is the case.

------END PERSONAL OPINION------

Telemetry implementation

Implementation Basics

A few days after the acquisition, a PR was opened that adds Basic telemetry for the Audacity. This implementation collects "application opened" events and sends those to Yandex to estimate the number of Audacity users. It also collects session start and end events, errors for debugging, file used for import and export, OS and Audacity versions, and the use of effects, generators, and analysis tools so they can prioritise future improvements. Sending this data would be optional and the user would be presented with a dialogue the first time they launch the application after installation or after they update to the including release. This description was mostly copied directly from the PR description itself.

Frontend Implementation

This is fairly straightforward and a pretty standard UI for prompting users to consent to analytics and crash logging. This section is included because the community has strong opinions regarding the language used and its design, but that will be discussed later. The screenshot below is copied directly from the PR.

{{< figure src="~/repos/sites/secluded/static/assets/pngs/audacity-pr/consentdialogue.png" link="~/repos/sites/secluded/static/assets/pngs/audacity-pr/consentdialogue.png"

}}

Backend Implementation

Many of the code reviews include the reviewer's personal opinion so I will summarise the comment, provide the code block in question, and link directly to the comment in a footnote.2

  if (!inputFile.Write (wxString::FromUTF8 (ClientID + "\n")))
    return false;

Lines 199-200 of TelemetryManager.cpp save the user's unique client ID to a file.3 This allows the analytics tool (in this case, Google Analytics) to aggregate data produced by a single user.

  def_vars()

    set( CURL_DIR "${_INTDIR}/libcurl" )
    set( CURL_TAG "curl-7_76_0")

Lines 3-6 of CMakeLists.txt "vendor in" libcurl.4 This is when an application directly includes sources for a utility rather than making use utilities provided by the system itself.

  ExternalProject_Add(curl
     PREFIX "${CURL_DIR}"
     INSTALL_DIR "${CURL_DIR}"
     GIT_REPOSITORY https://github.com/curl/curl
     GIT_TAG ${CURL_TAG}
     GIT_SHALLOW Yes
     CMAKE_CACHE_ARGS ${CURL_CMAKE_ARGS}
  )

Lines 29-36 of CMakeLists.txt add curl as a remote dependency.5 This means that the machine building Audacity from its source code has to download curl during that build.

  S.Id (wxID_NO).AddButton (rejectButtonTitle);
  S.Id (wxID_YES).AddButton (acceptButtonTitle)->SetDefault ();

Lines 93-94 of TelemetryDialog.cpp add buttons to the dialogue asking the user whether they consent to data collection.6 SetDefault focuses the button indicating that the user does consent. This means that if the user doesn't really look at the dialogue and presses Spacebar or Enter, or if they do so accidentally by simply bumping the key, they unintentionally consent to data collection. If the user desires, this can later be changed in the settings menu. However, if they weren't aware what they were consenting to or that they did consent, they won't know to go back and opt out.

There are other problems with the code that include simple mistakes, styling that's inconsistent with the rest of the project, unhandled return values resulting in skewed data, use of inappropriate functions, and spelling errors in the comments. I believe these are less important than those above so they won't be discussed.

Community opinions

There were many strong opinions regarding both the frontend and backend implementations of this PR, from the wording of the dialogue and highlighting the consent button to devices running something other than Windows and macOS not being able to send telemetry and thus skewing the data that was collected.

Opinions on the frontend

Really, the only frontend here is the consent dialogue. However, there are many comments about it, the most common of which is probably that the wording is not only too vague7 but also inaccurate.8 The assertion that Google Analytics are not anonymous and any data sent can be trivially de-anonymised (or de-pseudonymised) is repeated many times over. Below are a few links to comments stating such. I searched for the term "anonymous", copied relevant links, and stopped when my scrollbar reached halfway down the page.

The next most pervasive comment is regarding the consent buttons at the bottom of the dialogue where users opt in or out.9 Many individuals call this design a dark pattern. Harry Brignull, a UX specialist focusing on deceptive interface practises, describes dark patterns as tricks used in websites and apps that make you do things that you didn't mean to. The dark pattern in this situation is the opt-in button being highlighted. Many community members assert that users will see the big blue button and click it without actually reading the dialogue's contents. They just want to record their audio and this window is a distraction that prevents them from doing so; it needs to get out of the way and the quickest way to dismiss it is clicking that blue button. Below is a list of some comments criticising this design.

Another issue that was brought up by a couple of individuals was the lack of a privacy policy.10 The consent dialogue links to one, but, at the time of writing, one does not exist at the provided URL. I have archived the state of the page in case that changes in the future.

Opinions on the backend

  if (!inputFile.Write (wxString::FromUTF8 (ClientID + "\n")))
    return false;

The issue many individuals take with this snippet is saving the ClientID. Say an individual has an odd file that causes Audacity to crash any time they try to open it. Say they attempt to open it a hundred times. Without giving the client a unique ID, it could look like there are 100 people having an issue opening a file instead of just the one. However, by virtue of each installation having an entirely unique ID, this telemetry is not anonymous. Anonymity would be sending statistics in such a way that connecting those failed attempts to a single user would be impossible. At best, this implementation is pseudonymous because the client is given a random ID, you don't have to sign in with an account or something.

  def_vars()

    set( CURL_DIR "${_INTDIR}/libcurl" )
    set( CURL_TAG "curl-7_76_0")

Timothe Litt's comment gives a good description of why "vendoring in" libcurl is a bad idea11 and Tyler True's comment gives a good overview of the pros and cons of doing so.12 Many people take issue with this specifically because it's libcurl. Security flaws in it are very common and Audacity's copy would need to be manually kept up to date with every upstream release to ensure none of its vulnerabilities can be leveraged to compromise users. If the Audacity team was going to stay on top of all of the security fixes, they would need to release a new version every week or so.

  ExternalProject_Add(curl
     PREFIX "${CURL_DIR}"
     INSTALL_DIR "${CURL_DIR}"
     GIT_REPOSITORY https://github.com/curl/curl
     GIT_TAG ${CURL_TAG}
     GIT_SHALLOW Yes
     CMAKE_CACHE_ARGS ${CURL_CMAKE_ARGS}
  )

The problem with downloading curl at build-time is that it's simply disallowed for many Linux- and BSD-based operation systems. When a distribution builds an application from source, its build dependencies are often downloaded ahead of time and, as a security measure, the build machine is cut off from the internet to prevent any interference. Because this is disallowed, the build will fail and the application won't be available on those operation systems.

Note, however, that these build machines would have the option to disable telemetry at build-time. This means the machine wouldn't attempt to download curl from GitHub and the build would succeed but, again, telemetry would be disabled for anyone not on Windows or macOS. This defeats the whole purpose of adding telemetry in the first place.

  S.Id (wxID_NO).AddButton (rejectButtonTitle);
  S.Id (wxID_YES).AddButton (acceptButtonTitle)->SetDefault ();

There was a lot of feedback about the decision to highlight the consent button but that was mentioned up in the frontend section; I won't rehash it here.

Broader and particularly well-structured comments

These are simply some comments I feel deserve particular attention.

From SndChaser...

The Audacity team's response


The privacy policy modification

https://github.com/audacity/audacity/issues/1213#issuecomment-875274890


  1. Link to the comment and link to the screenshot ↩︎

  2. Note that because I am not a C programmer, these reviews might not be entirely accurate and I wouldn't be able to catch the reviewer's error. I am relying on other community members to catch issues and comment on them; none of the reviews I link to have such comments so I'm assuming they are correct. ↩︎

  3. Link to the review and link to the screenshot ↩︎

  4. Link to the review and link to the screenshot ↩︎

  5. Link to the review and link to the screenshot ↩︎

  6. Link to the review and link to the screenshot ↩︎

  7. Link to the comment and link to the screenshot ↩︎

  8. Link to the comment and the screenshot is the same as previous ↩︎

  9. Link to the comment and link to the screenshot ↩︎

  10. Link to the comment and link to the screenshot ↩︎

  11. Link to the comment and link to the screenshot ↩︎

  12. Link to the comment and link to the screenshot ↩︎