Wikimedia Technology/Annual Plans/FY2019/CDP1: Privacy, Security, and Data Management/CDP Budget Segment 3/Goals

Program Goals and Status for FY18/19

  • Goal Owner: NRuiz (WMF)
  • Program Goals for FY18/19: Develop, maintain and mature our privacy, security, and data management practices in order to protect Wikimedia community member and donor information, comply with applicable privacy and data protection regulations, and ensure safe and secure connection to Wikimedia projects and sites in accordance with the values of the movement.
  • Annual Plan: Segment 3 - Analytics

Outcome / Output


Ensure the high-quality protection and security of our infrastructure and data.

Make systems compliant with security best practices, as vetted and recommended by Security.

Dependencies on: Security team


  • More restrictive Firewall rules for Kafka. task T204957
  • Review the requirements for a service implementing a stronger user authentication scheme for the Analytics Hadoop cluster and possibly for other related tools (like Zookeeper).   Done
  • STRETCH GOAL: implement a prototype in labs that the Analytics team can test and evaluate. task T198227   Done



  Note: October 19, 2018

We are working with SRE evaluating the requirements of Kerberos

  Note: November 14, 2018

Testing kerberos in labs cluster is now   In progress

  Note: December 6, 2018

We have a prototype running in labs that allows us to test kerberos, need to decide what use cases to hit first when moving this work to prod, but the goals for this quarter are   Done.

Outcome / Output (Analytics)


Ensure the high-quality protection and security of our infrastructure and data.

Make systems compliant with security best practices, as vetted and recommended by Security.

Dependencies on: Security team, SRE


  • Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one task T212256   Done
  • Set up a Kerberos KDC service in production with minimal puppet automation task T212257   Partially done
  • Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings task T212259
  • Create test Kerberos identities/accounts for some selected users from Analytics task T212258



  Note: February 14, 2019

  • We have set up a shadow test cluster to which we are adding kerberos, we are in track to be able to test critical jobs

  Note: March 14, 2019

These two work items are  N Postponed until next quarter:
  • Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings task T212259
  • Create test Kerberos identities/accounts for some selected users from Analytics task T212258

Outcome / Output (Analytics)


Ensure the high-quality protection and security of our infrastructure and data.

Make systems compliant with security best practices, as vetted and recommended by Security.

Dependencies on: Security Team, SRE


  • Run critical Analytics Hadoop jobs and make sure that they work with the new auth settings task T212259
  • Create test Kerberos identities/accounts for some selected users from Analytics task T212258



May 2019

We delayed this work due to 1) superset upgrades that took much longer than planned and 2) lack of availability from SRE to troubleshoot the current setup of Kerberos which has some issues. Still, we hope to be mostly done by EOQ.

  To do June 2019
