Shepherdly actively monitors popular open-source repositories for research purposes. Below we’ll walk you through what information Shepherdly would have been able to provide the maintainers at the time and more importantly how surface-level statistics break apart traditional engineering tuition.
First, the following analysis is in no way a judgment on the quality of work by these maintainers. We’re selecting an example for analysis to elevate awareness of how objective risk measurements can assist software engineers in which changes are more or less deserving of their time.
Small PRs are safer and should go faster, right?
The pull request we’ll be analyzing today is https://github.com/redis/redis/pull/11766. At first glance, it’s a tightly scoped change. ~100 lines across 7 files, including tests.
Many engineers and engineering metrics would consider this a “small” change and therefore, lower risk (by default). Furthermore, smaller changes carry a compounding bias that they should be merged quickly. This change followed that trajectory with a total cycle time of about 1 day.
This change introduced CVE-2023-41056 and was later fixed here.
Shepherdly's Take
Our model classified #11766 as high risk with a score of 76/100. The predictors in this case were the size of the change and the number of commits from reviewers.
Since the vast majority of changes within teams are well below this risk level, it’s imperative to focus on mitigations when the risk is that high.
Traditional Mitigation Intuition Needs Help
A lot of prevailing wisdom for how to mitigate risk within code changes rests on review and testing. This should remain of course, but software engineering as a profession lacks the precision of where risk will manifest before changes ship. Consider this research paper that demonstrated developers were 8x more likely to find vulnerabilities when given a reason to do so.
What this highlights is that there is enormous latent ability within engineers to find and mitigate these issues, but they lack a definitive tool to justify where they should be allocating their time.
Category | Examples | Collected |
---|---|---|
A. Identifiers | Contact details, such as real name, alias, postal address, telephone or mobile contact number, unique personal identifier, online identifier, Internet Protocol address, email address, and account name | YES |
B. Personal information categories listed in the California Customer Records statute | Name, contact information, education, employment, employment history, and financial information | NO |
C. Protected classification characteristics under California or federal law | Gender and date of birth | NO |
D. Commercial information | Transaction information, purchase history, financial details, and payment information | NO |
E. Biometric information | Fingerprints and voiceprints | NO |
F. Internet or other similar network activity | Browsing history, search history, online behavior, interest data, and interactions with our and other websites, applications, systems, and advertisements | NO |
G. Geolocation data | Device location | |
H. Audio, electronic, visual, thermal, olfactory, or similar information | Images and audio, video or call recordings created in connection with our business activities | NO |
I. Professional or employment-related information | Business contact details in order to provide you our Services at a business level or job title, work history, and professional qualifications if you apply for a job with us | NO |
J. Education Information | Student records and directory information | NO |
K. Inferences drawn from other personal information | Inferences drawn from any of the collected personal information listed above to create a profile or summary about, for example, an individual’s preferences and characteristics | NO |
L. Sensitive Personal Information | NO |