In software, we’ve moved from a world where a customer buys a piece of software to run on their own infrastructure, to a world where a customer pays a vendor to run software on the vendor’s infrastructure. With machine learning, we may see another evolution of this. Machine learning startups create models based on data provided by customers. Should customers be compensated for their contribution?
Unlike the first wave of SaaS software, machine learning startups benefit from the data their customers share with them. Many times, machine learning startups create one global machine learning model that is used across the customer base. Each marginal customer provides additional data that refines the model.
Today, I could argue customers contribute their data altruistically. Another might argue the customer pays for the right to contribute their data and benefit from the software startup’s models. A third might argue that the customer should actually benefit economically from this contribution. Which is the right viewpoint? Which is the viewpoint that will ultimately govern the customer/vendor relationships in SaaS?
Online advertising serves as a potential model for how the machine learning software as a service ecosystem might evolve. In the 2000s, Google and others built global machine learning models for ad targeting across websites. Some customers balked at the idea of contributing their data to benefit their competitors. But there wasn’t a simple way to isolate individual customer data and its benefit to the global targeting. So you either ran Google ads or you didn’t.
As ecosystem evolved, website publishers sought to monetize the data they were providing. This created an opportunity for companies like BlueKai and others to build a Data Management Platform, which aggregated all the publishers' first party data and allow them to sell their data on a marketplace. And that initiative met some success.
However, many of these publishers soon realized that they benefited more from leveraging their own data directly to improve their own monetization efforts. Rather than selling their data to third parties, they focused on building competitive advantage through their own data.
To summarize, publisher still contribute their data to global advertising networks to improve performance for everyone. But they have doubled down on extracting unique value for themselves from their first party data.
I believe that the SaaS ecosystem will evolve similarly. Startups and incumbents alike focus on a few core areas of competitive differentiation. In those areas, these businesses will focus on first party data, limit the amount of data sharing with competition, and likely build models in-house to maximize their advantage.
For non-strategic areas, these businesses will pay to use ML-based SaaS. And they will be very unlikely to demand a royalty for data rights usage, because the marginal revenue from such an arrangements simply isn’t meaningful for many of them. It also would challenge the business model of the SaaS startup.
In other words, the revenue from data royalties is worth less than the value of an intelligent human resources information system or customer relationship management system or customer support system. The universe of buyers would rather see a successful machine learning SaaS company than generate $15-100k annually in data-royalty revenue.