Access to external data can provide a competitive advantage. Our research shows that more than three-quarters (77%) of participants consider external data to be an important part of their machine learning (ML) efforts. The most important external data source identified is social media, followed by demographic data from data brokers. Organizations also identified government data, market data, environmental data and location data as important external data sources. External data is not just part of ML analyses though. Our research shows that external data sources are also a routine part of data preparation processes, with 80% of organizations incorporating one or more external data sources. And a similar proportion of participants in our research (84%) include external data in their data lakes.
External data enriches the customer data an organization collects, enabling better segmentation of the customer base and, ultimately, a better customer experience. External data also enriches artificial intelligence and machine learning (AI/ML)-enhancing feature engineering for better accuracy in the models. Driver-based planning can be improved with external data, such as economic data, that may impact operations and capital costs. Organizations can also use external data to benchmark their performance relative to their peers and relative to their competitors. External data provides the basis for market share analysis and helps determine the size of the addressable market.
Despite the value it provides, acquiring relevant external data presents several challenges. It can be expensive and hard to access - hundreds of data providers, data aggregators, and data marketplaces make finding and acquiring relevant data overwhelming. Data is tedious to use, as manipulating, transforming, and matching to internal data takes significant time and effort. Finally, the data can be out of date and pose compliance and governance risks.
Organizations can attempt to acquire and manage the external data they need, or they can work with an external data provider. An external data platform provider manages the information, keeps it current and makes it available as necessary. Ideally, the platform would also assist with analyses by automatically discovering features of the data and their potential impacts on ML models. The platform should help match and validate multiple external data sources with each other and with internal data sources. Where data needs to be loaded into internal systems, the data vendor can provide various ways to export data or analyses, such as ML models based on the data.
Organizations that are not using external data should strongly consider how this data could enhance their financial, operational and analytical processes. As organizations utilize external data, they should address the issues associated with purchasing, storing, accessing and maintaining this data. Each organization will need to address these issues with approaches appropriate for its situation, but should ensure its strategy encourages rather than discourages the use of external data.
Regards,
David Menninger