Data Science in Investing

There is a relatively recent phenomenon of incorporating heavy use of data & analytics into the investment management process, specifically in private markets.

For this post, I wanted to highlight a relatively recent phenomenon of incorporating heavy use of data & analytics into the investment management process, specifically in private markets (e.g., private equity). There are various funds applying advanced data & analytics, machine learning (ML) and artificial intelligence to the three primary work streams of investing: (i) opportunity sourcing, (ii) company diligence and (iii) portfolio company value creation. For example, Blackstone has a team of approximately twenty data engineers / scientists working to, for one, operationalize portfolio company data. Taking an alternative route in 2018, Vista Equity Partners invested ~$100 million to acquire 7Park Data, a third-party alternative data (“alt data”) provider. While I could discuss the approach of either of these two firms in detail, I believe it will be most helpful to discuss how private equity managers, broadly, are approaching the data conundrum.

Value Capture:

Although fund managers are contemplating using data to both enhance the existing process / product AND generate a new product (i.e., operationalizing portfolio company data to sell as data sets to others), the former is much more prevalent today. Irrespective of the scenario / use case, however, there are similar processes used to capture, clean and implement data for advanced analytics, machine learning and (sparingly) artificial intelligence.

Interestingly, many of the algorithms needed for machine learning applications can be acquired for free via open source projects. As such the challenge at-hand comes down to data sourcing, engineering (e.g., ETL or extract-transform-load) and iterative modelling. On the data sourcing side, the magic is truly in the combination of data / data sets (i.e., appending first-party data with third-party data) rather than leveraging proprietary data alone. As such, funds have partnered with third-party data brokers of the likes of Eagle Alpha to obtain valuable data sets. In terms of data engineering, the market is yet to unveil a “data cleansing in-a-box” tool (i.e., this process can be extremely burdensome – time consuming and difficult). To tackle this, funds have typically followed one of two approaches, discussed below.

Model #1 – Hire data & analytics consulting firm for discrete use cases / questions:

Whether via a traditional consulting firm (e.g., McKinsey, Deloitte) that has added data & analytics capabilities or a niche firm that exclusively offers these services, funds have retained consultants to help answer discrete questions (to be answered with data, ML, etc.). These consultants typically leverage some sort of machine learning to speed up the data ingestion / cleaning process and ultimately deliver easy-to-understand outputs / dashboards to the client, often in the due diligence phase of an investment process where there is “budget” to adopt such expensive tools. The consultant is able to amortize the cost of building these ML tools over several client engagements, and should be able to bring down the cost of services over time.

Model #2 – Build internal team and proprietary capabilities / platform:

Another path certain funds, like EQT Partners or Blackstone, have taken is to build internal data science practices. While the end-state can be extremely attractive from the perspective of having built a proprietary data platform / easily accessibly tools, the investment and effort that goes into such an endeavor are substantial (and cost-prohibitive in most fund environments).


There are several challenges that must be appreciated when incorporating data science into investing. First, there is a persistent problem of getting companies to a level of unified data; data is so scattered that it is nearly impossible to join data sets together and create a unified view (i.e., analytics can’t be done at first). As such, 80% of engineering time is spent on data ingestion / cleansing versus predictions. Secondly, as discussed previously, the price tags for these tools are high and must be constantly evaluated from an ROI (return on investment) perspective. Lastly, an often forgotten element is how difficult it can be for data scientists to communicate effectively with investment team members – cultural change is required for any of this to be effective.


Symrise: AI to create new perfumes


Stitch Fix: The Data Powered Personal Style Assistant

Student comments on Data Science in Investing

  1. Thanks for putting thought into this side of investing, Jibran. I think data science is inevitably going to grow as a part of investing but I’m excited to see how firms use it as a competitive advantage. As you mention, it is very costly to do this in-house. However, I feel that in-house is the main way that firms can use the analysis to their advantage. In an industry like investments, having an edge can go a long way. It makes me think that further consolidation in the industry may be next because only the larger firms can afford to build out their internal data science systems.

Leave a comment