Book publishing is becoming an increasingly competitive industry with the big players vying for share of a mature market. The Covid-19 pandemic accelerated the shift toward digital distribution and consumption of books, which has put downward pressure on publisher margins.
Penguin Random House (PRH) is responding to these market trends by building a competitive edge through data and technology. The use of big data in publishing is not as developed as in other industries, but given the digitization trend in the industry, more data is becoming available for big players to capitalize on.
The first step for PRH was to upgrade their data infrastructure and break down analytic silos created by legacy systems that prevented widespread use of data-driven insights across the organization. This required them to invest heavily in data storage, analytics, and visualization systems that could be accessible to those with limited data literacy while still being useful and flexible enough for power users. This has improved the quality of insights used to inform business decisions as well as the time required to unearth these insights. PRH estimates time savings of 5000 work hours per year based on the initial infrastructure upgrades (Source).
Once the foundation was in place, PRH could start collecting and using the data in more innovative ways to create value for readers. Data comes from sales (online and brick-and-mortar), PRH website browsing data (e.g. what books readers are looking at online), other websites (e.g. best seller lists), and social media impressions.
- Understanding what readers want: The wealth of structured and unstructured data enables PRH to understand demand at a granular level. For example, metadata shown on a book listing allows PRH to drill down to which key words can affect a book’s popularity. This helps them optimize for discovering new books as well as for marketing their books.
- Ensuring online visibility: Metadata analysis also comes into play when working to ensure that PRH books are visible to potential buyers. PRH can optimize for keywords to make sure their authors are appearing higher on Amazon and other e-tailer search results (Source).
- Attracting website traffic: PRH mines data across the web to gather a list of the top 5 Penguin Random House titles called “Today’s Top Books” (see below for example). This is used in combination with search engine optimization and daily newsletters to help drive traffic to their website. It also showcases their innovative data practices, which enhance the brand image (Source).
These initiatives all create value because they improve the offering available and streamline the content decision process for readers. In the same stroke they help PRH capture value through higher sales.
When it comes to sourcing data, the best quality data comes from first party sources like PRH’s website, where it has control over what types of data it can collect (reader profile, book listing clicks, etc.). However, the largest quantity of data comes from its Amazon sales. This means that better data insight can be generated by increasing traffic to PRH’s own website. In an attempt to collect more first party data, PRH has waddled onto the rewards program trend with Penguin Rewards. The program helps to drive sales, but it also enables PRH to collect data about loyal readers (Source).
Despite these measures, competing with Amazon will continue to be a key challenge, as the behemoth has top tier data analytics abilities and a vast store of data. Attracting tech-skilled talent will be difficult due to the traditional image of the industry. PRH’s parent company, Bertelsmann, is working to mitigate this risk with data science leadership development programs at the holding company level. Additional challenges may arise due to the longstanding way of working in publishing. It may be difficult to instigate structural change from such changes as how books are selected, how the editing process is managed, how many books are released and at what rate, etc.
There are many opportunities to expand the use of big data in book publishing. PRH has barely scratched the surface. Data can be used to improve prospect discovery, content personalization for readers, supply chain, and marketing and sales. Perhaps there is an opportunity to develop a competing book e-tailer to Amazon, which would feature a collaboration with the top 5 book houses such that they could offer readers the widest array of books while ensuring that quality and quantity of data can be maximized. A united front of the top book publishers could be the only way to face the threat of Amazon.