Emerald Publishing is one of the world’s leading digital-first publishers, known for commissioning, curating, and showcasing research that drives societal progress.
By collaborating with thousands of universities and business schools globally, Emerald aims to share knowledge and spark debates that lead to positive change. With a strong commitment to people and innovation, Emerald supports a network of 500,000 researchers across 130 countries. Their website attracts over 109 million visitors annually, resulting in 30 million downloads each year.
The Challenge
Their existing on-premises data warehouse struggled to support the growing analytical demands necessary for new propositions, such as fully analysing data on article submission trends, author and institution information, and enabling AI integration within solutions. The dispersed nature of their data sources, with varying formats and inconsistent quality, limited their insights and hindered their ability to grow and expand. Recognising the need for a more robust and future proof solution, Emerald planned to transition to a cloud-based data platform to ensure sustainability, innovation and growth.
The Solution
After a competitive pitch against a number of data consultants, Oakland was selected based on our exceptional service quality and deep understanding of Emerald’s goals. Oakland’s infrastructure and expertise were well-aligned with Emerald’s vision for a future-ready data analytics platform.
The Goal
Oakland was tasked with building a cloud-based data analytics platform designed to propel Emerald into the future. This platform needed to provide comprehensive data governance and quality functionality, reconciling conflicting data from diverse sources. Key requirements included:
- Integration of both batch and streaming data from disparate source systems.
- Secure and robust departmental self-service reporting capabilities, alongside support for data science and future machine learning initiatives.
- Financial viability and supportability by a dedicated internal data squad.
- Significant reduction in manual effort for reporting, integration, and maintenance.
- Compliance with data privacy laws, particularly GDPR.
This project was not solely about technology; it also involved identifying the necessary people, skills, workflows, and processes to sustain and enhance the new platform. By addressing these multifaceted needs, Oakland aimed to empower Emerald Publishing to fully leverage their data, driving the business forward in a data-driven publishing landscape.
Tailoring the solution
At Oakland, each data platform is designed around an organisation’s unique business challenges and technology landscape. By replicating Emerald’s existing on-premises data warehouse and then scaling capabilities, we set about architecting a solution that could be launched quickly but also be built upon to deliver future data and AI capabilities.
The analytics platform was built using the ‘Oakland Modular Platform’. These modular templates allowed us to quickly customise the platform to fit the requirements while dramatically reducing build time from months to weeks. The templates also ensured best practices gained from years of experience in building data platforms were followed, reducing the risk of missing estimated delivery dates.
The Technical Components

Delivering a Microsoft Azure Platform
When Emerald Publishing sought to modernise its data infrastructure, Oakland was tasked with designing a cloud-based data analytics platform that would not only meet immediate needs but also future-proof the organisation in an evolving digital landscape. The technology choices we made were critical to ensuring the platform’s success and scalability.
Microsoft Azure
Microsoft Azure offers a range of benefits for building a data platform, making it a popular choice for organisations looking to modernise their data infrastructure.
Azure offered:
- Scalability: Azure allows you to scale resources up or down based on your needs, ensuring cost-efficiency and performance optimisation.
- Comprehensive Services: Azure provides a wide range of integrated services including databases, analytics, and machine learning, facilitating end-to-end data management and analysis.
- Security: Robust security features, including advanced threat protection and encryption, ensure your data remains secure and compliant with regulations.
- Integration: Seamless integration with existing Microsoft products and other third-party services enhances productivity and collaboration.
- AI and Machine Learning: Built-in AI and machine learning capabilities allow you to derive insights and drive innovation from your data.
Microsoft Azure Synapse
Given the size and nature of Emerald’s data, which was heavily oriented towards a data warehouse approach, Azure Synapse Analytics emerged as a natural fit. Synapse, often regarded as the Azure equivalent of a data warehouse, provided the robust environment needed to handle complex queries and largescale data processing.
Emerald’s existing infrastructure was already familiar with SQL Server environments, and Synapse allowed for a seamless transition while offering enhanced capabilities for data integration, analysis, and reporting.
Apache Spark
Spark is known for its ability to handle large datasets and perform complex analytics at scale, making it a powerful addition to the data platform.
Emerald’s long-term goal included the integration of machine learning capabilities. Spark’s support for PySpark provided a natural fit for these advanced data science initiatives.
With Microsoft Fabric on the horizon, incorporating Spark pools into the architecture ensured that Emerald’s platform would be adaptable to future developments in the Microsoft ecosystem.
Balancing Familiarity and Innovation
One of the critical decisions was balancing new technology with Emerald’s existing expertise. While moving to Spark, we introduced a new query language (PySpark); we made this transition smooth through extensive knowledge transfer and collaboration.
We conducted numerous training sessions and hands-on workshops, enabling Emerald’s team to become proficient with PySpark and effectively leverage the new capabilities.
Phased Approach
The project was executed in two key phases:
- Phase 1: Infrastructure Setup: Using Terraform, we established the core infrastructure, ensuring it was scalable, secure, and aligned with Emerald’s needs. This phase also included the initial onboarding of data sources and the implementation of repeatable data quality reporting functionality.
- Phase 2: Enhancing Capabilities: With additional budget allocation, we expanded the data sources and focused heavily on integrating machine learning capabilities, transitioning Emerald’s ML processes from legacy systems to a more modern and streamlined environment within Azure.
The combination of Azure Synapse and Spark allowed us to build a platform that not only meets Emerald’s current needs, but also positions them for future growth. This strategic choice of technology ensures that Emerald can continue to innovate in the rapidly changing publishing landscape, with a robust data platform that supports everything from basic analytics to advanced machine learning.
Outcomes
The new cloud-based data platform has equipped Emerald Publishing to meet current needs and set the stage for future growth and innovation. Here’s what it has enabled:
- Replication of Existing Capabilities: The transition from Emerald’s outdated on-premises data warehouse to a modern platform ensures continuity in reporting and analytics, with improved stability and scalability for seamless operations.
- Enhanced Regulatory Compliance: With automated data retention and improved handling of Personally Identifiable Information (PII), Emerald now meets stringent privacy regulations like GDPR, reducing risk and building stakeholder trust.
- Data Quality Assurance: A new toolset for monitoring data quality allows Emerald to identify and resolve issues early, supporting reliable data and innovation without disruptions.
- Foundation for Future Innovation: The new platform enables Emerald to explore AI and machine learning solutions, improving workflows and communication, previously unattainable with the old system.
- Strategic Flexibility and Scalability: Built to adapt to technological trends, the platform can scale and integrate new tools, keeping Emerald competitive in academic publishing.
- Improved Efficiency and Discovery: Enhanced data structuring fosters better discovery and utilisation, opening up insights and strategies to drive business forward.
In summary, this data platform isn’t just an upgrade; it’s a strategic foundation that secures Emerald’s present operations whilst unlocking potential for future growth and innovation in an evolving industry.
“Right from the start, the relationship with Oakland has been a strong one. There was a close cultural match between our two organisations, and it genuinely felt like we were working as one team. This kind of open and genuine working relationship isn’t always the norm, but with Oakland, it certainly has been.
The level of collaboration and adaptability throughout the project has been exceptional, making the entire experience incredibly positive. I couldn’t be more pleased with how we’ve worked together.”
– Daniel Molesworth, Emerald Publishing
If you want to discuss any elements of this case study or find out how Oakland could help you, please contact us to arrange a call.