Chapter 6: The Hargreaves Exceptions – A Case Study
As the literature review demonstrated, there were repeated calls for exceptions to copyright in order to improve adaptation to the digital shift. Recommended exceptions varied from better access for disabled people to education and teaching exceptions, and in the UK were possible based on provisions of the InfoSoc Directive.1 The implementation of those exceptions in the UK was based on research that has been criticised for its unreliability.2 This chapter will use two of the 2014 copyright exceptions, the private copying exception and the text and data mining (TDM) exception, as a case study for the implementation of copyright exceptions and the effects that followed from that implementation. The implementation of the private copying exception can be used as an example of the dangers of improper preliminary research. On the other hand, as the TDM exception was introduced very shortly after the technique became a valid research tool, it is an ideal case study for assessing how digital development can be dealt with in terms of copyright legislation. The assessment of a newly-implemented exception dealing with a recently-developed tool allows for a snapshot view of how digital development is dealt with, from the perspective of the individual author or researcher, the rights holders for databases and other large-scale text repositories, and the legislative arm of government.
This chapter will consider first the rationale for implementing an exception to copyright law, and then the specific way in which this applies to two exceptions in the UK. It discusses the implementation of the private copying exception and its subsequent judicial review, before then moving on to discuss the implementation of the TDM exception, as well as the effect that the exception has had in the UK. Finally, discussion will be expanded to similar calls for TDM exceptions across the European Union, including considering the available research associated with these proposals, before concluding.
Rationale for Exceptions
An exception to copyright law is a form of government intervention. Government intervention is justified where there is a ‘clear need which it is in the national interest for the government to address.’3 Examples of such government intervention in the copyright spectrum can be seen in the historical exceptions of fair dealing, which allowed for a copyright exception in the cases of
1 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society  OJ L167/10, Art 5 (InfoSoc Directive).
2 See section ‘Oxford Economics Report’ in Chapter 5.
3 Her Majesty’s Treasury, ‘The Green Book: Appraisal and Evaluation in Central Government’ (2011) 11.
research, study (non-commercial only), quotation, criticism, and review. Licensing on such a huge scale would have been unfeasible, and an exception allowed such activities to occur in a scenario where overly restrictive copyright laws would have prevented the free functioning of the market.
The definition of a clear need for intervention used for the purposes of this chapter will be taken from the UK Treasury Green Book (2003 update). The Green Book is a guide to best practice in project and programme appraisal for executive agencies and government central departments. It gives guidance on economic, financial, social and environmental assessments and draws on fundamental economic theory.4 The Green Book sets out two reasons for government intervention: market failure, and the need for clear government distributional objectives to be met. While market failure is not the only justification for an exception to copyright law – the Green Book itself gives a second reason, and overarching European legislation would also provide for exceptions, provided they met the Berne Three-Step Test,5 the scenario for these particular exceptions was market failure. Market failure is a scenario where the market ‘has not and cannot of itself be expected to deliver an efficient outcome’,6 and thus intervention will aim to redress this failure.
An ‘efficient outcome’, in terms of economics, is an outcome where maximum economic efficiency is obtained. Economic efficiency, then, is defined as achieved when ‘nobody can be made better off without someone else being made worse off. Such efficiency enhances prosperity by ensuring that resources are allocated and used in the most productive manner possible.’7 Obviously, real life is not as simple as a theoretical example, and markets frequently cannot achieve such an outcome. In this scenario, they are said to fail. This can then negatively affect the market, through inefficient returns to society as a whole or to individuals or businesses involved. Such inefficiency can in turn affect motivations and behaviours in a way which will further negatively affect the market, leading to worse outcomes for society and for individuals. There may be many reasons for this failure.
The Green Book highlights the potential reasons for market failure:
4 JISC, ‘The Value and Benefit of Text Mining to UK Further and Higher Education Digital Infrastructure’ (2012) <> accessed 12 January 2016, 39.
5 The Berne Three-Step Test is discussed in more detail in Chapter 2.
6 Green Book (n 3) 11.
7 ibid 51.
» Due to the ‘public good’ characteristics of the goods or service under consideration
» Where there may be significant ‘externalities’ (positive or negative) involved
» Where there is imperfect information or information asymmetry between buyers and sellers
» As a result of market power or structure (eg a lack of competition, monopoly power or high entry costs deterring entrants)8
Thus, the implementation of a copyright exception should only be considered where the market fails to reach economic efficiency on its own – where there is no viable alternative to the government intervention, as the failure to intervene would lead to a total market failure. A total market failure is, of course, less desirable than government intervention, as intervention guarantees the availability of a good, service, or process, whereas total market failure would deny the possibility of its being made available entirely.
The market failure in the case of private copying is obvious. Private copying refers to the reproduction of legally held copyright material for the purposes of an individual’s private use for reasons which are neither directly nor indirectly commercial.9 This may include scenarios such as backup, remote storage, format-shifting, etc. For many consumers, the first step after purchasing a new CD is to rip it to their iTunes library or equivalent – the possibility of licensing this is remote at best. Thus, the need for an exception was clear – the market could not possibly license each private reproduction of legally held CDs onto iPods, phone, MP3 players or cloud storage systems.
The private copying exception,10 which came into force on October 1, 2014,11 was one of the recommendations made by the Hargreaves Review. It, together with the other Hargreaves exceptions, suffered a long journey to implementation, from the exception envisaged in the 2001 InfoSoc Directive,12 which the UK declined at the time to implement, through the Hargreaves
8 JISC (n 4).
9 The Copyright and Rights in Performances (Personal Copies for Private Use) Regulations 2014, SI 2014/2361 Reg 3(1).
12 InfoSoc Directive (n 1) Art 5.
Review,13 which recommended its revival. From this recommendation, it would be a further three years before the exception was tabled in the House of Commons. Even then it vanished without explanation mere days before it was due to be voted upon.14 It reappeared several months later, with an implementation date six months after what was originally envisaged. The Hargreaves Review recommended several exceptions, which were contained in five Statutory Instruments.15 Those exceptions were caricature, parody or pastiche, quotation, research and private study, text and data mining, education and teaching, archiving and preservation, public administration, private copying, and accessible formats for disabled people. Each exception was narrowly drawn, applicable only in certain circumstances, and to certain persons or bodies.16
The private copying exception created a right for consumers to copy and format-shift their own private music, films, ebooks, and other digital media and store them remotely, as long as it was for personal use, and only so long as the original or master copy was legitimately obtained. This meant that it could not be shared between family members or friends, nor was the exception a guard against torrented or otherwise illegitimately obtained creative content.17 The exception was implemented based on the powers granted by the InfoSoc Directive, which allowed for an optional private copying exception with fair compensation for rights holders.18 The UK government, in implementing a private copying exception, decided that no compensation was fair compensation, and enacted the exception without any kind of private copying levy, such as those seen in many parts of Europe.19 This decision not to implement a levy system was backed up by an Impact Assessment from the IPO.20
13 Ian Hargreaves, ‘Digital Opportunity: A Review of Intellectual Property and Growth’ (2011) (Hargreaves Review).
14 IPO, ‘Progress of the exceptions to copyright regulation’ (Gov.uk, 8 May 2014) <> accessed 12 January 2016.
15 The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, SI 2014/1372; The Copyright and Rights in Performances (Disability) Regulations 2014, SI 2014/1384; The Copyright (Public Administration) Regulations 2014, SI 2014/1385; The Copyright and Rights in Performances (Quotation and Parody) Regulations 2014, SI 2014/2356; The Copyright and Rights in Performances (Personal Copies for Private Use) Regulations 2014, SI 2014/2361.
16 IPO, ‘Exceptions to Copyright: An Overview’ (2014) < o_copyright_-_An_Overview.pdf> accessed 23 November 2015.
17 Copyright and Rights in Performances (Personal Copies for Private Use) Regulations 2014 Reg 3.
18 InfoSoc Directive (n 1) Art 5(2)(b).
19 WIPO, ‘International Survey on Private Copying Law & Practice’ (2013) <> accessed 23 November 2015.
20 Department for Business, Innovation, and Skills, ‘Impact Assessment: Copyright Exception for Private Copying’ (2012) < RPC11-BIS-1055_3__-Copyright_exception_for_Private_Copying.pdf> accessed 18 December 2015.
The lack of any levy system was the reason for the exception’s being challenged shortly after implementation by the British Academy of Songwriters Composers and Authors (BASCA), the Musicians’ Union,21 and industry representatives UK Music, who sought judicial review of the exception. In a judgment issued in June 2015, Green J concluded that the harm done to rights holders through the absence of a levy or compensation scheme was more than minimal.22 This was followed in July 2015 by a judgment quashing the exception entirely, albeit with prospective (ex tunc) effect. Mr Justice Green declined to comment on whether the quashing would have retrospective (ex nunc) effect. He further declined to a make reference to the CJEU.23 The quashing of the private copying exception was perhaps not unexpected, given that the UK’s regime made no provision for remuneration. It is, however, cause for the IPO to reconsider its pre-implementation research. Although the private copying exception was the subject of a preliminary report24 and an impact assessment,25 the quashing of the regulation, together with the suggestions that the harm it would cause to rights holders would be more than de minimis made elsewhere, 26 should be taken as a cautionary tale to the IPO, and cast some doubt on the reliability of the research it relied upon. Thus, the IPO must be sure to carefully consider the implications of changes to copyright law before enacting them and rely on research which is verifiable and reproducible – following their own evidence policy.27
Definition of TDM
Text and data mining (TDM), which is also variously referred to as text mining, data mining, text data mining, or text analytics, is a process that analyses large amounts of texts using a computer, so as to obtain insights which would not be possible with a single human researcher. The power and popularity of mining developed in tandem with that of computers and
21 BASCA <> accessed 18 December 2015.
22 BASCA v Secretary of State (Department for Business, Innovation and Skills)  EWHC 1723 (Admin).
23 BASCA v Secretary of State (Department for Business, Innovation and Skills)  EWHC 2041 (Admin).
24 Roberto Camerani and others, ‘Private Copying’ (2013) < private-150313.pdf> accessed 12 January 2016.
25 Department for BIS (n 19).
26 Karina Grisse and Stefan Koroch, ‘The British private copying exception’ (2015) 10(7) JIPLP 562.
27 IPO, ‘Guide to Evidence for Policy’ (2013) < copyright-evidence.pdf> accessed 18 December 2015.
processing power – in a 1999 paper the industry was described as ‘nascent’.28 The author pointed out that, at the time, TDM had ‘almost no practitioners’.29 In early 2015, opinions on TDM were that it ‘might transform the way scientists read […] literature’30 and ‘the potential that comes with mining scientific literature is enormous’.31
A strict definition of TDM is hard to pin down, because it is composed of a range of techniques, all of which are constantly evolving, changing, and developing. The underlying idea, however, is that it is the computer-based analysis of large amounts of data. The processing power of computers allows large amounts of text and data to be analysed and new inferences drawn from relationships which would not be apparent to individual human readers. It is available to almost anyone with the requisite level of skill, allowing them to assemble data in text, picture, sound, or other form, and analyse that data for new insights and knowledge. Various definitions of TDM are offered by each new publication; for the purposes of this chapter, we will stay in line with the proposal put forward by the Publishing Research Consortium (PRC) in their 2013 report on text mining:
Data mining is an analytical process that looks for trends and patterns in data sets that reveal new insights. These new insights are implicit, previously unknown and potentially useful pieces of information. The data, whether it is made up of words or numbers or both, is stored in relational databases. It may be helpful to think of this process as database mining or as some refer to it ‘knowledge discovery in databases’. Data mining is well established in fields such as astronomy and genetics.32
Thus, TDM is a range of techniques used in a multitude of ways, encompassing a host of subject areas and data types, all of which are aimed at increasing knowledge. The rise of TDM is compounded by the rise of Big Data33 – experts suggest that we have created more data in the years since 2010 than in the entirety of human history preceding that.34 Consumers, researchers,
28 Marti A Hearst, ‘Untangling Data Mining’ (The 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June 1999) 3.
30 Editorial, ‘Gold in the Text?’ (2012) 483 Nature 124.
31 Sergey Filippov, ‘Mapping Text and Data Mining in Academic and Research Communities in Europe’ (2014) 16/2014 Lisbon Council Special Briefing, 3.
32 Jonathan Clark, ‘Text Mining and Scholarly Publishing’ (2012) Publishing Research Consortium, 6.
33 Seref Sagiroglu and Duygu Sinanc, ‘Big Data: A Review’ (Collaboration Technologies and Systems (CTS), 2013 International Conference, May 2013).
34 See, for example, Eric Schmidt, CEO of Google, making a similar statement in 2010: MG Sigler, ‘Eric Schmidt: Every 2 Days We Create as Much Information as We Did Up To 2003’ (Techrunch.com, 4 August 2010) <> accessed 26 November 2015; Åse Dragland, ‘Big Data, for better or worse’ (SINTEF, 22 May 2013) < data–for-better-or-worse/> accessed 26 November 2015.
sales people and even computers communicating with each other all create data about their actions, and that data is ripe for analysis. The development of smart systems which can extract new links and liaisons from that excess of data is a move which could lead to new discoveries – it has already been used to find new uses for existing drugs, by mining the existing corpus of research for side effects and then using this for knowledge discovery.35 TDM, however, falls foul of copyright legislation in that mining engines create a copy of the text in order to mine the content– this was permitted under fair dealing exceptions for a single person with a photocopier,36 but the large-scale copying of potentially thousands of articles by a machine would not be covered by the exceptions enshrined in legislation.
The potential for TDM is almost limitless – as mentioned in the definition offered above, it is well established in astronomy and genetics, but the analysis of data can have applications in fields as distinct as digital humanities and medical research. As a research tool, TDM is nothing more than a blunt instrument, mechanically analysing only what it is told to do, but its strength lies in the fact that it can drastically cut down the amount of time required to analyse a large amount of data. A corpus of research papers examined by hand searching for a single word, for example, could take weeks; furthermore, human researchers are fallible, and could miss important instances of whatever it is they seek. A computerised system, on the other hand, is not only faster, but also more reliable. TDM is only as clever as the researcher who uses it, but it is a range of techniques which can be used in circumstances from analysing how gender affects assessment of teaching skill37 to analysing patient records to find new medical insights.38
35 LIBER, ‘Text and Data Mining: The Need for a Change in Europe’ (2014) < content/uploads/2014/11/Liber-TDM-Factsheet-v2.pdf> accessed 21 March 2015, 1.
36 Copyright Designs and Patents Act 1988 s 29(1) (CDPA).
37 Benjamin H Schmidt, ‘Gendered Language in Teacher Reviews’ (benschmidt.org, 2015) <> accessed 17 November 2015.
38 Jon Hamilton, ‘Can a Cancer Drug Reverse Parkinson’s Disease and Dementia?’ (NPR.org, 20 October 2015) < parkinsons-disease-and-dementia> accessed 26 November 2015; Claire Zillman, ‘Parkinson’s Patients Show Improvement After Taking Cancer Drug’ (Fortune, 19 October 2015) <> accessed 26 November 2015.
Call for Exception in UK
Digital Opportunity, the Hargreaves Review of 2011,39 as in so many other examples in this thesis, was one of the sources of the call for a TDM exception in the UK. At its publication in 2011, the Review suggested that
The UK should […] promote at EU level an exception to support text and data analytics. The UK should give a lead at EU level to develop a further copyright exception designed to build into the EU framework adaptability to new technologies. This would be designed to allow uses enabled by technology of works in ways which do not directly trade on the underlying creative and expressive purpose of the work.40
Although it took several years for this recommendation to be put into action, the TDM exception41 was one of the exceptions that made their way through the British Parliament to reach implementation in 2014, coming into effect on the 1st June.42 The Hargreaves Review called for a TDM exception, but it did not cite any supporting evidence in this call. However, we can see from searching the publicly available submissions to the Review that ten of those mentioned TDM or data analytics.43 Of those ten, nine called for an exception to be included in current
39 As mentioned in the Literature Review, the Hargreaves Review was commissioned in 2010 by David Cameron in response to remarks by Google’s founders that the UK’s copyright regime would not have allowed them to set up their business in the UK. Professor Hargreaves, a journalist by trade, was asked to head the six-month review of the UK intellectual property regime, a mammoth task which he undertook admirably. The review built upon the Gowers Review, which was completed four years earlier by Andrew Gowers, and expanded upon some of Gowers’ suggestions, while also adding in recommendations which were unique to Hargreaves, due to the developments in IP, even in that short time. ‘The founders of Google have said they could never have started their company in Britain. The service they provide depends on taking a snapshot of all the content on the internet at any one time and they feel our copyright system is not as friendly to this sort of innovation as it is in the United States. Over there, they have what are called “fair use” provisions, which some people believe gives companies more breathing space to create new products and services.’ Hargreaves (n 12) 44.
40 Hargreaves (n 12) 51.
41 The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, SI 2014/1372.
42 ibid s 1.
43 IPO, ‘Review of Intellectual Property and Growth: Submissions Received’ (2010) < -c4e.htm> accessed 17 March 2015. The submissions which mentioned TDM or data analytics were: AstraZeneca, British Library, British Library Driving UK Research (a collection of research perspectives, not intended to represent the opinion of the British Library), Copyright for Innovation (a collection of interested parties), Copyright for Knowledge, IBM, LACA (the Libraries and Archives Copyright Alliance), the National Centre for Text Mining, Scibella, and the Joint Information Systems Committee (JISC). It is worth noting that many of the parties involved in these submissions co-signed more than one – for example LACA’s chair signed LACA’s own submission, and the Copyright for Innovation submission; the Chair of Copyright for Knowledge was also a signatory to the Copyright for Innovation
copyright legislation, with Scibella’s submission mentioning text mining, but not calling for an exception. For perspective, 256 submissions to the Review are available to view44 – those nine calling for a TDM exception amounted to 3.52% of submissions.
An extensive literature search conducted for the purposes of this doctoral project revealed a startling lack of discussion of the need for a TDM exception. Although in 2014 there were reports commissioned and published by the DGs of the European Commission, as discussed later in this chapter, and the UN published a report in 2012 on Big Data,45 there was very little available which actually called for a TDM exception. Numerous publications were and are available on the topic of big data and data mining,46 and data was of concern in many digital reviews,47 but these publications did not call for an exception to the law. The submissions to the Hargreaves review stood somewhat alone in this respect.
In order to justify such an exception, there must have been a sufficient market failure that obtaining TDM licences from publishers was unfeasible for potential miners. In order to investigate whether there was support for this position, we will look to several publications from the years preceding the implementation of the 2014 Statutory Instrument, and consider their conclusions on the topic of the feasibility of TDM.
In 2011, the PRC commissioned a study from Dutch consulting company BV Bronfonteyn. This study consisted of 29 interviews and 190 surveys on the topic of journal article mining.48 It found that, at the time, the majority of publishers were receiving requests to mine, but in very low numbers – less than ten per year, or even less than five per year for research.49 It also found that 90% of research-focused requests received permission, with only 28% of respondents having no submission; the British Library submitted its own piece, a collection of supporting essays, and co-signed the Copyright for Innovation submission, and JISC funded the National Centre for Text Mining.
44 IPO, Submissions Received (n 42).
45 United Nations Global Pulse, ‘Big Data for Development: Challenges & Opportunities’ (2012).
46 Sophia Ananiadou and others, ‘Supporting the Education Evidence Portal via Text Mining’ (2010) Philosophical Transactions of the Royal Society 368; James Manyika and others, ‘Big data: The next frontier for innovation, competition, and productivity’ (2011) McKinsey Global Institute; Paul Zikopolous and Chris Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data (McGraw-Hill Osborne 2011); Andrew McAfee and Erik Brynjolfsson, ‘Big Data’ (October 2012) Harvard Business Review 59; Ian H Wilton, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (2nd edn, Elsevier 2005); Jiawei Han, Micheline Kamber, and Jan Pei Data Mining Concepts and Techniques (3rd edn, Elsevier 2012).
47 See, for example, Department for Business, Innovation and Skills, ‘Digital Britain: Final Report’ (2009).
48 Eefke Smit and Maurits van der Graaf, ‘Journal Article Mining, A research study into Practices, Policies, Plans … and Promises.’ (2011) BV Bronfonteyn.
49 ibid 4.
stated policy on mining requests. 68%of respondents considered mining requests on a case-by- case basis, with 88% of publishers allowing mining in some or the majority of cases.50 This study was not based solely on the UK, but focused on a global scale, and considered several possibilities for creating common, cross-publisher solutions to the mining difficulties. It offered five possibilities, the most popular of which was ‘standardization of content formats for mining, of API-standard platforms, of basic semantics tagging terms, etc.’51 – this would allow publishers to continue to handle their own permissions, but would remove some of the difficulties of mining content from several publishers by standardising formats across publishers.52 It is worth noting that none of the options suggested included a legislative change. The PRC also commissioned a second study on the topic of TDM, which was published in 2013. The second report, ‘Text Mining and Scholarly Publishing’ was produced by Jonathan Clark, an independent advisor on strategy and innovation.53 It stated that requests for text mining permissions remained, at the time, relatively infrequent, but that they were expected to increase in the coming years.54 Of course, the fact that both of these reports were commissioned by a publishing group must be considered when considering the possibility of bias. The use of third-party research groups goes some way to negating the possible accusations of bias, but it is hardly likely that a publishing group would publicise a report which did not support their position. Nonetheless, they were the only publications available which considered the available solutions to the issue of text and data mining in the UK at the time. This issue was not so pressing in the wider European context, discussed later in the chapter, where several reports considered the possible formations and implications of a TDM exception.
Another project which offered evidence on the proportion of publishers allowing mining access to their content was the UCSC Genocoding Project.55 This Californian project attempted to map genomic identifiers to the human genome by crawling published papers – using a computer system to mine the papers for the relevant information. In order to achieve this, however, it was necessary to seek the permission of the publishers. The UCSC Genocoding project kept track of the responses it received to its letter requesting permission to crawl articles, which was sent via
50 ibid 5: 35% allowed mining requests in the majority of cases, with 53% allowing in some cases. 51 ibid 14.
52 ibid 58-59.
53 Clark (n 31).
54 ibid 13.
55 UCSC Genome Bioinformatics, ‘Genocoding Project’ (UCSC Genocoding) <> accessed 18 December 2015.
email to multiple publishers.56 The project then published those responses, both positive and negative, on its website.57 Of the 43 publishers listed on the website, 28 permitted crawling access, and 15 either denied permission or did not respond – a 65% positive response. The letter was sent in early 2012, and those that did respond did so before the end of 2012. Thus, it is clear that, while crawling and other TDM requests were not particularly common at the time, publishers could and usually did deal with them, both granting and denying licensing requests.
Similarly, the submissions to the Hargreaves review point out incidences in which content owners included clauses in contracts about mining – for example, the National Centre for Text Mining pointed out that both JISC Collections and LicensingModels attempted to account for TDM in their licences as early as 2010.58 While this submission did state that the provisions were inadequate, the market at the time was, and indeed still is, nascent, and thus must be allowed time to adjust and find the ideal economic balance.
As TDM became more popular, publishers began to implement solutions which made obtaining licences easier for those wishing to mine or crawl content. Several publishers implemented policies that allowed researchers to mine their content; Elsevier was one of these, provided crawlers did so through Elsevier’s proprietary API59 system. This allowed Elsevier to maintain some control over server load regarding their publications, but still permitted researchers to mine content.60 This system, however, was not entirely well-received by researchers, as it required researchers to mine only Elsevier content at any one time – part of the advantage of TDM is that it can analyse data from a variety of publishers, and a proprietary API system conflicted with this.61
56 UCSC Genome Bioinformatics, ‘Open Letter to Publishers’ (UCSC Genocoding) <> accessed 11 March 2015.
57 UCSC Genome Bioinformatics, ‘Progress’ (UCSC Genocoding) <> accessed 11 March 2015.
58 National Centre for Text Mining, ‘Text Mining and IP: Submission to the Independent Review of IP and Growth from the National Centre for Text Mining’ (2011) < nctm.pdf> accessed 17 March 2015.
59 An Application Programming Interface (API) is a set of rules and protocols which dictate how software will operate. In the case of mining, publisher-specific APIs restrict researchers from developing their own software for mining, and force them to use the system established by the rights holder.
60 Chris Shillum, ‘Elsevier updates text-mining policy to improve access for researchers’ (Elsevier, 31 January 2014) < access-for-researchers> accessed 12 March 2015.
61 Peter Murray-Rust, ‘Content Mining: Why you and I should NOT sign up for Elsevier’s TDM service’ (PeterMR’s Blog, 31 January 2014) < and-i-should-not-sign-up-for-elseviers-tdm-service/> accessed 17 March 2015.
However, even this issue has been making progress in the years leading up to the implementation of the exception – CrossRef, a digital object identifier (DOI) system, integrated support for data mining into their system, which allows researchers to request permissions for a variety of articles from publishers, thus cutting down the amount of time required to successfully obtain full-text access to content.62 Similarly, the Publishers Licensing Society, a CMO which already administered collective licensing in the UK, developed a system called PLSclear63 which identifies rights holders and allows potential reusers to request licensing through an automated system. This was further developed to a TDM-specific engine, PLSclear TDM, which allows up to 100 DOIs to be entered at a time, cutting the time required to obtain permissions down substantially.64 If given time to develop, it is more than feasible that similar systems would have proliferated, leading to a functional and effective system, giving researchers a choice of services without the need for legislative intervention.
Nonetheless, in late 2011 the government accepted Hargreaves’ recommendations in full.65 This then led, inter alia, to the implementation of a variety of exceptions to copyright,66 and the establishment of the Copyright Hub.67 The TDM recommendation took the form of a section of a Statutory Instrument enacted in 2014.68
Coming into effect on 1 June 2014, The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations69 inserted a new section into the Copyright, Designs and Patents Act 1988.
29A Copies for text and data analysis for non-commercial research
62 CrossRef <> accessed 12 January 2016; for more information, see CrossRef, ‘Text and Data Mining for Researchers’ (CrossRef) <> accessed 17 March 2015.
63 PLSclear <> accessed 18 December 2015.
64 PLSclear TDM <> accessed 26 March 2015.
65 HM Government, ‘The Government Response to the Hargreaves Review of Intellectual Property and Growth’ (2011).
66 The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, SI 2014/1372; The Copyright and Rights in Performances (Disability) Regulations 2014, SI 2014/1384; The Copyright (Public Administration) Regulations 2014, SI 2014/1385; The Copyright and Rights in Performances (Quotation and Parody) Regulations 2014, SI 2014/2356; The Copyright and Rights in Performances (Personal Copies for Private Use) Regulations 2014, SI 2014/2361.
67 Discussed in the introduction and Chapter 7.
68 The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, SI 2014/1372.
(1) The making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that—
(a) the copy is made in order that a person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose, and
(b) the copy is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise).
(2) Where a copy of a work has been made under this section, copyright in the work is infringed if—
(a) the copy is transferred to any other person, except where the transfer is authorised by the copyright owner, or
(b) the copy is used for any purpose other than that mentioned in subsection (1)(a), except where the use is authorised by the copyright owner.
(3) If a copy made under this section is subsequently dealt with—
(a) it is to be treated as an infringing copy for the purposes of that dealing, and
(b) if that dealing infringes copyright, it is to be treated as an infringing copy for all subsequent purposes.
(4) In subsection (3) ‘dealt with’ means sold or let for hire, or offered or exposed for sale or hire.
(5) To the extent that a term of a contract purports to prevent or restrict the making of a copy which, by virtue of this section, would not infringe copyright, that term is unenforceable.70
This section sits under the umbrella of the general ‘Research and Private Study’ section of the Act, and does not distinguish between different types of work, meaning that mining and data analysis can be carried out on any kind of copyright work. This is in keeping with the other amendments to the Act, which omit the words ‘literary, dramatic or musical’ in favour of simply
70 ibid reg 3.
referring to a ‘work’.71 Subsection (5), which states that the TDM provision may not be circumvented by means of a contract term, is in keeping with all the exceptions implemented in 2014, which have similar statements making the full range of exceptions immune from contractual override.72
Effect since implementation
The implementation of the exception for non-commercial TDM meant that there was no longer a need for non-commercial researchers (including those engaged in universities and other third- level institutions) to request permission to perform TDM on any works to which they have lawful access. This then increases the potential for TDM, as it is no longer as arduous or time- consuming as it would have previously. The elimination of the requirement to search for each individual copyright-holder and request permissions made TDM a more viable research method for a variety of researchers. This means that, as awareness of TDM grows, more researchers will consider it a viable research method, leading to the discovery of new information and greater analysis of the vast swathes of data available in the digital age. Although publishers no longer have any control over whether or not their content may be mined for data analysis purposes, they maintain the option to deny any access to the work at all – researchers are required to have ‘lawful access’ to the work before they may copy it for TDM purposes.73 Publishers are also allowed to impose restrictions in order to maintain the stability or security of their network (eg by limiting downloads to a certain number per month) but these measures must not unreasonably restrict researchers’ access to the works.74 The exception also had the effect of denying publishers the chance to monetise non-commercial TDM requests. However, it is not outside the bounds of imagination that publishers may simply incorporate the costs of supporting non-commercial TDM into their licence fees for access, given that the right of access includes the right to mine.
With regard to commercial exploitation of TDM techniques, this remains the province of individual publishers. Thus, they may create or decline to create systems which will allow commercial researchers to mine content, provide APIs, partake in joint initiatives such as
71 ibid reg 3(1)(a).
72 See, for example, The Copyright and Rights in Performances (Quotation and Parody) Regulations 2014, SI 2014/2356, reg 3.
73 CDPA (n 35) s 29A(1).
74 IPO, ‘Exceptions to Copyright: Research’ (2014) < > accessed 25 March 2015, 6.
CrossRef75 or PLSclear TDM,76 or simply refuse all TDM applications entirely. However, given the proportion of TDM requests which were non-commercial and now no longer need to be supported, it is not outside the bounds of possibility that the implementation of the non- commercial TDM exception has disincentivised publishers and content owners to create systems which can deal with requests for commercial TDM permissions in a timely and efficient manner. Nonetheless, as TDM continues to grow, it is in publishers’ best interests to develop procedures that can efficiently issue TDM permissions – this may take the form of publisher- specific initiatives, or joint projects between multiple publishers. It is important, however, that publishers and rights holders are allowed to retain the freedom to control the commercial exploitation of their works – including permitting or denying copying for the purposes of data mining and analysis. To do otherwise would be to undermine the integrity of copyright rules entirely.
One area of potential confusion which may arise is due to the fact that there is no clear delineation of the distinction between commercial and non-commercial research given in the legislation. While a common-sense approach would seem to be the most likely approach which will be taken, it is an area which may yet lead to a certain amount of litigation before the courts as to what exactly constitutes non-commercial research, and to what degree such researchers can rely on the TDM exception. For example, a university researcher who conducts research using TDM techniques is likely to fall within the exception, especially if the results of their research are published in a scientific journal, usually for no fee. In fact, if the research is published under certain Open Access protocols, the researcher may even be required to pay a fee in order to publish their research.77 This is clearly not a commercial gain for that researcher. However, if their research then forms a chapter of a monograph, textbook, or other larger work for which royalties are payable, does this change the nature of the research? While the research may be the same, the method of publication in the latter case accrues a certain amount of financial gain. Thus, is the research no longer eligible to take advantage of the exception? This is a question which will have to be answered through litigation, as it is not explained in the legislation itself. One possibility is that a researcher who did not need to seek permissions to mine content originally would then have to retroactively seek a licence if they wished to
75 CrossRef (n 58).
76 PLSclear TDM (n 60).
77 For more on Open Access, see Chapter 7.
subsequently publish for commercial reasons, but this would be particular to individual circumstances.
The exception is relatively new, and has not yet been subject to litigation. In the event of commercial research attempting to rely on the TDM exception, it would be interesting to consider which aggrieved party would take a suit against the commercial researcher. For each individual author, if the amount of data analysed was large, then the degree of use of their copyright would be so minute that it may not accrue more than nominal damages. In that case, then, the most likely parties to take a case against commercial miners would be publishers or licensing societies. Much like the NLA, the licensing society for newspapers in the UK, was the body which took a case against news clippings services,78 it would not be outside the bounds of imagination to suggest that licensing societies would again be the most logical avenue for litigation against TDM infringement. However, this of course, is conjecture, and remains to be seen.
Naturally, TDM requests are not limited solely to publishers – there are data resources which could provide great insights and scientific developments if mining of the content were permitted. For example, while mining the scientific literature and published articles about the use of a specific drug could provide new data on potential side effects, a much wider source of information on this could be found in the NHS patient records. Therein one would find a comprehensive record of the effects of a particular drug on numerous patients and, were mining of that content allowed, new uses for drugs could be discovered with relative ease. However, the mining of data sources other than those which are commercially available (ie those to which a subscription may be purchased) is something outside of the ambit of the current exception. Thus, it does nothing to enable the progression of knowledge in areas other than specifically non-commercial TDM with data which may be lawfully purchased or obtained, and therefore fails to account for much data which could be the source of new scientific insights. This use of TDM may well be one which is impossible to license, and thus could be a potential case for a second TDM exception, but it is one which must be carefully researched and defined before legislation is enacted.
78 NLA Media Access, ‘Newspaper Websites – Copyright Law’ (2014) < -%20Background%20for%20Journalists%20_June_2014.pdf>accessed 12 January 2016.
In August 2015, the UK Publishers Licensing Society conducted a survey of its members regarding the use of the TDM exception.79 Of those publishers who responded to the survey, less than 20% stated that they had received mining requests.80 The survey covered the number of requests to mine which were received in the year 2014, and the overall number of requests from 111 publishers was just 91.81 Although there is no data for why this number is so low, the PLS suggested that it may have been due to licence terms including permissions to mine content, and the use of automated systems, such as CrossRef, which allow researchers to access material for TDM purposes without having to contact the publishers directly.82 This low instance of requests for TDM permissions was not expected to rise greatly in the future, nor had it seen a notable rise or fall with the implementation of the UK TDM exception. It is suggested that the existence of automated services such as CrossRef, PLSclear TDM, and the Copyright Clearance Center, may well have negated the need for a TDM exception at all.83
However, it is important to note that as of the date of conclusion of the research project, an independent research project had not been completed on the effect of the TDM exception in the UK. It is wise to note that figures reported by an organisation such as the PLS may be open to accusations of bias, and thus should be treated with some scepticism, and an independent project would be advisable for greater reliability.
Calls for wider exception in Europe
The movement toward TDM was not isolated to the UK. Across the EU, there was also a realisation of the growing need to acknowledge TDM as a research method, and consider the need for legislation to regulate the use of TDM. This resulted in the creation of a TDM working group in the Licences for Europe initiative, which was established in December of 2012 and launched the following February.84 Unfortunately, the working group did not proceed as successfully as had been hoped. In February 2013, a number of participants of the working group – mostly librarians – walked out, on the basis that the scope of the group was not wide enough to adequately account for the possibilities of TDM. They set out their concerns in a letter to the
79 Publishers Licensing Society, ‘Survey shows text and data mining supported by licensing not copyright exceptions’ (PLS News and Events, August 2015) < august-15/> accessed 18 October 2015.
84 Michel Barnier, ‘Licences for Europe: quality content and new opportunities for all Europeans in the digital era’ (Speech at the Launch of the initiative ‘Licences for Europe’, 4 February 2013) <> accessed 18 December 2015.
relevant Commissioners (Barnier, Geoghegan-Quinn, Kroes, and Vassilou) later that same month, arguing that the group considered only re-licensing content, and not a wider variety of solutions to the TDM quandary, including exceptions or limitations to copyright law.85
Nonetheless, the plenary group produced a pledge, one of the ten pledges for getting more content online which were published in December of 2013.86 This stated that the group had achieved ‘Easier text and data mining of subscription-based material for non-commercial researchers: a commitment by scientific publishers’ in the form of a proposed licensing clause which allowed for TDM at no extra cost for non-commercial research. At the date of publication of the ten pledges, thirteen publishers had signed up to include this licence term.87
Despite the pledges, this did not appear to be a sufficient movement towards enabling TDM, and thus in 2014 a veritable plethora of reports were commissioned and published by two European Directorates-General (DGs), focusing on TDM. DG Research and Innovation published the Hargreaves-led Expert Group’s report ‘Standardisation in the area of innovation and technological development, notably in the field of Text and Data Mining’,88 while DG Market commissioned and supported March’s De Wolf and Partners ‘Study on the legal framework of text and data mining (TDM)’.89 A third report from 2014, ‘Assessing the economic impacts of adapting certain limitations and exceptions to copyright and related rights in the EU:
85 – –, ‘Letter from participants in response to “Licences for Europe- A Stakeholder Dialogue” text and data mining for scientific research purposes workshop’ (February 2013). Signatories were Sara Kelly, Executive Director, The Coalition for a Digital Economy Jonathan Gray, Director of Policy and Ideas, The Open Knowledge Foundation John McNaught, National Centre for Text Mining, University of Manchester Aleks Tarkowski, Communia Klaus-Peter Böttger, President, European Bureau of Library Information and Documentation Associations (EBLIDA) Paul Ayris, President, The Association of European Research Libraries (LIBER) Brian Hole, CEO, Ubiquity Press Ltd. David Hammerstein, Trans- Atlantic Consumer Dialogue, and the letter was additionally supported by more than fifty other interested parties.
86 Licences for Europe, ‘Ten pledges to bring more content online’ (2013) <> accessed 18 December 2015, 8.
87 ibid 8.
88 The Expert Group, ‘Standardisation in the area of innovation and technological development, notably in the field of Text and Data Mining’ (2014) < report_from_the_expert_group-042014.pdf> accessed 18 December 2015.
89 Jean-Paul Traille, Jérôme de Meeûs d’Argenteuil and Amélie de Francquen, ‘Study on the legal framework of text and data mining (TDM)’ (2014) <> accessed 12 January 2016 (De Wolf and Partners).
Analysis of specific policy options’ from Charles River Associates (CRA), was also supported by DG Market.90
While these three reports were all published in the same year, they came to markedly different conclusions about the need for a TDM exception, and indeed the wider copyright system. While the Expert Group report from DG Research and Innovation called for an exception to copyright law, this was only a precursor to its recommendation to entirely overhaul the European copyright system. The De Wolf study from DG Market focused on the current legal framework, and whether TDM would fall under any of the exceptions which are currently mandated under European law. It concluded that TDM could not be permitted under any of the existing exceptions, and recommended that a TDM exception be put into place to remove ‘unjustified obstacles’91 to data analysis. The report by CRA, also DG Market, offered four ways to proceed regarding the question of TDM and relevant legislation, before recommending the implementation of a non-commercial exception for scientific research in circumstances where licences were not offered. This exception could be relied upon where it was not possible to obtain a TDM licence, but would cease to apply once a publisher began to offer TDM licences, thus incentivising publishers to create licensing terms in order to avoid falling subject to the exception.
The Expert Group92 which was responsible for the DG Research report was led by Professor Ian Hargreaves, who was also responsible for the 2011 Hargreaves Review,93 discussed earlier in the chapter. Within the Expert Group report, one finds the statement that prolific use of TDM would add tens of billions of Euros in value to the EU’s aggregate GDP, but the important thing to note here is that this estimate is not based on empirical research. Indeed, ‘[t]here are no empirical estimates of the impact of TDM on the productivity effect of research’,94 and the
90 Julian Boulanger and others, ‘Assessing the economic impacts of adapting certain limitations and exceptions to copyright and related rights in the EU: Analysis of specific policy options’ (2014) < study_en.pdf> accessed 12 January 2016 (Charles River Associates).
91 DeWolf and Partners (n 85) 116.
92 The Expert Group was assembled for the purposes of the report discussed in this chapter. It was composed of Professor Hargreaves, of Cardiff University, Dr Lucie Guibault (University of Amsterdam, the Netherlands), Dr Christian Handke (University of Amsterdam and Erasmus University, the Netherlands), Professor Peggy Valcke (KU Leuven, Belgium) and economist Bertin Martens (JRC, IPTS, Seville). They were supported by Dr Ros Lynch (Department for Business Innovation & Skills, United Kingdom) as rapporteur. It further received research assistance from Dr Sergey Filippov, Assistant Professor of Innovation Management at Delft University of Technology and Non-resident Fellow of the Lisbon Council.
93 Hargreaves (n 12).
94 Expert Group (n 84) 33.
figures proffered in the report were based on data from a JISC report95 which applied only to the UK, and did not take into account either the variation between different European countries’ research outputs or the unique position that the UK occupies as a research centre. Nonetheless, the Expert Group report then went on to use the JISC figure (of 2%) and apply this to the EU as a whole, to increase the real value of EU research output by €5.3 billion. It used an elasticity estimate from Guellec and Van Pottelsberghe’s 2004 OECD economic study96 to model the long- term impact of a change in the volume of R&D expenditure, which came out at a €32.5 billion increase in terms of innovative products, productivity and consumer welfare increases.97 It did not, however, state over what length of time this long-term increase would apply. In addition, this would be a minor increase in terms of proportion of EU GDP – only 0.26%.
On the back of these assertions, the report proposed three action points, the first of which was licensing initiatives.98 The second was a wide-ranging exception for TDM, which removed it entirely from the scope of European copyright and database law.99 This was, however, an interim suggestion, to be implemented pending the progress of the paper’s third suggestion, a large- scale reform of European copyright law.100
The Expert Group dismissed almost out of hand the possibility of licensing options, citing the needs of digital age researchers, who ‘require legally reliable research access to many types of database, spread across numerous media platforms, disciplines, organisations and countries’.101 While this is not untrue, to dismiss the possibility of licensing is to deny rights holders a level of control over their protected works, and also fails to take into account joint enterprises such as the aforementioned PLSclear TDM engine and CrossRef.
With regard to a TDM exception, the report came to some of the same conclusions that will be put forward later in this chapter. The distinction between commercial and non-commercial research can, at times, be hard to make. Further, the definition of ‘scientific’ is also difficult to ascertain. The report concluded that the only viable exception would be one which applied to all scientific researchers, both commercial and non-commercial, in order to avoid confusion,
95 ibid 25.
96 Dominique Guellec and Bruno Van Pottelsberghe, ‘From R&D to Productivity Growth: Do the Institutional Settings and the Source of Funds of R&D Matter?’ (2004) CEB Working Paper No 04/010. 97 Expert Group (n 84) 33.
98 ibid 65.
99 ibid 66.
100 ibid 67.
101 ibid 65.
Chapter 6: The Hargreaves Exceptions – A Case Study
and to improve the status quo.102 However, the distinction between copying expressive works and the incidental copying which occurs as part of the data analysis process would still be difficult to draw. It is on the basis of this difficult distinction that the report’s made its ultimate recommendation – a new Copyright Directive.103 Thus, while it recommended the implementation of a ‘specific and mandatory exception to remove text and data mining for scientific purposes from the reach of European copyright and database law’,104 this was only to be considered as a medium-term amelioration pending the new copyright Directive.
This recommendation, however, was incomplete – it was a recommendation that did not take into account the nuances involved in scientific research. While the distinction between scientific non-commercial research and commercial research can be difficult to make, to create an exception which applied to all scientific research would be heavy-handed and would not take into account the distinction in purpose behind commercial and non-commercial research. Further, the recommendation of the exception also failed to consider the crucial forerunner of whether or not the researching party would have lawful access to the material to be analysed. To allow TDM without lawful access to material would be overreaching on the part of the legislature, which, rather than enhancing the value of European scientific research, could undermine it entirely. As has been established many times throughout this thesis, copyright walks a balance between access and incentive, and distinctions such as lawfully owning a content of work to be analysed would be a vitally important one in maintaining that balance.
The second study,105 by De Wolf and Partners,106 assessed the legal framework of TDM in Europe. It considered the possibility that data analysis could fall under the current copyright and database legislation, before concluding that this would largely be impossible.107 It did point out that TDM, in part, could be justified under existing exceptions, as a part of non-commercial scientific research, but the lack of a specified exception could lead to confusion for researchers. It stated very clearly that more research would be needed, due to the fact that TDM as a research tool was so new.108 It then moved on to consider the possible construction of a new European
102 ibid 68.
103 ibid 68.
105 De Wolf and Partners (n 85).
106 De Wolf and Partners is a Belgian/Luxembourgish legal firm that specialises in corporate, commercial, TMT, tax, employment and real-estate law. It often produces high-level European reports, including in the area of intellectual property reform.
107 De Wolf and Partners (n 85) 41-84. 108 ibid 96.
data mining exception, taking guidance from existing legislation in the InfoSoc Directive,109 and the Database Directive110 and their provisions on both copyright and sui generis rights.111 The recommendation of the report was that, if a TDM exception were to be implemented, it should apply to non-commercial research which is mainly scientific in purpose, and that any requirement to acknowledge the source should be left to individual data miners.112 The report further suggested, inter alia, that such an exception should be made mandatory across all member states, unlike the InfoSoc exceptions, as this would lead to greater harmonisation in the EU, and would also simplify the possibilities of cross-border research,113 and lastly suggested that it should not be possible to override the exception by contract, much like the UK exceptions discussed earlier in this chapter.114
The De Wolf and Partners report was cautious in its approach, and extensive in its analysis, both of the current law, and the potential for change. Its suggestions were built on the need for further research into the area, and considered also the need for balance in the implementation of any new exceptions. This author disagrees with the need for a mandatory pan-European exception, especially given that the scientific research exception in InfoSoc has not been uniformly implemented, but respects that this would lead to greater clarity and more ease of cross-border research across the EU.
The third report, by Charles River Associates (CRA),115 was not limited solely to considering TDM. Rather, it considered a variety of possible exceptions to copyright law, including remote access, private copying, TDM, and library e-lending. It considered the licensing context in the EU at the time, and pointed out that for each possible TDM project, the time input in contacting the rights holder in order to seek permission to mine would be considerable.116 As well as this, many publishers had not yet developed TDM licences and thus the costs associated with obtaining a TDM licence could vary. This uncertainty could then make undertaking mining projects difficult, time-consuming and intimidating. The study made a strong point, however,
109 InfoSoc Directive (n 1).
110 Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases  OJ L77/20.
111 De Wolf and Partners (n 85) 99.
112 ibid 100.
113 ibid 107.
114 ibid 106-113.
115 Charles River Associates is a global consulting firm, headquartered in Boston, which offers economic, financial and strategic consulting to major law firms, corporations, accounting firms, and governments around the world.
116 Charles River Associates (n 86) 16-17.
in pointing out that the market for TDM licences was nascent – publishers could not be expected to have developed licences to feed a demand that did not exist only a few short years previously.117 It further pointed out that there was no empirical evidence which stated the demand for TDM licences.118 However, the market was already adapting, as initiatives which have already been mentioned, such as PLSclear TDM, CrossRef’s automated TDM initiative, and the commitment by scientific, technical and medical (STM) publishers to a standard TDM licence indicated.119
The study assessed four separate policy options for TDM:
Maintaining the status quo for TDM
A specific harmonised and mandatory exception for text and data mining for thepurpose of non-commercial scientific research in the absence of a licence agreementenabling text and data mining
A specific harmonised and mandatory exception for text and data mining for thepurpose of non-commercial scientific research
A specific harmonised and mandatory exception for text and data mining for thepurpose of scientific research (both commercial and non-commercial)120
Each of these four options was individually assessed, with the research concluding that options 3 and 4 (both of which were non-overridable exceptions) would not be warranted.121 It is worth noting that option 3 would be an extension to the entire EU of the situation which was in place in the UK since the implementation of the new TDM exception in 2014. The report made the point that, especially with regard to an exception which was both commercial and non- commercial, removing the potential for financial gain from TDM would then disincentivise publishers from developing the TDM-specific structures which would allow TDM to flourish, and further benefit the EU as a whole.122 This exception, which at first glance seemed nothing but positive, could then lead to more difficulty in utilising TDM techniques. With regard to option 1, the report pointed out that maintaining the status quo could leave high transaction costs in place. It did acknowledge that new developments were emerging and maintaining the status quo could still encourage investment in TDM infrastructure.123 It did not state that
117 ibid 70.
118 ibid 65.
119 ibid 70.
120 ibid 74-81. 121 ibid 78, 80. 122 ibid 72.
123 ibid 74.
maintaining the status quo would be impossible, or even undesirable, but maintained that it would not be the most preferable outcome for Europe.124
The recommendation which received highest priority was the implementation of a specific harmonised and mandatory exception for TDM for the purpose of non-commercial scientific research in the absence of a licensing agreement enabling TDM.125 This exception, option 2, in contrast to options 3 and 4, could be over-ridden by the implementation or offering of a licence by a rights holder. This would apply only where there was lawful access to the content being mined – ie where the researcher or research institute had a subscription to a particular database. This option would still allow publishers to provide licences for TDM, but would not prevent researchers from accessing content where a publisher did not offer licences for TDM. It was suggested in the study that this would have little to no effect on larger publishers, as they had established licences for TDM (particularly for scientific purposes) and thus the exception would not apply.126 It was further suggested that the exception would also encourage smaller publishers to amend their subscription agreements in order to add mining clauses, meaning that the effect on them would be limited only for as long as they had not yet included or provided such a licence. It pointed out also that content from smaller publishers was less frequently mined than that from larger publishers.127 The study lastly suggested that the loss of revenue by publishers affected by the exception would encourage them to develop licences, leading to lower transaction costs in the future.128
The detriment felt by the implementation of option 2 over maintaining the status quo would be limited, in that it would likely affect only smaller publishers, and even then only until they produced their own TDM licences. The increased production of TDM licences, the report suggested, would then likely reduce transaction costs.129 The recommendation would ensure that researchers were able to mine content, even where there was no licence available, but also maintain rights holders’ incentives, allowing the market the freedom to set its own rates. This would be an improvement over the status quo, as it would allow rights holders to develop their own TDM clauses, but also require that permission to mine be made available to researchers.
125 ibid 81. 126 ibid 75. 127 ibid 75. 128 ibid 75-6. 129 ibid 76.
Chapter 6: The Hargreaves Exceptions – A Case Study
This minimal level of intervention would allow the market to develop its own solutions in time, while still providing encouragement for rights holders.130
Of course, the discussion of TDM has not been limited solely to reports commissioned by legislators. Other interested parties have also been making their opinions known, across Europe. By way of example, in July 2014 the French Council for the Protection of Literary and Artistic Property (CSPLA)131 published the report of their ‘Mission sur l’exploration de données’,132 proffering their opinion that legislative initiative on TDM would be premature, and the market should be encouraged to develop its own licensing solutions.133
The Mission report, written by Jean Martin and Liliane de Carvalho, gave first a description of TDM, with specific reference to the subject matter concerning the CSPLA (that is to say, artistic and literary works), pointing out that the area had been slower to adapt than scientific journals, and had adapted differently – the notion of Open Access meant that for 50% of journal articles, mining was already permitted.134 The report then went on to note that the authorisation of rights holders, without legislative intervention, was still required for mining activities, presenting a logistical problem where the quantity of objects to be mined was great.135 However, it also noted that the value of TDM was more than that of a parasite – TDM could potentially create and add further value, justifying the use of TDM techniques.136 Thus, it suggested, a balance must be struck between the often intimidatingly high transaction and administrative costs and the value that TDM could create in the future. It also praised the Licences for Europe movement, mentioned earlier in this chapter, which created a standard TDM clause, along with commitments to implement this licence by multiple publishers.137
The discussion of TDM in this report is as extensive as those commissioned by the European DGs, but considered from a different angle. The final product of the report was that it made twelve recommendations, aimed at various different aspects of TDM, from changing the
130 ibid 77.
131 More correctly, ‘Conseil supérieur de la propriété littéraire et artistique’, but translated to English for ease.
132 Jean Martin and Lilian de Carvalho, ‘Mission sur l’exploration de données’ (July 2014) < artistique/Conseil-superieur-de-la-propriete-litteraire-et-artistique/Travaux-du- CSPLA/Missions/Mission-du-CSPLA-relative-au-text-and-data-mining-exploration-de-donnees> accessed 30 March 2015.
133 ibid 4-5.
134 ibid 10.
135 ibid 16.
136 ibid 18.
137 ibid 6.
perception of TDM from parasitic to symbiotic, to creating a public policy designed to manage the specific rights required for mass use of works for TDM, including prioritising self-regulation over legislative intervention, allowing access to public data for TDM, and maintaining that non- legislative conviction within the European Union and WIPO structures.138 The report also recommended that a two-year deadline be set to prepare an industry report on the possibility of an eventual need for legislative intervention, but maintained that legislative intervention at the point of publication (July 2014) would be premature. A move towards legislation is easy to implement, but hard to retract, and the recommendations of the CSPLA report, which included a review in two years, would allow the market enough time to adapt to what was still a very new development, while showing a willingness to co-operate with miners, reusers, and content adaptors in the future. The CSPLA report was indicative of the willingness of rights holders to engage with content miners, and come up with solutions to allow greater use of TDM, but also demonstrates the reluctance to embrace a mandatory legislative exception to TDM – this would remove all incentives for rights holders to cooperate in any way, and may then damage the prospects of TDM in the future.
Another viewpoint worth considering is that of librarians. As mentioned earlier in this chapter, librarian representatives walked out of the 2013 Licences for Europe working group on TDM, as they felt their concerns were not being heard.139 The European Association of Research Libraries, (LIBER )140 continued to campaign for a European TDM exception, including in their response to the 2014 European Consultation on Copyright.141 In February of 2015, former LIBER president Paul Ayris met Commissioner Oettinger, the Commissioner for Digital Economy and Society, to campaign for a mandatory, pan-European, non-overridable TDM exception to European copyright law, citing issues of cross-border work and damaging the European single market if this were to be left to individual Member States to implement, like the InfoSoc copyright exceptions.142
The variety of opinions available from interested parties, librarians, researchers, content owners and independent research bodies meant that the European Parliament had a fine line to walk
138 ibid 4-5.
139 Letter from participants (n 81).
140 LIBER actually stands for Ligue des Bibliothéques Européenes de Recherche.
141 LIBER, ‘LIBER Response to the Public Consultation on the review of the EU copyright rules’ (2014) <> accessed 30 March 2015.
142 Paul Ayris, ‘LIBER Argues For Pan-European TDM Exception’ (LIBER, 23 February 2015) <> accessed 30 March 2015.
in order to reach a solution acceptable or at least tolerable to all interested parties, which would continue to promote European growth, research and profitability. The benefits of effective TDM are difficult to quantify in either financial or public good terms, but there seemed to be little disagreement that there was distinct potential. Thus, on such a hot-button topic, Europe was under an intense amount of pressure to find a solution which would result in maximised benefits through the effective use of TDM, and it seems that there is no easy answer to this quandary.
Text and data mining is a research tool that has the potential to speed up research in all areas of life, with possible untold benefits for all interested parties. However, enabling free use of TDM is fraught with issues, as it must walk a line between the interests of researchers and content owners, while preserving the free market. There are many issues that could arise with the implementation of a TDM exception, which may need some guidance before the exception could come into effect.
For example, there may be difficulty in defining what exactly scientific research is, and it is not suggested where such a definition or distinction should come from. The layman’s understanding of scientific research would suggest testing of a hypothesis within the hard sciences – so-called for their quantifiable data produced, testable and reproducible via the scientific method, generally understood as being the natural sciences (astronomy, biology, chemistry, earth science, and physics). Of course, this definition could also be expected to include STEM subjects (science, technology, engineering, and maths) or STM (scientific, technical, and medical) research. It is clear that even within the constraints of hard science, there is the possibility for variation between subject areas. Nonetheless, the layman’s understanding fails to take into account the possibility to open to expand the term scientific research to its broadest possible interpretation, which could encompass also soft sciences, which could include social science, political science, and psychology.143 Social science in particular is an expansive area, encompassing economics, history, anthropology, law, linguistics, geography, education, and more subject areas.144 A narrow definition of scientific research would prevent many university and third-level non-commercial researchers from taking advantage of any potential exceptions or initiatives for scientific research, which would, in all likelihood, be a misinterpretation of the
143 For a discussion of some of the differences between hard and soft sciences, as well as a mention of the sometimes artificial distinctions between them, see Larry V Hedges ‘How Hard is Hard Science, How Soft is Soft Science?’ (1987) 42(2) American Psychologist 443.
144 Norman W Storer, ‘The Hard Sciences and the Soft: Some Sociological Observations’ (1967) 55(1) Bull Med Libr Assoc 75.
intention behind any exception. Furthermore, as the De Wolf and Partners report pointed out, limiting a TDM exception to solely scientific research would prevent many projects which have as their main aim scientific research from completing the entirety of their project, as secondary objectives which may veer more toward market research would not fall under the exception.145
A second problem to be tackled would be the issue of delineating between commercial and non- commercial research. Such a distinction would likely be difficult to draw, and guidance would need to either be vague, in the form of general principles, or very specific – even still, it is certainly not outside the bounds of imagination that a lack of clarity would lead to researchers relying on an exception which may not apply to them, if they are ‘fringe’ cases.
The suggestion of implementation of an exception for TDM could be considered as premature. According to guidance from the Green Book, exceptions to copyright law should only be implemented where there is a market failure.146 While the market for TDM is nascent, it has not failed entirely. It has not yet been given time to grow. There are multiple initiatives in place to provide licences for mining, both individually by publishers,147 and through larger co-operative initiatives.148 There is thus no immediately obvious need for an exception for publishers, and indeed implementing an exception could lead to further issues in distinguishing between scientific and non-scientific, and commercial and non-commercial research. Furthermore, rights holders are price discriminating, as stated in the CRA study – they often grant TDM licences free of charge to non-commercial researchers149 and as the UK PLS survey stated, only a small proportion of publishers were receiving TDM requests, due to services such as PLSclear TDM and CrossRef and the inclusion of TDM clauses in standard licence agreements.150 This then alleviated the difficulty of licensing which was faced by potential TDM users in previous years.
The suggestion made by multiple parties, including De Wolf and Partners and LIBER, that a potential exception should be mandatory for all EU Member States would lead to a scenario in which all member states would have an exception for TDM, but not for scientific research in general, as there is not yet harmonisation of all the InfoSoc exceptions across the EU.151 This
145 De Wolf and Partners (n 85) 62.
146 Green Book (n 3) 11.
147 For example Elsevier has a mining API, and multiple publishers have signed up to the Licences for Europe standardised mining clause.
148 PLSclear TDM (n 60) and CrossRef (n 58), for example.
149 Charles River Associates (n 86) 64.
150 PLS (n 75).
151 Eleonora Rosati, ‘Copyright in the EU: in search of (in)flexibilities’ (2014) 9(7) JIPLP 585.
scenario would then mean that certain countries would be put in the situation where TDM is permissible, but other aspects of scientific research are not, a strange scenario by any reach of the imagination.
Finally, the increase in Open Access (OA) means that a large proportion of scientific research is already available under a licence which allows reuse and transformative uses.152 In those circumstances, a TDM exception is redundant, as it does not place any onus on the rights holders to allow anything which they had not already allowed. Furthermore the UK requires that journal articles and conference proceedings published after 1 April 2016 must be made OA in order to qualify for the Research Excellence Framework, and encourages all institutions to implement an OA policy in advance of this date.153 This means that most, if not all, university and third-level researchers will make their research available OA online, negating the need for a TDM exception in order to mine this content. Similar policies may well be implemented across other parts of the EU, as the move towards OA is gaining momentum worldwide.154
As the non-commercial TDM exception in the UK has already been implemented, the issues of defining scientific research and non-commercial purposes may well be laid out for consideration in future case law. The existence of the UK TDM exception can act as a model, example, or even cautionary tale for the implementation of a wider European TDM exception. We can see from the implementation of this UK exception that there may well be no need for a European-level initiative to implement a new TDM exception, as it is possible to frame a TDM exception within the exceptions already permitted under the 2001 InfoSoc Directive (in the UK’s case, under the exception for Research and Private Study).155 Thus, the immediate implementation of a new instrument creating a TDM exception would be redundant. Any move toward a greater TDM exception in the EU should be approached thoughtfully and through a dialogue between stakeholders on all sides of the debate. The possibility of implementation of TDM exceptions in Member States under current legislation means that there is no need to rush to bring in a pan- European TDM exception, and thus legislators can afford the luxury of time, discussion, and careful consideration before deciding whether or not such an exception is a necessity.
152 Martin and de Carvalho (n 128) 10 suggested that more than 50% of journal articles were open access. 153 HEFCE, ‘Policy for open access in the post-2014 Research Excellence Framework’ (July 2014) <> accessed 30 March 2015.
154 For more, see chapter 7.
155 Of course, it is possible that this exception may be judicially reviewed, like the Private Copying exception, but this remains to be seen.
In conclusion, while there may be criticisms made of the UK TDM exception, it may in the future prove to be a good test case in order to allow for greater consideration and careful planning of any potential implementation of a greater TDM exception in the EU as a whole. The wide variety of opinions and expert reports available on the topic of TDM, as well as the strongly-held and research-supported views of interested parties, including rights holders, researchers and other related parties such as libraries, mean that any legislative intervention on the topic of TDM in the EU should be carefully considered, using the UK as an example of how things may pan out in an EU environment. There is little doubt that TDM is an exciting and potentially game- changing research tool which may redefine the landscape of scientific research, but to heedlessly update copyright law without careful consideration of the delicate and nuanced balance which is required between rights holders and researchers could have adverse consequences which would counteract the financial and welfare benefits that TDM could offer. Furthermore, as a variety of research techniques, and applying to a range of types of data for assessment, TDM runs into further issues with regard to privacy, data protection, ownership of data, and myriad other issues which cannot be resolved with a blanket exception.
For these reasons, the author feels that a pan-European exception for TDM would be premature, and could detrimentally affect the development of TDM in Europe. Thus, empirical research should be conducted and carefully assessed before changes are made to the European copyright structure. This should focus on the possible implications of a mandatory exception against an optional exception, whether the exception applies to non-commercial only or all research, scientific, mainly scientific or all research, and whether it can be excluded through contract. The current framework allows individual member states to implement TDM exceptions if they so wish, and those who choose to do so will have the example of the UK to use as a framework on which to base their own exceptions. The author believes that the introduction of non- commercial scientific research exceptions for TDM across EU member states could be beneficial for all parties involved, both on the rights holder and researcher side, but without empirical evidence, this is only an opinion.
Furthermore, while TDM exceptions as suggested and discussed in this chapter may well improve access to scientific literature which is available through database and journal publication, there is a wealth of privately held data which would likely be more difficult to access, and thus could be a better case for a TDM exception. The lawful access requirement for the current TDM exception in the UK means that only data which is available for purchase can be mined. Thus, a carefully framed exception which improves access to other datasets, such as NHS patient records, may also be a candidate for consideration. Of course, given that this would
also interact with issues of privacy and data control, it would be a more involved and lengthy process than an exception which contemplates only data which is available for purchase. Ultimately, though, it would be an exception which would be more beneficial not only to researchers, but potentially also to humanity as a whole.
While the UK TDM exception can be used as a test case, on the other hand, the implementation of the private copying exception can be used as a demonstration of how not to implement an exception. Despite the example of several other European countries’ private copying levy schemes, the UK declined to include a similar levy scheme. They also employed flawed research and impact assessments, which wrongly denoted the amount of harm to rights holders which would result from such an exception. As such, despite the thirteen-year gap between the ability to implement the exception156 and its actual beginnings,157 and the almost-four years from the publication of the Hargreaves Review158 to the private copying exception’s debut, this can still be seen as a cautionary tale as to the wisdom of failing to properly research the implications of certain exceptions. As a result, while there is still a market failure regarding private copying, there is now no private copying scheme in place in the UK, and no intention to rectify this.159 Thus, in the future, the UK government must be sure to obtain objective and verifiable evidence of the effect of implementing exceptions to copyright before proceeding, lest the exception be quashed less than a year after its coming into effect.
156 Via the powers of the InfoSoc Directive (n 1), which was 2001.
157 The exception came into effect on 1 October 2014 (n 14).
158 The Hargreaves Review (n 12) was published in May of 2011.
159 Pinsent Mason, ‘UK government scraps plans to legalise private copying’ (Out-Law.com, 18 November 2015) < private-copying/> accessed 18 December 2015.