Try it on: Voice, concordancing and text-matching in doctoral writing

Appropriate use and acknowledgement of sources continues to be a central concern of academic integrity. A major challenge for research students in using sources is the development of a confident authorial voice that matches disciplinary expectations in language use, yet manages to do so without plagiarising through inappropriate textmatching or recycling of language. This is a daunting challenge for all research writers, particularly for English as an Additional Language (EAL) researchers who are still grappling with English grammar and syntax. In order to develop novice research writers‟ understanding of acceptable use of sources and mastery of disciplinary language, we have developed a process called “Try it on” that uses concordancing software alongside text-matching software (Turnitin). Here we present textual analyses of two cases using this process: in one, the student‟s percentage of matches decreased as he developed his authorial voice; in the second, the percentage of matches increased as the student‟s language choices came to reflect more closely the expected usage in the discipline, thus replicating the expected authorial voice for that particular audience. These cases demonstrate how “Try it on” can be used to help students write in an appropriate authorial voice while also avoiding plagiarism.


Introduction
Appropriate use and acknowledgement of sources continues to be a central concern of academic integrity. However, this is more complex than simply avoiding the copying associated with plagiarism and patchwriting (Chandrasoma, Thompson, & Pennycook, 2004;Howard, 1995;McCulloch, 2012;McGowan, 2008aMcGowan, , 2008bPecorari, 2003;Pennycook, 1996); sources also contain the disciplinary and academic language that writers must necessarily use. Thus novice research writers are forced to navigate a complex set of contradictory expectations before achieving a confident authorial voice. On one hand, they must make an original contribution in their "own words" and their own voice (Stolley & Brizee, 2010); on the other hand, they are required to acknowledge previous research using the standardised, highly formulaic language of their disciplinary discourse community. That is, they must "be original, but not too original" (Picard & Guerin, 2011, p. 221). Most novice research writers find this challenging, but it is particularly difficult for those English as an Additional Language (EAL) researchers who are not yet entirely confident or comfortable in manipulating English grammar and syntax. In order to develop novice research writers" understanding of acceptable use of sources and mastery of disciplinary language in relation to authorial voice, we have developed a process called "Try it on" that uses concordancing software alongside the text-matching system Turnitin. The text-matching capacity of Turnitin is increasingly being used as an educational tool (see, for example, Davis & Carroll, 2009;McCarthy & Rogerson, 2009;Rolfe, 2011). However, our process of "Try it on" is innovative in its addition of concordancing software as an extra step in this pedagogy.
In this paper we present textual analyses of two cases where research students used Turnitin in conjunction with concordancers to develop an appropriate voice in academic writing. In the first case the student"s percentage of matches decreased as he developed his "authorial presence" (Zhao & Llosa, 2008, p. 159) and disciplinary and "situational voice" (Ede, 1992). In the second case, the percentage of matches increased as the student"s language choices came to reflect more closely the expected usage in the discipline. We demonstrate how our process of "Try it on" helps students understand how to avoid plagiarism while developing a suitable authorial voice in their writing that recycles the expected disciplinary phrases and expressions.

To match or not to match?
For many novice research writers, the difficulty lies in knowing how much text needs to match other research writing (to meet disciplinary/academic expectations) and how much is too much (and therefore constitutes plagiarism). We demonstrate how Turnitin plus concordancers can illuminate this distinction for EAL research students who are in the process of discovering how to write in their own voice within the conventional language use of their discipline. The notion of voice, though, is a slippery one, and requires some exploration.

Voice
Discussions of voice tend to fall into two general categories: those that focus on voice as self-representation and expression of identity (see, for example, Flowerdew, 2000;Gale, 1994;Grobman, 2009;Hirvela & Belcher, 2001;Matsuda & Tardy, 2007;Ramanathan & Atkinson, 1999;Thompson, 2005); and those that focus on the linguistic and rhetorical strategies mobilised to create authorial voice in a text (see, for example, Chang & Schleppegrell, 2011;Harwood, 2005;Ivanic & Camps, 2001;Taş, 2010;Zhao & Llosa, 2008). For our purposes it is the intersection of these approaches that is of interest in terms of supporting doctoral candidates in their writing practices, and allows us to explore how identity is constituted in writing (Aitchison, 2009(Aitchison, , 2012Baker & Lattuca, 2010;Boud & Lee, 2005;Caffarella & Barnett, 2000;Catterall et al., 2011;Kamler & Thomson, 2008;Lee & Boud, 2003). Indeed, it is precisely in order to highlight the academic identity or "academicity" (Petersen, 2007) that is coming into being during doctoral writing that we draw students" attention to the concept of voice.
For all its imprecision as a term (DiPardo, Storms & Selland, 2011), we choose to evoke the notion of voice because "as a metaphor [voice] has to do with feelinghearing-sensing a person behind the written words, even if that person is just a persona created for a particular text or a certain reading" (Bowden, 1999, quoted in Hirvela & Belcher, 2001. That is, it helps our students to notice this aspect of writing that requires them to adopt a particular identity or persona for a given situation (in this case, thesis writing). In their influential paper on mature EAL writers, Hirvela and Belcher (2001, p. 89) take up Ede"s (1992) concept of the "situational voice", investigating how writers adopt different voices for different texts and purposes, much as they might put on different outfits for different occasions. A related metaphor is employed by Ivanic and Camps (2001, p. 21), who posit that the choices of "voice types" available in academic writing are pre-determined by the "disciplinary discourse communities they are entering, like second-hand clothes waiting to be selected and given new life when worn by someone new". Academic writing, then, can be understood as an act not only of putting on different outfits to suit particular occasions, but also of choosing that outfit from a selection of second-hand clothes that have been worn by academics before us. The value of these metaphors lies in their capacity to draw our attention to the ways in which academic texts borrow and recycle language, bringing together our concerns here with both voice and plagiarism.
How is voice constituted in the text? Ivanic and Camps (2001) map out a detailed framework to explain the concept of voice by building on systemic functional linguistics, delineating between ideational, interpersonal and textual positionings of the authorial voice and articulating the linguistic realisations of each of these categories. On closer inspection of their material, it becomes apparent that "voice" is created out of all the enormous range of language choices made in writing any given text -vocabulary, verb tenses, modality, evaluation, linking, references to other texts and individuals (whether general or specific), classification and structuring of information, and choice of pronouns.
In an attempt to pin down the elements of voice in a manner that could be effectively used to assess voice in student writing, Helms-Park and Stapleton (2003) developed a "Voice intensity rating scale". Working from this, Zhao and Llosa (2008, p. 160) assessed four main components of voice in their study: 1. assertiveness -hedging language, intensifiers; 2. self-identification -pronoun use, active voice; 3. reiteration of central point -frequency and explicitness of presentation of central ideas; and 4. authorial presence and autonomy of thought -presentation of alternative viewpoints, and "reader"s impression of the overall authorial presence in a particular piece of writing".
As students start to make progress in developing their own voices, this list can provide a relatively straight forward focus for examining voice in their writing.
Important for our purposes is an awareness of the relationship between author, text and reader. Our EAL students soon discover that their readers and discourse communities can hold "rigid prohibitions against allowing different voices using nonconventional and nonnative forms of language and rhetoric" (Cho, 2004, p. 50). Attempts to publish or submit theses for examination in those "different voices" are rarely accepted on their own terms. While one might well be troubled by the politics and power relations this implies, we would argue that we still owe it to our students to help them understand the conventions in order for them to make informed decisions about how they choose to present their writing. There are certainly examples of successful manipulation of new and different voices in academic writing (see, for example, Viete & Ha, 2007), but this is a complex and often risky business. In terms of the clothing metaphor outlined above, this might be equivalent to wearing a ball gown and tiara to a small, informal academic seminar -beautiful undoubtedly, but sending to onlookers a rather different message about one"s purpose from the jeans and shirt more usually adopted for the occasion.
It is useful to put this unreceptiveness to different voices alongside Ivanic and Camps" (2001, p. 31) reminder that academic writing is in fact a "creative recombination of voices". As academic developers we can harness this concept for our students" benefit: "Learner-writers can discuss the aspects of voices they encounter in source texts that they would like to adopt and those they would like to avoid...the role of source texts in learning to write can be acknowledged and used creatively rather than dismissed with the moral outrage associated with "plagiarism"" (Ivanic & Camps, 2001, p. 31). Thus, it is through a focus on voice that students can develop aspects of their writing that in turn allow them to participate effectively in their disciplinary and discourse communities.

Why does voice matter?
Although many doctoral candidates take a considerable length of time to develop an appropriate academic voice in their writing, this is an even greater hurdle for EAL students; indeed, it is argued elsewhere that this might in fact constitute a "threshold concept" in doctoral education (Guerin & Green, 2012). Selection of the appropriate levels of formality and assertiveness, signposting of central ideas in the argument, employment of jargon and technical terms, and assertion of authority in texts are not always obvious or transparent to novice writers and require experience and experimentation. Getting this right is crucial, however, in that it facilitates access to a desirable identity as a legitimate member of the targeted discourse community (Cho, 2004;Flowerdew, 2000). Ivanic and Camps (2001, p. 31) explain that, "For those learning to write in a second language, there may be a double demand for critical awareness: firstly, recognizing the extent to which the voice types supported by the new language are culturally acceptable or culturally alien to them, and secondly, recognizing differences between voice types associated with the range of genres and discourses to which they are exposed in the new culture." These challenges are borne out in the study conducted by Hirvela and Belcher (2001, p. 83), which recognises the importance of existing authorial voices in "mature multilingual writers". Their subjects are much like our EAL research students, who also find themselves in a situation where they are expected to adopt new identities that can in fact be at odds with the self they have previously successfully represented in their academic writing in other languages. For example, as lecturers in their home countries, they presented their ideas with ease and authority to a student audience, but now as doctoral candidates they are expected to justify themselves to examiners; or their previous suitably deferential attitude towards established experts is suddenly regarded as lacking in critical awareness and judgement. These kinds of difficulties in establishing the appropriate academic voice can be particularly irksome when it comes to writing the discussion and analysis sections of the doctoral thesis (Bitchener & Basturkmen, 2006). Thus, EAL research students frequently need direct instruction in how to write about their discipline and their research within the field in ways that match examiners" and reviewers" expectations. Concordancers can be harnessed in particular ways to aid this instruction.

Concordancers
The use of concordancers in corpus study was introduced by English for Academic Purposes teachers in order to explicate lexical and grammatical patterns, and more recently it has been used in direct explorations of disciplinary language patterns (Cargill & Adams, 2005;Cheng, 2008;Conroy, 2010;Yoon, 2011). In highly specialised fields, corpora (bodies of written works) need to be tailor-made for the precise discipline (Lee & Swales, 2006). For novice writers working towards an appropriate academic voice, corpora and concordancers can provide an invaluable source of information about idiomatic or disciplinary use of English. Concordancers are designed to show words in context, and are thus useful for identifying collocations (words that go together) and grammatical patterns. This is particularly important for EAL writers seeking to enter discourse communities that are unsympathetic to the "different voices" mentioned above. At the same time, these writers need to develop a better understanding of the appropriate (that is, unplagiarised) recycling of language elements in the sources they employ. As Eira (2005) has demonstrated, research writers are engaged in a process of "obligatory intertextuality". The difficulty lies in knowing just how much text needs to match (to meet disciplinary/academic expectations) and how much is too much (and therefore constitutes plagiarism).

Our Process -"Try it on"
In preparing ways for our students to learn how to work with disciplinary voices, we are guided by the principles of "learner autonomy" and "scaffolded learning". Learner autonomy is well established as an important element of language learning (Palfreyman & Smith, 2003), and is closely linked to other kinds of independence required of doctoral candidates. However, such autonomy requires initial nurturing if it is to prove robust. A scaffolded approach to writing instruction has been demonstrated as effective for university students in academic English language programs (see, for example, Cotterall & Cohen, 2003), even at doctoral level (Kamler & Thomson, 2006). In breaking down the steps and working with students at each stage, the expectations and requirements of academic writing can be more easily understood, integrated into the writing produced, and later applied in new situations. Together, these approaches provide guided steps towards independent research writing.
To provide a scaffolded approach to encouraging learner autonomy in research writing, we have developed a process that uses Turnitin in conjunction with concordancers to help EAL research students develop an appropriate voice in academic writing. At present it is tentatively known as "Try it on", reminding us of the range of matching choices and their recyclability (as well as the persona/clothing being tested out to see how it looks and feels). Below is a brief summary of the process we have described in detail elsewhere (Picard & Guerin, 2011).
Step 1: Text-matching The first step of Try it on involves writers running their text through plagiarism detection software (e.g., Turnitin), and then categorising the text-matches according to type (see Table 1). The document is revised in response to these decisions. For most students, some initial guidance in this process is useful as they learn to interpret the Turnitin report. Most importantly, they need to understand that the actual percentage match is not the issue; rather, it is the type of match that is significant. In some cases we have seen a considerable percentage of matches to student papers on unrelated topics which our students have clearly never read. In such cases, the matches are to standard academic phrases of the kind commonly taught to EAL students in English for Academic Purposes classes. For example, students learn some standard strings of words such as "has been used to develop a theoretical framework", "although there are a number of studies", "the existing body of knowledge" and "to the best of my knowledge". A comfortable manipulation of standard, idiomatic phrases is precisely the kind of intertextuality that Pecorari and Shaw (2012) identify as being integral to scholarly writing. Table 1 Text-match categories (Picard & Guerin, 2011, p. 228)

Too close
The text is too similar to the source and needs to be paraphrased or rewritten.

Not relevant
Some other text has been highlighted, e.g., a formula or a bibliographic reference.

Discipline-specific
phrase This is the way that concept must be expressed in this context.

Unsure
Anything else the student does not know how to categorise.
Concordancers can be used to confirm the author"s belief that a phrase does in fact match discipline-specific language and that a particular example can indeed be found in numerous published articles on the subject. It is the last category, "Unsure", that is particularly beneficial in developing the student"s voice when taken to the next step where the concordancer programs come into play.
Step 2: Concordancing Writers can use an existing corpus (a body of written works) such as the Springer Exemplar (available at http://www.springerexemplar.com), or create their own corpus using the articles they have cited in their literature reviews. In the latter situation, students will also need to download concordancer software such as ConcApp© (available for purchase) or AdTAT© (the latter is a freeware program that has been developed at the University of Adelaide: http://www.adelaide.edu.au/red/adtat/). Students then search the corpus for instances of the phrases that have been identified as unoriginal by text-matching software such as Turnitin, but which they suspect are legitimately recycled (that is, they might be discipline-specific terms or standard academic expressions). If the phrase appears a number of times, it is likely that, rather than constituting plagiarism, it is the accepted language used by the discipline. On the contrary, if it appears only once, it may indeed be a case of plagiarism.

Developing authorial voice
In what follows, we present two examples of how "Try it on" has been used to respond to different kinds of text-matching issues. In the first, there is too little sense of the student"s own voice; in the second, there is too little sense of the required disciplinary language.

Student 1. Too much matching = 48%
The first situation involved a student who exhibited the common errors that result from patchwriting (Howard, 1995) and poor note-taking, in which large chunks of text are copied directly from the source document and there is very little evidence of the author"s own contribution to the discussion. The major challenge here was to create a sense of authority in the writing through developing the student"s own voice; that is, the student must be encouraged to offer confident opinions and critical assessments of the published ideas, to present him/herself as a researcher entering into dialogue with the discipline (Chanock, 2008;Wisker, 2005). In Zhao and Llosa"s (2008) terms listed above, these elements draw on the first of their categories (assertiveness) and the fourth (authorial presence and autonomy of thought).
For this particular student, the first attempt at preparing a doctoral research proposal resulted in a 48% match in the Turnitin report, 26% of which was attributed to one article which served as the basis for establishing the research gap to be filled by the doctoral project. However, the other 22% comprised a range of matches to academic publications.
In establishing the research gap, this student relied heavily on a single text that was reproduced in the initial research proposal with only minimal changes or interventions from the student"s voice. A major task was for the student to consider what precisely he himself needed to take from this article, and to interweave the voices from other texts into the discussion as well. We advised the student to step back from the secondary sources, put on the "cloak" of authoritative researcher in the discipline, and to prepare an outline of his own proposal that allowed him to identify the central points he wanted to present in his discussion. This process stripped away the reliance on the plagiarised text and created a new structure for the argument that allowed for a more overt authorial presence value-adding to existing debates on the topic.
The revised text reduced the match to the plagiarised article from an initial 26% to 16%. While still recycling more material than is acceptable, the student is gradually developing an ability to organise his own ideas about the field and to comment on the material presented, as well as going to the original sources, rather than simply copying what the main article had to say about these other studies. Table 2 Text-matches for Student 1 Italics represent text-matches highlighted by the Turnitin report (that is, all of the text presented in the left-hand column). Underlined text draws attention to copied text remaining from the first report.
The left-hand column of Table 2 shows the text copied directly from one source that summarises the experiment performed in another study (that is, taken from the literature review of the key article). In the right-hand column we see that the student has now sought out the paper for himself, and has gleaned from it the points that are relevant to his own discussion. Unnecessary detail has now been eliminated, and the student has digested the material and integrated it into his own argument. The remaining matches are sometimes to disciplinary language confirmed through a concordancer search (e.g., "fluid time scale"), but other matches remain too close to the original source text to be acceptable. Clearly the student still has some distance to go before he is writing in a confident and appropriate academic voice, but we can at least see progress here.

48% matched text with Turnitin
-entire paragraph copied

16% matched text with Turnitin
-student"s own voice starting to appear Ishima et al. (1993a)

. Too original = 0%
The second story is that of an EAL doctoral student whose first attempt at writing a research proposal resulted in a 0% matching report from Turnitin. While this might be possible, even desirable, in some disciplines, such a report in many Science, Technology, Engineering and Mathematics (STEM) disciplines requires further scrutiny. It is highly likely that the text is in fact "too original" in that it does not use the idiomatic phrases and terms one would expect in standard academic writing. As Pecorari (2009, p. 102) reminds us: "one of the reasons why lexical bundles are of practical interest is that they are part of fluent, native-like expression, and are therefore an important aspect of language learning." Without matching or recycling these phrases, the writing can become awkward, even inaccurate, as well as lacking the intertextuality that is required of scholarly writing (Pecorari & Shaw, 2012).
In this instance, an academic developer worked with the student and supervisor through a number of drafts of the document to identify expressions that did not meet the norms of the discipline. While the text displayed the components of voice outlined by the "Voice intensity rating scale" (Helms-Park & Stapleton, 2003;Zhao & Llosa, 2008), there was still a significant amount of work to be done in terms of vocabulary choices, word forms, verb tenses and phrasing. Examples of the kinds of errors are presented in Table 3 below, comparing the second iteration (which returned a textmatch of 2%) and a later version (now up to 13% text-matching). Table 3 Text-matches for Student 2 Italics in the middle column represent text-matches highlighted by the Turnitin report. Underlined text draws attention to altered text from the first report leading to textmatch.

Reason for match in draft 2
There are different types of T effector cells.
To date, different types of T helper cells have been described Correct noun has now resulted in a match The differentiation of the T cell takes place in the thymus The differentiation of the T cell begins in the thymus.

Correct verb has now resulted in a match
And any defect in the Th2 cell type leads to atopic asthma and allergy. IL4 is majorly responsible for Th2 differentiation.
And any defect in theTh2 cell type leads to asthma and allergy [ref]. ... The cytokines TGF-β, IL6, IL-21 and IL-23 play a major role in the differentiation of naive T helper cells into Th17 cells.
Idiomatic term has now resulted in a match CD40L is highly expressed in these cells. CD40L activates, proliferates and differentiates B cells.
CD40L which is highly expressed on these cells help in the activation, proliferation and differentiation of B cells.
Nominalisation has now resulted in a match Throughout the document a range of inappropriate word choices were identified and rectified, including wrong verb or noun, unidiomatic phrases, and use of verbs rather than nominalised forms. Wrong choices in vocabulary can sometimes be the result of translation issues, or attempts to reword information using synonyms, but unfortunately result in unacceptable expression. The same applies to awkward reconstructions of unidiomatic phrases that, in the worst cases, obscure meaning. In such situations, one can sympathise with students who claim that the imperative to express ideas in their own words seems to lead to further complications for the author (Abasi & Akbari, 2008). However, the task here is to develop an authorial voice that more closely resembles the requirements of the disciplinary community.
In this particular case the student was instructed in the language expectations of the discipline as well as some of the more general norms of academic writing, for example, the preference for nominalisation and high levels of lexical density. The outcome was a revised text that more closely matched the expectations of the academic discipline and recycled the language that demonstrated the writer"s developing identity as a member of the target discourse community. Concordancing was used to verify the common usage of idiomatic phrases, which now result in the matches indicated in Table 3. Thus, the student"s voice as a "legitimate peripheral" (Lave & Wenger, 1991) member of the disciplinary community is gradually coming into being.

Conclusion
When we say we are concerned with students developing their own voices in writing as a way to avoid plagiarism, this can be understood as their process of developing a greater sense of confidence and authority in the persona they create in the text (Hirvela & Belcher, 2001;Matsuda & Tardy, 2007;Ramanathan & Atkinson, 1999;Thompson, 2005). We would argue that this is just as much a matter of learning how closely they need to resemble the standard language expectations of their discipline, as it is a matter of finding new words and phrases to express their ideas. Thus, it is a process of learning to distinguish between the elements of writing that are available for recycling and those which are not -which items of clothing are available from the second-hand rack, and those which must remain in the wardrobes of their original owners.