Comparative analysis of the sports chronicle quality written by artificial intelligence and journalists


Universidad de Castilla-La Mancha, Spain

Abstract

Introduction: In recent years, the presence of artificial intelligence programs in newsrooms has become standardised, which has allowed the writing of texts through algorithms. This research analyses the characteristics of sports chornicles written by artificial intelligence, comparing them with those written by journalists. The aim is to find out whether these types of texts have the same quality standards as those of chronicles written by journalists. Methodology: The methodology used is based on the content analysis of 28 journalistic chronicles, 14 written by journalists and 14 written by artificial intelligence programmes. In all cases, these are sports chronicles on matches corresponding to the Spanish Professional Football League (first and second division). Results: The results show that AI- generated texts are effective in collecting and ordering data and in reporting game action. However, they lack many of the qualities of a sports chronicle, which are present in texts written by journalists. Conclusions: The chronicles produced by artificial intelligence do not represent a quality contribution to the journalistic genre, lacking analytical or interpretative character, in both cases qualities traditionally present in sports journalism.

Análisis comparado de la calidad de crónicas deportivas elaboradas por inteligencia artificial y periodistas

RESUMEN

Introducción: En los últimos años se ha normalizado la presencia de programas de inteligencia artifi- cial en las redacciones periodísticas, lo que ha permitido la redacción de textos a través de algoritmos. La presente investigación analiza las características de las crónicas deportivas realizadas por inteligen- cia artificial, comparándolas con las realizadas por periodistas. El objetivo es conocer si este tipo de textos cuentan con los mismos estándares de calidad de los que disponen las crónicas realizadas por periodistas. Metodología: La metodología empleada parte del análisis de contenido de 28 crónicas periodísticas, 14 escritas por periodistas y 14 escritas mediante programas de inteligencia artificial. En todos los casos se trata de crónicas deportivas sobre partidos correspondientes a la Liga de Fútbol Profesional (de primera y segunda división). Resultados: Los resultados demuestran que los textos generados mediante la inteligencia artificial son eficaces a la hora de recoger y ordenar datos, así como dar a conocer las acciones del juego. Sin embargo, carecen de muchas de las cualidades de la crónica deportiva, presentes en los textos firmados por periodistas. Conclusiones: Las crónicas producidas por la inteligencia artificial no suponen un aporte de calidad al género periodístico, careciendo de carácter analítico o interpretativo, en ambos casos, cualidades presentes tradicionalmente en el periodismo deportivo.

PALABRAS CLAVE: inteligencia artificial; periodismo deportivo; producción periodística; crónica; calidad; periodismo; automatización de contenidos.

(Rubén Ramos Antón*) This author receives funding from the European Regional Development Fund (ERDF) call 2018/11744 (Luis Mauricio Calvo Rubio**) This author receives funding from the European Regional Development Fund (ERDF) call 2020/3771.

How to cite this article / Normalized reference

Murcia Verdú, F. J., Ramos Antón, R. y Calvo Rubio, L. M. (2022). Análisis comparado de la calidad de crónicas deportivas elaboradas por inteligencia artificial y periodistas. Revista Latina de Comunicación Social, 80, 91-111. https://doi.org/10.4185/RLCS-2022-1553

Translation by Paula González (Universidad Católica Andrés Bello, Venezuela)

Keywords

artificial intelligence, sports journalism, journalistic production, chronicle, quality, journalism, content automation

Introducción

Artificial intelligence has been incorporated into the dynamics of the media in recent years, coinciding with a time of widespread demand for content, in the full digital explosion. Although the very concept of artificial intelligence dates back to Alan Turing, in the middle of the last century, its possible application to journalism was not considered until the first decade of this century (Matsumoto et al., 2007). Its incorporation into the professional dynamics of the media represents an opportunity, as has happened with other elements of the digital ecosystem, such as social networks, mobile phones, or online video (Cornia et al., 2016). Likewise, the so-called data journalism, despite responding to an unspecific definition (Broussard, 2018, p. 91), also arises as a result of recent technological advances in newsrooms.

The incorporation of artificial intelligence into journalism, besides freeing journalists to dedicate themselves to other tasks and favoring the production of articles much more quickly (Sim and Shin, 2016), also raises many questions. Among them, Carlson (2015) identifies some that encompass issues as important as the style of writing and the production of the news. This technology is useful in generating routine news on repetitive topics for which clear, precise, and structured data is available (Graefe, 2016).

Its presence in journalistic newsrooms has aroused academic interest from various perspectives (Calvo- Rubio and Ufarte-Ruiz, 2021; Parrat-Fernández et al., 2021; Túñez, 2021). Thus, the possibilities it offers to combat disinformation (Manfredi and Ufarte, 2020), the profound changes that it will entail, even in the design of newsrooms (Ali and Hassoun, 2019), or the opportunity it represents for the media, at a time when they need a rapid generation of content with lower production costs (Montal and Reich, 2016) have been analyzed.

Despite being considered a human activity, some authors (Miroshnichenko, 2018) claim that it is already possible for robots to carry out journalistic work to the point of displacing professionals. The research carried out by Clerwall (2014) showed that readers were not able to differentiate news created by journalists from those produced by artificial intelligence systems. In the case of Graefe et al. (2016), their study found that readers gave greater credibility to news written by artificial intelligence that addressed topics related to football or the economy. Regarding the structure and writing of texts, the work of Túñez-López, Toural-Bran, and Valdiviezo-Abad (2019) does not find differences between those produced by humans and artificial intelligence, despite those that can be found in the approach and journalistic genre. A perspective that is contrary to the one adopted by Diakopulos (2019), who claims the necessary human presence in what he defines as “algorithmic media” (p. 339), or Murcia and Ufarte (2019), who warn about the dangers that can be attached to journalistic quality if this practice becomes generalized.

The relationship between artificial intelligence and human beings in newsrooms, as is the case in general with any technological advance in society, is subject to continuous negotiation (Lewis, Guzman, and Schmidt, 2019). The implementation of these techniques forces us to relocate the role played by journalists and, from an ethical perspective, the media to be transparent with the use of this type of tool. Readers have the right to know if a piece is written by a journalist or by an artificial intelligence program (Ufarte, Calvo, and Murcia, 2021). Lemelshtrich (2018) argues that to compete with artificial intelligence, human journalists must think differently, experiment with new forms of expression to tell their stories or have a deeper multidisciplinary training in arts and philosophy, in the search for more creative and innovative solutions (p. 26). That is, delve into the most human qualities of journalism while becoming familiar with the possibilities offered by new technologies. Following this line, the study carried out by Túñez, Fieiras, and Vaz-Álvarez (2021), compiling the vision of academics and company managers in the Spanish sphere, indicates that the automation of journalism may acquire more importance in the coverage of topics with structured data than in local events, for which the encouragement of contributions from journalists is suggested.

Sports journalism, artificial intelligence, and quality

Despite its relevance in contemporary media, sports journalism has not been characterized as being an object of analysis, far from the treatment that other areas of the media industry usually receive (English, 2016).

At a technological level, sports journalism has known how to innovate, incorporating advances such as data journalism into its chronicles and retransmissions (Rojas, 2014; Horky and Pelka, 2017) or the inclusion of 360º video to the usual narrative of journalistic coverage. (Martínez and Torrado, 2017, p. 154). Not surprisingly, as Boyle and Haynes (2009) point out, one aspect that has traditionally characterized the culture of sports fans has been the constant need for speed in the transmission of information, something that the incorporation of these technologies favors (p. 183). This need for sports- related content has not only involved the media but also sports institutions, which have also deployed different strategies on social networks to strengthen their bond with fans (Thompson et al., 2017).

In 2010, artificial intelligence projects linked to the production of sports news began to be developed (Bunz, 2010). In the Spanish sphere, it has been incorporated into the newsrooms for documentation or the preparation of chronicles, such as those carried out by BeSoccer (Segarra-Saavedra, Cristòfol, and Martínez-Sala, 2019) or AnaFut in El Confidencial (Rojas and Toural, 2019). In both cases, the opportunity that the use of this technology represents for the future of journalism is highlighted, in an area in which data analysis is so standardized.

Along the same lines, Galily (2018) defends that artificial intelligence applied to sports journalism may involve the incorporation of a greater number of professionals, destined for the generation, collection, and management of data to feed the generation processes of artificial intelligence. Technological innovations also imply the alteration of the traditional role of the sports journalist, with the professional taking on the role of moderator rather than mediator (Perreault and Bell, 2020) or focusing on the search for journalistic stories about the competition or its protagonists (Rojas, 2019).

Traditionally, sports journalism has been accompanied by a perception of a lack of seriousness in contrast to that of other types of fields (English, 2017). In research carried out in Chile, Scherman and Mellado (2019) detected that sports journalism uses fewer sources of information, integrates fewer points of view, and presents less verifiable information to the public. Similar results were found in the research carried out in Spain and Mexico by Márquez-Ramírez and Rojas (2017) on the informative treatment of one of the biggest corruption cases in world soccer, the so-called FIFA Gate. In the Spanish case, this research found that the level of proactivity of El País doubled that of the newspaper Marca. However, changes in professional practices are being seen throughout the world (Weedon et al., 2016; Goikoetxea and Ramírez, 2020), which could result in the search for models more focused on quality.

Among the characteristics that sports language brings together is passion (Naranjo de Arcos, 2011, p. 243; Loaiza, 2018, p. 221) or the habitual use of warlike or epic phrases, especially visible in some genres, such as the case of the journalistic chronicles (Arroyo and García, 2012; Martínez, 2018), where from an early moment many of the usual techniques in the so-called new journalism were tried out (Sánchez and Armañanzas, 2009).

The chronicle is usually understood as one of the most relevant genres of sports journalism, coming to be described as its “queen” (Sobrados, 2009, p. 82), with an outstanding role in sports such as soccer. A genre that stands out for its richness, which is prolific in the use of literary resources (Dauncey and Cooke, 2020) and rhetorical figures, such as metaphors (Kovljanin, 2018), anaphora (Quintero, 2015), hyperboles (Quintero and Hernández, 2019), or personifications (López, 2019).

It is a creative story where the author’s style is appreciated (Sobrados, 2009, p. 90) and which drinks from the hybridization of current journalism, in which the chronicler, beyond informing, aims to delight the reader (Román, 2015). A genre considered an “escape” by Marín (2000), which gives “free rein, without exaggeration, to the cultural and literary virtues of sports journalism.” The nature of the chronicle also makes it possible to stimulate subjectivity and attachment to certain values as a mechanism for the cohesion of readers and hobbies (Naranjo de Arcos, 2011, p. 347).

Although the academy has addressed these issues from various fields, there has been no proliferation of works that simultaneously address sports journalism, artificial intelligence, and quality, so this research aims to provide knowledge on the subject.

Objectives

This research arises with the following general objective:

• To compare the journalistic quality between the sports chronicles prepared by artificial intelligence systems and those arising from the hand of a journalist.

Furthermore, it has two specific objectives that are extracted from the general objective:

• Determine the ability of artificial intelligence to reproduce the characteristics of sports

chronicles.

• Establish the characteristics that define the chronicles elaborated through algorithms.

To help achieve these objectives, a series of research questions have been raised, which we will try to answer. They are the following:

P1. Do the automatically produced chronicles meet the characteristics of journalistic chronicles? P2. Can artificial intelligence reproduce the passion/emphasis that characterizes sports chronicles?

P3. Do the chronicles elaborated by algorithms maintain evaluative elements to be able to fit them into the genres of interpretation?

The starting hypotheses are the following:

H1. The chronicles elaborated by artificial intelligence systems do not reach the quality of the chronicles made by humans as they do not comply with the characteristics of the genre.

H2. The lack of interpretation is one of the obstacles that limit the quality of the chronicles produced by machines.

Methodology

To develop the fieldwork, content analysis techniques have been used which, according to Bardin (2002, p. 32), allow “to obtain indicators (quantitative or not) through systematic and objective description procedures of the content of the messages, allowing the inference of knowledge relative to the conditions of production/reception (inferred variables) of these messages”.

As a first step, the object of study was defined (Abela, 2002): journalistic chronicles of the same soccer match elaborated by an artificial intelligence system and by a journalist. The sample has been made up of 28 sports chronicles: 14 prepared by artificial intelligence systems and the same number by journalists. The size of the sample is legitimized by resorting to the concept of saturation. The analysis has led us to reach a point where increasing the cases does not provide significant information regarding the stated objective (Berteaux, 1980; Callejo 1998).

The selection has responded to convenience criteria. The authors have contacted Narrativa, a Spanish company that emerged in 2015 that has become a benchmark in natural language generation systems, which has provided 14 chronicles of football matches created by algorithms and published in the media. Half of these texts correspond to LaLiga Santander (first division) and LaLiga Smartbank (second division) matches. In the first case, the matches were held between October 2nd and 3rd, 2021. In the second, they correspond to the period between September 19th and 26th, 2021.

To compare each of these pieces, chronicles of the same matches published in provincial and national media and written by journalists have been located. Following these criteria, the sample has been made up as follows:

Table 1: Journalistic pieces that are part of the sample.

Artificial intelligence

Author

ID

Publication date

Media

Match

Category

Media

Publication date

ID

1

2/10/21

Encancha.cl

Cádiz-Valencia

Diario de Cádiz

2/10/2

15

2

3/10/21

Encancha.cl

Granada-Sevilla

Granada Hoy

3/10/21

16

3

2/10/21

Encancha.cl

Osasuna-Rayo Vallecano

Diario de Navarra

2/10/21

17

4

2/10/21

Infobae

Atlético de Madrid - F.C. Barcelona

As

3/10/21

18

5

3/10/21

Infobae

Espanyol-Real Madrid

Mundo Deportivo

3/10/21

19

6

3/10/21

Infobae

Getafe-Real Sociedad

ElDesmarque

3/10/21

20

7

2/10/21

Infobae

Mallorca-Levante

Última Hora

2/10/21

21

8

19/09/21

Sport

Huesca-Fuenlabrada

Marca

19/09/21

22

9

26/09/21

Sport

Valladolid-Alcorcón

Marca

26/09/21

23

10

12/09/21

Sport

Ponferradina-Almería

Diario de Almería

12/09/21

24

11

12/09/21

Sport

Oviedo-Cartagena

La Voz de Asturias

13/09/21

25

12

26/09/21

Sport

Leganés-Mirandés

As

26/09/21

26

13

13/09/21

Sport

Lugo-Huesca

La Voz de Galicia

14/09/21

27

14

26/09/21

Sport

Tenerife-Valladolid

El Dorsal

12/09/21

28

Source: Own elaboration.

Each of these chronicles was established as a recording unit.

To assess the quality of the journalistic chronicles, an analysis sheet was prepared, made up of 11 dimensions and 20 variables. The selection of variables is based on the Journalistic Added Value (VAP for the acronym in Spanish of Valor Agregado Periodístico), an instrument developed by researchers from the School of Journalism of the Universidad Católica de Chile that forms one of the most solid attempts to evaluate the quality of journalistic content (Alessandri et al., 2001). In the words of Pellegrini and Mujica (2006, p.15), the VAP is “what the medium adds to the information that the public could directly obtain”. This methodology is based on quality studies of journalistic works in different fields fields (Rodríguez, 2012; García, 2018; Pérez, 2013; Pérez and Luque, 2004).

To a selection of the VAP variables, others related to the characteristics of sports chronicles have been incorporated. The result has been an analysis sheet with the following variables and coding.

Table 2: Analysis sheet

Dimension

Variable

Definition

Identification

Date

Publication date

Media

Media in which it was published

Drafting

Authorship of the piece (artificial intelligence/human)

Sources

Documentary

Presence of documentary sources (yes/no)

Personal

Presence of personal sources (yes/no)

Narrative structure

Narration type

Type of structure that the text follows to narrate the match: inverted pyramid -from the most relevant to the least relevant-, chronological -account of events following a time order-, and mixed -use of characteristics of the two previous formats-.

Style

Number of evaluative adjectives

The number of adjectives that incorporate qualities to the noun they accompany and that imply an assessment of the author.

Number of attribution verbs

The number of verbs used to indicate who is the author of a quote (affirmed, regretted, said, etc.)

Use of literary resources

The presence of rhetorical figures is used to increase the expressiveness of the text (metaphors, hyperbole, metonymy, etc.) Three sections are established according to the amount of these resources: little (less than five figures in the text), average (between six and ten), and a lot (more than 11)

Word frequency

Most used terms

Words that appear most frequently in the text

Content

Number of verifiable data

The total amount of data that can be verified with other sources. It is the sum of the following three variables.

Number of verifiable data-background

Amount of data that can be verified with other sources and that are linked to circumstances that occurred before the events that are narrated

Number of verifiable data-consequences

Amount of data that can be verified with other sources and that are linked to future events or circumstances related to the events that are narrated

Number of verifiable data-consequences

Amount of data that can be verified with other sources and that are linked to the development of the events that are narrated

Journalistic observation

Presence of evaluation elements of the author in the text.

Headline

Type of headline

Typology of the headline used in the piece: informative -they explain what happened concerning the protagonist-, expressive -they seek the impact on the reader through emotions-, or appellative -they seek to attract the reader’s attention and do not provide information about the fact itself-.

Multiperspective

Presence of different points of view

Presence in the text of different versions of the same event. It can take the following values: a single point of view, a point of view regarding another version, or a mix of points of view.

Bias

Clear orientation of the text in favor of one of the participating teams (Yes/No)

Understanding

Level of understanding

Easiness to understand what happened through the text. It can adopt the following values: It is not understood - when the text does not allow to determine what happened; No, for the most part -their understanding presents many difficulties-; Yes, mostly -it is possible to easily determine what happened, although some parts are confusing-; Yes, totally -the text is clear and allows us to know exactly what happened-.

Strength

Emphasis

The attitude of the narrator is characterized by the force in the expression, the intensity, and the emotion. It can adopt the following values: Opinion -when the emphasis is based on the personal vision of the narrator-, Speculative -the creator of the text relies on unfounded ideas or thoughts-, and factual -when he sticks to the facts-.

Source: Own elaboration.

To avoid coding bias, the analysis was performed by three coders simultaneously, which can be understood as a “triangulation of researchers” that provides validity and reliability to the results (Martínez, 2006).

For the analysis of the frequency of words, the Nvivo 11 tool has been used. All the data has been dumped into tables to facilitate its analysis.

Results/Discussion

Use of sources

In the selected chronicles, both those made by algorithms and those written by humans, no documentary sources appear. The immediacy of digital media encourages chronicles to be published imminently and there is no time for meditation to consult other types of sources, archives, or sources of a match, such as the referee’s minutes, to delve deeper into some causes of the expulsion of players and other reasons.

The use of personal sources in the selected chronicles prepared with artificial intelligence or in those written by journalists is not usual. In this section, only one personal source appears in the Cádiz- Valencia match, and it is related to a precedent (ID-15). In this case, the only text that has this type of font has human intervention and retrieves sports news from throughout the week, something that artificial intelligence does not use in its texts.

Narrative structure

The narrative structure is significant if the data is crossed with the type of writing of the chronicle (AI/ Human). The use of statistics means that the artificial intelligence of Narrativa has a mixed distribution of the text in 100% of its writings, where important elements appear at the beginning of the text (inverted pyramid), but later it has paragraphs that are chronologically described. Furthermore, it concludes most of them with important data such as warnings and the general classification of La Liga.

On the other hand, the type of structure changes substantially when the text is written by an editor. Of the seven texts corresponding to the first division, five of them are written chronologically, 1 uses the inverted pyramid, and one has a mixed organization. On the other hand, 100% of the second division chronicles are written chronologically. Therefore, considering the global corpus of texts written using the human factor, 85.71% are written chronologically, 7.14% use the inverted pyramid, and the other 7.14% present a mixed narrative.

Style

The style of the journalist is what will determine the greatest differences between the texts written by artificial intelligence and those that are thought by a journalist. The evaluative adjectives, the attribution verbs (in the texts in which personal sources appear), and the rhetorical figures make up an interpretive addition that enriches the text and that is a common feature in the pieces written by humans.

The quantification of evaluative adjectives is an important element of interpretation in sports chronicles. If all the adjectives of the pieces elaborated by algorithms are added, the number of eight is reached, 4.47% of the adjectives counted, and an average of 0.57% per piece. A value well below the adjectives used in journalists’ writings where the sum shoots up to 171, which represents 95.53% of the whole, and an average of 12.21 per publication.

It is also significant that the change of category supposes a substantial increase in the adjectives made by journalists. While in the first category 101 are counted, in the second it is reduced to 70. The average number of evaluative adjectives in chronicles of the highest category is 14.42 and represents 59.06% of the attributions of this type of writing. However, the evaluative presence in the silver division is reduced with an average of ten adjectives per publication of journalists and 40.94% of the use of adjectives. The length of the chronicle, the greater or lesser degree of bias, and proximity (local or national environment) influence when using more assessment in the text. Some of the adjectives that can be read throughout the texts prepared by editors are “lethal” (ID-22), “unleashed” (ID-23), “debatable” (ID-24), “bittersweet” (ID -20) “shameless” (ID-16), “disastrous” (ID-19), and “sharp” (ID-28), among others. On the other hand, in those made by algorithms, the one that is repeated the most is “positive” (ID-13), and the adjectives “correct” and “disputed” also appear (ID-10).

Attribution verbs are directly related to personal sources. In this selection no significant attribution verbs appear, only in the Valencia-Cádiz match can one of them be read. Specifically, “announced” (ID-15) is the verb used for the use of a personal source. In this case, the text is written by a journalist. It is a resource that is not used too much in the selection of the global corpus, but that is indicative of the use of knowledge and monitoring of the topic.

The use of literary resources is an indicator that serves to differentiate between the human factor and artificial intelligence. When it comes to quantifying the numerical quantity of the sample: little (0-5), average (6-10), and a lot (>11), it can be seen that the texts written by humans predominantly include figures of speech. In this sense, no text elaborated by algorithms exceeds the three uses of rhetorical figures. If the total number of rhetorical figures that appear in the 14 texts written with artificial intelligence is added, nine are counted and the average per text would be 0.64, therefore, its interval of use is between zero and five (little). On the other hand, the texts written by humans incorporate 193 figures of speech, so the average is 13.14 resources in writing and the use would be placed in the range of >11 (a lot). Regarding the global corpus, the number of rhetorical figures used in texts produced by artificial intelligence only represents 4.66%, so those written by journalists represent the remaining 95.34%.

If broken down by categories, 131 rhetorical figures of the texts written by journalists correspond to the division of honor of Spanish football, with an average of 18.71 (a lot) per text and they appear in 71.20% of texts of this nature. In those related to the lower category, the number of figures of speech used is limited to 53, which means an average of 7.57 per text (6-10, average use) and 28.80% of the total number of used literary resources. Some examples of rhetorical figures that appear in texts written by humans are: anaphora such as “and chew and chew” (ID-18), personifications such as “Granada CF is already smiling” (ID-16), hyperboles such as “the stadium and the celebration of the pichichi exploded” (ID-19), similes like “Pedrosa who arrived like a motorcycle” (ID-19), metaphors like “they turned the grass of the Metropolitano into a labyrinth of mirrors” (ID-18), and metonyms like “they were locked with a yellow to Merino” (ID-20), among other resources and examples.

Word frequency

The style of the journalist is what will determine the greatest differences between the texts written by artificial intelligence and those that are thought by a journalist. The evaluative adjectives, the attribution verbs (in the texts in which personal sources appear), and the rhetorical figures make up an interpretive addition that enriches the text and that is a common feature in the pieces written by humans.

When establishing a word frequency, those terms that are empty of content or that, in this context, do not add value to the study, such as conjunctions or links, nationalities and names of football players, cities, and football stadiums, have been ruled out, among others. It must be taken into account that the same chronicles have been analyzed, some written by artificial intelligence (14) and others by journalists (14). Words with three or fewer letters have also been discarded.

The word frequency of the global corpus of texts written by artificial intelligence shows that many of its concepts are used very repeatedly in the journalistic texts created by the Narrativa software. This is the case of the terms “team/s”, which is repeated 61 times and occupies 1.40% of all the words used in these journalistic pieces; “match/es”, with a repetition of 53 times and 1.21% frequency; point/s, with 49 uses and 1.12%; and “part”, with 48 repetitions and 1.10%, which is the last word that exceeds 1% presence in journalistic texts written using artificial intelligence. Therefore, the greater use of some words prevails over others and there is no balance in the use of terms.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/3d082c9e-0bf5-4816-b96e-c509c7f3cef5image6.jpeg
Figure 1: Cloud of words most used in texts written with artificial intelligence.

Source: Own elaboration with the Nvivo 11 tool.

Quite the opposite happens with texts written through human labor. Although there are words that are the most used due to the very need to narrate a football match, their frequency is much more widely distributed. The two words that appear most often in the texts written by journalists, “minute/s” and “first/s”, are used in 0.45% of the 14 sports chronicles, on a total of 40 occasions. It is followed by “ball” and “match/s”, both with 36 uses and occupying 0.40% of the corpus.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/3d082c9e-0bf5-4816-b96e-c509c7f3cef5image7.jpeg
Figure 2: Cloud of words most used in texts written by journalists.

Source: Own elaboration with the Nvivo 11 tool.

The words that coincide in the list of the 20 most used are: “team/s”, “match/es”, “point/s”, “part”, “minute/s”, “local/s”, and “arrived”. Therefore, of the 20 most frequent words of each of the writing methods, only these seven are identical. On the other hand, the most repeated terms that appear in the texts written with artificial intelligence and that do not fall within the most used by the pieces written by journalists are: “result”, “marker”, “against”, “visitor”, “set”, “three”, “stadium”, “win/s”, “match”, “front”, “card/s”, “last/s”, and “home”. In the case of the words that appear exclusively in texts with authorship, they are: “first/s”, “ball”, “game”, “area”, “little”, “centre/s”, “out /s”, “new”, “first”, “yellow/s”, “second”, “final/s”, and “had”.

Table 3 : The 20 most used words in texts written with artificial intelligence and by journalists. Own elaboration.

Artificial intelligence

Count

%

Author

Count

%

Team/s

61

1,40

Minute/s

40

0,45

Match/es

53

1,21

First

40

0,45

Point/s

49

1,12

Ball

36

0,40

Part

48

1,10

Match/es

36

0,40

Outcome

33

0,76

Team/s

29

0,32

Marker

32

0,73

Game

28

0,31

Against

27

0,62

Area

25

0,28

Minute/s

26

0,60

Arrived

21

0,23

Visitor/s

26

0,60

Point/s

21

0,23

Set

24

0,55

Little

20

0,22

Three

24

0,55

Part

19

0,21

Stadium

23

0,53

Center/s

19

0,21

Local/s

22

0,50

Out/s

19

0,21

Win/s

22

0,50

New

17

0,19

Encounter

21

0,48

First

17

0,19

Front

20

0,46

Yellow

16

0,18

Card/s

18

0,41

Local/s

16

0,18

Last/s

17

0,39

Second

16

0,18

Home

16

0,37

Final/s

15

0,17

Arrived

16

0,37

Had

15

0,17

Source: Own elaboration.

In this list, a metonymy can be seen in the texts written by journalists where they use “yellow” without specifying that it refers to “card/s”. The term “yellow” appears among the 20 most used, while “card/s” is among the most repeated. However, in the writing of journalistic pieces with artificial intelligence, the opposite happens and “card/s” is the most frequent term and “yellow” does not appear among the most used.

Word frequency can also be broken down by division (first with seven texts and second with ano- ther seven) and by elaboration (artificial intelligence and human intervention). The five most repeated words in first division texts written by journalists are: “ball” (24, 0.46%), “minute/s” (24, 0.46%), “match/es” (23, 0.44%), “team/s” (16, 031%), and “arrived” (15, 0.29%). On the other hand, if the first division journalistic pieces are created by artificial intelligence, the frequency would be as follows: “team/s” (32, 1.69%), “point/s” (26, 1.37%), “match/es” (24, 1.27%), “part” (21, 1.11%), and “marker” (19, 1%).

The frequency of words in the silver category of Spanish football also has its particularities in the use of terms. In this way, the most repeated terms in the second division texts elaborated by editors are: “game” (18, 0.48%), “minute/s” (16, 0.42%), “team/s” (13, 0.34%), “match/es” (13, 0.34%), and “ball” (12, 0.32%). For the second division texts made using the Narrativa software, the terms with the most uses are: “team/s” (29, 1.17%), “match/es” (29, 1.17%), “part” (27, 1.09%), “point/s” (23, 0.93%), and “result” (17, 0.69%).

The texts that were written for the first division and second division, but with their own authorship, coincide in 4 of the five most used words and differ only in one term, “arrived” for the gold category and “game” for the second. This same analysis carried out in writings elaborated with artificial intelligence has a coincidence of four of the five most frequent words. It only varies in one word in the first division (“marker”) and another in the second division (“result”).

There is one more difference if you compare the first division texts written by human intervention and artificial intelligence. Of the five most frequent words, three coincide and differ in two: “ball” and “arrived” for texts with authorship and “point/s” and “marker” for those pieces created by algorithms. For its part, for the second division, the coincidence is only two of the five words. Therefore, there are three words with more usability depending on the procedure when writing the information: “game”, “minute/s”, and “ball” in the texts written by journalists, and “part”, “point/s”, and “result” in the pieces created by the software.

Content

Verifiable data

Data is a fundamental part of artificial intelligence and with the sum of all verifiable data (antecedents, consequences, and facts), 184 uses are reached, while those written by humans reach 128. In the first division, the chronicles are more evenly matched in their use of verifiable data, with 91 using artificial intelligence and 77 using the human factor. In the second division, the use is less even, with 93 in writings prepared by artificial intelligence, and 54 by journalists.

The verifiable data that speaks about the antecedents is more used in artificial intelligence, with 38 appearances (19 in the first and 19 in the second) than, in texts prepared by journalists, with seven (three in the first and four in the second). It is evident that artificial intelligence uses its databases related to previously played games, streaks of results, and previous position in the classification for the writing of its journalistic texts. Some examples would be the following: “Sevilla won in their last two matches of the competition against Espanyol in their fiefdom and Valencia in their stadium, by 2-0 and 3-1” (ID-2); “accumulating a total of seven defeats in the competition” (ID-6); and “Almería had just beaten Málaga 2-0 in their fiefdom in the last match held” (ID-10), among others.

When counting the number of verifiable data that deal with the consequences in texts prepared by artificial intelligence, 54 appearances have been identified, both in first-class matches (28) and in second-class matches (26). However, the chronicles made by humans add less than half of the appearances compared to the algorithms, with 22 appearances, ten in the first and 12 in the second. In this way, a greater use of the databases is appreciated to build their sports chronicles with the climb or descent in the classification and the points accumulated after the match, among others. All the texts made with algorithms have this type of data, such as: “the team from Cadiz is fourteenth after the end of the match” (ID-1); “With this result, Mallorca is left with 11 points” (ID-7); and “the chicharrero team is second, while Valladolid is ninth at the end of the match” (ID-14).

The number of verifiable data of the events that occurred during the match presents equality in general terms. While in the writings prepared by journalists a total of 99 facts have been counted, in those created by algorithms they add up to 92. The main difference is found in the first division because the texts elaborated by humans include 61 facts, while those of artificial intelligence reach 44. Now, in the second division, the figures are more even, with 48 uses in artificial intelligence and 38 in texts written by journalists. Verifiable data appears in all texts to a greater or lesser extent, such as: “Lugo won 3-2 against Huesca” (ID-13), “after beating Mirandés in Anduva (1-2)” (ID -26), and “a goal from Baba at 75 minutes” (ID-7), among other data.

Journalistic observation

The journalistic observation does not appear in any text elaborated by algorithms, but it is shown in many of their texts phrases that are repeated such as “he faced with enthusiasm”, “he wanted to improve his situation”, and “with reinforced spirits”. For their part, the publications made by journalists all have journalistic observation, which is directly related to adjectives.

Headline

Artificial intelligence works with data and can be seen in the headlines of its journalistic texts. All its writings use titles of an informative type, with a simple structure and with the result of the encounter as a common denominator (in most of them). Instead, headline types vary if they are written by journalists

and depending on the category. In the first division, four would be informative, two appellatives, and one

expressive. On the other hand, in the second division, six of them are informative and one is appellative.

Presence of different points of view and bias

Regarding the presence of one or more points of view, there is complete homogeneity in the results obtained and all the texts selected from the corpus only have a single point of view.

Bias is a common feature in sports chronicles. While in the texts elaborated with artificial intelligence the bias is not appreciated, in those that are written by humans this characteristic predominates. Of the seven matches, six of them do have a bias, while the Mallorca-Levante match, published on the website Ultimahora.es, is written objectively towards the two teams. In this sense, in the second division, the trend is repeated in the writings created by journalists, there are six that do have it, and one of them that does not have this feature, specifically the Valladolid-Alcorcón match, published in the online version of the newspaper Marca.

Understanding

Another of the points analyzed is the comprehension of the texts. All the writings are understandable, both those that are made by artificial intelligence and those that are written by the journalist. One of the main characteristics of journalism is clarity and this element is fulfilled in all the selected pieces.

Emphasis

The emphasis that the journalist brings to the sports chronicles serves to determine the intervention of the editor in the text. In this case, 100% of the pieces produced by algorithms have a factual emphasis, that is, they are based on facts. On the other hand, the emphasis is on opinion in 12 of the human- written texts, while another emphasizes opinion and speculation together. The remaining piece presents a factual emphasis.

Discussion and conclusions

As has already been pointed out above, there are not many studies that analyze the quality provided by artificial intelligence in sports journalism. From a more general perspective, the study carried out shows that in the line pointed out by Lemelshtrich (2018, p. 19), technological development is still limited when writing quality journalistic stories. Although the linked industry sets itself the goal of improving expressiveness or using metaphors (Goldberg, 2013), these have not been achieved, at least in the sample analyzed in this research.

Contrary to what was argued by Miroshnichenko (2018), the quality of human journalists is not over- valued compared to artificial intelligence, taking into account the data of this research. As this author defends, algorithms can dominate newsrooms for economic reasons, but it has not been proven that they do so because they are capable of producing better texts today.

Coinciding with what Rojas and Toural (2019) pointed out, the future will determine how the natural language generated by artificial intelligence is capable or not of achieving a greater elaboration in the texts that allow it to match that of professional journalists.

The comparative analysis of the 28 sports chronicles that make up the sample of this study allows us to draw a series of conclusions. In the first place, the presence of evaluative adjectives is substantially

higher in texts written by journalists than in those created by artificial intelligence. Therefore, the interpretation, description, and assessment by the editor suppose an addition of quality in his texts compared to those elaborated by algorithms (H2/P1/P3). On the other hand, the greater use of rhetorical figures is also related to the texts written by journalists. It is another of the elements that help the narration and description of sports chronicles, as well as being essential to identify the style of the journalist. Therefore, the pieces created with artificial intelligence lack a narrative closer to what happened in the soccer match (H1/P1/P3). The writings of first division matches and written by journalists include more rhetorical figures and adjectives, which indicates that it is content that has required more production work, especially in terms of style and language.

Another of the characteristics of sports journalism that can be seen in the large number of adjectives in texts written by humans is passion. Therefore, the writings elaborated using artificial intelligence lack this identifying element of sports chronicles (H1/P2). Furthermore, the emphasis on the texts made by journalists serves as an indicator of passion in sports chronicles. Most of them have the value of opinion, so they provide interpretation and assessment to the chronicles, something that the pieces created by Algorithms lack (H2/P2/P3).

Furthermore, bias is characteristic of texts written by humans and is related to journalistic observation. This element depends on the sports or local trend of the medium. On the other hand, from the analysis of the frequency of words in the texts, it appears that the texts made by artificial intelligence repeat concepts to a greater extent than those made by journalists. In this sense, the pieces elaborated with algorithms contain greater redundancies and less richness of lexicon.

The number of verifiable data is much higher in texts written with algorithms than in those written by humans. Therefore, the use of databases serves to offer much more complete statistical information than journalists. Moreover, the data on antecedents and consequences, which also predominate in publications on artificial intelligence, help to give important information to the reader and contextualize the facts. Regarding the distribution of the chronicles, the journalistic texts elaborated using algorithms have a mixed predetermined structure, without paying attention to what each type of chronicle would need. However, in the writings prepared by journalists, a chronological narration predominates, detailing each incident of the game of news value. Instead, in artificial intelligence, only the data appears.

From this analysis, it can be deduced that the application of artificial intelligence to sports journalism, in the case of the chronicles of football matches, represents an important advance in terms of the treatment of the actions or sets of the game, expressed through observable data. However, it does not represent a quality contribution in terms of the analytical character and interpretation typical of journalistic genres with as much tradition as the sports chronicle, limiting itself almost exclusively to the chronological exposition of the events that occurred during a football match.

This interpretative character is shown to be reserved for the chronicles made by journalists, with a greater production in those that correspond to the matches of the first division (LaLiga Santander). However, one of the limits of this research lies in the possibility that logarithmic applications will be developed in the future, which allows these functions to be developed also in texts generated by artificial intelligence. A possibility not observed in the chronicles analyzed in this research.