-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrieg2023.tex
3357 lines (3034 loc) · 164 KB
/
rieg2023.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
12pt,
a4paperpaper,
]{article}
\usepackage{amsmath,amssymb}
\usepackage{setspace}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else
% xetex/luatex font selection
\setmainfont[]{Arial}
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\usepackage[top=25mm,left=25mm,right=25mm,bottom=20mm,heightrounded]{geometry}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{5}
% Make \paragraph and \subparagraph free-standing
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
% definitions for citeproc citations
\NewDocumentCommand\citeproctext{}{}
\NewDocumentCommand\citeproc{mm}{%
\begingroup\def\citeproctext{#2}\cite{#1}\endgroup}
\makeatletter
% allow citations to break across lines
\let\@cite@ofmt\@firstofone
% avoid brackets around text for \cite:
\def\@biblabel#1{}
\def\@cite#1#2{{#1\if@tempswa , #2\fi}}
\makeatother
\newlength{\cslhangindent}
\setlength{\cslhangindent}{1.5em}
\newlength{\csllabelwidth}
\setlength{\csllabelwidth}{3em}
\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing
{\begin{list}{}{%
\setlength{\itemindent}{0pt}
\setlength{\leftmargin}{0pt}
\setlength{\parsep}{0pt}
% turn on hanging indent if param 1 is 1
\ifodd #1
\setlength{\leftmargin}{\cslhangindent}
\setlength{\itemindent}{-1\cslhangindent}
\fi
% set entry spacing
\setlength{\itemsep}{#2\baselineskip}}}
{\end{list}}
\usepackage{calc}
\newcommand{\CSLBlock}[1]{\hfill\break#1\hfill\break}
\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{\strut#1\strut}}
\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{\strut#1\strut}}
\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1}
\let\oldsection\section
\usepackage[font=it,labelfont=bf]{caption}
\usepackage{sectsty}
\sectionfont{\centering}
\subsectionfont{\raggedright}
\subsubsectionfont{\raggedright\itshape}
\usepackage{etoolbox}
\AtBeginEnvironment{longtable}{\small}
\pretocmd{\section}{\clearpage}{}{}
\usepackage{romannum}
\makeatletter
\makeatother
\makeatletter
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\AtBeginDocument{%
\ifdefined\contentsname
\renewcommand*\contentsname{Table of contents}
\else
\newcommand\contentsname{Table of contents}
\fi
\ifdefined\listfigurename
\renewcommand*\listfigurename{List of Figures}
\else
\newcommand\listfigurename{List of Figures}
\fi
\ifdefined\listtablename
\renewcommand*\listtablename{List of Tables}
\else
\newcommand\listtablename{List of Tables}
\fi
\ifdefined\figurename
\renewcommand*\figurename{Figure}
\else
\newcommand\figurename{Figure}
\fi
\ifdefined\tablename
\renewcommand*\tablename{Table}
\else
\newcommand\tablename{Table}
\fi
}
\@ifpackageloaded{float}{}{\usepackage{float}}
\floatstyle{ruled}
\@ifundefined{c@chapter}{\newfloat{codelisting}{h}{lop}}{\newfloat{codelisting}{h}{lop}[chapter]}
\floatname{codelisting}{Listing}
\newcommand*\listoflistings{\listof{codelisting}{List of Listings}}
\makeatother
\makeatletter
\@ifpackageloaded{caption}{}{\usepackage{caption}}
\@ifpackageloaded{subcaption}{}{\usepackage{subcaption}}
\makeatother
\makeatletter
\makeatother
\ifLuaTeX
\usepackage[bidi=basic]{babel}
\else
\usepackage[bidi=default]{babel}
\fi
\babelprovide[main,import]{english}
\ifPDFTeX
\else
\babelfont{rm}[]{Arial}
\fi
% get rid of language-specific shorthands (see #6817):
\let\LanguageShortHands\languageshorthands
\def\languageshorthands#1{}
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same} % disable monospaced font for URLs
\hypersetup{
pdftitle={Socioeconomic Disruption by Artificial Intelligence},
pdfauthor={Fynn Jonas Rieg},
pdflang={en},
colorlinks=true,
linkcolor={black},
filecolor={Maroon},
citecolor={black},
urlcolor={black},
pdfcreator={LaTeX via pandoc}}
\title{Socioeconomic Disruption by Artificial Intelligence}
\usepackage{etoolbox}
\makeatletter
\providecommand{\subtitle}[1]{% add subtitle to \maketitle
\apptocmd{\@title}{\par {\large #1 \par}}{}{}
}
\makeatother
\subtitle{A comparative analysis on labor effects between industries in
the European Union}
\author{Fynn Jonas Rieg}
\date{2023-11-27}
\begin{document}
\maketitle
\setstretch{1.5}
\section*{Abstract}\label{sec-abstract}
\addcontentsline{toc}{section}{Abstract}
Artificial Intelligence technology has seen major breakthroughs in
recent years and is expected to have a significant impact on society.
However, the current literature on the possibly negative effects of AI
on labor is still inconclusive. This paper aims to add to the current
corpus of literature by assessing the relationship between AI innovation
and labor conditions within European industries by looking at European
patent application data and Eurostat's Structural Business Statistics.
The results suggest a decline in the number of employees and their gross
value added for the mining and quarrying industry, positive effects for
labor productivity and gross value added in the information and
communication industry, and mixed effects for the manufacturing
industry, with the number of enterprises and labor costs rising and wage
adjusted labor productivity declining. However, the majority of results
are statistically insignificant. Retrieved data also entail limitations
and need to be interpreted with caution. Consequently, more research is
needed to assess the true relationship between AI innovation and labor
effects.
\newpage{}
\pagenumbering{Roman}
\newpage{}
\setstretch{1}
\renewcommand{\contentsname}{Table of Contents}
\tableofcontents
\newpage{}
\listoffigures
\newpage{}
\listoftables
\newpage{}
\pagenumbering{arabic}
\setstretch{1.5}
\section{Introduction}\label{sec-introduction}
In the last few years, Artificial Intelligence has seen major
breakthroughs in its capabilities and applicable domains
(\citeproc{ref-michael_l_littman_gathering_2021}{Michael L. Littman et
al., 2021, p. 12}). The popular AI chatbot ChatGPT has set a historical
record in its user acquisition pace (\citeproc{ref-hu_chatgpt_2023}{Hu,
2023}), and internet searches for the term ``AI'' are on an all time
high (\citeproc{ref-google_google_2023}{Google, 2023}). This trend has
also arrived in the scientific community, with AI related papers
exploding in popularity in recent years
(\citeproc{ref-catherine_cheung_growth_2022}{Catherine Cheung et al.,
2022}). However, undeniably, the introduction of new technology, this
time, Artificial Intelligence, does raise concerns about its potential
implications on various aspects of society
(\citeproc{ref-gries_artificial_2018}{Gries and Naudé, 2018, p. 1}; see
\citeproc{ref-joint_research_centre_artificial_2018}{Joint Research
Centre, 2018, p. 77}; \citeproc{ref-lu_review_2021}{Lu and Zhou, 2021,
p. 1055}). And even OpenAI's co-founder and chief scientist Ilya
Sutskever admits that ``for every positive application of AGI there will
be a negative as well''\footnote{AGI stands for Artificial General
Intelligence, which is a form of AI that is capable of performing any
intellectual task a human can (\citeproc{ref-naude_race_2019}{Naudé,
2019, p. 4}).} (\citeproc{ref-sutskever_exciting_2023}{Sutskever,
2023}). While AI is not the first technology to raise such concerns
(\citeproc{ref-martens_will_2018}{Martens and Tolan, 2018, p. 5}), the
pace at which AI evolves and advances into various domains is unseen.
Mokyr et al. (\citeproc{ref-mokyr_history_2015}{2015, p. 32}) identifies
two forms of technological anxiety, the fear of labor displacement
through technology and the fear of morally negative applications
resulting in declining welfare. This technological anxiety seems to be
increasing again in recent times, with the majority of the US population
assessing the potential impact of automation as generally unfavorable
rather than beneficial
(\citeproc{ref-anderson_automation_2017}{Anderson, 2017}). Because of
the recent advances in Artificial Intelligence, and its increasing
presence in the media, everyday life, and work, there is a growing need
for research to meticulously scrutinize AI technology's accompanying
concerns to objectively assess its true potential and risks. Given the
seemingly ubiquitous applicability of AI, there is a correspondingly
vast number of possible effects and side effects which AI might induce.
This paper specifically focuses on the aforementioned technological
anxiety of labor displacement. Specifically, Artificial Intelligence's
effects on labor displacement, which in this context also relate to
partial displacements induced by a reduction in labor wages and labor
bargaining power.
The paper is structured as follows: Section~\ref{sec-introduction}
provides an overview about the current literature on automation induces
labor effects. Section~\ref{sec-methodology} introduces the
methodological approach used to assess AI's impact on labor with an
overview of the data sources (Section~\ref{sec-data-sources}), the data
acquisition process (Section~\ref{sec-data-acquisition}) and
preproccessing methods (Section~\ref{sec-preprocessing}), along with the
chosen model (Section~\ref{sec-model}) and its hypotheses
(Section~\ref{sec-hypotheses}). Results are then presented in
Section~\ref{sec-results}, followed by a discussion
(Section~\ref{sec-discussion}) which includes the models' results'
implications (Section~\ref{sec-implications}), important limitations
(Section~\ref{sec-limitations}), and suggestions for future research
(Section~\ref{sec-further-research}). Finally,
Section~\ref{sec-conclusion} concludes the paper and
\nameref{sec-appendix} provides additional tables and figures
accompanying this research.
\subsection{Effects of Artificial Intelligence}\label{sec-effects-of-ai}
Brynjolfsson et al. (\citeproc{ref-brynjolfsson_what_2018}{2018a, p.
46}) found that machine learning affects different types of tasks than
earlier forms of automation. A year later, in a study comparing the
impact of AI on the job market between industries, Webb
(\citeproc{ref-webb_impact_2019}{2019, p. 46}) shows that AI affects
mostly the highly educated workforce and that this group is affected
significantly more by AI than the presence of software or robots. Under
the assumption that the current trend in technological evolution is set
to continue, the speed of labor displacement through technological
innovation is found likely to outpace the speed at which labor can be
relocated (\citeproc{ref-mokyr_history_2015}{Mokyr et al., 2015, p.
43f}.). By constructing impact scores of Artificial Intelligence on
occupations, Felten et al. (\citeproc{ref-felten_effect_2019}{2019})
found low-income occupations to experience a decline in wage growth that
is attributed to the increased presence of AI and middle and high-income
occupations to experience an increase in wage growth (p.~6).
Furthermore, the authors found that occupations with a medium and high
degree of automation (degree of automation being the presence of
automation technologies --- not just AI) positively correlate with
employment when exposed to Artificial Intelligence, while they did not
find any relationship for occupations already exhibiting a low degree of
automation (p.~5). Damioli et al.
(\citeproc{ref-damioli_impact_2021}{2021, p. 14}) linked small and
medium-sized enterprises (SMEs), having previously filed patents related
to AI, to significant increases in labor productivity. The same effect,
however, could not be found once SMEs and large firms were studied
together, nor when only considering large firms (p.14).
It has also been noted that the presence of Artificial Intelligence does
not have a linear impact on labor but depends on influencing factors,
such as price elasticity, complementaries, or elasticity of labor that
govern the implementation of these technologies
(\citeproc{ref-brynjolfsson_what_2017}{Brynjolfsson and Mitchell, 2017,
p. 1533f}.). Additionally, the adoption of AI technology is found to
significantly alter the skill-demand distribution of firms, with the
number of previously highly demanded skills declining while
simultaneously creating demand for new skills
(\citeproc{ref-acemoglu_ai_2020}{Acemoglu et al., 2020a, p. 19}). By
surveying 203 attendees at three AI conferences, Gruetzemacher et al.
(\citeproc{ref-gruetzemacher_forecasting_2020}{2020, pp. 4, 9}) found
attendees, on average, to evaluate 22\% of human tasks being prone to
replacement, with the number rising to 40\% in the next five years.
Researchers have also argued that AI technology can be seen as a new
general purpose technology (GPT) which has implications in every aspect
of society as had other GPTs before, such as the steam engine or
computers (\citeproc{ref-brynjolfsson_artificial_2018}{Brynjolfsson et
al., 2018b, p. 39}). In a meta analysis of the current literature, Lu
and Zhou (\citeproc{ref-lu_review_2021}{2021, p. 1263}) came to the
conclusion that the general consensus among researchers is a definite
concern about AI's implications as well as expected labor displacement,
although unsure about the extent of displacement and whether these
effects are offset elsewhere.
Given the yet small body of empirical literature about the effects of AI
(\citeproc{ref-seamans_ai_2018}{Seamans and Raj, 2018, p. 3}), which is
due to the fact that AI is still a fairly new topic, with real increase
in dominance and interest only seen in recent years
(\citeproc{ref-acemoglu_ai_2020}{Acemoglu et al., 2020a, p. 23f}.), it
is worth noting the effects of previous technologies. The adoption of
machines (specifically often industrial robots
(\citeproc{ref-acemoglu_robots_2020}{Acemoglu and Restrepo, 2020a}; see
\citeproc{ref-graetz_robots_2018}{Graetz and Michaels, 2018})) and
software --- also referred to as computerization
(\citeproc{ref-autor_growth_2013}{Autor and Dorn, 2013};
\citeproc{ref-frey_future_2017}{Frey and Osborne, 2017}; see
\citeproc{ref-pajarinen_computerization_2015}{Pajarinen et al., 2015})
--- have been seen as previous stages in the evolution of automation,
with AI composing the next stage
(\citeproc{ref-acemoglu_harms_2021}{Acemoglu, 2021, p. 19}).
Furthermore, all of these technologies have been summarized under the
umbrella term ``automation'' (\citeproc{ref-mann_benign_2018}{Mann and
Püttmann, 2018, p. 40}) indicating common characteristics and thereby
--- possibly --- common effects.
\subsection{Effects of
Automation}\label{sec-effects-of-automation-on-labor}
In a 2018 study, the introduction of automation technology was found to
have positive effects on employment gains, but only within the same
commuting zone (\citeproc{ref-mann_benign_2018}{Mann and Püttmann, 2018,
p. 26}). These findings contradict the results from Autor et al.
(\citeproc{ref-autor_untangling_2015}{2015, p. 632}), who found no
relation between exposure to automation and employment as a whole but
found a significant decline in employment related to routine tasks in
the non-manufacturing sector (p.~641). Graetz and Michaels
(\citeproc{ref-graetz_robots_2018}{2018, p. 766}) found no relationship
between the usage of industrial robots and net employment. However,
usage of industrial robots was found to lower employment of low-skilled
workers. A later study, also looking at employment effects induced by
usage of industrial robots, found a significant decline of employment as
well as a reduction in wages related to robot exposure within a
commuting zone (\citeproc{ref-acemoglu_robots_2020}{Acemoglu and
Restrepo, 2020a, pp. 2215f, 2218}). Dauth et al.
(\citeproc{ref-dauth_german_2017}{2017, p. 25}) found no relation
between robot exposure and employment in the German market. A few years
later, Dauth et al. (\citeproc{ref-dauth_adjustment_2021}{2021, p.
3126ff}) found robot exposure to lead to within-firm and between-firm
job displacement, with displaced workers having difficulties
reallocating their jobs within the same industry, leading to a migration
of workers from manufacturing (where robot exposure is most present) to
the service sector. They also exhibited that a lack of worker
protections (for example unionization or tenure) is related to greater
displacement. These results were also confirmed by Boustan et al.
(\citeproc{ref-boustan_automation_2022}{2022, pp. 21, 23}) who observed
that displaced workers acquire new skills and concluded job displacement
by automation to be less discernible among unionized and high-skilled
workers. Similarily, Acemoglu and Restrepo
(\citeproc{ref-acemoglu_robots_2020}{2020a, p. 2215f}., 2218) provided
evidence showing automation (adoption of industrial robots) within a
commuting zone (local labor market) relating to significant declines in
employment as well as wages. By studying 53 developing countries, Cirera
and Sabetti (\citeproc{ref-cirera_effects_2019}{2019, p. 172}) did not
find a relationship between exposure to automation and firm level
employment. However, while a net effect on employment was absent, in
line with the aforementioned literature, they did find automation to
alter the composition of tasks and skills within firms (p.~172).
In a purely theoretical approach to the effects of automation on labor,
Acemoglu and Restrepo (\citeproc{ref-acemoglu_low-skill_2018}{2018a, pp.
220, 224}) concluded that automation leads to labor displacement and
that the displacement of low skilled-labor leads to an increase in the
wage gap (pay gap between low-skilled and high-skilled workers) while
the displacement of high-skilled labor is followed by a reduction in the
wage gap as high-skill labor reallocates into medium- and low-skilled
occupations. This reallocation from displaced high-skill labor into
lower skilled occupations has also been shown by Beaudry et al.
(\citeproc{ref-beaudry_great_2016}{2016, p. 21}), who studied the
effects on labor when prices for specific types of labor fall, as is
induced when substitution (through technology) becomes economically
viable. While labor displacement induced by the introduction of
automation is followed by increased inequality between low-skill and
high-skilled labor in the short run
(\citeproc{ref-acemoglu_race_2018}{Acemoglu and Restrepo, 2018b, p.
1519}), the creation of new tasks --- that is followed by increased
productivity gains from automation --- is seen to reduce this gap in the
long run (p.~1521). However, this positive outlook of a net positive on
employment only holds true as long as the productivity effects, which
accompany the adoption of automation technologies, offset the
displacement effects incurred in the first place. And should the offset
be insufficient, automation is found to negatively impact the demand for
labor and its wages (\citeproc{ref-acemoglu_artificial_2018}{Acemoglu
and Restrepo, 2018c, p. 227}). There is also growing evidence suggesting
automation to cause a decline in real wages of low-skilled workers, for
example, Acemoglu and Restrepo
(\citeproc{ref-acemoglu_unpacking_2020}{2020b, p. 360f}.) found strong
relationships between the adoption of automation technology and wages.
Acemoglu and Restrepo (\citeproc{ref-acemoglu_tasks_2022}{2022, p.
1993}) found a relationship between labor displacement and a decrease in
relative wages, concluding automation to cause an increase in wage
inequality (p.~1998). Automation is also attributed to the decline in
the demand for labor in the US over recent decades
(\citeproc{ref-acemoglu_automation_2019}{Acemoglu and Restrepo, 2019, p.
21}).
Furthermore, Arntz et al. (\citeproc{ref-arntz_risk_2016}{2016, p. 14f})
studying 21 OECD countries found 9\% in the US, and 6-12\% across
countries of overall employment to be substitutable for automation,
while Acemoglu and Autor (\citeproc{ref-acemoglu_skills_2011}{2011, p.
61}) came to the conclusion that labor displacement by machines mostly
affects routine tasks.
\subsection{Effects of
Computerization}\label{sec-effects-of-computerization}
In a Finnish study, Pajarinen et al.
(\citeproc{ref-pajarinen_computerization_2015}{2015}) came to the
conclusion that computerization is likely to place high risk of
displacement on 35\% of the Finnish labor market (p.~5), 33\% of
Norwegian labor (p.~5) as well as 49\% in the US (p.~5). Frey and
Osborne (\citeproc{ref-frey_future_2017}{2017, p. 41}) found 47\% of US
employment to have a a high risk suitability for substitution by
computerization. They further classify the process of automation into
two ``waves'' with the first wave affecting routine tasks
(transportation, logistics, office and administration) (p.~41) followed
by a second wave that, once technological obstacles are overcome, will
affect the jobs involving creative or abstract tasks (p.~43). Evidence
also suggests computerization to significantly induce labor displacement
from occupations relying on routine tasks into higher-skilled
occupations as well as low-skilled service occupations
(\citeproc{ref-autor_growth_2013}{Autor and Dorn, 2013, p. 1573})
\subsection{Changes in Occupational
Composition}\label{sec-changes-of-occupational-composition}
Furthermore, it is important to note that previous research on the
effects of robots, software and AI --- that have been summarized under
the umbrella term ``automation'' (\citeproc{ref-mann_benign_2018}{Mann
and Püttmann, 2018, p. 40}) --- in general may not have found net
negative effects on employment but a restructuring of composition of
occupations. The aforementioned study from Autor et al.
(\citeproc{ref-autor_untangling_2015}{2015, p. 644}) found automation,
while having no aggregate effects on employment, led to a decline in
occupations involving routine tasks and an increase in non-routine
(abstract) tasks. Graetz and Michaels
(\citeproc{ref-graetz_robots_2018}{2018, p. 766}) found the same effect
studying the introduction of industrial robots. Furthermore, using
weighted patents and firm level data together with Eurostat's Structural
Business Statistics, Van Roy et al.
(\citeproc{ref-van_roy_technology_2018}{2018, p. 7}) reasoned
technological innovation to only have positive effects on employment on
firm level as well as in high-tech and medium-tech sectors, and found no
relationship between technology innovation and employment in the service
sector. These effects remain only harmless as long as the assumption
holds true that displaced labor can in fact always reallocate itself to
new tasks. Should this assumption be contradicted, and the negative
effects of automation on employment are no longer offset by the positive
effects of reallocation, the phenomenon of occupational migration would
turn into an observation of job destruction.
\subsection{Changes in Labor Share}\label{changes-in-labor-share}
The introduction of capital, whether to complement or substitute labor,
inehrently leads to a decline of a firm's profits paid to labor as the
share of labors input relative to the output value decreases (assuming
all else equal). And in fact Karabarbounis and Neiman
(\citeproc{ref-karabarbounis_global_2014}{2014, p. 99}) show that the
observed decline in capital prices explains almost half the decline in
global labor share that has been observed in recent decades. This might
seem problematic as an increasing portion of a firm's revenue remains as
corporate profits and savings (given that the capital invested leads to
a decrease in marginal costs through substitution of labor and/ or
increased production) rather than being redistributed to labor.
Karabarbounis and Neiman (\citeproc{ref-karabarbounis_global_2014}{2014,
p. 102}) further show that the observed decline in labor share is
accompanied by an increase in corporate revenue and savings. This is
also brought forward from Acemoglu and Restrepo
(\citeproc{ref-acemoglu_automation_2019}{2019, p. 27}) who conclude that
``{[}\ldots{]} automation always reduces the labor share and may reduce
labor demand {[}\ldots{]}'' but also mention that the creation of new
tasks necessarily increases the labor share. These results where further
solidified by Acemoglu et al.
(\citeproc{ref-acemoglu_competing_2020}{2020b, p. 387}) who investigated
the French manufacturing market and found firms exposed to automation
(in this study measured by the introduction of robots) to experience
significant declines in their labor share.
\subsection{Definitions of AI}\label{sec-definitions-of-ai}
Lastly, research on Artificial Intelligence's implication has been
intrinsically difficult due to the fact that there is no consensus in
the definition of AI yet (\citeproc{ref-damioli_impact_2021}{Damioli et
al., 2021, p. 7}; see \citeproc{ref-lu_review_2021}{Lu and Zhou, 2021,
p. 1063}). The classification of Artificial Intelligence remains also
difficult due to the fact that there is yet no widespread agreement on
the definition of intelligence itself (see
\citeproc{ref-legg_collection_2007}{Legg and Hutter, 2007}). While AI
and machine learning are sometimes regarded as two different terms, the
former applying to the industry and the latter applying to the
technology (\citeproc{ref-crawford_atlas_2021}{Crawford, 2021, p. 9}),
in this research, the term Artificial Intelligence refers to the
underlying technologies and its applications.
\subsection{Summary}\label{sec-summary-of-effects}
To conclude, the net impact assessment of automation on socioeconomic
factors widely differs in the aforementioned literature (see also
\citeproc{ref-frank_toward_2019}{Frank et al., 2019, p. 6532}). Some
research has focused on mircoeconomic data (see
\citeproc{ref-seamans_ai_2018}{Seamans and Raj, 2018}) or local labor
markets (commuting zones) (see
\citeproc{ref-acemoglu_robots_2020}{Acemoglu and Restrepo, 2020a};
\citeproc{ref-autor_untangling_2015}{Autor et al., 2015};
\citeproc{ref-autor_growth_2013}{Autor and Dorn, 2013}), while other
research has focused on national effects (see
\citeproc{ref-furman_ai_2019}{Furman and Seamans, 2019}) and
international effects (see \citeproc{ref-graetz_robots_2018}{Graetz and
Michaels, 2018}). While one would expect to see the same relationship
between the chosen variables on all levels, apart from differences in
research design, it may be difficult to assess effects on a greater
aggregate level as the number of variables that would need to be
included to account for differences between and within groups becomes
unfeasible. Given the various contradicting results on the relationship
between automation and labor effects and the increasing presence of AI,
this research aims to add to the current corpus of literature by
assessing the relationship between AI innovation and socioeconomic
factors. Specifically, the research question is as follows: How does AI
innovation across industries impact labor displacement and labor
conditions?
\section{Methodology}\label{sec-methodology}
The following section introduces the methodology adopted in this
research along with the data sources used, the data acquisition process,
the data preprocessing methods as well as an overfiew of the data, the
chosen model and its hypotheses. Note that the data acquisition,
preprocessing, as well as the statistical models, figures and tables
presented in this research have been implemented in Python and are
available in the GitHub repository accompanying this research
(\citeproc{ref-rieg_bt_ai_2023}{Rieg, 2023}). The repository also
contains the source code for this paper as a Quarto
(\citeproc{ref-allaire_quarto_2022}{Allaire et al., 2022}) document as
well as seperate source code for most tables and figures provided in
this paper. To keep domain specific technicalities about the
implimentation of the following methdology to a minimum, methods are
mostly described in their characteristics and not in their
implementation. While the GitHub repository contains the source code,
for attribution purposes, it should be noted that the data acquisition
process via EPO'S API was implemented using Python's Requests module
(\citeproc{ref-chandra_python_2015}{Chandra and Varanasi, 2015}),
processing and table creation was done using Pandas
(\citeproc{ref-the_pandas_development_team_pandas-devpandas_2023}{The
pandas development team, 2023}), Numpy
(\citeproc{ref-harris_array_2020}{Harris et al., 2020}) and SciPy
(\citeproc{ref-virtanen_scipy_2020}{Virtanen et al., 2020}). Regressions
and statistical tests were implemented with the Statsmodels module
(\citeproc{ref-seabold_statsmodels_2010}{Seabold and Perktold, 2010})
and figures were created using Plotly
(\citeproc{ref-plotly_technologies_inc_collaborative_2015}{Plotly
Technologies Inc, 2015}).
A key problem to current AI research is the lacking availability of
precise data about the usage and implementation of AI technologies
(\citeproc{ref-seamans_ai_2018}{Seamans and Raj, 2018, p. 5f}.).
Therefore, this research adopts an approach which has similarities to
Mann and Püttmann (\citeproc{ref-mann_benign_2018}{2018, p. 13}) who
used patent counts as a proxy for estimating the level of automation
present within a US commuting zone and Van Roy et al.
(\citeproc{ref-van_roy_technology_2018}{2018}) who used firm-level
citation-weighted patent counts to measure effects on employment.
However, the here presented method of patent selection differs. While
Mann and Püttmann (\citeproc{ref-mann_benign_2018}{2018}) classified
texts based on the tasks they may effect within occupations, the
presented approach here uses API query composition to preselect patents
whose titles or abstracts match keywords reserved to an industry. It
should be noted that there have been other approaches to measure the
presence of AI, such as using the AI Progress Measurement from the
Electronic Frontier Foundation (EFF), job postings
(\citeproc{ref-acemoglu_ai_2020}{Acemoglu et al., 2020a, p. 12}) and
surveys (\citeproc{ref-gruetzemacher_forecasting_2020}{Gruetzemacher et
al., 2020, p. 4}). However, the EFF project, while being a promising
source of data, has been discontinued in 2017
(\citeproc{ref-electronic_frontier_foundation_ai_2017}{Electronic
Frontier Foundation, 2017}).
\subsection{Data Sources}\label{sec-data-sources}
Data about patent publications is obtained from the European Patent
Office's Open Patent Services (OPS) API
(\citeproc{ref-european_patent_office_open_2023}{European Patent Office,
2023}) as well as the Annual Structural Business Statistics (SBS) by
Eurostat
(\citeproc{ref-european_commission_eurostat_structural_nodate}{European
Commission, Eurostat, n.d.a}). Furthermore, Eurostat's code lists of
Statistical classification of economic activities in the European
Community (NACE Revision 2)
(\citeproc{ref-european_commission_eurostat_statistical_2023}{European
Commission, Eurostat, 2023a}) (henceforth ``NACE'') and economic
indicators for Eurostat's SBS
(\citeproc{ref-european_commission_eurostat_economical_2023}{European
Commission, Eurostat, 2023b}) are retrieved to map codes to their
respective definition. Additionally Cooperative Patent Classification
(CPC) codes are retrieved manually from the European Patent Office's
Espacenet website
(\citeproc{ref-european_patent_office_classification_nodate}{European
Patent Office, n.d.a}).
\subsubsection{Patents}\label{patents}
Cooperative Patent Classification is a classification system by the
European Patent Office and the US Patent and Trademark Office that
allows for a structural hierarchical classification of patents
(\citeproc{ref-european_patent_office_cooperative_nodate}{European
Patent Office, n.d.b}). As seen in Table~\ref{tbl-cpc-codes}, CPC codes
are composed of a section (alphabetical), class (numerical), subclass
(alphabetical), and main group (numerical). The CPC codes are used to
retrieve patents that utilize artificial intelligence technology. The
European Patent Office's OPS API allows for programmatic access to the
Patent Office's database
(\citeproc{ref-european_patent_office_open_2023}{European Patent Office,
2023}). With it, one can retrieve data on individual patents, such as
--- among others --- their title and abstract, date of application,
place of application, the names of the applicants, the patents
classification (CPC), and a patent's references to other patents and
documents. The OPS is used to systematically retrieve patents that
contain specified attributes (see Section~\ref{sec-data-acquisition}).
Retrieved patents are used as a proxy for the current level of interest
and level of innovation in AI, which in turn is assumed to be an
indicator for the extent to which AI is present within an industry.
\subsubsection{Structural Business
Statistics}\label{structural-business-statistics}
Eurostat's Structural Business Statistics (SBS) are annually composed
statistics about the economic structure and performance of businesses
across the EU as well as aggregates on EU level. It currently holds data
for the years 2005 to 2020
(\citeproc{ref-european_commission_eurostat_structural_nodate-1}{European
Commission, Eurostat, n.d.b}).\footnote{At the time of writing, the
Eurostat has released its latest data on the SBS for the year 2021
(\citeproc{ref-eurostat_enterprises_2023}{Eurostat, 2023a}).
Unfortunately, the new statistics uses new indicators that do not
align with previous ones
(\citeproc{ref-european_commission_commission_2020}{European
Commission, 2020, p. 131}).} It gathers data from national sources and
calculates EU wide aggregates on the level of NACE sections and groups
about a variety of indicators, such as the number of enterprises present
in an industry, the number of employees, and monetary value produced
(\citeproc{ref-european_commission_commission_2009}{European Commission,
2009}). While the SBS offers a variety of indicators (see
\citeproc{ref-european_commission_commission_2009}{European Commission,
2009}), this research focuses on the following. First, the number of
enterprises present within an industry. This variable has been chosen to
describe a possible relationship between the current number of AI patent
applications and a possible trend towards a monopolistic market
structure. The intuition here being that a market trending towards
monopoly (not actually exhibiting monopoly) gains increasing leverage
(bargaining power) over labor.
Second, the number of Employees. Given the literature introduced in the
previous section, one would expect two possible relationships between
the number of patents retrieved and the number of employees. Either
technology acts as a complementary input, enhancing labor productivity
and leading to industry growth, which further induces demand in labor.
Here one would expect to see a positive relationship between the
endogenous and exogenous variables. Or technology acts as a substitute
for labor, i.e., displacing labor at a rate higher than new occupations
are introduced into the industry. In this case, one expects a negative
relationship between the introduction of technology and the number of
employees.
Third, the wage adjusted labor productivity. It is expressed as a ration
of value added over average personnel expenses
(\citeproc{ref-european_commission_eurostat_wage_2023}{European
Commission, Eurostat, 2023c}). This variable has been chosen to describe
a possible relationship between the current number of AI patent
applications and the productivity of labor. Given the two possibilities
that new technology either displaces labor completely or complements
labor (which may include some displacement that is fully offset by the
creation of new jobs), the expectation is that the introduction of
technology always enhances labor productivity (either through
displacement or complementation). Both ways should exhibit a rise in
wage adjusted labor productivity as the numerator of the ratio
increases. Of course, there may be scenarios in which simultaneously the
denominator --- wages --- increases too.
Fourth, gross value added per employee. This variable was chosen on the
assumption of increased productivity through the adoption of new
technology. As capital (in this case AI technology) aids to increase
output production on a marginal (per employee) basis, one would expect
the ratio to grow with increased adoption of technology.
Fifth, the percentage of personnel costs in production, which is a
derived value from production costs and personnel costs, calculated by
Eurostat
(\citeproc{ref-european_commission_eurostat_derived_nodate}{European
Commission, Eurostat, n.d.c, p. 1}). One would expect --- all else equal
--- the percentage share of labor costs in the production process to
decrease with the adoption through technology. Either because capital
spending is increased, or marginal costs of capital is decreased, or
production quantity (and value ) is increased by adoption of new
technology. The SBS data's indicators are used as the endogenous
variables to be explained by the number of retrieved patent
applications.
\subsubsection{Definition of AI}\label{definition-of-ai}
To retrieve patents that relate or incorporate to AI technology, the
selection of correct CPC codes is crucial. While there are a variety of
possible technologies that may fall under the umbrella term ``Artificial
Intelligence'', this research aims to assess AI's socioeconomic impact,
which, if negative, falls into the governmental realm. Therefore, a
legal definition of AI is preferable as a classifier on which basis CPC
codes are selected. Furthermore, it is arguable that the political
definition is likely to have the greatest (socio)economic impact in the
near future due to possible (and probable) regulation. However, as there
is no legal definition yet --- at least in the EU --- technologies
listed in the European Commision's latest proposal for the ``Artificial
Intelligence Act{[}'s{]}''
(\citeproc{ref-european_commission_proposal_2021}{European Commission,
2021a}) annex (\citeproc{ref-european_commission_annexes_2021}{European
Commission, 2021b}) will be used.\footnote{The European Commission's
proposal for the ``Artificial Intelligence Act'' is currently in the
legislative process. At the time of writing, the European Parliament
has made amendments to this proposal, one of which --- unfortunately
--- is the removal of the list of technologies classified as AI from
the initial proposal's annex
(\citeproc{ref-european_parliament_texts_2023}{European Parliament,
2023, p. 326f}.). For the time being, the EU Parliament's new
definition (amendment 165, p.~111f.) of Artificial Intelligence is
rather vague, which is why the European Comission's initial proposal's
definition will be used.} In its annex \Romannum{1}, the European
Commsission suggests the following definition for AI.
\setstretch{1}
\begin{quote}
``(a) Machine learning approaches, including supervised, unsupervised
and reinforcement learning, using a wide variety of methods including
deep learning;\\
(b) Logic- and knowledge-based approaches, including knowledge
representation, inductive (logic) programming, knowledge bases,
inference and deductive engines, (symbolic) reasoning and expert
systems;\\
(c) Statistical approaches, Bayesian estimation, search and optimization
methods.'' (\citeproc{ref-european_commission_annexes_2021}{European
Commission, 2021b, p. 2})
\end{quote}
\setstretch{1.5}
As there is no clear mapping between the European Commission's
definition and available Cooperative Patent Classification codes, CPC
codes are chosen to the author's best knowledge.
\setstretch{1}
\phantomsection\label{tbl-cpc-codes}
\begin{longtable}[]{@{}ll@{}}
\caption{\label{tbl-cpc-codes}Selected CPC Codes}\tabularnewline
\toprule\noalign{}
Class & CPC \\
\midrule\noalign{}
\endfirsthead
\toprule\noalign{}
Class & CPC \\
\midrule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
Machine Learning & G06N20/00, G06N20/10, G06N20/20 \\
Supervised Learning & G06N3/09 \\
Unsupervised Learning & G06N3/088 \\
Reinforcement Learning & G06N3/092 \\
Deep Learning & G06N3/08 \\
\end{longtable}
\setstretch{1.5}
\subsection{Data Acquisition}\label{sec-data-acquisition}
In order to retrieve data from the European Patent Office's Open Patent
Services (OPS) API, queries are composed to link retrieved patents to
their respective industry. The query composition is based on the
selected CPC codes displayed in Table~\ref{tbl-cpc-codes}, as well as
keywords from the list of NACE codes that have been retrieved from
Eurostat. Each NACE code is composed of section (alphabetical), division
(numerical), group (numerical) and class (numerical) of a particular
economic activity. Sections relate to the overall industry, while
divisions, groups and classes relate to more specific activities within
the industry (\citeproc{ref-eurostat_nace_2023}{Eurostat, 2023b}). For
each industry, keywords are extracted from the NACE code's description.
This is done on the division level (the second level of NACE codes). As
a result, keywords are extracted and grouped by their respective
division. For example, for NACE industry ``A'', which relates to
``agriculture, forestry and fishing''
(\citeproc{ref-european_commission_eurostat_economical_2023}{European
Commission, Eurostat, 2023b}), keywords are extracted for its three
divisions, ``crop and animal production, hunting and related service
activities'' (A01), ``forestry and logging'' (A02), and ``fishing and
aquaculture'' (A03). To ensure only relevant keywords are used, each
description is cleaned of common characters and unrelated words (e.g.,
``,'', ``and'', ``or'', ``to'') as well as duplicate words. Descriptions
for each industry are then split into lists of single keywords that will
be used in the API query. As a result, extracted keywords are
identifiable by their section as well as division.
Because some industries contain a variety of different activities (e.g.,
NACE industry (section) ``A'' relates to ``Agriculture, forestry and
fishing''
(\citeproc{ref-european_commission_eurostat_economical_2023}{European
Commission, Eurostat, 2023b})), main (industry) keywords that relate to
the section as a whole are manually selected (see
Table~\ref{tbl-nacemainkeywords} in the \nameref{sec-appendix}). In
other words, while general (division) keywords are selected from the
descriptions of groups within a division, main keywords are extracted
from the description of a section. For each division within a section
(industry), queries are then build using the (manually selected) main
(industry) keywords, the general (division) keywords, as well as the
chosen CPC codes. The general structure of a query is as follows.
Queries are build on the level of divisions. For each division, a query
is composed that retrieves patents that have at least one of the main
keywords of the respective section (industry) in its title or abstract,
at least one keyword of the division's general keywords in its title or
abstract, at least one of the chosen CPC codes in the patent's list of
CPC codes, and an application number starting with ``EP'', relating to
the European Patent Office.\footnote{To be precise, because of the API's
restrictions, there can be multiple queries for the same division. The
OPS API allows for a maximum number of 20 ``terms'' (keywords, such as
a single CPC code or industry keyword) but also only a maximum number
of 10 terms per argument (such as keywords that must be contained in
the patent's title or abstract; the argument is ``title or
abstract''). Given that each query contains seven CPC codes and one
application number, if there are together more than 12 main keywords
and general keywords, the general keywords are subdivided into smaller
chunks across multiple queries. Therefore, each query contains all
main keywords, CPC codes and the application number, while the
remaining terms are filled with the general keywords.}
The resulting query is then used to retrieve patents from the OPS API.
Initially, queries were created not only for the European Patent Office
but all patent offices within the European Union to retrieve patent data
on a national level. This approach would have resulted in a much richer
dataset and enabled better aggregates while also allowing for
between-country comparisons. However, initial tests showed that most of
the patents filed with a national patent office contain only patent
titles and abstracts in their native language which renders the chosen
keywords in the query language (English) ineffective. As a result, the
decision was made to only retrieve patents filed with the European
Patent Office. This approach disregards patents filed with national
patent offices. The query is composed of the following elements:
\setstretch{1.0}
\begin{quote}
\textbf{(ta = Main Keywords) AND (ta = Description Keywords) AND (cpc =
CPC Codes) AND (ap = ``EP'')}\\
\emph{Note: ta = title or abstract; ap = Application Number, referring
to the Patent Office the patent was filed at. In this case, ``EP''
refers to the European Patent Office. See Table~\ref{tbl-queryexample}
for example queries.}
\end{quote}
\setstretch{1.5}
The queries are then posted to the OPS API's Published Data Keywords
Search with Variable Constituents endpoint
(\citeproc{ref-european_patent_office_published_nodate}{European Patent
Office, n.d.c}). The API's response, containing the data --- which is
provided in JSON format --- is first enriched with meta data, such as
the section and division for which the query was posted, to allow a
mapping from the returned patents to the industry to which they belong.
Next, the data is converted from JSON format into a table (Pandas
DataFrame). Given the structure of JSON files, this is not a linear
process. Therefore, only relevant information, such as the patent office
of application, the industry (section) and division, the CPC codes, the
patents filing dates, names of inventors, and citation, have been
extracted from the JSON file. The resulting table contains individual
patents and their attributes together with the meta data of the query's
section and division through which each patent has been retrieved.
\subsection{Preprocessing}\label{sec-preprocessing}
Since Eurostat's SBS data only includes codes to refer to given
indicators as well as industries, data retrieved from Eurostat (SBS, and
Code Lists about NACE and SBS codes) is merged. This is done by matching
the NACE codes and SBS indicator codes to the respective NACE code and
indicator in the SBS data. The economic indicators ``Enterprises'' and
``Persons employed'' are reported as totals. ``Wage adjusted labor
productivity (Apparent labor productivity by average personnel costs)''
and ``Share of personnel costs in production'' are reported as
percentages, and ``Gross value added per employee'' is reported in
Euros. Because the number of employees is rather large for each
industry, the number of employees is divided by 1000 to reduce the scale
of the data. This increases readability of tables in the following
regression results while being still large enough that it is unlikely
for coefficients (coef.) and standard errors (SE) to remain too far in
the decimals.\footnote{Note that this is done to ensure readability and
does not affect the regression results. Defactoring data by more than
a thousand might lead to coefficients and standard errors falling into
the decimals, which in turn may show up --- due to rounding --- as
zeros despite having large scale effects.\label{note3}}
Next, patent data retrieved from the OPS API, which returns data in JSON
format, is converted into a table. As multiple queries for the same
industry --- but with different keywords --- have been posted to the
API, duplicates in the patent data are removed. Specifically, duplicate
patent data (indicated by the patent application number) are removed in
each industry subset of the data. This ensures that each industry only
contains unique patents while patents can still appear in more than one
industry (as their applicable usage may not be restricted to only one
industry). Furthermore, as the SBS data only spans from 2011 to 2020,
patents that have been filed before or after this period are removed
from the data. As a next step, patents are grouped by their respective
industry and year of application and the patent count for each subgroup
is recorded. Furthermore, industries for which patents have been
retrieved in less than four years within 2011-2020 are removed from the
data to ensure a minimum sample size for the following statistics. The
sum of patents for each industry and year composes the exogenous
variable ``Sum patents'' that will be used in the regression analyses.
Additionally, the SBS data is merged with the patent data by matching
the industry and year of application with the industry and year of the
SBS data. This ensures that each industry and year combination in the