-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathCPiH.tex
1012 lines (837 loc) · 39.7 KB
/
CPiH.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% -*- mode: LaTeX; compile-command: "rubber -d --unsafe CPiH.tex" -*-
\documentclass{book}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Packages
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{hyperref}
\usepackage{url}
\usepackage{xcolor}
\hypersetup{
colorlinks,
linkcolor={green!50!black},
citecolor={blue!50!black},
urlcolor={blue!80!black}
}
\usepackage{graphicx}
\graphicspath{{images/}}
\usepackage{etoolbox}
\usepackage{mdframed}
\usepackage{prettyref}
\usepackage{minted}
\usepackage{todonotes}
\usepackage[style=authoryear]{biblatex}
\addbibresource{references.bib}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% prettyref
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newrefformat{fig}{Figure~\ref{#1}}
\newrefformat{sec}{section~\ref{#1}}
\newrefformat{chap}{Chapter~\ref{#1}}
\newrefformat{eq}{equation~\eqref{#1}}
\newrefformat{prob}{Problem~\ref{#1}}
\newrefformat{tab}{Table~\ref{#1}}
\newrefformat{thm}{Theorem~\ref{#1}}
\newrefformat{lem}{Lemma~\ref{#1}}
\newrefformat{prop}{Proposition~\ref{#1}}
\newrefformat{defn}{Definition~\ref{#1}}
\newrefformat{cor}{Corollary~\ref{#1}}
\newrefformat{ex}{Exercise~\ref{#1}}
\newrefformat{alg}{Algorithm~\ref{#1}}
\newcommand{\pref}[1]{\prettyref{#1}}
% \Pref is just like \pref but it uppercases the first letter; for use
% at the beginning of a sentence.
\newcommand{\Pref}[1]{%
\expandafter\ifx\csname r@@#1\endcsname\relax {\scriptsize[ref]}
\else
\edef\reftext{\prettyref{#1}}\expandafter\MakeUppercase\reftext
\fi
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Misc semantic markup
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\module}[1]{\emph{#1}}
\newcommand{\pkg}[1]{\texttt{#1}}
\newcommand{\term}[1]{\emph{#1}}
\newcommand{\h}[1]{\mintinline{haskell}{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Kattis
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\kattislogo}{\raisebox{-0.2em}{\includegraphics[height=0.9\baselineskip]{Kattis}}}
\newcommand{\kattislist}[1]{%
\def\nextitem{\def\nextitem{, }}
\renewcommand*{\do}[1]{\nextitem\kattislink{##1}}
\docsvlist{#1}
}
\newcommand{\kattis}[1]{%
\begin{mdframed}[frametitle={Practice problems}]
\kattislogo
\kattislist{#1}
\end{mdframed}
}
\newcommand{\kattislink}[1]{\href{https://open.kattis.com/problems/#1}{\texttt{#1}}}
\newcommand{\inlinekattis}[1]{\kattislogo\!\!\kattislist{#1}\!}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Title page
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{Competitive Programming in Haskell}
\author{Brent A. Yorgey}
\begin{document}
\maketitle
\tableofcontents
\chapter{Introduction}
\label{chap:intro}
\todo{code style: stylish-haskell? fourmolu?}
\chapter{Getting Started}
\label{chap:getting-started}
As a basic running example, we will consider the same example problem
that Open Kattis uses in its \href{https://open.kattis.com/help}{help
section}, A Different Problem \mbox{(\inlinekattis{different})}. In
this problem, we are told that the input will consist of a number of
pairs of integers between $0$ and $10^{15}$, one pair per line, and we
should output the absolute value of the difference between each
pair. The given example is that if the input looks like this:
\begin{verbatim}
10 12
71293781758123 72784
1 12345677654321
\end{verbatim}
then our program should produce output that looks like this:
\begin{verbatim}
2
71293781685339
12345677654320
\end{verbatim}
This is of course an extremely simple problem, so we can just focus on
the mechanics of solving a problem in Haskell.
\section{Competitive programming is functional}
\label{sec:cp-functional}
You can see that this problem specifies an \emph{input} in a
particular format (which will be provided via our program's standard
input), and requires us to produce a specified \emph{output} (via
standard output). Most competitive programming problems are of this
form.\footnote{There certainly can exist problems requiring
interaction beyond standard input and output, such as reading from a
certain file, doing network I/O, and so on, but we will not consider
such problems in this book.} An imperative approach to such a
problem involves doing a sequence of input commands, some computation,
and a sequence of output commands---possibly interleaved with one
another---and we might immediately think to start using functions like
\h{getLine} and \h{putStrLn} to do the required I/O in Haskell. However,
there is a much more fruitful functional perspective: we are simply
being asked to implement a particular (partial) function of type
\h{String -> String}.\footnote{If you are worried about the use of
\h{String}, fear not: in \pref{chap:parsing}, once we develop a simple parsing
framework, we will switch to \h{ByteString} for efficiency.} The fact
that the function's input and output should be hooked up to the
program's standard input and output is just an implementation detail.
Competitive programming is functional at heart.
It turns out that Haskell's standard library already has the perfect
built-in function for this scenario:
\begin{minted}{haskell}
interact :: (String -> String) -> IO ()
\end{minted}
\h{interact} takes a pure \h{String -> String} function and turns it into
an \h{IO} action which reads from standard input, passes the input to
the given function, and prints the result to standard output. It even
does this using \emph{lazy} I/O, which can be strange and problematic
in some scenarios, but is perfect for this situation: the input is
read lazily, as demanded by the function, so that the output and input
can be automatically interleaved depending on which parts of the
output depend on which parts of the input. In particular, this means
that that the entire input need not be stored in memory at once. If
the inputs can be processed into outputs in a streaming fashion---as
is the case in the example problem we are currently
considering---then the input and output will be interleaved.
Thus, the \h{interact} function lets us immediately pass to a functional
view of a problem, worrying only about the essential details of
transforming the given input into the requested output. This is the
last time \h{IO} will appear in this book! \todo{Check this---maybe I
want to write about interactive problems.}
\section{A basic solution pipeline}
\label{sec:pipeline}
So now we need to write a pure function which transforms the input
into the output. Of course, in true Haskell fashion, we will do this
by constructing a chained pipeline of functions to do the job
incrementally. The general plan of attack (for any problem) is
as follows:
\begin{enumerate}
\item First, parse the input, that is, transform the raw input
into some more semantically meaningful representation.
\item Next, solve the problem, turning a semantically meaningful
representation of the input into a semantically meaningful
representation of the output.
\item Finally, format the output into a \h{String}.
\end{enumerate}
Figure~\ref{fig:skeleton-different} has a simple skeleton solution along
these lines. There are a few things to point out.
\begin{figure}
\centering
\inputminted{haskell}{code/intro/Skeleton.hs}
\caption{A skeleton solution for A Different Problem}
\label{fig:skeleton-different}
\end{figure}
\begin{itemize}
\item Notice the use of the backwards function composition operator
\h{(>>>)} from \h{Control.Arrow} (essentially, \h{f >>> g = g . f},
although in actuality \h{(>>>)} is a bit more general than that).
When solving programming problems, I like to be able to type
function pipelines from left to right as I think about data flowing
through the pipeline from beginning to end. This is of course a
personal preference, and one could also write %
\h{format . solve . parse} instead of %
\h{parse >>> solve >>> format}.
\item If the machine on which our solution will run has a 64-bit
architecture (this is always true for Open Kattis, but not
necessarily so for other platforms such as Codeforces), for this
problem we could technically get away with using \h{Int} instead of
\h{Integer}. On a 64-bit architecture, \h{maxBound :: Int} is $2^{63} -
1$, which is a bit more than $9 \times
10^{18}$, plenty big enough for this problem, with inputs only up to
$10^{15}$. For more computationally intensive problems, using \h{Int}
instead of \h{Integer} can be an important optimization; however, for
simple problems, using \h{Integer} is preferred since it eliminates
the potential for bugs due to overflow.
\item In simple problems such as this, the \h{solve} function could
probably be inlined (it can be a fun challenge to solve easy
problems in a single line of code!). However, in general, I prefer
to split it out explicitly in order to specify its type, which both
prevents problems with \h{read}/\h{show} ambiguity, and also serves as a
sanity check on the parsing and formatting code.
\item And one last thing: I said we were going to parse the input into
a ``semantically meaningful representation'', but I lied a teensy
bit: the problem says we are going to get a \emph{pair} of integers,
but the type of \h{solve} says that it takes a \emph{list} of
integers. We will discuss and justify this choice in more detail in
\pref{sec:partial}.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Parsing and formatting}
\label{sec:parse-format}
The outer shell of our basic solution pipeline deals with converting
between the raw strings used as input and output, and more
semantically meaningful representations. In many cases, it suffices
to use basic list-processing functions from \module{Prelude} or
\module{Data.List}.
\begin{itemize}
\item For parsing input, some of the most frequently useful functions
include \h{lines}, \h{words}, \h{drop}, and \h{read}. The \h{split} function
from the \h{Data.List.Split} module (from the \pkg{split}
package) can also be helpful.
\item For formatting output, some of the most frequently useful
functions include \h{unlines}, \h{unwords}, \h{show}, \h{concat},
\h{intersperse}, and \h{intercalate}.
\end{itemize}
If any of these functions are unfamiliar to you, it's worth spending
some time reading their documentation and trying them on some
examples. In \pref{chap:parsing} we will consider more sophisticated
tools for parsing in particular, but until then we will stick to
problems that only require these basic tools. Fortunately, there are
many such problems.
Returning to our running example solving A Different Problem, the
given input consists of a number of lines, each containing two
numbers. This is easily parsed using a combination of \h{lines},
\h{words}, \h{read}, and \h{map} (\pref{fig:parse-different}).
\begin{figure}
\centering
\inputminted[firstline=5, lastline=6]{haskell}{code/intro/Different.hs}
\caption{\h{parse} function for A Different Problem}
\label{fig:parse-different}
\end{figure}
As another example, consider the problem Pot
\mbox{(\inlinekattis{pot})}. The input description for this problem says
\begin{quote}
The first line of input contains the integer $N (1 \leq N \leq 10)$, the number
of the addends from the task. Each of the following $N$ lines contains
the integer $P_i$ ($10 \leq P_i \leq 9999$, $i = 1, \dots, N$) from the task.
\end{quote}
At first glance, it looks like we need a more sophisticated parsing
framework for this problem: the first line is a number which tells us
\emph{how many} additional lines of input we need to parse. The need
for intermediate values to be able to affect later parsing is exactly
what necessitates the use of a \emph{monadic} parsing framework.
However, in this case, we can do something much simpler: just ignore
the number entirely, and process all the lines besides the first! We
don't actually need the number to tell us how many lines to expect
since the end of the input is delimited by EOF.
\begin{minted}{haskell}
parse :: String -> [Integer]
parse = lines >>> drop 1 >>> map read
\end{minted}
This approach often works. For example, even if we have a variable
number of lines, each of which has a variable number of items (with a
number at the beginning saying how many), we can ignore all the
counts: just split into lines, drop the first line, and then split
each line into words, dropping the first. Besides the lines being
delimited by EOF overall, each individual line is delimited by the end
of line character, so we don't need any of the counts to tell us how
far to read.
However, this doesn't work when we really have no intrinsic way to
tell when a certain section ends and another starts. For example,
some problems consist of a number of test cases, where each test case
input consists of a variable numbers of lines. In that case we would
need to use provided information about the length of each test case.
We will discuss such parsing in \pref{chap:parsing}.
\todo{include full solution for A Different Problem?}
\section{Using partial functions}
\label{sec:partial}
You might wonder about using a list as the output of \h{parse} and the
input of \h{solve}. Shouldn't we use something like \h{(Int,Int)}, since
we know we will be given exactly two integers? \todo{paste in blog post
about this. Note you can skip if you're not interested in the
philosophy.}
The fact is that I almost never use actual Haskell tuples in my
solutions, because they are too awkward and inconvenient. Representing
homogeneous tuples as Haskell lists of a certain known length allows
us to read and process ``tuples'' using standard functions like
\h{words} and \h{map}, to combine them using \h{zipWith}, and so on.
And since we get to assume that the input always precisely follows the
specification---which will never change---this is one of the few
situations where, in my opinion, we are fully justified in writing
partial functions like this if it makes the code easier to write. So
I often represent homogeneous tuples as lists and just pattern match
on lists of the appropriate (known) length. (If I need heterogeneous
tuples, on the other hand, I create an appropriate \h{data} type.)
\section{Explicit recursion}
\label{sec:explicit-recursion}
Some styles of Haskell programming eschew explicit recursion in favor
of folds and other recursion schemes. And certainly, one should use
simple higher-order functions like \h{map}, \h{filter}, and so on when
appropriate. However, it's often not worth trying to come up with
some clever fold when a simple recursive function will be faster to
write and easier to debug.
\todo{Example}
\section{Practice problems}
\label{sec:gs-practice}
Below are a few simple problems for you to practice on. This is just
a very small sample; there are many, many problems on Open Kattis that
can be solved with not much more than the techniques explained in this
chapter, and I encourage you to just solve a bunch of them! To start,
look for problems with a difficulty rating less than 2.0. To find
such problems easily, you can go to the list of all problems
(\url{http://open.kattis.com/problems/}) and sort by difficulty.
\kattis{jobexpenses, judgingmoose, quickestimate}
\chapter{Wholemeal Programming}
\label{chap:wholemeal}
As we have seen in the previous chapter, functional programming
encourages us to think in terms of assembling functions into pipelines
that incrementally transform an input into an output. \term{Wholemeal
programming} is a closely related concept, explained best by Ralf
Hinze \parencite*{hinze2009tour}:
\begin{quote}
Functional languages excel at wholemeal programming, a term coined
by Geraint Jones. Wholemeal programming means to think big: work
with an entire list, rather than a sequence of elements; develop a
solution space, rather than an individual solution; imagine a graph,
rather than a single path.
\end{quote}
This is often a very fruitful approach to solving competitive
programming problems. It helps us avoid getting lost---or making
mistakes!---dealing with trivial low-level details.
\section{Example: processing a list of strings}
As a simple example, consider the problem of writing a function which
takes a list of strings as input, and adds up the lengths of all
strings that do not contain the letter \h{'e'}. For example, given
the input
\begin{minted}{haskell}
["dog", "sheep", "horse", "capybara", "lemur"]
\end{minted}
the function should return 11 (the combined length of \h{"dog"} and
\h{"capybara"}).
A standard imperative approach to this problem would be to loop over
the list of words, updating an accumulator variable every time we see
a string that does not contain the letter e. An inexperienced
Haskeller might produce something similar:
\inputminted{haskell}{code/wholemeal/LengthWithoutE.hs}
This code works, but there is a better way. Instead of thinking in
terms of processing the list one item at a time, an experienced
Haskeller will think in terms of incrementally transforming the input
into the desired output: first, filter out the strings we don't want;
next, turn each string into its length; and finally sum the lengths.
\inputminted{haskell}{code/wholemeal/LengthWithoutEWholemeal.hs}
This is better in almost every way: it is shorter, easier to read and
understand, easier to refactor, harder to get wrong, and easier to
prove correct.
Experienced programmers might worry about efficiency: doesn't this do
three passes over the list instead of just one? Even worse, doesn't
it construct several intermediate lists from scratch? Actually,
Haskell's laziness, combined with optimization technology like list
fusion, make this very efficient in practice. And in any case, we
usually don't need to worry much about constant factors as long as we
have the right asymptotic complexity.
\section{Basic data structures}
Wholemeal programming in Haskell usually revolves around the use of
several basic data types provided in Haskell standard libraries, such
as lists, \h{Set}, \h{Map}, \h{Tree}, and \h{Array}. We will touch on
each of these, highlighting their important features and suggesting
some practice problems to solve.
\subsection{Lists}
\label{sec:lists}
Lists (including \h{String}s) are ubiquitous in Haskell, and the
standard library provides a large number of useful list-processing
functions. Trying to come up with a specific catalog of list
functions which are of use in competitive programming would be a
fool's errand, since almost all of the standard \h{Data.List}
functions
(\url{https://hackage.haskell.org/package/base-4.18.0.0/docs/Data-List.html})
can be of use. If you're not already familiar with all the standard
list functions, you should spend time looking through the
documentation and practicing their use, especially if you want to be
able to solve easy problems quickly.
One thing worth mentioning is efficiency: lists have quite a bit of
extra overhead compared to arrays. To avoid this becoming an issue, we
should aim to use lists as \emph{control structures} rather than
\emph{data structures}. That is, in a situation where you would write
a \texttt{for} loop in an imperative language, you can probably safely
use a list in Haskell. It is perfectly fine to have a long list and
then iterate over it multiple times with standard functions like
\h{map}, \h{filter}, and so on. The multiple traversals may fuse
away, and even if they don't the extra overhead is unlikely to make a
big difference. What you definitely don't want to do is use many
small lists as data structures. That is, for example, having two
lists with $10^5$ elements each is probably fine; having $10^5$ lists
with two elements each is probably going to be too slow.
\todo{pick two or three representative sample problems and discuss
their solutions? One using scanl?}
\kattis{foo, bar}
\subsection{Sets}
\label{sec:sets}
The standard \h{Data.Set} type stores elements in a balanced tree
structure, so most operations take time logarithmic in the number of
items stored. This is not as fast as a hash table, but quite fast in
practice, and provides a much richer API than a hash-based data
structure. (The \pkg{unordered-containers} package---which is
available in the Open Kattis test environment---does provide
\h{Data.HashSet}, but it's rarely needed.)
It is worth having a good working knowledge of standard \h{Set}
functions like \h{empty}, \h{singleton}, \h{fromList}, \h{toList},
\h{insert}, \h{delete}, \h{union}/\h{unions}, \h{intersection},
\h{difference}, \h{map}, \h{filter}, \h{member}, \h{null}, and
\h{size}.
There are many others which are occasionally useful and can be looked
up in the documentation when needed, such as (but not limited to)
\h{split}, \h{takeWhile}, \h{lookupIndex}, \h{lookupGT}, \h{elemAt},
and \h{findMin}/\h{findMax}.
\subsubsection{Example: Booking a Room}
As an example, consider \inlinekattis{bookingaroom}. We are given as
input integers $r$ and $n$, and then list of $n$ integers which are
between $1$ and $r$ inclusive, representing hotel rooms which are
booked. We must output the number of some available room (or an error
message if all rooms are booked).
There are many ways to solve this. One simple way might be to just
read the given numbers into a list, and then check every number from
$1$ to $r$ to see if it is a member of the list or not (using
\h{elem}), and print the first one we find which is not. This
solution would take $O(nr)$ time. Since $n$ and $r$ are guaranteed to
be at most $100$, it would work, but we can do better---not only in an
asymptotic sense, but in terms of code size! Checking every number
from $1$ to $r$ has a very ``imperative'' feel; we are looping through
some items and doing something with each one individually. Instead,
we can think in terms of computing the \emph{entire} set of available
rooms at once, by taking the difference of the set of all rooms
$\{1 \dots r\}$ and the set of booked rooms. Picking one arbitrary
element from a \h{Set} can then be accomplished either by converting
to a list and picking the first element (\emph{i.e.} \h{listToMaybe
. toList}), or, even better, we can use the built-in
\h{Data.Set.lookupMin :: Ord a => Set a -> Maybe
a}. \pref{fig:bookingaroom} demonstrates a complete solution.
\begin{figure}
\centering
\inputminted{haskell}{code/wholemeal/BookingARoom.hs}
\caption{Booking a Room solution}
\label{fig:bookingaroom}
\end{figure}
\kattis{foo, bar}
\subsection{Maps}
\label{sec:maps}
The standard \h{Data.Map} type stores key-value mappings in a balanced
tree structure, so, like \h{Data.Set}, most operations take time
logarithmic in the number of items stored. A hash table-based map
such as \h{Data.HashMap} would be faster in theory, but \h{Data.Map}
is quite fast in practice. There are other map variants as well, such
as \h{Data.Map.Strict} or \h{Data.IntMap}. In my experience it usually
doesn't make a whole lot of difference, and I mostly stick to
\h{Data.Map} (\pref{chap:dp} will explain one particular application
of lazy, as opposed to strict, maps). The major exceptions seem to
be:
\begin{itemize}
\item If you want to make a large map with string (\h{ByteString})
keys, a \h{HashMap} can be considerably faster than \h{Map}
(\inlinekattis{pakethanterare}).
\item \todo{When is IntMap faster?}
\end{itemize}
It is worth having a good working knowledge of standard \h{Map}
functions like \h{empty}, \h{null}, \h{size}, \h{singleton},
\h{fromList}, \h{fromListWith}, \h{assocs}, \h{elems}, \h{keys},
\h{keysSet}, \h{insert}, \h{insertWith}, \h{delete}, \h{alter},
\h{lookup}, \h{member}, \h{union}, \h{unionWith}, \h{intersection},
\h{intersectionWith}, and \h{map}. \h{Map k} is also an instance of
\h{Functor}, \h{Foldable}, and \h{Traversable}, allowing the use of
standard functions like \h{fmap}, \h{foldMap}, \h{sum}, \h{maximum},
\h{minimum}, and \h{traverse}/\h{mapM}. There are many, many
other functions which are occasionally useful and can be looked up in
the documentation when needed, such as (but not limited to)
\h{compose}, \h{split}, \h{restrictKeys}, \h{partition}, \h{mapMaybe},
\h{updateAt}, \h{take}, \h{drop}, and \h{findMin}/\h{findMax}.
\kattis{pakethanterare}
\subsection{Arrays}
\label{sec:arrays}
\subsection{Trees}
\label{sec:trees}
\section{Additional practice problems}
\label{sec:wholemeal-practice}
\kattis{foo,bar}
\chapter{Input Parsing}
\label{chap:parsing}
So far, all the example problems we have considered have input in a
format that we were able to parse using simple standard functions like
\h{lines} and \h{words}. There is another common class of problems,
however, which follow this pattern:
\begin{quote}
The first line of the input consists of an integer $T$. Each of the
next $T$ lines consists of \dots
\end{quote}
That is, the input contains integers which are not input data per se,
but just tell you how many things are to follow. This is really easy
to process in an imperative language like Java or C++. For example,
in Java we might write code like this:
\begin{minted}{java}
Scanner in = new Scanner(System.in);
int T = in.nextInt();
for (int i = 0; i < T; i++) {
// process each line
}
\end{minted}
As we already saw in \pref{chap:getting-started}, we can often get
away with completely ignoring this extra information. For example, if
the input consists of a number $T$ followed by $T$ lines, each of
which contains a number $n$ followed by a list of $n$ numbers, we can
simply write
\begin{minted}{haskell}
main = interact $
lines >>> drop 1 >>> map (words >>> drop 1 >>> map read) >>> ...
\end{minted}
%$
That is, we can ignore the first line containing $T$ since the
end-of-file will tell us how many lines there are; and we can ignore
the $n$ at the beginning of each line, since the newline character
tells us when the list on that line is done.
Sometimes, however, this isn't possible, especially when there are
multiple test cases, or when a single test case has multiple parts,
each of which can have a variable length. For example, consider
Popular Vote (\inlinekattis{vote}), which describes
its input as follows:
\begin{quote}
The first line of input contains a single positive integer $T \leq 500$
indicating the number of test cases. The first line of each test case
also contains a single positive integer $n$ indicating the number of
candidates in the election. This is followed by $n$ lines, with the $i$th
line containing a single nonnegative integer indicating the number of
votes candidate $i$ received.
\end{quote}
How would we parse this? We could still ignore $T$---just keep reading
until the end of the file---but there's no way we can ignore the $n$
values. Since the values for each test case are all on separate lines
instead of on one line, there's otherwise no way to know when one test
case ends and the next begins.
One possible approach would be to use \h{splitAt} and explicit
recursion, like so:
\begin{minted}{haskell}
type Election = [Int]
readInput :: String -> [Election]
readInput = lines >>> drop 1 {- ignore T -} >>> map read >>> go
where
go :: [Int] -> [Election]
go [] = []
go (n:xs) = votes : go rest
where (votes,rest) = splitAt n xs
\end{minted}
However, this is really annoying to write and easy to get wrong.
There are way too many variable names to keep track of (\h{n}, \h{xs},
\h{votes}, \h{rest}, \h{go}) and for more complex inputs it becomes
simply unmanageable. You might think we should switch to using a real
parser combinator library---\h{parsec} is indeed installed in the
environment Kattis uses to run Haskell solutions---and although
sometimes a full-blown parser combinator library is needed
(\pref{chap:parser-combs}), in this case it's quite a bit more
heavyweight than we would like.
\section{Scanner}
\label{sec:scanner}
The heart of the issue is that we want to be able to specify a
high-level description of the sequence of things we expect to see in
the input, without worrying about managing the stream of tokens
explicitly. Another key insight is that 99\% of the time, we
\emph{don't} need the ability to deal with parse failure or the
ability to parse multiple alternatives. With these insights in mind,
we can create a very simple \h{Scanner} abstraction, which is just a
\h{State}ful computation over a list of tokens:
\begin{minted}{haskell}
type Scanner = State [String]
runScanner :: Scanner a -> String -> a
runScanner s = evalState s . words
\end{minted}
To run a scanner, we just feed it the entire input as a \h{String},
which gets chopped into tokens using \h{words}. (Of course in some
scenarios we might want to use \h{lines} instead of \h{words}, or even do
more complex tokenization.)
Note since \h{Scanner} is just a type synonym for \h{State [String]}, it
is automatically an instance of \h{Functor}, \h{Applicative}, and
\h{Monad} (but not \h{Alternative}).
So let's develop a little \h{Scanner} DSL. The most fundamental thing
we can do is read the next token.
\begin{minted}{haskell}
str :: Scanner String
str = get >>= \case { s:ss -> put ss >> return s }
\end{minted}
(This uses the \h{LambdaCase} extension\todo{reference}, though we
could easily rewrite it without.) \h{str} gets the current list of
tokens, puts it back without the first token, and returns the first
token. Note that there is no case for the empty list. You might
think we want to include a case for the empty token list and have it
return the empty string or something like that. But since the input
will always be properly formatted, if \h{str} is ever called when
there are no tokens left it means our program has a bug---for example,
perhaps we misunderstood the description of the input format. In this
scenario we \emph{want} it to crash loudly, as soon as possible,
rather than continuing on with some bogus data.
We can now add some scanners for reading specific token types other
than \h{String}, simply by mapping the \h{read} function over the output
of \h{str}:
\begin{minted}{haskell}
int :: Scanner Int
int = read <$> str
integer :: Scanner Integer
integer = read <$> str
double :: Scanner Double
double = read <$> str
\end{minted}
%$
Again, these will crash if they see a token in an unexpected format,
and that is a very deliberate choice.
Now, as explained earlier, a very common pattern is to have an
integer $n$ followed by $n$ copies of something. So let's make a
combinator to encapsulate that pattern:
\begin{minted}{haskell}
numberOf :: Scanner a -> Scanner [a]
numberOf s = int >>= flip replicateM s
\end{minted}
\h{numberOf s} expects to first see an \h{Int} value $n$, and then it runs
the provided scanner $n$ times, returning a list of the results.
It's also sometimes useful to have a way to repeat a \h{Scanner} some
unknown number of times until encountering EOF (for example, the input
for some problems doesn't specify the number of test cases up front
the way that \inlinekattis{vote} does). This is similar to the \h{many}
combinator from \h{Alternative}, except that it only stops when the
token stream runs out, not when the given \h{Scanner} fails (remember
that we have no way to handle \h{Scanner} failure other than crashing).
\begin{minted}{haskell}
many :: Scanner a -> Scanner [a]
many s = get >>= \case { [] -> return []; _ -> (:) <$> s <*> many s }
\end{minted}
%$
\h{many s} repeats the scanner \h{s} as many times as it can, returning a
list of the results. In particular it first peeks at the current
token list to see if it is empty. If so, it returns the empty list of
results; if there are more tokens, it runs \h{s} once and then
recursively calls \h{many s}, consing the results together.
Finally, it's quite common to want to parse a specific small number of
something, for example, two double values representing a 2D coordinate pair.
We could just write \h{replicateM 2 double}, but this is common enough
that I find it helpful to define dedicated combinators with short
names:
\begin{minted}{haskell}
two, three, four :: Scanner a -> Scanner [a]
[two, three, four] = map replicateM [2..4]
\end{minted}
\todo{Also mention times / \h{><} and pair}
\todo{Show complete code listing? Or put it in Appendix? In any case,
link to complete code on GitHub.}
\section{Examples}
So what have we gained? Writing the parser for \kattisinline{vote} is now
almost trivial:
\begin{minted}{haskell}
type Election = [Int]
main = interact $ runScanner elections >>> ...
elections :: Scanner [Election]
elections = numberOf (numberOf int)
\end{minted}
%$
In practice I would probably just inline the definition of \h{elections}
directly: \h{interact $ runScanner (numberOf (numberOf int)) >>> ...}
%$
As a slightly more involved example, chosen almost at random, consider
Board Wrapping (\inlinekattis{wrapping}):
\begin{quote}
On the first line of input there is one integer, $N \leq 50$, giving
the number of test cases (moulds) in the input. After this line, $N$
test cases follow. Each test case starts with a line containing one
integer $n, 1 \leq n \leq 600$, which is the number of boards in the
mould. Then $n$ lines follow, each with five floating point numbers
$x,y,w,h,v$ where $0 \leq x,y,w,h \leq 10000$ and $-90^{\circ} < v
\leq 90^{\circ}$. The $x$ and $y$ are the coordinates of the center of
the board and $w$ and $h$ are the width and height of the board,
respectively. $v$ is the angle between the height axis of the board to
the $y$-axis in degrees, positive clockwise.
\end{quote}
Here's how I might set up the input, using \h{Scanner} and a custom
data type to represent boards (in practice I would actually use a
custom data type for 2D points/vectors defined in my geometry library
rather than the type \h{V} defined below; see \pref{chap:geometry}).
\begin{minted}{haskell}
import Scanner
type V = [Double] -- 2D vectors/points
newtype A = A Double -- angle (radians)
-- newtype helps avoid conversion errors
fromDeg :: Double -> A
fromDeg d = A (d * pi / 180)
data Board = Board { boardLoc :: !V, boardDims :: !V, boardAngle :: !A }
board :: Scanner Board
board = Board
<$> two double
<*> two double
<*> ((fromDeg . negate) <$> double)
main = interact $
runScanner (numberOf (numberOf board)) >>> ...
\end{minted}
%$
\section{ByteString}
Using the \h{String} type for input can be convenient, since we get to
use all the standard library list functions, but sometimes \h{String}
isn't good enough.
A good example is the problem Army Strength (Hard)
(\inlinekattis{armystrengthhard}). There are a number of separate
test cases; each test case consists of two lines of positive integers
which record the strengths of monsters in two different armies.
Supposedly the armies will have a sequence of battles, where the
weakest monster dies each time, with some complex-sounding rules about
how to break ties. It sounds way more complicated than it really is,
though; a bit of thought reveals that to find out who wins we really
just need to see which army's maximum-strength monster is strongest.
So our strategy for each test case is to read in the two lists of
integers, find the maximum of each list, and compare. Seems pretty
straightforward, right? Something like this:
\begin{minted}{haskell}
import Control.Arrow
import Data.List.Split
main = interact $
lines >>> drop 1 >>> chunksOf 4 >>>
map (drop 2 >>> map (words >>> map read) >>> solve) >>>
unlines
solve :: [[Int]] -> String
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
LT -> "MechaGodzilla"
_ -> "Godzilla"
\end{minted}
%$
Note we didn't actually use the \h{Scanner} abstraction here, though
we could have; in this case it's actually easier to just ignore the
numbers telling us how many test cases there are and the length of
each line, and just split up the input by lines and go from there.
This seems straightforward enough, but sadly, it results in a Time
Limit Exceeded (TLE) error on the third of three test cases.
Apparently this program takes longer than the allowed 1 second.
What's going on?
If we look carefully at the limits for the problem, we see that there
could be up to $50$ test cases, each test case could have two lists of
length $10^5$, and the numbers in the lists can be up to $10^9$. If
all those are maxed out (as they probably are in the third, secret
test case), we are looking at an input file many megabytes in size.
At that point the time to simply read the input is a big factor.
Reading the input as a \h{String} has a lot of overhead: each
character gets its own cons cell, which form a linked list in memory;
breaking the input into lines and words requires traversing over these
cons cells one by one. We need a representation with less overhead.
Now, if this were a \emph{real} application, we would reach for
\h{Text}, which is made for representing textual information and can
correctly handle Unicode encodings and all that good stuff. However,
this isn't a real application: competitive programming problems
\emph{always} limit the input and output strictly to ASCII, so
characters are synonymous with bytes. Therefore we will commit a
``double no-no'': not only are we going to use \h{ByteString} to
represent text, we're going to use \h{Data.ByteString.Lazy.Char8}
which simply assumes that each 8 bits is one character. As explained
in \pref{chap:XYZ}, however, I think this is one of those things that
is usually a no-no but is completely justified in this context.
Let's start by just replacing some of our string manipulation with
corresponding \h{ByteString} versions:
\begin{minted}{haskell}
import Control.Arrow
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List.Split
main = BS.interact $
BS.lines >>> drop 1 >>> chunksOf 4 >>>
map (
drop 2 >>>
map (BS.words >>> map (BS.unpack >>> read)) >>>
solve
) >>>
BS.unlines
solve :: [[Int]] -> BS.ByteString
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
LT -> BS.pack "MechaGodzilla"
_ -> BS.pack "Godzilla"
\end{minted}
%$
This already helps a lot: this version is actually accepted, taking
0.66 seconds.
But we can do even better: it turns out that \h{read} also has a lot
of overhead, and if we are specifically reading \h{Int} values we can
do it much faster. The \h{ByteString} module comes with a function
\begin{minted}{haskell}
BS.readInt :: BS.ByteString -> Maybe (Int, BS.ByteString)
\end{minted}
Since, in this context, we know we will always get an integer with
nothing left over, we can replace \h{BS.unpack >>> read} with \h{BS.readInt
>>> fromJust >>> fst}. While we're at it, we can also enable the
\h{OverloadedStrings} language extension so that we don't have to wrap
string literals in \h{BS.pack}:
\begin{minted}{haskell}
{-# LANGUAGE OverloadedStrings #-}
import Control.Arrow
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List.Split
import Data.Maybe (fromJust)
main = BS.interact $
BS.lines >>> drop 1 >>> chunksOf 4 >>>
map (drop 2 >>> map (BS.words >>> map readInt) >>> solve) >>>
BS.unlines
where
readInt = BS.readInt >>> fromJust >>> fst
solve :: [[Int]] -> BS.ByteString
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
LT -> "MechaGodzilla" -- automatically wrapped in BS.pack
_ -> "Godzilla"
\end{minted}
Now we're talking---this version completes in a blazing 0.04 seconds!
We can take these principles and use them to make a variant of the
\h{Scanner} module which uses \h{ByteString} instead of
\h{String}, including the use of the \h{readInt} functions to read
\h{Int} values quickly. You can find it
at \url{https://github.com/byorgey/comprog-hs/blob/master/ScannerBS.hs}.
\chapter{Monoids}
\label{chap:monoids}
\chapter{Trees}
\label{chap:trees}
\chapter{Data Structures}
\label{chap:data-structures}
\chapter{Dynamic Programming}
\label{chap:dp}
\chapter{Number Theory}
\label{chap:number-theory}
\chapter{Combinatorics}
\label{chap:combinatorics}
\chapter{Geometry}
\label{chap:geometry}
\chapter{Graphs}
\label{chap:graphs}
\chapter{Strings}
\label{chap:strings}