-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcontent.json
1 lines (1 loc) · 396 KB
/
content.json
1
{"pages":[{"title":"categories","text":"","link":"/categories/index.html"},{"title":"friends","text":"OIer们本校同学/学长little_sun: little_sun LCuter: LCuter’s Blog YXHXianyu:咸鱼鱼的Blog 外省大佬Siyuan: Siyuan’s Blog yzhang: yzhang’s Blog memset0: memset0’s Blog 清华校友学长tyanyuy3125: tyanyuy3125’s Blog clever_jimmy: Clever_Jimmy’s Blog 燃燃: 轻鸢’s Blog 同学捌拾壹 BI1TWG: zirno81’s Blog","link":"/friends/index.html"},{"title":"About","text":"About me一个初二学生,OIer,菜的一匹","link":"/about/index.html"},{"title":"summary","text":"比赛不要妄想写正解,首先保证暴力分拿到,毕竟暴力打好就能进队。 注意题目中的数据范围,避免Runtime Error的情况发生 斜率优化里的$k$是带$i$的项,$x$是带$j$的项,$y$是带$f_j$的项 注意双向边和单向边不要加错","link":"/summary/index.html"},{"title":"tags","text":"","link":"/tags/index.html"},{"title":"Todo","text":"每日刷题记录2021.1.22 洛谷 P4726 【模板】多项式指数函数(多项式 exp) 洛谷 P4725 【模板】多项式对数函数(多项式 ln) 洛谷 P5245 【模板】多项式快速幂 CF600E Lomsat gelral 2021.1.23 洛谷 P5906 【模板】回滚莫队&不删除莫队 JOISC 2014 C 歴史の研究 2021.1.24 洛谷 P6623 [省选联考 2020 A 卷] 树 洛谷 P4238 【模板】多项式乘法逆 正睿 #1720 [21省选day4]逆转函数 20210123校内模拟 T3 集合 set 订正","link":"/plan/index.html"}],"posts":[{"title":"「20190219」赛后总结","text":"总的来说这场比赛打完感触还是蛮深的. 深切体会到了背模板的意义 $T1$ $s$到$t$的路径上所有点显然一定会走到,以$s$为根时$t$子树中的点显然走不到,而其它点都有$\\frac{1}{2}$的概率会走到。 时间复杂度$O(n log_2{n} + m)$ $T2$ n较小时,我们可以直接用线性筛/埃氏筛法求出每个数的最小质因数。 考虑进行容斥。对于每个质数$x$,我们需要求出$1$~$n/x$中不被比$x$小的质数整除的数的个数。一种简单的思路是,对于$x \\leq k$的情况,我们进行常见的枚举子集容斥;对于$x>k$的情况,$n/x$较小,我们就在$n/k$的范围内进行线性筛/埃氏筛法。 注意到进行子集容斥时,枚举子集后贡献形如$(-1)^i·\\frac{n}{S}$,而$\\frac{n}{S}$只有$O(\\sqrt n )$种取值,可以对这个进行记忆化。 复杂度$\\frac{n^{\\frac{3}{4}}}{\\sqrt{log_2 n}}$ $T3$ 点分治统计树上的情况,然后单独考虑经过剩下那条边的答案。 在环上按顺序枚举一个端点,用树状数组维护另一个端点到这条边的距离。 时间复杂度$O(nlog_2n)$ 总的来说,T1T3的思路基本都有,但是因为代码能力有限,写的题太少,没写出来。。。","link":"/2019/02/19/20190219/"},{"title":"AGC022E Median Replace","text":"题目大意你有一个长度为$n$的串$\\texttt{S}$,其中有一些位置上的字符是?,其他的字符则是$0/1$之间的一种 每次可以进行一步操作:选择$3$个连续的字符,并把它们用它们的中位数替换 求有多少种把?替换成$0/1$的方案使得在进行$\\frac{n-1}{2}$次操作后剩下的字符为$1$? 分析我们先考虑如果给定一个符合条件的串,要怎么判定这个串是否合法。 我们维护一个栈,这个栈从栈底到栈顶由一段连续的$1$和一段连续的$0$组成 对于新的一个字符$\\texttt{c}$,我们分$0/1$情况考虑 $c=0$,我们发现连续的$3$个$0$可以被抵消成为$1$个$0$,所以如果原来栈顶有$2$个连续的$0$,那么就把这$3$个$0$抵消掉$2$个变成$1$个$0$,否则直接把这个$0$插入栈顶 $c=1$,如果栈顶是$0$,则可以将这个$0$与$1$抵消(因为再找一个数,$3$个数取中位数的话,结果只与新找的数有关),否则把这个$1$插入栈。(如果栈中已经有了两个$1$,则怎么合并剩下的都是$1$,所以如果栈中已经有了两个$1$就可以忽略新的这个$1$了) 然后我们发现栈中$1$的个数只有$0\\sim2$这$3$种情况,$0$的个数也只有$0\\sim2$这$3$种情况,所以栈的种类数只有$3 \\times 3 = 9$种 现在我们考虑怎么$\\texttt{dp}$:我们可以把当前栈的状态当做$\\texttt{dp}$的状态,设$f[i][j][k]$表示当前处理第$i$位,栈中有$j$个$1$和$k$个$0$,则我们就可以按照上述的方式转移,对于?只要当做$0/1$分别转移一次就可以了。(具体转移可参见代码) 时间复杂度$\\mathcal{O(n)}$ 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int mod = 1e9 + 7;const int MaxN = 3e5 + 10;char s[MaxN];ll n, f[MaxN][3][3];void add(ll &a, ll b) {a += b, ((a > mod) ? (a -= mod) : 0); }int main(){ f[0][0][0] = 1; scanf(\"%s\", s + 1), n = strlen(s + 1); for(int i = 0; i < n; i++) { for(int j = 0; j < 3; j++) { for(int k = 0; k < 3; k++) { if(s[i + 1] != '0') { if(k) add(f[i + 1][j][k - 1], f[i][j][k]); else add(f[i + 1][std::min(j + 1, 2)][k], f[i][j][k]); } if(s[i + 1] != '1') { if(k == 2) add(f[i + 1][j][1], f[i][j][k]); else add(f[i + 1][j][k + 1], f[i][j][k]); } } } } ll ans = 0; for(int i = 0; i < 3; i++) for(int j = 0; j <= i; j++) add(ans, f[n][i][j]); printf(\"%lld\\n\", ans); return 0;}","link":"/2020/03/11/AGC022E/"},{"title":"AGC034E Complete Compress","text":"题目大意有一棵有$n$个节点的树,每个节点上有$0/1$枚棋子,每次可以选择两个棋子并移动到它们的路径上的相邻节点(满足路径长度至少为$2$),求把所有棋子移到同一个节点的最小花费(无解输出$-1$)。 $n \\leq 2 \\times 10 ^ 3$ 分析枚举最后汇聚到的点$\\texttt{root}$,并以$\\texttt{root}$为根建树 我们可以发现,如果存在一个合法的方案,则必然是每次选择不存在祖先关系的两枚棋子,同时向着他们的$\\texttt{lca}$处跳一格,重复若干步,直到所有棋子都在$\\texttt{root}$ 由此我们联想到一个经典模型:有$\\texttt{sum}$个节点被分成了若干个集合,每次要找到不在同一集合的两个节点匹配并抵消。 设$\\texttt{max}$为最大的集合的大小,则当$sum - max \\geq max$时,刚好可以消去$\\lfloor \\frac{sum}{2} \\rfloor$对节点 否则剩下$2 \\times max - sum$个来自最大集合的节点,消去了$sum - max$对 现在我们回到原问题,考虑在$u$处做这个操作,设$f_u$表示在$u$的子树里最多消去了多少对。 我们把所有$u$的子树内的有棋子的节点$v$拆成$dis(v, \\; u)$个节点,则我们有如下转移(仍然设$\\texttt{sum}$为总结点个数,$\\texttt{max}$为最大的集合的大小) $sum - max \\geq max$ ,此时$f_u=\\lfloor \\frac{sum}{2} \\rfloor$ $sum - max < max$,此时需要最大子树$v$内的节点来抵消,此时$f_u=sum-max+ \\min (f_v, \\lfloor \\frac{2 \\times max - sum}{2} \\rfloor )$ 以$\\texttt{root}$为根的情况合法当且仅当$f_{root} = \\frac{\\Sigma_u dis(u, root)}{2}$,同时这也是以$\\texttt{root}$为根的答案 对$\\texttt{root}=1 - n$重复这个$\\texttt{dp}$过程,时间复杂度$\\texttt{O(}n^2\\texttt{)}$ 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 1e4 + 10;const int inf = 0x3f3f3f3f;struct edge{ int next, to;};char s[MaxN];edge e[MaxN];int n, ans, cnt;int a[MaxN], head[MaxN], dis[MaxN], size[MaxN], f[MaxN];void init(){ for (int i = 1; i <= n; i++) dis[i] = size[i] = f[i] = 0;}void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}void dfs(int u, int fa){ size[u] = a[u]; int maxp = 0; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs(v, u), size[u] += size[v], dis[v] += size[v]; dis[u] += dis[v], maxp = ((dis[maxp] > dis[v]) ? maxp : v); } if(!maxp) return (void) (f[u] = 0); if(dis[u] >= 2 * (dis[maxp])) f[u] = (dis[u] / 2); else f[u] = dis[u] - dis[maxp] + std::min(f[maxp], (2 * dis[maxp] - dis[u]) / 2);}int main(){ ans = inf; n = read(), scanf(\"%s\", s + 1); for (int i = 1; i <= n; i++) a[i] = s[i] - '0'; for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v), add_edge(v, u); } for (int i = 1; i <= n; i++) { init(), dfs(i, 0); if(dis[i] & 1) continue; if (f[i] * 2 >= dis[i]) ans = std::min(ans, dis[i] / 2); } printf(\"%d\\n\", (ans == inf) ? -1 : ans); return 0;}","link":"/2020/03/09/AGC034E/"},{"title":"AGC035D Add and Remove","text":"题目大意有一个长度为$n$的序列$\\{a_i\\}$,每次可以选择连续的$3$个数,把中间那个数加到左右两个数上后删除中间那个数。 求最后剩下的两个数的最小值。 $n \\leq 18$ 分析我们发现最后的结果肯定是每一个$a_i$乘上一个系数的和,我们考虑倒着$\\texttt{dp}$ 设$\\texttt{f[l][r][x][y]}$表示当前区间的左右端点分别是$l, \\; r$,$[l, r]$之间的部分已被删除,$a_l$的系数为$x$,$a_r$的系数为$y$,对整个序列贡献的答案。 那么我们模仿区间$\\texttt{dp}$的形式,枚举一个中间点$\\texttt{mid} \\in (l, r)$则$f[l][r][x][y] = \\min\\{f[l][mid][x][x+y]+f[mid][r][x+y][y]+a[mid] \\times (x+y)\\}$ 由于$n$不大,直接搜索即可。 代码12345678910111213141516171819202122232425#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)ll n, a[50];ll f(ll l, ll r, ll x, ll y){ ll ans = 1e18; if(r - l <= 1) return 0; for(int i = l + 1; i <= r - 1; i++) ans = std::min((f(l, i, x, x + y) + f(i, r, x + y, y) + a[i] * (x + y)), ans); return ans;}int main(){ scanf(\"%lld\", &n); for(int i = 1; i <= n; i++) scanf(\"%lld\", &a[i]); printf(\"%lld\\n\", f(1, n, 1, 1) + a[1] + a[n]); return 0;}","link":"/2020/03/15/AGC035D/"},{"title":"「算法笔记」 莫队","text":"前言莫队,可是传说中能够解决所有离线区间问题的神奇算法 引子我们先来看这样一道题: 有一个包含了$n$个数的序列$a_i$ 有$m$次询问,每次询问$[l,r]$区间中有多少个不同的数 $n,m \\leq 5*10^4$ 你会怎么做?暴力? 暴力复杂度是$O(nm)$的,会$T$ 12345678910111213for(int i = 1; i <= m; i++){ int ans = 0; memset(cnt, 0, sizeof(cnt)); for(int j = l[i]; j <= r[i]; j++) { ++cnt[a[i]]; if(cnt[a[i]] == 1) ++ans; } printf(\"%d\\n\", ans);}// 暴力-未优化版 我们来观察一下暴力: 暴力每次询问$[l_i,r_i]$时上一次询问的$[l_{i-1},r_{i-1}]$所存储下来的信息就都被抛弃了 如果我们把上一次查询存储下来的信息再利用呢? 123456789101112131415//前面询问没有排序int l = 1, r = 0, sum = 0;for (int i = 1; i <= m; i++){ while (l > q[i].l) l--, cnt[a[l]]++, sum += ((cnt[a[l]] == 1) ? 1 : 0); while (r < q[i].r) r++, cnt[a[r]]++, sum += ((cnt[a[r]] == 1) ? 1 : 0); while (l < q[i].l) cnt[a[l]]--, sum -= ((cnt[a[l]] == 0) ? 1 : 0), l++; while (r > q[i].r) cnt[a[r]]--, sum -= ((cnt[a[r]] == 0) ? 1 : 0), r--; printf(\"%d\\n\", sum);}// 暴力-优化*1 很不幸,这样写还是会$T$.出题人可以构造数据使你相邻两次查询没有相交项,然后这就成了一个比暴力还劣的算法 但是我们已经离真正的莫队很近了 我们可以把询问分块,把询问依照左端点分成$O(\\sqrt n)$块,块内再按右端点排序。 这样子算法的总复杂度就是$O(n \\sqrt n)$的 但是,这个复杂度要怎么证明呢? 莫队的复杂度分析块内转移 左端点 在同一个块里面,由于左端点都在一个长度为$O(\\sqrt n)$的区间里面所以在同一块里面移动一次,左端点最多变化$O(\\sqrt n)$总共有$m$个询问,那么同一个块里面的左端点变化最多是$O(m\\sqrt n)$的 右端点 由于每个块里面的询问都按右端点排序所以右端点在一个块里面最多变化$n$有 $\\sqrt n$个块,那么右端点移动最多就是$O(n\\sqrt n)$ 跨块转移容易发现这样的转移总共会发生$\\sqrt n$次 左端点 单次跨块转移的复杂度为$O(\\sqrt n)$,总复杂度为$O(\\sqrt n * \\sqrt n)=O(n)$ 右端点 由于跨块时,右端点是无序的,所以在最坏情况下右端点单次转移的复杂度为$O(n)$,总复杂度为$O(n * \\sqrt n)=O(n \\sqrt n)$ 综上所述,莫队算法的复杂度是$O(n\\sqrt n)$(由于$m$与$n$在同一个数量级,所以统一一下就成了$O(n\\sqrt n)$了,毕竟这是渐进时间复杂度) 范例代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657#include <bits/stdc++.h>#define ll long longconst int MaxN = 500010;struct query{ int l, r, id, pos; bool operator<(const query &x) const { if (pos == x.pos) return r < x.r; else return pos < x.pos; }};query q[MaxN];int a[MaxN], n, m, k;int cnt[MaxN << 1], ans[MaxN];inline int read(){ int x = 0; char ch = getchar(); while(ch > '9' || ch < '0') ch = getchar(); while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(); for (int i = 1; i <= n; i++) a[i] = read(); m = read(); int size = (int)pow(n, 0.55); //莫队的块大小不一定要根号n,可以视题目而定 for (int i = 1; i <= m; i++) { q[i].l = read(), q[i].r = read(); q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q + 1, q + m + 1); int l = 1, r = 0, sum = 0; for (int i = 1; i <= m; i++) { while (l > q[i].l) l--, cnt[a[l]]++, sum += ((cnt[a[l]] == 1) ? 1 : 0); while (r < q[i].r) r++, cnt[a[r]]++, sum += ((cnt[a[r]] == 1) ? 1 : 0); while (l < q[i].l) cnt[a[l]]--, sum -= ((cnt[a[l]] == 0) ? 1 : 0), l++; while (r > q[i].r) cnt[a[r]]--, sum -= ((cnt[a[r]] == 0) ? 1 : 0), r--; ans[q[i].id] = sum; } for (int i = 1; i <= m; i++) printf(\"%d\\n\", ans[i]); return 0;} 习题 P2709 小B的询问 普通莫队模板题,建议初学者从这道题入手 P1972 [SDOI2009]HH的项链 普通莫队模板题,建议初学者从这道题入手 P3901 数列找不同 可以转化成莫队来做 P1494 [国家集训队]小Z的袜子 普通莫队模板题 P4137 Rmq Problem / mex 普通莫队模板题,修改有一些思维难度 CF375D Tree and Queries 树上问题转区间问题,可以用莫队解决 SP3267 DQUERY - D-query HH的项链数据弱化版 P4396 [AHOI2013]作业 莫队套分块 P4867 Gty的二逼妹子序列 莫队套分块,是上题第二小问的数据加强版 P3709 大爷的字符串题 区间众数模板题,不要求在线 带修莫队留坑待填 树上莫队留坑待填","link":"/2019/02/10/Algorithm-莫队/"},{"title":"「算法笔记」Dijkstra","text":"前言 $SPFA$算法由于它上限 $O(NM) = O(VE)$的时间复杂度,被卡掉的几率很大.在算法竞赛中,我们需要一个更稳定的算法:$dijkstra$. 什么是$dijkstra$? $dijkstra$是一种单源最短路径算法,时间复杂度上限为$O(n^2)$(朴素),在实际应用中较为稳定$;$加上堆优化之后更是具有$O((n+m)\\log_{2}n)$的时间复杂度,在稠密图中有不俗的表现. $dijkstra$的原理/流程? $dijkstra$本质上的思想是贪心,它只适用于不含负权边的图. 我们把点分成两类,一类是已经确定最短路径的点,称为”白点”,另一类是未确定最短路径的点,称为”蓝点” $dijkstra$的流程如下$:$ $1.$ 初始化$dis[start] = 0,$其余节点的$dis$值为无穷大. $2.$ 找一个$dis$值最小的蓝点$x,$把节点$x$变成白点. $3.$ 遍历$x$的所有出边$(x,y,z),$若$dis[y] > dis[x] + z,$则令$dis[y] = dis[x] + z$ $4.$ 重复$2,3$两步,直到所有点都成为白点$.$ 时间复杂度为$O(n^2)$ $dijkstra$为什么是正确的 当所有边长都是非负数的时候,全局最小值不可能再被其他节点更新.所以在第$2$步中找出的蓝点$x$必然满足$:dis[x]$已经是起点到$x$的最短路径$.$我们不断选择全局最小值进行标记和拓展,最终可以得到起点到每个节点的最短路径的长度 图解 (令$start = 1$) 开始时我们把$dis[start]$初始化为$0$,其余点初始化为$inf$ 第一轮循环找到$dis$值最小的点$1$,将$1$变成白点,对所有与$1$相连的蓝点的$dis$值进行修改,使得$dis[2]=2,dis[3]=4,dis[4]=7$ 第二轮循环找到$dis$值最小的点$2$,将$2$变成白点,对所有与$2$相连的蓝点的$dis$值进行修改,使得$dis[3]=3,dis[5]=4$ 第三轮循环找到$dis$值最小的点$3$,将$3$变成白点,对所有与$2$相连的蓝点的$dis$值进行修改,使得$dis[4]=4$ 接下来两轮循环分别将$4,5$设为白点,算法结束,求出所有点的最短路径 时间复杂度$O(n^2)$ 为什么$dijkstra$不能处理有负权边的情况? 我们来看下面这张图 $2$到$3$的边权为$-4$,显然从$1$到$3$的最短路径为$-2$ $(1->2->3).$但在循环开始时程序会找到当前$dis$值最小的点$3$,并标记它为白点. 这时的$dis[3]=1,$然而$1$并不是起点到$3$的最短路径.因为$3$已经被标为白点,所以$dis[3]$不会再被修改了.我们在边权存在负数的情况下得到了错误的答案. $dijkstra$的堆优化? 观察$dijkstra$的流程,发现步骤$2$可以优化 怎么优化呢? 我会zkw线段树!我会斐波那契堆! 我会堆! 我们可以用堆对$dis$数组进行维护,用$O(\\log_{2}n)$的时间取出堆顶元素并删除,用$O(\\log_{2}n)$遍历每条边,总复杂度$O((n+m)\\log_{2}n)$ 范例代码: 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879#include<bits/stdc++.h>const int MaxN = 100010, MaxM = 500010;struct edge{ int to, dis, next;};edge e[MaxM];int head[MaxN], dis[MaxN], cnt;bool vis[MaxN];int n, m, s;inline void add_edge( int u, int v, int d ){ cnt++; e[cnt].dis = d; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}struct node{ int dis; int pos; bool operator <( const node &x )const { return x.dis < dis; }};std::priority_queue<node> q;inline void dijkstra(){ dis[s] = 0; q.push( ( node ){0, s} ); while( !q.empty() ) { node tmp = q.top(); q.pop(); int x = tmp.pos, d = tmp.dis; if( vis[x] ) continue; vis[x] = 1; for( int i = head[x]; i; i = e[i].next ) { int y = e[i].to; if( dis[y] > dis[x] + e[i].dis ) { dis[y] = dis[x] + e[i].dis; if( !vis[y] ) { q.push( ( node ){dis[y], y} ); } } } }}int main(){ scanf( \"%d%d%d\", &n, &m, &s ); for(int i = 1; i <= n; ++i)dis[i] = 0x7fffffff; for( register int i = 0; i < m; ++i ) { register int u, v, d; scanf( \"%d%d%d\", &u, &v, &d ); add_edge( u, v, d ); } dijkstra(); for( int i = 1; i <= n; i++ ) printf( \"%d \", dis[i] ); return 0;} 例题 入门模板:P3371 进阶模板:P4779 其余例题请右转洛谷 题库,搜索”最短路” 后记 本文部分内容摘自李煜东《算法竞赛进阶指南》和《信息学竞赛一本通》 友情提示:正权图请使用$dijkstra$算法,负权图请使用$SPFA$算法 感谢洛谷 各位管理员提供的平台","link":"/2019/02/06/Algorithm-Dijkstra/"},{"title":"CF1063B 【Labyrinth】","text":"一道锻炼代码能力的好题 只要bfs一下,向四个方向搜索,剪下枝,就A了(好像还跑的蛮快?) Code: 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667#include <bits/stdc++.h>#define check(x, y) (x >= 0 && x < n && y >= 0 && y < m)//判断是否越界const int MaxN = 2010;const int dx[] = {0, 1, -1, 0}, dy[] = {-1, 0, 0, 1};//bfs方向数组struct p{ int x, y; int cntx, cnty;};int ans;int n, m, x, y, limx, limy;std::string s[MaxN];int vis[MaxN][MaxN];int disx[MaxN][MaxN], disy[MaxN][MaxN];void bfs(int x, int y){ memset(disx, 0x3f, sizeof(disx)); memset(disy, 0x3f, sizeof(disy) ); std::queue<p> q; q.push((p){x, y, 0, 0}); disx[x][y] = disy[x][y] = 0; while (!q.empty()) { p tmp = q.front(); q.pop(); x = tmp.x, y = tmp.y; for (int i = 0; i <= 3; i++) { int nx = x + dx[i], ny = y + dy[i]; if (!check(nx, ny) || s[nx][ny] == '*')//当前位置是否合法 continue; int cntx = tmp.cntx + bool(dy[i] == -1), cnty = tmp.cnty + bool(dy[i] == 1);//计算向左/右走步数 if (cntx < std::min(disx[nx][ny], limx + 1) || cnty < std::min(disy[nx][ny], limy + 1))//判断,剪枝 { disx[nx][ny] = cntx; disy[nx][ny] = cnty;//更新向左/右走步数 q.push((p){nx, ny, cntx, cnty}); } } }}int main(){ scanf(\"%d%d\", &n, &m); scanf(\"%d%d\", &x, &y), --x, --y; scanf(\"%d%d\", &limx, &limy); for (int i = 0; i < n; i++) std::cin >> s[i]; bfs(x, y); for (int i = 0; i < n; i++) for (int j = 0; j < m; j++) if (disx[i][j] <= limx && disy[i][j] <= limy) ++ans;//统计答案 printf(\"%d\\n\", ans); return 0;}","link":"/2019/02/06/CF1063B/"},{"title":"CF1285D Dr. Evil Underscores","text":"题目大意你有一个数组${a_n}$,求一个数$x$ ,满足$\\max{a_i \\oplus x}$最小,输出这个最小值 题目分析首先看到xor操作我们可以想到01trie, 接下来我们来分析如何用01trie计算答案 容易发现这样一个结论:如果一个节点的两个孩子都存在,那么这一位的异或值只能是$1$ 设当前节点为$\\texttt{now}$, 当前节点处在第$\\texttt{dep}$位, 我们可以考虑这样一个过程 1.递归计算当前节点的孩子$\\texttt{ch[now][0], ch[now][1]}$的答案$\\texttt{ans[0], ans[1]}$,并求得$ans= \\min \\{ans[0], ans[1] \\}$ 2.若当前节点的两个孩子都存在,返回$ans+2^{dep}$,否则返回$ans$ 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 5e6 + 10;const int lim = (1 << 29);int n, cnt = 1, a[MaxN], num[MaxN], ch[MaxN][2];inline void insert(int x){ int now = 1; for (int i = lim; i; i >>= 1) { if (ch[now][!!(x & i)]) now = ch[now][!!(x & i)]; else ++num[now], now = ch[now][!!(x & i)] = ++cnt; }}int query(int x, int k){ if (!num[x]) return 0; int ans = 1e9; if (ch[x][0]) ans = std::min(ans, query(ch[x][0], k >> 1)); if (ch[x][1]) ans = std::min(ans, query(ch[x][1], k >> 1)); return ((num[x] == 1) ? ans : (ans + k));}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(); for (int i = 1; i <= n; i++) a[i] = read(), insert(a[i]); printf(\"%d\\n\", query(1, lim)); return 0;}","link":"/2020/01/13/CF1285D/"},{"title":"CF1141E Superhero Battle","text":"题目大意有一个有着$h$点血量的boss,你的每一个回合有$n$种攻击 第$i$种攻击可以对boss造成$-d[i]$的伤害($h=h+d[i]$) 求最早在什么时候能击败boss(即boss血量$\\leq0$) 题解首先我们发现,回合数越少越好(废话) 怎么让回合数最小呢? 我们发现,让boss剩下最多(但是在一个回合内能够击杀)的血量时回合数最少 然后就没有然后了。。。 把boss的血量压到一个回合能击杀的范围内然后枚举即可 具体见代码(注意开long long!) Code123456789101112131415161718192021222324252627#include <bits/stdc++.h>#define R register#define ll long long#define cmin(a, b) ((a < b) ? a : b)#define cmax(a, b) ((a < b) ? b : a)const int MaxN = 2e5 + 10;typedef std::pair<int, int> pa;ll h, n;ll d[MaxN], sum[MaxN];int main(){ scanf(\"%lld%lld\", &h, &n); for (int i = 1; i <= n; i++) scanf(\"%lld\", &d[i]), sum[i] = sum[i - 1] + d[i]; for (int i = 1; i <= n; i++) if (h + sum[i] <= 0) return 0 * printf(\"%d\\n\", i); if (sum[n] >= 0) return 0 * printf(\"-1\"); ll min = *std::min_element(sum + 1, sum + n + 1); ll cnt = (h + min - 1) / (-sum[n]) + 1; h += cnt * sum[n]; for (int i = 1; i <= n; i++) if (h + sum[i] <= 0) return 0 * printf(\"%lld\\n\", i + cnt * n); return 0;}","link":"/2019/03/21/CF1141E/"},{"title":"CF1285E Delete a Segment","text":"题目大意你有$n$个区间$[l_i, r_i]$, 你要恰好删掉一个区间,使得剩下的$n-1$个区间的并的总和最多 eg. [1,2], [3,5], [3,7]的并是[1,2], [3,7] 题目分析首先我们将给定的区间进行离散化 由于区间端点相邻的区间并不算相交,所以离散化时要进行特殊处理$(x=x*2-1)$ 然后本题就变成了这样一个问题: 1.对于所有$[l_i, r_i]$, 把对应区间的数值$a_i$全部加上1,并统计此时所有区间并的个数$num$ 2.对于每个$[l_i, r_i]$, 求出该区间内满足$a_i=1$的连续段个数,并统计最大值$ans$ 那么,$ans+num$即为答案 对于步骤$1$,可以用差分求出;对于步骤$2$, 可以用前缀和求出(注意特判开头和结尾相等的情况) 代码12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 1e6 + 10;std::map<int, int> m1, m2;int n, cnt, l[MaxN], r[MaxN], a[MaxN], s[MaxN];inline void prework(){ m1.clear(), m2.clear(); for (int i = 1; i <= n; i++) m1[l[i]] = m1[r[i]] = 1; cnt = 0; for (std::map<int, int>::iterator it = m1.begin(); it != m1.end(); it++) m2[it->first] = ++cnt; for (int i = 1; i <= n; i++) l[i] = m2[l[i]] * 2 - 1, r[i] = m2[r[i]] * 2 - 1;}inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}int main(){ int T = read(); while (T--) { n = read(), cnt = 0; for (int i = 1; i <= n; i++) l[i] = read(), r[i] = read(); prework(), cnt = cnt * 2 - 1; for (int i = 1; i <= n; i++) a[l[i]]++, a[r[i] + 1]--; for (int i = 1; i <= cnt; i++) a[i] += a[i - 1]; int num = 0, ans = -1000000; for (int i = 1; i <= cnt; i++) { s[i] = s[i - 1]; num += (a[i] && !a[i - 1]); if (a[i] > 1 && a[i - 1] <= 1) ++s[i]; } for (int i = 1; i <= n; i++) { int cur = s[r[i]] - s[l[i] - 1] + ((a[l[i]] > 1) && (a[l[i] - 1] > 1)) - 1; ans = std::max(ans, cur); } printf(\"%d\\n\", num + ans); for (int i = 0; i <= cnt * 2; i++) a[i] = s[i] = 0; } return 0;}","link":"/2020/01/13/CF1285E/"},{"title":"CF1299C Water Balance","text":"简要题意你有一个序列$\\{a\\}$,你的每次操作可以把一段区间里的数全部变成这个区间的平均数,求能得到的字典序最小的序列 分析我们发现字典序最小时显然$\\{a_i\\}$单调不降 那么我们可以维护一个单调栈,栈里维护的值$\\{l,r,avr\\}$,表示该段区间的左端点、右端点、平均值 每次插入$\\{i,i,a_i\\}$,然后暴力循环取出栈顶的两个节点$x, y$($y$在$x$后插入),如果满足$y.avr<x.avr$则合并$x,y$,直到不能合并为止(具体细节可以看代码) 最后把栈中记录的值输出即可 代码123456789101112131415161718192021222324252627282930313233343536373839404142#include <bits/stdc++.h> #define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod) const int MaxN = 1e6 + 10; struct node{ int l, r; double sum, len;}; int n;double a[MaxN];std::vector<node> vec; int main(){ int tmp; scanf(\"%d\", &n); for (int i = 1; i <= n; i++) scanf(\"%d\", &tmp), a[i] = tmp; for (int i = 1; i <= n; i++) { vec.push_back((node){i, i, a[i], 1}); while (vec.size() >= 2 && (vec[vec.size() - 2].sum / vec[vec.size() - 2].len) >= (vec[vec.size() - 1].sum / vec[vec.size() - 1].len)) { node x = (node){vec[vec.size() - 2].l, vec[vec.size() - 1].r, vec[vec.size() - 2].sum + vec[vec.size() - 1].sum, vec[vec.size() - 1].r - vec[vec.size() - 2].l + 1.0}; vec.pop_back(), vec.pop_back(), vec.push_back(x); } } for (int i = 0; i < vec.size(); i++) { for (int j = vec[i].l; j <= vec[i].r; j++) a[j] = (vec[i].sum / vec[i].len); } for (int i = 1; i <= n; i++) printf(\"%.10lf\\n\", a[i]); return 0;}","link":"/2020/02/11/CF1299C/"},{"title":"CF1182E Product Oriented Recurrence","text":"题目大意给你$n,f_1,f_2,f_3,c$,让你求$f_n=c^{2n-6} \\times f_{n-1} \\times f_{n-2} \\times f_{n-3}$ 解析首先我们可以发现: $f_n \\times c^n = c^{3n-6} \\times f_{n-1} \\times f_{n-2} \\times f_{n-3}= c^{n-1} \\times f_{n-1} \\times c^{n-2} \\times f_{n-2} \\times c^{n-3} \\times f_{n-3}$ 设 $g_n=f_n*c^n$,则有$g_n=g_{n-1} \\times g_{n-2} \\times g_{n-3}$ 把它展开,发现 g_n=g_{n-1} \\times g_{n-2} \\times g_{n-3}=g_{n-2}^2 \\times g_{n-3}^2 \\times g_{n-4} = g_{n-3}^4 \\times g_{n-4}^3 \\times g_{n-5}^2 = \\cdots = g_3^{h_{2n-5}} \\times g_2^{h_{2n-6}} \\times g_1^{h_{2n-7}}于是我们的工作就变成了求$h_n$ 观察oeis一下这个式子我们可以发现: h_{2n}=h_{2n-1}+h_{2n-3},h_{2n+1}=h_{2n-1}+h_{2n-2},我们设$a_n=h_{2n+1},b_n=h_{2n}$,则可以得出如下矩阵 我们又注意到$mod=10^9+7$,于是我们可以把次数模上$\\phi(mod)=10^9+6$(欧拉定理) 于是我们就可以快乐的用矩快计算次数啦 注意最后的$f_n=\\frac{g_n}{c^n}$哦 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192#include <bits/stdc++.h>#define R register#define ll unsigned long long#define sum(a, b, mod) ((a + b) % mod)const ll mod = 1e9 + 7, mode = 1e9 + 6;ll fast_mul(ll a, ll b, ll mod){ ll ans = 0; while (b) { if (b & 1) ans = (ans + a) % mod; a = (a + a) % mod; b >>= 1; } return ans;}ll fast_pow(ll a, ll b){ ll ret = 1; while (b) { if (b & 1) ret = fast_mul(ret, a, mod) % mod; a = fast_mul(a, a, mod) % mod; b >>= 1; } return ret;}struct matrix{ ll n, m, a[10][10]; matrix(ll x, ll y) { n = x, m = y; memset(a, 0, sizeof(a)); }};matrix mul(matrix a, matrix b){ ll n = a.n, m = a.m, k = b.m; matrix c(n, k); for (ll i = 1; i <= n; i++) for (ll l = 1; l <= k; l++) for (ll j = 1; j <= m; j++) c.a[i][j] = (c.a[i][j] + fast_mul(a.a[i][l], b.a[l][j], mode)) % mode; return c;}matrix pow_(matrix a, ll b){ matrix ret = a; while (b) { if (b & 1ll) ret = mul(ret, a); a = mul(a, a); b >>= 1ll; } return ret;}inline matrix init(){ matrix a(3, 3); a.a[1][1] = a.a[1][3] = a.a[2][1] = a.a[3][1] = a.a[3][2] = 1; return a;}inline matrix init_(){ matrix a(3, 1); a.a[1][1] = a.a[2][1] = a.a[3][1] = 1; return a;}int main(){ ll f[4], n, c; std::cin >> n >> f[1] >> f[2] >> f[3] >> c; matrix a = init(), b = init_(), x = pow_(a, n - 4), y = pow_(a, n - 4), z = pow_(a, n - 5); x = mul(x, b), y = mul(y, b), z = mul(z, b); ll A = x.a[1][1], B = y.a[3][1], C = z.a[1][1], X = fast_mul(f[3], fast_pow(c, 3), mod), Y = fast_mul(f[2], fast_pow(c, 2), mod), Z = fast_mul(f[1], c, mod); ll ans = fast_mul(fast_mul(fast_mul(fast_pow(X, A), fast_pow(Y, B), mod), fast_pow(Z, C), mod), fast_pow(fast_pow(c, n), mod - 2llu), mod); std::cout << ans; return 0;}","link":"/2019/10/18/CF1182E/"},{"title":"CF1316D Nash Matrix","text":"题目大意有一个$n \\times n$大小的棋盘,棋盘的每个格子上有一个字母(是U,L,R,D,X中之一),其中U表示向上走,D表示向下走,L表示向左走,R表示向右走,X表示走到这个格子就停止。 现在给你$n ^ 2$个坐标$(x_{i,j}, y_{i, j})$表示从$(i, j)$出发能走到的位置(如果无限循环则为$-1$),你需要构造出这个棋盘,或者输出INVALID,$n \\leq 10^3$ 分析容易发现一个性质:所有终点相同的点形成独立的联通块 证明很显然,如果$A$的终点是$(x_1, y_1)$,$B$的终点是$(x_2,y_2)$($(x_1,y_1) \\not= (x_2, y_2)$),且$A$有边连到$B$的话,那么$A$的终点就不可能是$(x_1, y_1)$,而会是$(x_2,y_2)$与题设矛盾。 那么问题就好解决了 我们首先忽略掉死循环的情况,对于一个(非死循环)联通块,我们可以从这个联通块的终点(即$(i,j)=(x_{i,j},y_{i,j})$的点)向外开始$\\texttt{DFS}$,遍历所有与他相邻的点并记录答案。 而对于死循环的情况,显然如果单独的一个点死循环的话肯定是不可能的,这时候输出INVALID即可 否则我们枚举两个相邻的点作为起点,把这两个点连成双元环,然后分别从这两个点开始$\\texttt{DFS}$,遍历所有在不经过其中一个点的情况下能走到的所有点并记录答案(详情参见代码) 最后,如果有某一个点没有被遍历到的话则输出INVALID,否则就输出VALID并输出前文处理出的答案 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121#include <bits/stdc++.h> #define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define check(x, y) ((x > 0) && (x <= n) && (y > 0) && (y <= n)) const int MaxN = 1e3 + 10;const char op[] = {'U', 'L', 'D', 'R', 'X'};const int dx[] = {1, 0, -1, 0}, dy[] = {0, 1, 0, -1}; int n, vis[MaxN][MaxN];char ans[MaxN][MaxN];std::vector<std::pair<int, int>> v;std::pair<int, int> a[MaxN][MaxN]; int nxt(int x, int y, int ex, int ey){ for (int i = 0; i <= 3; i++) if (x + dx[i] == ex && y + dy[i] == ey) return i; return -1;} inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);} void dfs(int x, int y, int Dx) // 正常联通块求解{ if (ans[x][y]) return; ans[x][y] = op[Dx]; for (int i = 0; i <= 3; i++) { int ex = x + dx[i], ey = y + dy[i]; if (check(ex, ey) && a[ex][ey] == a[x][y]) dfs(ex, ey, i); }} void get(int x, int y) // 求死循环联通块大小{ vis[x][y] = 1, v.push_back(std::make_pair(x, y)); for (int i = 0; i <= 3; i++) { int ex = x + dx[i], ey = y + dy[i]; if (check(ex, ey) && a[ex][ey].first == -1 && a[ex][ey].second == -1 && !vis[ex][ey]) get(ex, ey); }} void Dfs(int x, int y, int banx, int bany, int Dx) // 死循环联通块遍历{ if (ans[x][y]) return; ans[x][y] = op[Dx]; for (int i = 0; i <= 3; i++) { int ex = x + dx[i], ey = y + dy[i]; if (check(ex, ey) && (a[ex][ey].first == -1 && a[ex][ey].second == -1) && (ex != banx || ey != bany)) Dfs(ex, ey, banx, bany, i); }} int main(){ n = read(); for (int i = 1; i <= n; i++) { for (int j = 1; j <= n; j++) a[i][j].first = read(), a[i][j].second = read(); } for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) if (a[i][j].first == i && a[i][j].second == j) dfs(i, j, 4); for (int i = 1; i <= n; i++) { for (int j = 1; j <= n; j++) { if (a[i][j].first == -1 && a[i][j].second == -1 && !vis[i][j]) { v.clear(), get(i, j); if (v.size() == 1) return 0 * printf(\"INVALID\"); for (int k = 1; k < v.size(); k++) { int x = nxt(v[k - 1].first, v[k - 1].second, v[k].first, v[k].second); if (~x) { Dfs(v[k].first, v[k].second, v[k - 1].first, v[k - 1].second, x); x = nxt(v[k].first, v[k].second, v[k - 1].first, v[k - 1].second); Dfs(v[k - 1].first, v[k - 1].second, v[k].first, v[k].second, x); break; } } } } } for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) if (!ans[i][j]) return 0 * printf(\"INVALID\"); puts(\"VALID\"); for (int i = 1; i <= n; i++) { for (int j = 1; j <= n; j++) putchar(ans[i][j]); puts(\"\"); } return 0; }","link":"/2020/03/07/CF1316D/"},{"title":"CF1336B Xenia and Colorful Gems","text":"题目大意你有$3$个数组,分别是$\\texttt{r, g, b}$,长度分别是$n_r, n_b, n_g$ 你需要在这三个数组中选择一个数,设你选择的三个数为$x, y, z$,则你要使$(x-y)^2+(y-z)^2+(z-x)^2$最小 多组数据,$1 \\leq n_r, n_b, n_g \\leq 10^5$,值域$1 \\leq r_i, b_i, g_i \\leq 10^9$ 分析首先我们考虑枚举其中一个数,假设我们枚举的是$r$数组。 对于一个数$x$,在某一个数组(假设是$z$)满足$(x-y)^2$最小的$y$一定是$x$在$z$内的前驱或后继。 于是对于每一个$r_i$,我们找到$r_i$在$g$中的前驱和后继,记为$gr_0, gr_1$,再找到$gr_0, gr_1$在$b$里的前驱后继,记为$bl_0, bl_1, bl_2, bl_3$,那么答案就在$f(r_i, gr_0, bl_0), f(r_i, gr_0, bl_1), \\cdots, f(r_i, gr_1, bl_3)$中。(其中$f(x, y, z)$表示$1 \\leq n_r, n_b, n_g \\leq 10^5$) 但是这种可能会有缺漏, 我们可能漏掉了某些情况,例如下面这个例子 1234512 2 21 23 46 7 。我们会选择$2, 3, 6$而正确答案应该是$2, 4, 6$。 于是,我们考虑对$r, g, b$都做一遍上面这个过程,就能得到正确的答案了。 时间复杂度$\\mathcal{O}(n \\log n)$ 代码12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394#include <bits/stdc++.h>#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int Max = 2e9;const int Min = -1e9;const int MaxN = 2e5 + 10;ll n, m, k, ans;std::set<ll> R, G, B;ll r[MaxN], g[MaxN], b[MaxN];ll sqr(ll x) { return x * x; }ll sub(ll x, std::set<ll> &y){ if (x == 2000000000) return x; return *y.upper_bound(x);}ll pre(ll x, std::set<ll> &y){ ll res = *y.lower_bound(x); if (x == -1000000000) return x; if (res > x) res = *(--y.lower_bound(x)); return res;}void init(){ ans = 0x7f7f7f7f7f7f7f7fll; R.clear(), R.insert(-1e9), R.insert(2e9); G.clear(), G.insert(-1e9), G.insert(2e9); B.clear(), B.insert(-1e9), B.insert(2e9);}int main(){ int T; scanf(\"%d\", &T); while (T--) { init(); scanf(\"%lld%lld%lld\", &n, &m, &k); for (int i = 1; i <= n; i++) scanf(\"%lld\", &r[i]), R.insert(r[i]); for (int i = 1; i <= m; i++) scanf(\"%lld\", &g[i]), G.insert(g[i]); for (int i = 1; i <= k; i++) scanf(\"%lld\", &b[i]), B.insert(b[i]); for (int i = 1; i <= n; i++) { ll gr[2] = {}, bl[4] = {}; gr[0] = pre(r[i], G), gr[1] = sub(r[i], G); bl[0] = pre(gr[0], B), bl[1] = sub(gr[0], B); bl[2] = pre(gr[1], B), bl[3] = sub(gr[1], B); for (int j = 0; j < 2; j++) { for (int l = 0; l < 4; l++) if (gr[j] != Max && gr[j] != Min && bl[l] != Max && bl[l] != Min) ans = std::min(ans, sqr(r[i] - gr[j]) + sqr(gr[j] - bl[l]) + sqr(bl[l] - r[i])); } } for (int i = 1; i <= m; i++) { ll gr[2] = {}, bl[4] = {}; gr[0] = pre(g[i], R), gr[1] = sub(g[i], R); bl[0] = pre(gr[0], B), bl[1] = sub(gr[0], B); bl[2] = pre(gr[1], B), bl[3] = sub(gr[1], B); for (int j = 0; j < 2; j++) { for (int l = 0; l < 4; l++) if (gr[j] != Max && gr[j] != Min && bl[l] != Max && bl[l] != Min) ans = std::min(ans, sqr(g[i] - gr[j]) + sqr(gr[j] - bl[l]) + sqr(bl[l] - g[i])); } } for (int i = 1; i <= k; i++) { ll gr[2] = {}, bl[4] = {}; gr[0] = pre(b[i], R), gr[1] = sub(b[i], R); bl[0] = pre(gr[0], G), bl[1] = sub(gr[0], G); bl[2] = pre(gr[1], G), bl[3] = sub(gr[1], G); for (int j = 0; j < 2; j++) { for (int l = 0; l < 4; l++) if (gr[j] != Max && gr[j] != Min && bl[l] != Max && bl[l] != Min) ans = std::min(ans, sqr(b[i] - gr[j]) + sqr(gr[j] - bl[l]) + sqr(bl[l] - b[i])); } } printf(\"%lld\\n\", ans); } return 0;}","link":"/2020/04/16/CF1336B/"},{"title":"CF1419F Rain of Fire","text":"显然这题答案具有单调性,现在我们考虑给定一个$T$怎么check 首先我们可以把所有不通过新点就可以互达的点合并成一个联通块。 容易发现当联通块个数$=1$时肯定有解,$>4$时无解,剩下情况我们进行分类讨论 1. 有两个联通块如果我们能找到两个点位于不同的联通块,并且他们在一条直线上且曼哈顿距离<=2T 或 两点的x,y坐标绝对值之差均小于T则有解 2. 有三个联通块首先我们把所有在一条直线上相邻并且不属于一个联通块的点对(a, b)处理出来,对于每个点对可以找到点c与这两个点所属联通块都不同,若c到直线距离,a、b到c在线段上的投影距离均小于T,则合法 3. 有四个联通块将找点c换成找点对(c,d)满足线段(a,b),(c,d)垂直,且a,b,c,d分属四个不同的联通块,接下来做法与3个联通块类似 时间复杂度O($n^2\\log_2{2\\times 10^9}$) 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192#include <bits/stdc++.h> #define R register#define ll long long#define pir std::pair<ll, ll>#define mp(i, j) std::make_pair(i, j)#define sum(a, b, mod) (((a) + (b)) % mod)#define It std::map<ll, std::vector<ll>>::iterator const ll MaxN = 1e3 + 10; ll n, x[MaxN], y[MaxN], f[MaxN];std::unordered_map<ll, ll> trash;std::map<ll, std::vector<ll>> lx, ly;std::vector<std::pair<ll, ll>> X, Y, line; ll Abs(ll x) { return (x < 0) ? (-x) : x; }ll cmpx(ll a, ll b) { return x[a] < x[b]; }ll cmpy(ll a, ll b) { return y[a] < y[b]; } ll getf(ll x){ if (x != f[x]) f[x] = getf(f[x]); return f[x];} void merge(ll x, ll y){ ll fx = getf(x), fy = getf(y); if (fx != fy) f[fx] = fy;} inline ll read(){ ll x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);} ll check(ll mid){ trash.clear(); for (ll i = 1; i <= n; i++) f[i] = i; for (It it = lx.begin(); it != lx.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (Abs(y[v[i]] - y[v[i - 1]]) <= mid) merge(v[i], v[i - 1]); } for (It it = ly.begin(); it != ly.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (Abs(x[v[i]] - x[v[i - 1]]) <= mid) merge(v[i], v[i - 1]); } ll siz = 0; for (ll i = 1; i <= n; i++) // getf(i), printf(\"%lld \", f[i]), siz += (trash[getf(i)] == 0), trash[getf(i)] = 1; // printf(\"# %lld %lld\\n\", mid, siz); if (siz == 1) return 1; else if (siz == 2) { for (ll i = 1; i <= n; i++) for (ll j = i + 1; j <= n; j++) if (f[i] != f[j]) { if (Abs(x[i] - x[j]) == 0 && Abs(y[i] - y[j]) <= 2 * mid) return 1; if (Abs(y[i] - y[j]) == 0 && Abs(x[i] - x[j]) <= 2 * mid) return 1; if (Abs(x[i] - x[j]) <= mid && Abs(y[i] - y[j]) <= mid) return 1; } } else if (siz == 3) { line.clear(); for (It it = lx.begin(); it != lx.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (f[v[i]] != f[v[i - 1]]) line.push_back(mp(v[i], v[i - 1])); } for (It it = ly.begin(); it != ly.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (f[v[i]] != f[v[i - 1]]) line.push_back(mp(v[i], v[i - 1])); } for (auto li : line) { ll l = li.first, r = li.second; for (ll i = 1; i <= n; i++) { if (f[l] != f[i] && f[i] != f[r]) { if (x[l] == x[r]) { if (Abs(y[l] - y[i]) <= mid && Abs(y[r] - y[i]) <= mid) if (Abs(x[l] - x[i]) <= mid) return 1; } else if (Abs(x[l] - x[i]) <= mid && Abs(x[r] - x[i]) <= mid) if (Abs(y[l] - y[i]) <= mid) return 1; } } } } else if (siz == 4) { X.clear(), Y.clear(); for (It it = lx.begin(); it != lx.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (f[v[i]] != f[v[i - 1]]) X.push_back(mp(v[i], v[i - 1])); } for (It it = ly.begin(); it != ly.end(); it++) { std::vector<ll> &v = it->second; for (ll i = 1; i < v.size(); i++) if (f[v[i]] != f[v[i - 1]]) Y.push_back(mp(v[i], v[i - 1])); } for (auto x1 : X) { for (auto y1 : Y) { ll a = x1.first, b = x1.second; ll c = y1.first, d = y1.second; if (f[a] != f[c] && f[a] != f[d]) if (f[b] != f[c] && f[b] != f[d]) if (Abs(x[a] - x[c]) <= mid && Abs(x[a] - x[d]) <= mid) if (Abs(y[c] - y[a]) <= mid && Abs(y[c] - y[b]) <= mid) return 1; } } } return 0;} signed main(){ n = read(); for (ll i = 1; i <= n; i++) { x[i] = read(), y[i] = read(); lx[x[i]].push_back(i); ly[y[i]].push_back(i); } for (It it = lx.begin(); it != lx.end(); it++) { std::vector<ll> &v = it->second; std::sort(v.begin(), v.end(), cmpy); } for (It it = ly.begin(); it != ly.end(); it++) { std::vector<ll> &v = it->second; std::sort(v.begin(), v.end(), cmpx); } ll l = 1, r = 0x7f7f7f7f; while (l < r) { ll mid = (l + r) >> 1; if (check(mid)) r = mid; else l = mid + 1; } if (l == 0x7f7f7f7f) l = -1; printf(\"%lld\\n\", l); return 0;}","link":"/2021/01/18/CF1419F/"},{"title":"CF1454F Array Partition","text":"题目大意给定一个长度为 $n$ 的序列 $a$ ,要求将其划分为三个非空字串,长度分别为 $x, y, z$ ,满足: \\max_{i=1}^x a_i = \\min_{i=x+1}^{x+y}a_i = \\max_{i=x+y+1}^n a_i若存在方案,输出 $\\texttt{YES}$ 和任意一组 $x, y, z$ 的值;若不存在,输出 $\\texttt{NO}$。 $3 \\leq n \\leq 2 * 10^5, 1 \\leq a_i \\leq 10^9$ 题目分析已经有很多题解做法是枚举 $x$ 了,这里提供一种另类的做法。 我们考虑对于 $\\forall i \\in [2,n-1]$ 的$i$,检测 $a_i$ 作为 $\\min_{i=x+1}^{x+y}a_i$ 是否可行。 首先我们可以预处理出每个 $i$ 左右比他小的第一个数,可在 $ \\Theta(n) $ 时间内使用单调栈处理出。 接着我们预处理出前缀和后缀最大值,这样就可以用二分处理出以 $a_i$ 为最大值的最长/最短前/后缀长度。 在处理出这些东西后,我们就可以根据这些信息来快速判断一个位置是否可行(具体实现参见代码)。 由于我们只要输出任意一组答案,故我们可以在找到可行的 $i$ 之后枚举 $x, y$ ,总时间复杂度 $\\Theta(n \\log n)$。 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 4e5 + 10;std::vector<int> v;std::unordered_map<int, int> cnt;int premax[MaxN], sufmax[MaxN];int n, flag, a[MaxN], l[MaxN], r[MaxN];void init(){ cnt.clear(); for(int i = 0; i <= n + 9; i++) l[i] = r[i] = premax[i] = sufmax[i] = 0;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}signed main(){ int T = read(); while (T--) { n = read(), flag = 0, init(); for (int i = 1; i <= n; i++) a[i] = read(), cnt[a[i]] += 1, premax[i] = std::max(premax[i - 1], a[i]); v.clear(), v.push_back(0); for(int i = 1; i <= n; i++) { while(v.size() && a[v.back()] >= a[i]) v.pop_back(); l[i] = v.back(), v.push_back(i); } v.clear(), v.push_back(n + 1); for(int i = n; i; i--) { while(v.size() && a[v.back()] >= a[i]) v.pop_back(); r[i] = v.back(), v.push_back(i); } for(int i = n; i; i--) sufmax[i] = std::max(sufmax[i + 1], a[i]); std::reverse(sufmax + 1, sufmax + n + 1); for(int i = 2; i < n; i++) { int l1 = std::lower_bound(premax + 1, premax + n + 1, a[i]) - premax; int r1 = std::upper_bound(premax + 1, premax + n + 1, a[i]) - premax - 1; int r2 = n + 1 - (std::lower_bound(sufmax + 1, sufmax + n + 1, a[i]) - sufmax); int l2 = n + 1 - (std::upper_bound(sufmax + 1, sufmax + n + 1, a[i]) - sufmax) + 1; // meow(\"$ %d %d %d %d %d %d %d %d\\n\", i, a[i], l1, r1, l2, r2, l[i], r[i]); if(l[i] > r1 || l1 > r[i] || l[i] > r2 || l2 > r[i] || l1 >= i || r2 <= i || l1 > r1 || l2 > r2 || cnt[a[i]] < 3) { /*meow(\"-1\\n\");*/ continue; } // if(premax[l1] != a[i] || premax[r1] != a[i] || // sufmax[l2] != a[i] || sufmax[r2] != a[i]) { meow(\"-2\\n\"); continue; } int a = 0, b = 0; for(int j = i - 1; j >= l[i]; j--) if(l1 <= j && j <= r1) if(l[i] <= j + 1 && j + 1 <= r[i]) { a = j; break; } for(int j = i - a; a + j <= r[i]; j++) if(l2 <= j + a + 1 && j + a + 1 <= r2) if(l[i] <= j + a && j + a <= r[i]) { b = j; break; } // if((!(a < i && i <= a + b)) || a == 0 || b == 0) { /*meow(\"%d %d -3\\n\", a, b);*/ continue; } printf(\"YES\\n%d %d %d\\n\", a, b, n - a - b), flag = 1; break; } if(!flag) puts(\"NO\"); } return 0;}","link":"/2021/09/04/CF1454F/"},{"title":"CF1468M Similar Sets","text":"题目大意你有 $n$ 个序列,每个序列里有一些元素。每个序列中的元素互不相同,但不同序列中的元素可以相同。 定义两个序列 $A, B$ 是相似的,如果存在两个不同的整数 $x, y$ ,满足 $x, y \\in A, x, y \\in B$。 现在你要找出任意一对相似的序列,或者输出无解。 $1 \\leq n, \\sum k_i \\leq 10^5$,其中 $k_i$ 表示第 $i$ 个序列的元素个数 题目分析直接做不太好做,我们考虑根号分治。设 $m$ 表示 $\\sum k_i$,考虑到 $m,n$ 在同一数量级,我们下文复杂度分析中的 $m$ 统一用 $n$ 代替。 首先我们考虑那些大小超过 $\\sqrt n$ 的序列,枚举每一个大小超过 $\\sqrt n$ 的序列。 考虑用一个桶存下这个序列中的所有元素,接着我们枚举所有其他序列,并检查有没有任何一个序列和当前序列有两个以上的数重复。 如果我们在一开始做一个离散化的话,每次操作的时空复杂度就是 $ \\Theta (n)$ 的,考虑到最多有 $ \\sqrt n $ 个这样的序列,这部分的时间复杂度为 $ n \\sqrt n$ 。 然后我们再考虑那些大小小于 $ \\sqrt m$ 的所有序列。枚举所有这些数列的所有元素对 $(x, y)$ ,并把它们存储下来。容易发现这样的元素对有 $ \\sum_{k_i < \\sqrt n} \\frac{k_i \\times (k_i - 1)}{2}$ 个,根据均值不等式知道总共有 $ \\Theta (n \\sqrt n)$ 个这样的数对,并且我们可以用 $n$ 个 $\\texttt{vector}$ 在 $ \\Theta (n \\sqrt n)$ 的时间里检查是否有两对相同数对。 这样我们就有了一个对所有情况都有效的算法,总时间复杂度 $ \\Theta (n \\sqrt n)$。 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113#include <bits/stdc++.h>#include <ext/pb_ds/assoc_container.hpp>#include <ext/pb_ds/hash_policy.hpp>#define R register#define ll long long#define pair std::pair<int, int>#define mp(i, j) std::make_pair(i, j)#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 2e5 + 10;int n, m, cnt, vis[MaxN];std::vector<int> v, b, a[MaxN];std::vector<pair> vec[MaxN];void init(){ v.clear(), b.clear(); memset(vis, 0, 4 * (m + 10)); for (int i = 1; i <= n; i++) a[i].resize(1);}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}signed main(){ int T = read(); while (T--) { int flag = 0; n = read(), init(); for (int i = 1; i <= n; i++) { a[i][0] = read(), m += a[i][0], a[i].resize(a[i][0] + 1); for (int j = 1; j <= a[i][0]; j++) a[i][j] = read(), b.push_back(a[i][j]); std::sort(++a[i].begin(), a[i].end()); } m = sqrt(m) / 2, std::sort(b.begin(), b.end()); b.erase(std::unique(b.begin(), b.end()), b.end()); for (int i = 1; i <= n; i++) for (int j = 1; j <= a[i][0]; j++) a[i][j] = std::lower_bound(b.begin(), b.end(), a[i][j]) - b.begin() + 1; // meow(\"%d%c\", a[i][j], \" \\n\"[j == a[i][0]]); for (int i = 1; i <= n; i++) { if (a[i][0] > m) { cnt = 0; memset(vis, 0, 4 * (b.size() + 10)); for (int j = 1; j <= a[i][0]; j++) vis[a[i][j]] = 1; for (int j = 1; j <= n; j++) if (i ^ j) { cnt = 0; for (int k = 1; k <= a[j][0]; k++) { vis[a[j][k]]++, cnt += (vis[a[j][k]] == 2); if (cnt == 2 && !flag) { flag = 1, printf(\"%d %d\\n\", i, j); break; } } for (int k = 1; k <= a[j][0]; k++) vis[a[j][k]]--; if (flag) break; } } else v.push_back(i); } memset(vis, 0, 4 * (b.size() + 10)); for(int i = 1; i <= b.size(); i++) vec[i].clear(); for(auto i : v) { for(int j = 1; j <= a[i][0]; j++) for(int k = j + 1; k <= a[i][0]; k++) vec[a[i][j]].push_back(mp(a[i][k], i)); } for(int i = 1; i <= b.size(); i++) { for(int j = 0; j < vec[i].size(); j++) { if(vis[vec[i][j].first] && !flag) { printf(\"%d %d\\n\", vis[vec[i][j].first], vec[i][j].second); flag = 1; break; } vis[vec[i][j].first] = vec[i][j].second; } if(flag) break; for (int j = 0; j < vec[i].size(); j++) vis[vec[i][j].first] = 0; } if(!flag) puts(\"-1\"); } return 0;}","link":"/2021/09/13/CF1468M/"},{"title":"CF1490G Old Floppy Drive","text":"题目大意有一个长为 $n$ 的数组 $a_i$,把 $a_i$ 复制成一个无限序列。 给 $m$ 个询问,每次询问给定一个整数 $x$ ,问这个序列第一个前缀和 $ \\geq x$ 的下标是什么。 $ 1 \\leq n, m \\leq 2 \\times 10 ^ 5, -10^9 \\leq a_i \\leq 10^9, 1 \\leq x \\leq 10 ^ 9$ 题目分析首先先判断一下有没可能在一次循环之内结束,这可以用一次双指针解决。 接下来对于还没有求出答案的 $x$ ,如果全部 $n$ 个数的和 $s_n \\leq 0$ ,则显然不可能使前缀和达到 $x$。 否则我们将 $x$ 反复减去 $s_n$ 使得 $x’ \\leq \\max s_i$ ,之后就可以通过一次二分求出第一个前缀和 $ \\geq x’$ 的下标。 时间复杂度 $ \\Theta((n + m) \\log n)$ 。 代码12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485#include <bits/stdc++.h>#define R register#define ll long long#define pair std::pair<ll, ll>#define mp(i, j) std::make_pair(i, j)#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const ll MaxN = 2e5 + 10;pair q[MaxN];std::map<ll, ll> mp;std::vector<pair> v;ll n, m, now, a[MaxN], s[MaxN], ans[MaxN];void init(){ v.clear(), mp.clear(), now = 0; for(ll i = 0; i < m + 10; i++) ans[i] = -1; for(ll i = 0; i < n + 10; i++) a[i] = s[i] = 0;}inline ll read(){ ll x = 0, f = 1; char ch = getchar(); while(ch > '9' || ch < '0') { if(ch == '-') f = 0; ch = getchar(); } while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}signed main(){ ll T = read(); while(T--) { n = read(), m = read(), init(); for(ll i = 1; i <= n; i++) { a[i] = read(), s[i] = a[i] + s[i - 1]; if(s[i] > 0 && !mp[s[i]]) mp[s[i]] = i, v.push_back(mp(s[i], i)); } for(ll i = 1; i <= m; i++) q[i].first = read(), q[i].second = i; std::sort(q + 1, q + m + 1), std::sort(v.begin(), v.end()); if(v.size() >= 2) { for(ll i = v.size() - 2; i >= 0; i--) { ll x = mp[v[i].first]; x = std::min(x, mp.upper_bound(v[i].first)->second); mp[v[i].first] = x, v[i].second = x; } } // for(auto x : mp) // meow(\"# %lld %lld\\n\", x.first, x.second); for(ll i = 0; i < v.size(); i++) { while(now < m && q[now + 1].first <= v[i].first) ++now, ans[q[now].second] = v[i].second - 1; } // meow(\"ok\"); for(ll i = now + 1; i <= m; i++) { if(s[n] <= 0) continue; ans[q[i].second] = n * ((q[i].first - v.back().first + s[n] - 1) / s[n]); q[i].first -= s[n] *((q[i].first - v.back().first + s[n] - 1) / s[n]), ans[q[i].second] += mp.lower_bound(q[i].first)->second - 1; // meow(\"$ %lld %lld %lld\\n\", ans[q[i].second], q[i].first, s[n]); } for(ll i = 1; i <= m; i++) printf(\"%lld%c\", ans[i], \" \\n\"[i == m]); } return 0;}","link":"/2021/09/25/CF1490G/"},{"title":"CF1898E Sofia and Strings","text":"题目大意给定两种对字符串的操作: 删除字符串第 $i$ 位的字符。 选择 $[l, r]$, 将 $s_{[l, r]}$ 的字符串按字典序排序。 现在有长为 $n, m$ 的字符串 $s, t$, 问是否能通过这两种操作把 $s$ 变成 $t$。 多组数据 $, 1 \\leq n, m, \\sum n, \\sum m\\leq 2 \\times 10^5, 1 \\leq t \\leq 10 ^ 4$ 题目分析首先忽略顺序限制,认为 $t$ 是 $s$ 重排后的字符串 $s’$ 的子序列。 回顾子序列的求法,我们对 $s$ 中的 $26$ 种字符分别维护了一个队列,存储每种字符出现的位置。对于每个 $t_i$,我们寻找第 $i$ 个字符最近的一个出现位置。如果每个 $t_i$ 都能找到对应,那么子序列匹配成功。 现在考虑如何在引入操作 $2$ 的情况下修改这一算法。假设对于 $1 \\dots i - 1$ 均已重排完毕。对于 $t_i$,我们同样寻找第 $i$ 个字符最近的一个出现位置,设为 $j$。则我们需要删除 $s_{i \\dots j}$ 中所有小于 $t_i$ 的字符,并重排。如果我们能对整个 $t$ 执行这一过程,那么答案为是,否则为否。容易证明这样贪心是最优的。 注意到我们并不需要显式维护 $s$ 的删除,只需要在记录 $26$ 种字符出现位置的队列中弹出无效位置即可。 时间复杂度 $O(26(n + m))$ 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 2e5 + 10;int n, m;std::queue<int> q[26];char s[MaxN], t[MaxN];inline int read(){ int x = 0; char ch = getchar(); while(ch > '9' || ch < '0') ch = getchar(); while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}signed main(){ int T = read(); while(T--) { int flag = 1; scanf(\"%d%d%s%s\", &n, &m, s + 1, t + 1); for(int i = 0; i < 26; i++) while(!q[i].empty()) q[i].pop(); for(int i = 1; i <= n; i++) q[s[i] - 'a'].push(i); for(int i = 1; i <= m; i++) { int c = t[i] - 'a'; if(q[c].empty()) { flag = 0; break; } int j = q[c].front(); q[c].pop(); for(int v = 0; v < c; v++) while(!q[v].empty() && q[v].front() < j) q[v].pop(); } puts(flag ? \"Yes\" : \"No\"); } return 0;}","link":"/2024/03/13/CF1898E/"},{"title":"CF1601B Frog Traveler","text":"题目大意有一只青蛙掉到了井底,这口井被划分为 $n+1$ 个位置,井口是 $0$ ,井底是 $n$ 。 现在这只青蛙想跳出这口井,假设它当前在位置 $i$,则它可以向上跳 $0$ 到 $a_i$ 的任意整数距离。 又因为井口很滑,所以如果青蛙跳到了位置 $j$,则它会往下滑 $b_j$ 个位置。 给定 $n, a, b$,你需要求出青蛙最少跳多少次才能跳出井(跳到位置 $0$ ),并给出方案。 $1 \\leq n \\leq 3 \\times 10^5$ 题目分析如果忽略跳到位置 $i$ 会往下滑 $b_i$ 这个限制,那么这题可以轻松使用线段树优化建图+最短路解决。 考虑添加了这个限制怎么做,我们建立 $n+1$ 个虚点 $[n+1,2n+1]$ 表示跳到 $[0, n]$ 且还没往下滑时的状态。 由于 $b_i$ 不变,那么每次建边就可以拆成两个部分: $i$ 到 $[i-a[i],i]$ 的虚点,边权为 $1$。 $[i-a[i],i]$ 的虚点 到其各自对应的点,边权为 $0$。 $1$ 部分可以用线段树优化建图解决, $2$ 部分则可以在初始化时建立。 最后我们发现所有边边权为 $0,1$,故我们可以用双端队列 $\\texttt{bfs}$ 降低复杂度,总时间复杂度 $ \\Theta (n \\log n)$ 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 4e5 + 10;const int Max = MaxN << 5;const int inf = 0x3f3f3f3f;struct edge{ int next, to, dis;} e[Max];int head[Max], a[MaxN], b[MaxN];int n, m, cnt, dis[Max], vis[Max], pre[Max];void add_edge(int u, int v, int d){ ++cnt; e[cnt].to = v; e[cnt].dis = d; e[cnt].next = head[u]; head[u] = cnt;};inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}struct SegmentTree{ int cnt, idx[Max]; void build(int id, int l, int r) { if(l == r) return (void) (idx[id] = l + n + 1); idx[id] = ++cnt; int mid = (l + r) >> 1; build(id << 1, l, mid), build(id << 1 | 1, mid + 1, r); if(idx[id << 1]) add_edge(idx[id], idx[id << 1], 0); if(idx[id << 1 | 1]) add_edge(idx[id], idx[id << 1 | 1], 0); } void modify(int id, int l, int r, int ql, int qr, int u) { if(ql > r || l > qr) return; if(ql <= l && r <= qr) return add_edge(u, idx[id], 1); int mid = (l + r) >> 1; modify(id << 1, l, mid, ql, qr, u); modify(id << 1 | 1, mid + 1, r, ql, qr, u); }} T;void bfs(){ std::deque<int> q; dis[n] = 0, q.push_back(n); while(!q.empty()) { int u = q.front(); q.pop_front(); if(vis[u]) continue; vis[u] = 1; for(int i = head[u]; i; i = e[i].next) { int v = e[i].to; if(dis[v] > dis[u] + e[i].dis) { dis[v] = dis[u] + e[i].dis; if(!vis[v]) { if(e[i].dis) q.push_back(v); else q.push_front(v); pre[v] = u; } } } }}signed main(){ int now = 0; n = read(), T.cnt = 2 * n + 1, memset(dis, 0x3f, sizeof(dis)); for(int i = 1; i <= n; i++) a[i] = read(); for(int i = 1; i <= n; i++) b[i] = read(); for(int i = 0; i <= n; i++) add_edge(i + n + 1, i + b[i], 0); T.build(1, 0, n); for(int i = 1; i <= n; i++) T.modify(1, 0, n, i - a[i], i, i); bfs(), printf(\"%d\\n\", dis[0] == inf ? -1 : dis[0]); if(dis[0] != inf) { std::vector<int> path; while(now != n) { if(n + 1 <= now && now <= 2 * n + 1) path.push_back(now - n - 1); now = pre[now]; } std::reverse(path.begin(), path.end()); for(auto x : path) printf(\"%d \", x); } return 0;}","link":"/2021/11/08/CF1601B/"},{"title":"Educational Codeforces Round 170 (Rated for Div. 2) 题解","text":"包含本场比赛的 $\\text{E}, \\text{F}, \\text{G}$ 三道题。 E. Card Game题目大意有两个人正在玩一个卡牌游戏,该游戏使用的卡牌组有 $n \\times m$ 张卡牌,每张卡牌都有两个参数:花色和等级。花色编号从 $1 \\sim n$,等级编号从 $1 \\sim m$。每种花色和等级的组合恰有一张牌。 一张花色为 $a$,等级为 $b$ 的牌可以击败一张花色为 $c$,等级为 $d$ 的牌的条件有: $a = 1, \\; c \\not = 1$ (花色为 $1$ 的卡牌可以打败任何其他花色的卡牌); $a = c, \\; b > d$ (同一花色的卡牌可以打败等级较低的卡牌)。 两名玩家进行游戏。在游戏开始之前,他们各自获得正好一半的牌组。第一名玩家获胜的条件是,对于第二名玩家的每一张卡牌,他都能选择一张可以打败它的卡牌,并且没有卡牌被重复选择。否则第二名玩家获胜。 你的任务是计算使第一名玩家获胜的卡牌分配方式数量。两种方式被认为是不同的,如果存在一张卡牌在一种方式中属于第一名玩家,而在另一种方式中属于第二名玩家。结果对 $998244353$ 取模。 $1 \\leq n, m \\leq 500$ 题目分析首先考虑 $n=1$ 时要如何解决这个问题。容易发现,将牌按等级从大到小排序后,此时任意合法的拿牌序列都形如一个合法的括号匹配。 然而对于 $n>1$ 时,由于除花色 $1$ 之外的花色之间无法相互抵消,从而第一个人得到的一定不多于第二个人得到的。因为一旦比第二个人多,那么他一定会剩下一些这种花色的牌无法消耗。那么对于第二个人比第一个人得到的该花色的牌多的情况,此时只能由第一个人得到的多的花色为 $1$ 的牌抵消掉。 于是考虑设计状态 $f_{i, j}$ 表示分配前 $i$ 副花色的卡牌使得第一个人手里还剩下 $j$ 张可用的 $1$ 花色卡牌的方案数。通过枚举在 $(i + 1)$ 花色第一个人需要使用的 $1$ 花色牌数 $k$,可以进行转移 $f_{i, j} * g_k \\rightarrow f_{i + 1, j - k}$,其中 $g_k$ 是长为 $m$ 的括号序列多出 $k$ 个左括号(或右括号)的方案数。 时间复杂度 $O\\left(nm^2\\right)$ 代码实现1234567891011121314151617181920212223242526272829303132333435363738#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define mul(a, b, mod) (((a) * 1ll * (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int mod = 998244353;const int MaxN = 5e2 + 10;int n, m, g[MaxN][MaxN], f[MaxN][MaxN];signed main(){ scanf(\"%d%d\", &n, &m), g[0][0] = 1; for (int i = 0; i < m; i++) for (int j = 0; j <= i; j++) { g[i + 1][j + 1] = sum(g[i + 1][j + 1], g[i][j], mod); if (j) g[i + 1][j - 1] = sum(g[i + 1][j - 1], g[i][j], mod); } f[0][0] = 1; for (int i = 0; i < n; i++) { for (int j = 0; j <= m; j++) { for (int k = 0; k <= m; k++) { int t = i ? j - k : j + k; if (t < 0 || t > m) continue; f[i + 1][t] = sum(f[i + 1][t], mul(f[i][j], g[m][k], mod), mod); } } } printf(\"%d\\n\", f[n][0]); return 0;} F. Choose Your Queries题目大意有 $1$ 个长度为 $n$,初始全部为 $0$ 的数组。有 $q$ 组操作,每次操作有两个参数 $x, y$。在每次操作中,你需要选择 $\\{x, y\\}$ 中的一个,并加上 $1$ 或 $-1$。需要保证操作过程中每个数都保持非负。给出一种使得最后所有数的和最小的操作方案。 $1 \\leq n, q \\leq 3 \\times 10 ^ 5$ 题目分析注意到题目所给的结构很像一个无向图,于是考虑在无向图上处理。容易发现对于一个点 $u$,最优的决策一定是一加一减,从而只和选择 $u$ 的次数的奇偶性有关,即我们要让尽量少的点具有奇数入度。 考虑答案的下界,容易发现对于某一个联通块,答案的下界是这个联通块中边的条数模 $2$ 的余数。可以发现一定存在一种构造达到这个下界:对于每一个联通块,从某个点出发构造一棵 $\\text{DFS}$ 树,然后自底向上分配边。具体过程是:对于一个节点 $v$,首先遍历它的所有子节点;然后对于 $v$ 连接的边,可以分为 $3$ 类:连接 $v$ 儿子的树边,返祖边,横叉边。我们用树边和横叉边构造边对,并把它们移除。如果最后剩下的边的数量为奇数,那么把 $v$ 和它父亲之间的边(如果还没被使用)进行配对,否则留下一条边留待 $v$ 的祖先处理。最后如果联通块的边为奇数,那么会在 $\\text{DFS}$ 树的根节点留下一条未被配对的边。 有一个细节需要注意:这题需要保证任意时刻序列中所有数的值非负。所以应该把所有定向到 $u$ 的边排序后,按照题目给定的顺序轮流加一减一,才能符合题意。 时间复杂度 $O(n+q)$ 代码实现123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 3e5 + 10;const std::string choice = \"xy\";const std::string sign = \"+-\";std::string ans[MaxN];int n, q, color[MaxN], edge[MaxN][2];std::vector<int> g[MaxN];void pair(int q1, int q2){ if (q1 > q2) std::swap(q1, q2); for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) if (edge[q1][i] == edge[q2][j]) { ans[q1] = {choice[i], sign[0]}; ans[q2] = {choice[j], sign[1]}; return; }}int dfs(int u, int pe = -1){ color[u] = 1; std::vector<int> ed; for(auto e : g[u]) { int v = edge[e][0] ^ edge[e][1] ^ u; if(color[v] == 1) continue; if(color[v] == 0) { if(dfs(v, e)) ed.push_back(e); } else ed.push_back(e); } int res = true; if(ed.size() % 2 != 0) { if(pe != -1) ed.push_back(pe); else ed.pop_back(); res = false; } for(int i = 0; i < ed.size(); i += 2) pair(ed[i], ed[i + 1]); color[u] = 2; return res; // return true if parent edge still exists} signed main(){ scanf(\"%d%d\", &n, &q); for (int i = 1; i <= q; i++) { scanf(\"%d%d\", &edge[i][0], &edge[i][1]); g[edge[i][0]].push_back(i); g[edge[i][1]].push_back(i); ans[i] = \"x+\"; } for(int i = 1; i <= n; i++) if(!color[i]) dfs(i); for(int i = 1; i <= q; i++) printf(\"%s\\n\", ans[i].c_str()); return 0;} G. Variable Damage题目翻译Monocarp 正在组建一支军队,在一款电子游戏中与龙作战。 军队由两个部分组成:英雄和防御神器。每个英雄都有一个参数——他的生命值。每个防御神器也有一个参数——它的耐久度。 在战斗开始前,Monocarp 将神器分配给英雄,使得每个英雄最多持有一个神器。 战斗由若干回合组成,每个回合的流程如下: 首先,龙对每个英雄造成伤害,伤害值等于 $\\frac{1}{a+b}$(为不进行取整的实数),其中 $a$ 是存活英雄的数量,$b$ 是活跃神器的数量; 之后,所有生命值小于或等于 0 的英雄死亡; 最后,一些神器会被取消激活。当满足以下任意一个条件时,耐久度为 $x$ 的神器被取消激活:持有该神器的英雄死亡,或者该英雄累计承受的伤害达到 $x$。如果神器未被任何英雄持有,则在战斗开始时即为非活跃状态。 当没有英雄存活时,战斗结束。 最初,军队为空。共有 $q$ 个查询:添加一个生命值为 $x$ 的英雄或耐久度为 $y$ 的神器到军队。对于每个查询,确定如果 Monocarp 最优地分配神器,他能存活的最大回合数。 $1 \\leq q \\leq 3 \\times 10 ^ 5$ 题目解析首先考虑给定 $a, b$ 数组时如何解决该问题。我们先把神器分配给英雄,组成对 $a_i, b_i$。如果 $m > n$,那么丢弃 $b$ 最小的一些神器。否则如果 $m < n$,用 $b_i = 0$ 补充缺少的神器。 注意到一个有着神器 $b_i$ 的英雄 $a_i$ 可以被替换为一个有着 $a_i$ 点血量的英雄和一个有着 $\\min \\left(a_i, b_i\\right)$ 点血量的英雄,而不会改变答案。 于是我们可以考虑把问题转化为神器不存在,只有 $2n$ 个血量分别为 $a_1, \\min (a_1, b_1), \\dots, a_n, \\min (a_n, b_n)$ 的英雄,每回合龙可以对英雄造成总计一点的伤害,因此最终答案就是: \\sum_{i=1}^n a_i + \\sum_{i=1}^n \\min(a_i, b_i)第一个和容易维护,我们考虑如何维护第二个和的最小值。容易发现,将神器和英雄均降序排序后两两配对是最优的。现在考虑动态插入时要怎么维护。 考虑类似扫描线的思路。我们将英雄和神器组合成一个数组并按降序排列。为了简化起见,假设所有耐久度和生命值都是不同的整数,这可以通过离散化实现。把英雄当成 1,让 $s_{a_i}$ 加上 1,武器当成 $-1$,让 $s_{b_i}$ 减去 1,那么一个血量为 $a_i$ 的英雄会造成贡献(即比与其匹配的武器的 $b$ 小)当且仅当 $\\sum_{k=a_i}^{\\infty} s_k \\leq 0$,同样的,一个强度为 $b_i$ 的装备会造成贡献当且仅当 $\\sum_{k=b_i}^{\\infty} s_k \\geq 0$。 现在考虑维护 $s$ 的后缀和数组 $f$,并查询 $f$ 上所有 $\\leq/ \\ge 0$ 的位置对应的权值,修改则是区间 $+1/-1$。考虑通过分块维护,显然查询的时间复杂度为 $O(\\dfrac{q}{B})$。对于修改,我们考虑块内维护差分数组,由于块内任意两个 $f$ 值的差小于块长 $B$,于是空间复杂度是线性的。在块内修改的时间复杂度是 $O(B)$,重新计算答案的时间复杂度为 $O(\\dfrac{q}{B})$。于是取 $B = \\sqrt q$,那么总时间复杂度为 $O(q \\sqrt q)$ 代码实现1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889#include <bits/stdc++.h>#define forn(i, n) for (int i = 0; i < int(n); i++)using namespace std;struct query{ int t, v;};int main(){ cin.tie(0); ios::sync_with_stdio(false); int m; cin >> m; vector<query> q(m); forn(i, m) cin >> q[i].t >> q[i].v; vector<pair<int, int>> xs; forn(i, m) xs.push_back({q[i].v, i}); sort(xs.rbegin(), xs.rend()); forn(i, m) q[i].v = xs.rend() - lower_bound(xs.rbegin(), xs.rend(), make_pair(q[i].v, i)) - 1; const int p = sqrt(m + 10); const int siz = (m + p - 1) / p; vector<int> tp(m); vector<int> val(m); vector<vector<long long>> dp(p, vector<long long>(2 * siz + 1)); vector<int> blbal(p); auto upd = [&](const query &q) { tp[q.v] = q.t; val[q.v] = xs[q.v].first; blbal[q.v / siz] += q.t == 1 ? 1 : -1; }; auto recalc = [&](int b) { dp[b].assign(2 * siz + 1, 0); int bal = 0; for (int i = b * siz; i < m && i < (b + 1) * siz; ++i) { if (tp[i] == 1) { dp[b][0] += val[i]; dp[b][0] += val[i]; dp[b][-bal + siz] -= val[i]; ++bal; } else if (tp[i] == 2) { dp[b][-bal + 1 + siz] += val[i]; --bal; } } forn(i, 2 * siz) { dp[b][i + 1] += dp[b][i]; } }; auto get = [&](int b, int bal) { bal += siz; if (bal < 0) return dp[b][0]; if (bal >= 2 * siz + 1) return dp[b].back(); return dp[b][bal]; }; for (auto it : q) { upd(it); recalc(it.v / siz); int bal = 0; long long ans = 0; for (int i = 0; i * siz < m; ++i) { ans += get(i, bal); bal += blbal[i]; } cout << ans << '\\n'; } return 0;}","link":"/2024/11/14/CF2025/"},{"title":"CF258E Little Elephant and Tree","text":"题目大意你有一棵有 $n$ 个节点的有根(根为 $1$ )树,你要对对其进行 $m$ 次操作。 每次操作给出两个数 $a_i, b_i$,你要往以 $a_i, b_i$ 为根的子树内每个点的集合里加入数 $i$。 问最后对于每个点有多少个点(不包括自己)的集合与其交集非空。 $1 \\leq n, m \\leq 10^5$ 题目分析将树按照 $\\texttt{DFS}$ 序遍历,则子树对应于一段连续的区间。 现在的操作就相当于 给出两个区间 $[a,b],[l,r]$,这两个区间并集内的所有节点都变得两两关联。 关联关系的定义中涉及两个节点,因此考虑两维,第一维表示关联定义中的 第一个节点,而第二维表示关联定义中的第二个节点,两维考虑的范围都是树上 所有的节点,那么操作就相当于说,使得第一维当中的所有编号$[a,b],[l,r]$ 中 的点与第二维中所有编号为 $[a,b],[l,r]$ 中的点互相关联。 这个问题可以将每个操作拆成 $4$ 个矩形,使用矩形面积并的方式用扫描线$+$线段树解决。 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const int MaxN = 2e5 + 10;struct edge{ int next, to;} e[MaxN << 1];struct node{ int l, r; int sum, len;};struct query{ int pos, l, r, c;} Q[MaxN << 2];int n, m, q, cnt, now, dfscnt, ans[MaxN];int head[MaxN], dfn[MaxN], siz[MaxN];struct SegmentTree{ node t[MaxN << 2]; void build(int id, int l, int r) { t[id].l = l, t[id].r = r; if (l == r) return; int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); } void pushup(int id) { int l = t[id].l, r = t[id].r; if (t[id].sum) t[id].len = r - l + 1; else t[id].len = t[id << 1].len + t[id << 1 | 1].len; } void modify(int id, int l, int r, int val) { if (t[id].r < l || r < t[id].l) return; if (l <= t[id].l && t[id].r <= r) { t[id].sum += val, pushup(id); return; } modify(id << 1, l, r, val); modify(id << 1 | 1, l, r, val), pushup(id); }} T;int cmp(query a, query b) { return a.pos < b.pos; }void add(int a, int b, int l, int r){ Q[++q] = (query){a, l, r, 1}; Q[++q] = (query){b + 1, l, r, -1}; // meow(\"$ %d %d %d %d\\n\", a, b, l, r);}void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline int read(){ int x = 0; char ch = getchar(); while(ch > '9' || ch < '0') ch = getchar(); while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}void dfs(int u, int fa){ dfn[u] = ++dfscnt, siz[u] = 1; for(int i = head[u]; i; i = e[i].next) { int v = e[i].to; if(v == fa) continue; dfs(v, u), siz[u] += siz[v]; }}signed main(){ n = read(), m = read(), T.build(1, 1, n); for(int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v), add_edge(v, u); } dfs(1, 0); for(int i = 1; i <= m; i++) { int a, b, l, r; a = read(), b = dfn[a] + siz[a] - 1, a = dfn[a]; l = read(), r = dfn[l] + siz[l] - 1, l = dfn[l]; add(a, b, a, b), add(a, b, l, r); add(l, r, a, b), add(l, r, l, r); } std::sort(Q + 1, Q + q + 1, cmp), now = 1; for(int i = 1; i <= n; i++) { while(now <= q && Q[now].pos == i) T.modify(1, Q[now].l, Q[now].r, Q[now].c), ++now; ans[i] = T.t[1].len, ans[i] ? --ans[i] : 0; } for(int i = 1; i <= n; i++) printf(\"%d \", ans[dfn[i]]); return 0;}","link":"/2021/09/06/CF258E/"},{"title":"CF487C Prefix Product Sequence","text":"题目描述构造一个长度为$n$的排列,使得其前缀积在$\\mod n$意义下两两不同 问题分析我们发现构造这样一个序列必然有$1$放在第$1$个,$n$放在第$n$个,考虑$(n-1)!$模$n$的余数,可以证明如果$n$为大于$4$的合数,则该余数$=0$,于是无解。 现在我们考虑对于质数$n$怎么做。发现$1$到$(n-1)$都有模$n$意义下的逆元,于是我们可以构造一个序列满足它前缀积$\\mod n$的余数是$1$到$n-1$,这样就是一个$ix \\equiv (i+1) \\mod n$,使用逆元求一下就好了 代码123456789101112131415161718192021222324252627282930313233343536373839404142# include <bits/stdc++.h># define R register# define ll long longconst ll MaxN = 1e5 + 10;ll n, tp, flag, p[MaxN];ll check(ll x){ for(ll i = 2; i * i <= x; i++) if(x % i == 0) return 1; return 0;}ll exgcd(ll a, ll b, ll &x, ll &y){ ll g = a; if(b == 0) x = 1, y = 0; else g = exgcd(b, a % b, y, x), y -= (a / b) * x; return g;}signed main(){ scanf(\"%lld\", &n); if(n == 1) puts(\"YES\\n1\\n\"); else if(n == 4) puts(\"YES\\n1 3 2 4\"); else if(check(n)) puts(\"NO\"); else { printf(\"YES\\n1 \"); for(ll i = 2; i < n; i++) { ll x, y; exgcd(i - 1, n, x, y); printf(\"%lld \", ((x * 1ll * i) % n + n) % n); } printf(\"%lld\\n\", n); } return 0;}","link":"/2021/01/18/CF487C/"},{"title":"CF375D 【Tree and Queries】","text":"子树上的查询问题可以通过$DFS$序转换为序列问题 我们用$sum_i$表示出现次数$\\geq i$的个数 用$val_i$表示第$i$种颜色的出现次数 那么每次修改时只要$O(1)$修改$sum$和$val$即可 详见代码 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100#include <bits/stdc++.h>const int MaxN = 100010;struct node{ int val, dfn, r, id;};struct query{ int l, r; int pos, id, k;};struct edge{ int next, to;};node a[MaxN];query q[MaxN];edge e[MaxN << 1];int n, m, cnt, dfscnt, size;int head[MaxN], ans[MaxN], sum[MaxN], val[MaxN];inline int comp(node a, node b) { return a.dfn < b.dfn; }inline int cmp(query a, query b){ if (a.pos != b.pos) return a.pos < b.pos; return a.r < b.r;}inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline void dfs(int u){ a[u].dfn = ++dfscnt; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (!a[v].dfn) dfs(v); } a[u].r = dfscnt;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void add(int x) { ++val[a[x].val], ++sum[val[a[x].val]]; }inline void del(int x) { --sum[val[a[x].val]], --val[a[x].val]; }inline void solve(){ int l = 1, r = 0; for (int i = 1; i <= m; i++) { while (l > q[i].l) --l, add(l); while (r < q[i].r) ++r, add(r); while (l < q[i].l) del(l), ++l; while (r > q[i].r) del(r), --r; ans[q[i].id] = sum[q[i].k]; }}int main(){ n = read(), m = read(); size = pow(n, 0.55); for (int i = 1; i <= n; i++) a[i].val = read(), a[i].id = i; for (int i = 1; i <= n - 1; i++) { int u = read(), v = read(); add_edge(u, v); add_edge(v, u); } dfs(1); for (int i = 1; i <= m; i++) { int v, k; v = read(), k = read(); q[i].l = a[v].dfn, q[i].r = a[v].r, q[i].k = k; q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q, q + m + 1, cmp); std::sort(a + 1, a + n + 1, comp); solve(); for (int i = 1; i <= m; i++) printf(\"%d\\n\", ans[i]); return 0;}","link":"/2019/02/06/CF375D/"},{"title":"CF550A 【Two Substrings】","text":"思路:暴力判每一个”BA”出现的位置,二分查找他前/后有没有满足条件的”AB”,时间复杂度$O(n\\log_{2}n)$ 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657# include <bits/stdc++.h>const int MaxN = 100010;std::vector<int> a, b;//存下标int upper(int x)//二分后面的位置{ int l = 0, r = a.size(); while(l < r) { int mid = (l + r) >> 1; if(a[mid] > x) r = mid; else l = mid + 1; } return l;}int lower(int x)//二分前面的位置{ int l = -1, r = a.size() - 1; while(l < r) { int mid = (l + r + 1) >> 1; if(a[mid] < x) l = mid; else r = mid - 1; } return l;}int main(){ std::string s; std::cin >> s; int len = s.length(); for(int i = 0; i < len - 1; i++) { std::string tmp = s.substr(i, 2); if(tmp == \"AB\") a.push_back(i); else if(tmp == \"BA\") b.push_back(i); }//查找\"AB\"和\"BA\"出现的位置 if(a.size() == 0 || b.size() == 0) return 0 * printf(\"NO\");//特判 for(int i = 0; i < b.size(); i++) { int x = lower(b[i] - 1);//防重 int y = upper(b[i] + 1); if(x != -1 || y != a.size()) return 0 * printf(\"YES\"); } printf(\"NO\"); return 0;}","link":"/2019/02/06/CF550A/"},{"title":"CF504E Misha and LCP on Tree","text":"题目大意给你一棵有$n$个节点的树,每个节点上有一个字符$c$。 有$m$次询问,每次询问$a\\sim b$路径上的字符串和$c \\sim d$路径上的字符串的最长公共前缀$\\texttt{(LCP)}$ $n \\leq 3 \\times 10^5,m \\leq 10^6$ 分析发现普通的$\\texttt{LCP}$可以通过二分$+$哈希求出,我们考虑把这个做法拓展到树上。 维护$\\texttt{h[u]}$表示根到$u$路径上的字符串哈希值,$\\texttt{revh[u]}$表示$u$到根路径上的哈希值,则$u \\sim v$路径上的哈希值可以表现为$\\texttt{revh[u, lca]} \\times \\texttt{base}^{dv} + \\texttt{h(lca,v])}$,其中$\\texttt{dv}$表示$\\texttt{v}$到$\\texttt{lca}$的距离;而询问$u$和$v$到$\\texttt{lca}$的哈希值可以视为一个序列问题解决。 回到原问题,我们二分$\\texttt{LCP}$的长度$\\texttt{k}$,并找到$\\texttt{(a, b), (c, d)}$路径上第$\\texttt{k}$个节点(记为$v_1, v_2$),判断$\\texttt{(a,v1)}$与$\\texttt{(c,v2)}$链上的哈希值是否相等,并调整二分区间,最后的$l$就是答案 时间复杂度$\\mathcal{O}\\texttt{((n + m) log n)}$,详细实现参见代码 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 1e6 + 10;const int mod = 998244853;struct edge{ int next, to;};struct mod_t{ static int norm(int x) { return x + (x >> 31 & mod); } int x; mod_t() {} mod_t(int v) : x(v) {} mod_t(long long v) : x(v) {} mod_t(char v) : x(v) {} mod_t operator+(const mod_t &rhs) const { return norm(x + rhs.x - mod); } mod_t operator-(const mod_t &rhs) const { return norm(x - rhs.x); } mod_t operator*(const mod_t &rhs) const { return (ll)x * rhs.x % mod; }};edge e[MaxN << 1];std::vector<int> up[MaxN], down[MaxN];char s[MaxN];int n, m, cnt;mod_t h[MaxN], revh[MaxN], powm[MaxN], invp[MaxN], base, inv;int son[MaxN], fa[MaxN][24], f[MaxN][24], Dep[MaxN], pos[MaxN];int head[MaxN], dep[MaxN], maxd[MaxN], fir[MaxN], lg2[MaxN], top[MaxN];void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}int querykth(int u, int k){ if (!k) return u; u = fa[u][lg2[k]], k -= (1 << lg2[k]); k -= dep[u] - dep[top[u]], u = top[u]; return ((k >= 0) ? up[u][k] : down[u][-k]);}int querylca(int u, int v){ int l = fir[u], r = fir[v], k; if (l > r) std::swap(l, r); k = lg2[r - l + 1]; return (Dep[f[l][k]] <= Dep[f[r - (1 << k) + 1][k]]) ? pos[f[l][k]] : pos[f[r - (1 << k) + 1][k]];}mod_t fast_pow(mod_t a, int b){ mod_t ret = 1; while (b) { if (b & 1) ret = ret * a; a = a * a, b >>= 1; } return ret;}void init(){ srand(time(NULL)); powm[0] = invp[0] = 1, base = (rand() % 2000) + 1001; inv = fast_pow(base, mod - 2); for (int i = 1; i <= n; i++) { powm[i] = (powm[i - 1] * 1ll * base); invp[i] = (invp[i - 1] * 1ll * inv); }}void prework(){ lg2[0] = -1; for (int i = 1; i <= cnt; i++) lg2[i] = lg2[i >> 1] + 1, f[i][0] = i; for (int j = 1; (1 << j) <= cnt; j++) { for (int i = 1; i <= cnt - (1 << j) + 1; i++) f[i][j] = (Dep[f[i][j - 1]] <= Dep[f[i + (1 << (j - 1))][j - 1]]) ? f[i][j - 1] : f[i + (1 << (j - 1))][j - 1]; }}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch - 48), ch = getchar(); return x;}void dfs(int u, int fa){ Dep[++cnt] = maxd[u] = dep[u] = dep[fa] + 1, ::fa[u][0] = fa; pos[cnt] = u, h[u] = h[fa] * 1ll * base + s[u]; fir[u] = cnt, revh[u] = revh[fa] + powm[dep[fa]] * s[u]; for (int i = 1; i <= 20; i++) ::fa[u][i] = ::fa[::fa[u][i - 1]][i - 1]; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs(v, u), pos[++cnt] = u, Dep[cnt] = dep[u]; if (maxd[v] > maxd[u]) maxd[u] = maxd[v], son[u] = v; }}void dfs1(int u, int top){ ::top[u] = top; if (u == top) { int x = u; for (int i = 0; i <= maxd[u] - dep[u]; i++) up[u].push_back(x), x = fa[x][0]; x = u; for (int i = 0; i <= maxd[u] - dep[u]; i++) down[u].push_back(x), x = son[x]; } if (son[u]) dfs1(son[u], top); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v != son[u] && v != fa[u][0]) dfs1(v, v); }}int qhash(int u, int v, int lca, int flca){ int dv = dep[v] - dep[lca]; mod_t h1 = (revh[u] - revh[flca] + mod) * invp[dep[flca]], h2, H; h2 = (h[v] - h[lca] * powm[dv]), H = h1 * powm[dv] + h2; return H.x;}int get(int u, int v, int k){ int lca = querylca(u, v), d = dep[u] + dep[v] - 2 * dep[lca] + 1; return (k <= dep[u] - dep[lca]) ? 1 : 0;}int path(int u, int v, int k){ int lca = querylca(u, v), d = dep[u] + dep[v] - 2 * dep[lca] + 1; if (k <= dep[u] - dep[lca]) return querykth(u, k - 1); else return querykth(v, d - k);}int query(int a, int b, int c, int d){ if (s[a] != s[c]) return 0; int lca1 = querylca(a, b), flca1 = fa[lca1][0], d1 = dep[a] + dep[b] - 2 * dep[lca1] + 1; int lca2 = querylca(c, d), flca2 = fa[lca2][0], d2 = dep[c] + dep[d] - 2 * dep[lca2] + 1; int l = 1, r = std::min(d1, d2); // printf(\"debug: a = %d, b = %d, c = %d, d = %d, lca(a, b) = %d, lca(c, d) = %d\\n\", a, b, c, d, lca1, lca2); while (l < r) { int mid = (l + r + 1) >> 1; int x1 = get(a, b, mid), x2 = get(c, d, mid); int v1 = path(a, b, mid), v2 = path(c, d, mid); // printf(\"Debug: l = %d, r = %d, mid = %d, v1 = %d, v2 = %d\\n\", l, r, mid, v1, v2); if (qhash(a, v1, (x1 ? v1 : lca1), (x1 ? fa[v1][0] : flca1)) == qhash(c, v2, (x2 ? v2 : lca2), (x2 ? fa[v2][0] : flca2))) l = mid; else r = mid - 1; } return l;}int main(){ scanf(\"%d\\n%s\", &n, s + 1), init(); for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v), add_edge(v, u); } m = read(), cnt = 0, dfs(1, 0), dfs1(1, 1), prework(); for (int i = 1; i <= m; i++) { int a = read(), b = read(), c = read(), d = read(); printf(\"%d\\n\", query(a, b, c, d)); } return 0;}","link":"/2020/03/15/CF504E/"},{"title":"CF605E Intergalaxy Trips","text":"题目大意有一个有$n$个点的有向完全图,每条边每天有一个开放几率$p[i][j]$,给定$p$,你需要求出从$1$到$n$的期望天数 $n \\leq 10^3$ 分析我们可以考虑倒着做,类似$\\texttt{dijkstra}$的思路,每次找到当前走到$1$期望天数最小的,并用它更新所有点的期望天数和路径概率,直到走到$n$为止。 时间复杂度$\\mathcal{O}(n^2)$ 代码12345678910111213141516171819202122232425262728293031323334353637383940#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 1e3 + 10;int n, vis[MaxN], a[MaxN];double p[MaxN][MaxN], d[MaxN], sum[MaxN], pr[MaxN];int main(){ scanf(\"%d\", &n); for (int i = 1; i <= n; i++) { int now = 0; sum[i] = pr[i] = 1.0; for (int j = 1; j <= n; j++) scanf(\"%d\", &now), p[i][j] = now * 0.01L; } vis[n] = 1, a[1] = n, d[0] = 1e18; for (int i = 2; i <= n; i++) { for (int j = 1; j <= n; j++) { if (vis[j]) continue; sum[j] += d[a[i - 1]] * p[j][a[i - 1]] * pr[j]; pr[j] *= (1 - p[j][a[i - 1]]), d[j] = sum[j] / (1 - pr[j]); } int pos = 0; for (int j = 1; j <= n; j++) if (!vis[j] && d[pos] > d[j]) pos = j; vis[pos] = 1, a[i] = pos; } printf(\"%.10lf\\n\", d[1]); std::cerr << \"tiger0132 /qq\"; return 0;}","link":"/2020/03/15/CF605E/"},{"title":"CF86D Powerful array","text":"怎么2700的题这么简单啊QAQ 长得非常像P2709 小B的询问,做法也一样 莫队离线乱搞做完了 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970#include <bits/stdc++.h>#define int long longconst int MaxN = 1000010;struct query{ int l, r, id, pos;};query q[MaxN];int n, t, size;int ans[MaxN], sum;int a[MaxN], cnt[MaxN];inline int cmp(query a, query b){ if (a.pos != b.pos) return a.pos < b.pos; return a.r < b.r;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void add(int x){ sum += a[x] * (2 * cnt[a[x]] + 1); cnt[a[x]]++;}inline void del(int x){ sum -= a[x] * (2 * cnt[a[x]] - 1); cnt[a[x]]--;}inline void solve(){ int l = 1, r = 0; for (int i = 1; i <= t; i++) { while (l > q[i].l) --l, add(l); while (r < q[i].r) ++r, add(r); while (l < q[i].l) del(l), ++l; while (r > q[i].r) del(r), --r; ans[q[i].id] = sum; }}signed main(){ n = read(), t = read(); size = pow(n, 0.55); for (int i = 1; i <= n; i++) a[i] = read(); for (int i = 1; i <= t; i++) { q[i].l = read(), q[i].r = read(); q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q + 1, q + t + 1, cmp); solve(); for (int i = 1; i <= t; i++) printf(\"%lld\\n\", ans[i]); return 0;}","link":"/2019/02/28/CF86D/"},{"title":"CF717A Festival Organization","text":"题目大意一个合法的串定义为:长度在 $[l,r]$ 之间,且只含 0,1,并且不存在连续 $2$ 个或更多的 $0$。 现在要选出 $k$ 个长度相同的合法的串,问有几种选法,答案模 $10^9+7$。 $ 1 \\leq k \\leq 200$,$1 \\leq l \\leq r \\leq 10^{18}$ 题目分析显而易见的我们要求的是一个形如 $ \\sum_{l+2}^{r+2} \\binom{f_i}{k}$的东西,其中 $f_i$表示斐波那契数列的第$i$项 我们把上式转化成求$ \\sum_{i=1}^{n} \\binom{f_i}{k}$,那么我们可以开始推式子了 \\sum_{i=1}^n \\binom {f_i}{k} = \\frac{1}{k!} \\sum_{i=1}^n \\sum_{j=1}^k (f_i - j + 1) = \\frac{1}{k!} \\sum_{i=1}^n \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} f_i^j这里用的是下降幂多项式转普通多项式的方法,$\\begin{bmatrix}k \\\\\\\\ j\\end{bmatrix}$是第一类斯特林数 = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} \\sum_{i=1}^n f_i^j = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} \\sum_{i=1}^n (\\frac{1}{\\sqrt 5}(\\phi^i - \\hat{\\phi}^i))^j = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} (\\frac{1}{\\sqrt 5})^j \\sum_{i=1}^n (\\phi^i - \\hat{\\phi}^i)^j = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} (\\frac{1}{\\sqrt 5})^j \\sum_{i=1}^n \\sum_{l=0}^j (-1)^l \\binom{j}{l} \\phi^{li} \\hat{\\phi}^{(j-l)i} = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} (\\frac{1}{\\sqrt 5})^j \\sum_{l=1}^j (-1)^l \\binom{j}{l} \\sum_{i=1}^n (\\phi^{l} \\hat{\\phi}^{(j-l)})^i = \\frac{1}{k!} \\sum_{j=1}^k (-1)^{k-j} \\begin{bmatrix}k \\\\\\ j\\end{bmatrix} (\\frac{1}{\\sqrt 5})^j \\sum_{l=1}^j (-1)^l \\binom{j}{l} \\frac{\\phi^{l} \\hat{\\phi}^{(j-l)}({1-\\phi^{l} \\hat{\\phi}^{(j-l)})^n}}{1-\\phi^{l}\\hat{\\phi}^{(j-l)}}大概就是这样一个式子,接下来就可以在 $O(k^2)$的时间内计算这个东西了 但是我们发现,浮点数显然无法承担这么复杂的计算任务,怎么处理 $\\phi$和$\\hat{\\phi}$呢? 类似于 $a + bi$的形式,我们搞一个$a+b\\sqrt{5}$出来,接下来就可以用这个数域处理上面的式子了 最后会附上草稿纸的原图,如果发现有和上面不一样的请联系我,谢谢! 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114#include <bits/stdc++.h>#define R register#define ll long long#define sqr(x) ((x) * (x))#define sum(a, b, mod) (((a) + (b)) % mod)#define meow(cat...) fprintf(stderr, cat)const ll MaxN = 2e3 + 10;const ll mod = 1e9 + 7;ll add(ll a, ll b) { return a + b >= mod ? a + b - mod : a + b; }ll dec(ll a, ll b) { return a - b < 0 ? a - b + mod : a - b; }ll pw(ll a, ll b){ ll ret = 1; while (b) { if (b & 1) ret = ret * a % mod; a = a * a % mod, b >>= 1; } return ret;}struct num{ ll a, b; num(ll a = 0, ll b = 0) : a(a), b(b) {} num operator+(const num &x) const { return num(add(a, x.a), add(b, x.b)); } num operator+(const ll &x) const { return num(add(a, x), b); } inline num operator-(const num &x) const { return num(dec(a, x.a), dec(b, x.b)); } inline num operator-(const ll &x) const { return num(dec(a, x), b); } inline num operator*(const num &x) const { num res; res.a = (a * x.a + b * x.b * 5) % mod; res.b = (a * x.b + b * x.a) % mod; return res; } inline num operator*(const ll &x) const { return num(a * x % mod, b * x % mod); } friend inline num operator/(const num &x, const num &y) { num res, z = y; if (z.b != 0) z.b = mod - z.b; res = x * z; return res * pw(dec(y.a * y.a % mod, y.b * y.b * 5 % mod), mod - 2); }} phi = num(500000004, 500000004), iphi = num(500000004, 500000003);ll n, l, r, k, ifac;ll c[MaxN][MaxN], s[MaxN][MaxN];num poww(num a, ll b){ num ret = num(1, 0); while (b) { if (b & 1) ret = ret * a; a = a * a, b >>= 1; } return ret;}num suma(num a, ll n) { return (poww(a, n + 1) - a) / (a - 1); }num query(num x, ll l, ll r){ if (x.a == 1 && x.b == 0) return x * (r - l + 1); return suma(x, r) - suma(x, l - 1);}ll func(ll n, ll k){ num ans, a = num(0, 400000003); for (ll j = 0; j <= k; j++) { ll b = (((j + k) % 2) ? -1 : 1) * c[k][j] % mod; num c = poww(phi, j) * poww(iphi, k - j); b = (b + mod) % mod, ans = ans + query(c, 1, n) * b; } a = poww(a, k), ans = ans * a; return (ans.a % mod + mod) % mod;}ll fun(ll n, ll k){ ll ans = 0; for (ll j = 1; j <= k; j++) { ll b = (((k - j) % 2) ? -1 : 1) * s[k][j] % mod; b = (b + mod) % mod, ans = sum(ans, b * func(n, j), mod); } return ans;}signed main(){ scanf(\"%lld%lld%lld\", &k, &l, &r), l += 2, r += 2, c[0][0] = s[0][0] = ifac = 1; for (ll i = 1; i <= k; i++) ifac = ifac * i % mod; ifac = pw(ifac, mod - 2); for (ll i = 1; i < MaxN; i++) { c[i][0] = 1; for (ll j = 1; j <= i; j++) c[i][j] = (c[i - 1][j] + c[i - 1][j - 1]) % mod; } for (ll i = 1; i < MaxN; i++) for (ll j = 1; j <= i; j++) s[i][j] = (ll)(s[i - 1][j] * (i - 1) + s[i - 1][j - 1]) % mod; printf(\"%lld\\n\", (fun(r, k) - fun(l - 1, k) + mod) * ifac % mod); return 0;}","link":"/2021/05/02/CF717A/"},{"title":"CF900D 【Unusual Sequences】","text":"数论好题 可以发现如果$x$不整除$y$那么肯定无解 不然我们可以发现其实求的就是和为$y/x$且$gcd(a_1,a_2,\\cdots,a_n)=1$的序列个数 容易发现所有和为$y$的序列个数为$2^{n-1}$ 而所有$gcd$不为$1$的序列,把每个数除以$gcd$,就又回到原题了 所以枚举每个可能的$gcd$(约数),递归计算即可。 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152#include <bits/stdc++.h>#define ll long longconst ll mod = 1e9 + 7;std::map<int, int> m;std::vector<int> v, vec;int fast_pow(ll a, ll n){ int ret = 1; while (n) { if (n & 1) ret = (1ll * ret * a) % mod; a = (1ll * a * a) % mod; n >>= 1; } return ret;}int solve(int x){ if (m[x]) return m[x]; if (x == 1) { m[x] = 1; return x; } int sum = 0; int s = sqrt(x); for (int i = 1; i <= s; i++) { if (x % i == 0) { if (i == 1 || i * i == x) sum = (sum + solve(i)) % mod; else sum = (sum + solve(i) % mod + solve(x / i) % mod) % mod; } } sum = (fast_pow(2, x - 1) - sum + mod) % mod; m[x] = sum; return sum;}int main(){ ll x, y; std::cin >> x >> y; if (y % x != 0) return 0 * printf(\"0\"); y /= x; std::cout << solve(y); return 0;}","link":"/2019/02/06/CF900D/"},{"title":"CSLearningMap","text":"记录一些想要自学的 Courses (Maybe Sorted) [x] UCB CS61A: Structure and Interpretation of Computer Programs [ ] UCB CS61B: Data Structures and Algorithms (Ongoing) Coursera: Machine Learning Coursera: Deep Learning Differential Equations CS231n: CNN for Visual Recognition CS224n: Natural Language Processing CS285: Deep Reinforcement Learning CMU CS15213: CSAPP MIT 6.031: Software Construction CS188: Introduction to Artificial Intelligence UCB CS70 : discrete Math and probability theory CS50’s Introduction to AI with Python CS255: Introduction to Cryptography UCB CS161: Computer Security","link":"/2024/02/04/CSLearningMap/"},{"title":"FJWC2020 小记","text":"这里记录$\\texttt{little_sun}$的$\\texttt{FJWC2020}$之旅 Day 11 Life1.1 简要题意一个$n$个点的有向图,每个点有颜色,部分点的颜色已经确定定义一条任意相邻点不同色的路径为交错路径为所有颜色未定的点确定颜色,并为所有$1 \\leq i < j \\leq n$,确定图上从$i$到$j$的有向边是否存在求有多少种方案使得该图交错路径的条数为奇数,对大质数取模$1\\leq n \\leq 2 \\times 10^5$ 1.2 分析我们设$g_i$表示以$i$结尾的交错路径条数,这样我们有了这样一个$\\texttt{dp}$思路: 设$f[i][j][k][h]$表示前$i$个点,有$j$个黑点,有$k$个白点满足他的$g$为奇数,且这$i$个点的$g$之和的奇偶性为$h$的方案数 我们发现如果$i+1$是黑点的话那么只有那$k$个白点会对$g_{i+1}$的奇偶性产生影响,故只要考虑这些点的子集与$i+1$的连边的方案数就好了,白色同理 又因为这些点在计算中都相当于等价的,于是我们只要考虑这些点的子集大小的奇偶性即可 设$calc(x, y)$表示一个大小为$x$的集合取大小奇偶性为$y$的集合的方案数,则我们有了如下一个$\\texttt{dp}$方程组:1.$f[i+1][j][k][h]+=f[i][j][k][h] \\times calc(k, 1) \\times 2^{i-k} $ 2.$f[i+1][j+1][k][h \\oplus 1]+=f[i][j][k][h] \\times calc(k, 0) \\times 2^{i-k}$ 3.$f[i+1][j][k][h]+=f[i][j][k][h] \\times calc(j, 1) \\times 2^{i-j}$ 4.$f[i+1][j][k+1][h \\oplus 1]+=f[i][j][k][h] \\times calc(j, 0) \\times 2^{i-j} $ 要注意的是,若$i+1$被钦定为黑色则$3,4$转移不可取,白色同理,复杂度$O(n^3)$ 我们发现: $calc(k, 0)\\times2^{i-k}=(k \\; ? \\; 2^{i-1} : 2^i)$$calc(k, 1)\\times2^{i-k}=(k \\; ? \\; 2^{i-1} : 0)$ 于是我们就可以不记$j, k$的值了,改记满足条件的黑白点的存在性,方程变成$f[i][0/1][0/1][0/1]$, 最后的$ans=\\sum_{j,k \\in \\{0,1\\}} f[n][j][k][1]$ 1.3 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 5e5 + 10;const int mod = 998244353;int f[MaxN][2][2][2];int n, m, col[MaxN], pow2[MaxN];inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}int main(){ n = read(), pow2[0] = 1; for (int i = 1; i <= n; i++) col[i] = read(), pow2[i] = (pow2[i - 1] * 2ll) % mod; f[0][0][0][0] = 1; for (int i = 0; i < n; i++) { for (int j = 0; j <= 1; j++) { for (int k = 0; k <= 1; k++) { for (int h = 0; h <= 1; h++) { if (col[i + 1] != 1) { f[i + 1][j][k][h] = sum(f[i + 1][j][k][h], f[i][j][k][h] * 1ll * (k ? pow2[i - 1] : 0), mod); f[i + 1][j | 1][k][h ^ 1] = sum(f[i + 1][j | 1][k][h ^ 1], f[i][j][k][h] * 1ll * (k ? pow2[i - 1] : pow2[i]), mod); } if (col[i + 1] != 0) { f[i + 1][j][k][h] = sum(f[i + 1][j][k][h], f[i][j][k][h] * 1ll * (j ? pow2[i - 1] : 0), mod); f[i + 1][j][k | 1][h ^ 1] = sum(f[i + 1][j][k | 1][h ^ 1], f[i][j][k][h] * 1ll * (j ? pow2[i - 1] : pow2[i]), mod); } } } } } int ans = 0; for (int j = 0; j <= 1; j++) for (int k = 0; k <= 1; k++) ans = sum(ans, f[n][j][k][1], mod); printf(\"%d\\n\", ans); return 0;} 2 Winner2.1 简要题意给定一个$n$个点$m$条边的无向图 求给所有边定向使得$1$和$2$可以到达同一个点的方案数 $1 \\leq n \\leq 15, 1 \\leq n \\leq \\frac{n \\times (n - 1)}{2}$ 2.2 分析发现正着做很难搞,考虑用总数减去不合法的数目 设$1$能到达的点集为$S$, $2$能到达的点集为$T$,则不合法的方案数就是$S \\cap T = \\emptyset$的方案数 设$f_S$表示对点集$S$的导出子图中的边定向能使得$1$能到达$S$内所有节点的方案数$(1 \\in S)$,$g_T$表示$2$的类似东西 那么枚举$S,T$如果没有边横跨$S,T$,则这两个点集内部的定向方案数为$f_S \\times g_T$ 而在$S,T$之外,如果有一条边横跨$S \\cup T$内外,则这条边只能从$S \\cup T$内连到$S \\cup T$外,否则这条边可以随便连,于是现在就可以算出答案了,由于$S \\cap T = \\emptyset$,所以时间复杂度$O(3 ^ n)$ 现在我们考虑怎么计算$f,g$,同样考虑用总数减去不合法的数目,对于集合$S$,总数显然是$2^{S的导出子图边数}$ 枚举$S$的真子集$T$,考虑只能到$T$的方案数,则点集$T$内部的方案数显然为$f_T$,外部的方案数为$2^{S-T的导出子图边数}$, 对于横跨$T$与$S-T$的边,显然只能从$S-T$连到$T$,于是这时扣掉的方案数为$f_T*2^{S-T的导出子图边数}$,由于枚举子集,时间复杂度$O(3^n)$ 2.3 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 16;const int mod = 1e9 + 7;int n, m, id;int pow2[MaxN * MaxN], gr[MaxN][MaxN], c[1 << MaxN], d[1 << MaxN], f[3][1 << MaxN];inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(), m = read(), id = read(), pow2[0] = 1; for (int i = 1; i <= m; i++) { int x = read(), y = read(); ++gr[x][y], pow2[i] = (pow2[i - 1] * 2ll) % mod; } int lim = (1 << n); for (int s = 0; s < lim; s++) for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) if (gr[i][j]) { if ((s & (1 << (i - 1))) && (s & (1 << (j - 1)))) c[s] += gr[i][j]; if ((s & (1 << (i - 1))) || (s & (1 << (j - 1)))) d[s] += gr[i][j]; } for (int id = 1; id <= 2; id++) { for (int s = 0; s < lim; s++) { if (!(s & (1 << (id - 1)))) continue; f[id][s] = pow2[c[s]]; for (int t = (s - 1) & s; t; t = (t - 1) & s) f[id][s] = sum(f[id][s], (-f[id][t] * 1ll * pow2[c[s - t]]) % mod + mod, mod); } } int ans = pow2[m]; for (int s = 0; s < lim; s++) { if ((!(s & 1)) || (s & 2)) continue; for (int t = lim - 1 ^ s; t; t = ((t - 1) & (lim - 1 ^ s))) { // printf(\"%d %d %d\\n\", s, t, ans); if ((!(t & 2)) || c[s] + c[t] < c[s | t]) continue; ans = sum(ans, ((-1ll * f[1][s] * f[2][t]) % mod) * pow2[m - d[s + t]] % mod + mod, mod); } } printf(\"%d\\n\", ans); return 0;} 3 Brr咕咕咕 Day 21 Building咕咕咕 2 Bracelet咕咕咕 3 Number3.1 简要题意给定操作数$n$和一个数$k$,实现一个集合$s$,支持插入和删除操作。 每次操作后输出$s$内满足$gcd(s_i, s_j) = k$的$(i,j)$对数 令$z$为集合内出现过的数的最大值,则有$1 \\leq n,z \\leq 10^5$ 3.2 分析题目可以转化为每次加入/删除一个数,并求这个数和集合内多少数的$\\texttt{gcd}=k$ 容易发现如果一个数不能被$k$整除,那么这个数一定对答案没有贡献 所以问题又转化为每次加入/删除一个数,并求这个数和集合内多少数的$\\texttt{gcd}=1$ 考虑容斥,那么发现当加入一个数$x$的时候,答案会增加: \\sum_{d|x}cnt[d] \\times mu[d]其中$cnt[i]$表示$i$的倍数出现过多少次,时间复杂度$O(n \\sqrt z)$ 3.3 代码12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const ll MaxN = 5e5 + 10;ll n, k, cnt, ans;ll prime[MaxN], p[MaxN], mu[MaxN], val[MaxN], vis[MaxN];inline ll read(){ ll x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}void add(ll x, ll v){ for (ll i = 1; i * i <= x; i++) { if (x % i == 0) { val[i] += v; if (i * i != x) val[x / i] += v; } }}ll query(ll x){ ll ret = 0; for (ll i = 1; i * i <= x; i++) { if (x % i == 0) { ret += mu[i] * val[i]; if (i * i != x) ret += mu[x / i] * val[x / i]; } } return ret;}inline void prework(){ ll n = 200000; mu[1] = 1, p[0] = p[1] = 1; for (ll i = 2; i <= n; i++) { if (!p[i]) prime[++cnt] = i, mu[i] = -1; for (ll j = 1; j <= cnt && i * prime[j] <= n; j++) { p[i * prime[j]] = 1; if (i % prime[j] == 0) break; mu[i * prime[j]] = -mu[i]; } }}int main(){ freopen(\"number.in\", \"r\", stdin); freopen(\"number.out\", \"w\", stdout); prework(); n = read(), k = read(); while (n--) { ll op = read(), x = read(); if (op == 1) { if (x % k == 0) vis[x / k]++, ans += query(x / k), add(x / k, 1); } else { if (x % k == 0 && vis[x / k]) vis[x / k]--, add(x / k, -1), ans -= query(x / k); } printf(\"%lld\\n\", ans); } return 0;} Day 3咕咕咕 Day 41 Hakugai1.1 简要题意有一个数列$g_i$满足$g_0=a,g_1=b,g_i=3*g_{i-1}-g_{i-2} \\ (i \\geq 2)$ ,其中$a,b$是给定的常数 现在我们有一个数列$f_{n,k}$满足$f_{n,0}=n,f_{n,k}=f_{g_n,k-1}$,给定$a,b,n,k,p$,求$f_{n,k}$对$p$取模的结果 $1 \\leq n, p \\leq 10^9, 0 \\le a,b \\le p, 0 \\le k \\leq 100$ 1.2 分析由于$k\\le100$,所以我们考虑暴力求循环节,然后用矩阵快速幂暴力计算 我们发现题目中的数列是个二阶常系数递推,写出前几项发现是个斐波那契数列 于是斐波那契数列的循环节就很好求了 设要求斐波那契数列对$p$取模的循环节$f(p)$, 若$p=p_1^{k_1} \\times \\cdots \\times p_m^{k_m}$,(其中$p_i$为$p$的第$i$个质因子) 则有$f(p)= lcm(f(p_i) \\times p_i^{k_i-1})$,又当$p_i$是质数的时候,若$p_i \\equiv \\pm1$,则$f(p_i)=p_i-1$,否则$f(p_i)=2\\times(p_i+1)$ 现在我们会求循环节了,考虑怎么求题目要求的东西 容易发现我们要求的是$g_{g_{g_{\\cdots g_{n}}}}$(嵌套$k$层),那么我们发现: 1.第$1$层求的是$ g_i \\; mod \\; p$的循环节,循环节为$f(p)$ 2.第$2$层求的是$ g_i \\; mod \\; f(p)$的循环节,循环节为$f(f(p))$ 以此类推,故我们只要把该过程迭代$k$遍就好了 1.3 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187# include <bits/stdc++.h># define ll long long# define R registerconst ll MaxN = 1e6 + 10;struct matrix{ ll n, m; ll a[5][5]; matrix(ll x = 0, ll y = 0) { n = x, m = y; memset(a, 0, sizeof(a)); }};ll a, b, n, k, mod, cnt, faccnt;ll prime[MaxN], p[MaxN], fac[1000][20];ll gcd(ll a, ll b){return b ? gcd(b, a % b) : a;}ll lcm(ll a, ll b){return a / gcd(a, b) * b;}ll mul(ll x, ll y, ll p){return x * y - (ll) ((long double) x * y / p) * p;}ll fast_pow(ll a, ll b, ll mod){ ll ret = 1; while(b) { if(b & 1) ret = mul(ret, a, mod) % mod; a = mul(a, a, mod) % mod; b >>= 1; } return ret;}matrix mul(matrix a, matrix b, ll mod){ matrix c(a.n, b.m); for(int i = 1; i <= a.n; i++) for(int j = 1; j <= b.m; j++) for(int k = 1; k <= a.m; k++) c.a[i][j] = (c.a[i][j] + mul(a.a[i][k], b.a[k][j], mod) % mod + mod) % mod; return c;}matrix I(){ matrix c(2, 2); c.a[1][1] = 1; c.a[2][2] = 1; return c;}matrix pow(matrix a, ll b, ll mod){ matrix ret = I(); while(b) { if(b & 1) ret = mul(ret, a, mod); a = mul(a, a, mod); b >>= 1; } return ret;}matrix init1(){ matrix a(2, 2); a.a[1][1] = 3, a.a[1][2] = -1; a.a[2][1] = 1, a.a[2][2] = 0; return a;}matrix init2(){ matrix c(2, 1); c.a[1][1] = b; c.a[2][1] = a; return c;}ll getf(ll x, ll mod){ if(x == 0) return a % mod; if(x == 1) return b % mod; matrix a = init1(), b = init2(), res = mul(pow(a, x-1, mod), b, mod); return (res.a[1][1] % mod + mod) % mod;}void prework(){ ll n = 1000000; p[0] = p[1] = 1; for(int i = 2; i <= n; i++) { if(!p[i]) prime[++cnt] = i; for(int j = 1; j <= cnt && i * prime[j] <= n; j++) { p[i * prime[j]] = 1; if(i % prime[j] == 0) break; } }}void getfac(ll x){ ll tmp = x; for(int i = 1; i <= faccnt + 1; i++) fac[i][0] = fac[i][1] = 0;faccnt = 0; for(int i = 1; prime[i] <= tmp / prime[i]; i++) { if(tmp % prime[i] == 0) { ++faccnt; fac[faccnt][0] = prime[i]; while((tmp % prime[i]) == 0 && tmp != 1) fac[faccnt][1]++, tmp /= prime[i]; } } if(tmp != 1) { ++faccnt; fac[faccnt][0] = tmp; fac[faccnt][1] = 1; }}ll g(ll p){ ll num; (p % 5 == 1 || p % 5 == 4) ? (num = p - 1) : (num = 2 * (p + 1)); return num;}std::map<ll, ll> m;ll getloop(ll n){ if(m.find(n) != m.end()) return m[n]; getfac(n); ll ans = 1; for(int i = 1; i <= faccnt; i++) { ll res = 1; if(fac[i][0] == 2) res = 3; else if(fac[i][0] == 3) res = 8; else if(fac[i][0] == 5) res = 20; else res = g(fac[i][0]); for(int j = 1; j < fac[i][1]; j++) res *= fac[i][0]; ans = lcm(ans, res); } return m[n] = ans;} int main(){ freopen(\"hakugai.in\", \"r\", stdin); freopen(\"hakugai.out\", \"w\", stdout); ll T; prework(); scanf(\"%lld\", &T); while(T--) { scanf(\"%lld%lld%lld%lld%lld\", &a, &b, &n, &k, &mod); if(k == 1) { printf(\"%lld\\n\", getf(n, mod)); continue; } ll loop[101] = {mod}; for(int i = 1; i <= k; i++) loop[i] = getloop(loop[i - 1]); n %= loop[k]; for(int i = k - 1; ~i; i--) n = getf(n, loop[i]); ll ans = n; printf(\"%lld\\n\", ans); } return 0;}","link":"/2020/01/18/FJWC2020/"},{"title":"「LOJ 145」DFS序 2","text":"经典的DFS序入门题 题目都告诉你是什么算法了 和「LOJ 144」DFS序 1一样,只不过这次把单点查询的树状数组改成区间修改的线段树罢了 敲下模板就结束了 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154#include <bits/stdc++.h>#define ll long longconst int MaxN = 1000010;struct edge{ int to, next;};struct vertex{ int dfn, next, val;};struct node{ int l, r; ll sum, tag;};struct SegmentTree{ ll x[MaxN]; node t[MaxN << 2]; void pushup(int id) { t[id].sum = t[id << 1].sum + t[id << 1 | 1].sum; } inline void build(int id, int l, int r) { t[id].l = l, t[id].r = r; if (l == r) { t[id].sum = x[l]; return; } int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); pushup(id); } inline void pushdown(int id) { if (t[id].tag) { t[id << 1].tag += t[id].tag; t[id << 1 | 1].tag += t[id].tag; t[id << 1].sum += t[id].tag * 1ll * (t[id << 1].r - t[id << 1].l + 1); t[id << 1 | 1].sum += t[id].tag * 1ll * (t[id << 1 | 1].r - t[id << 1 | 1].l + 1); t[id].tag = 0; } } inline void modify(int id, int l, int r, int val) { if (l > t[id].r || r < t[id].l) return; if (l <= t[id].l && t[id].r <= r) { t[id].sum += val * 1ll * (t[id].r - t[id].l + 1); t[id].tag += val; return; } if (t[id].l == t[id].r) return; pushdown(id); modify(id << 1, l, r, val); modify(id << 1 | 1, l, r, val); pushup(id); } inline ll query(int id, int l, int r) { if (l > t[id].r || r < t[id].l) return 0; if (l <= t[id].l && t[id].r <= r) return t[id].sum; if (t[id].l == t[id].r) return 0; pushdown(id); return query(id << 1, l, r) + query(id << 1 | 1, l, r); }} T;edge e[MaxN];vertex a[MaxN];int head[MaxN], vis[MaxN];int n, m, r, cnt, dfscnt;inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}inline void dfs(int u){ a[u].dfn = vis[u] = ++dfscnt; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (vis[v]) continue; dfs(v); } a[u].next = dfscnt;}int main(){ n = read(), m = read(), r = read(); for (int i = 1; i <= n; ++i) a[i].val = read(); for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v); add_edge(v, u); } dfs(r); for(int i = 1; i <= n; i++) T.x[a[i].dfn] = a[i].val; T.build(1, 1, n); for (int i = 1; i <= m; i++) { int op = read(); if (op == 1) { int pos = read(), x = read(); T.modify(1, a[pos].dfn, a[pos].next, x); } else { int pos = read(); printf(\"%lld\\n\", T.query(1, a[pos].dfn, a[pos].next)); } } return 0;}","link":"/2019/02/19/LOJ-145/"},{"title":"CodeChef TANDEM","text":"题目大意我们定义一个字符串$s$为$\\texttt{tandem}$当且仅当这个字符串能被表示三个相同的字符串$A$首尾相连的结果 对于一个字符串$s$的所有子串$s_{l \\cdots r}$,如果它是一个$\\texttt{tandem}$,则它是一个有趣的$\\texttt{tandem}$当且仅当$s_l \\not= s_{r+1}$,否则这就是一个无聊的$\\texttt{tandem}$ 现在,你需要统计有趣的和无聊的$\\texttt{tandem}$的数量 分析$O(n^3)$方法不说了,大家都会 $O(n^2)$方法我们枚举$A$的长度$L \\in [1, \\frac{n}{3}]$, 并统计所有长度为$3L$的$\\texttt{tandem}$ 我们考虑所有下标能被$L$整除的位置,即$s_0, s_L, s_{2L}, \\cdots$,不难发现对于每一个长度为$3L$的$\\texttt{tandem}$都会恰好覆盖这些位置中的连续$3$个,我们设这连续三个位置为$(i, j, k)$ 不难发现这些子串的起始位置$a \\in [i - L + 1, i]$, 如果这个子串是一个$\\texttt{tandem}$, 则有$s_{[a, a+L-1]}=s_{[a+L, a+2L-1]}=s_{[a+2L, a+3L-1]}$ 由于$a \\in [i - L + 1, i]$,所以我们有$a \\leq i \\leq a + L - 1$, $a + L \\leq j \\leq a + 2L - 1$, $a + 2L \\leq k \\leq a + 3L - 1$,所以我们可以把上述条件转化为如下条件: s_{[a,i]}=s_{[a+L,j]}=s_{[a+2L,k]} \\\\ s_{[i,a+L-1]}=s_{[j,a+2L-1]}=s_{[k,a+2L-1]}上面两个条件可以表示为:子串$s[0,i],\\;s[0,j],\\;s[0,k]$有一个长度为$i-a+1$的相同后缀,子串$s[i,n-1],\\;s[j,n-1],\\;s[k,n-1]$有一个长度为$a+L-i$的相同前缀。 设$\\texttt{LCP(i,j,k)}$表示$s[i,n-1],\\;s[j,n-1],\\;s[k,n-1]$的后缀长度,$\\texttt{LCS(i,j,k)}$表示$s[0,i],\\;s[0,j],\\;s[0,k]$的后缀长度,则有$\\max( i−LCS(i , j , k )+1, i − L +1)≤ a ≤ \\min( LCP ( i , j , k )− L + i , i )$ 合法的$a$共有$\\min(0,\\min(LCP(i,j,k)−L+i,i)−\\max(i−LCS(i,j,k)+1,i−L+1)+1)$个 化简一下得到$\\min(LCP(i,j,k),L)+\\min(LCS(i,j,k),L)−1$,我们设这个值为$V$ 可以发现: 当$V < L$时不存在$\\texttt{tandem}$ 当$V \\ge L$时存在$V-L+1$个$\\texttt{tandem}$,此时: 如果$LCP \\le L$,则存在$1$个有趣的$\\texttt{tandem}$ 否则不存在有趣的$\\texttt{tandem}$。 如果朴素的去求$\\texttt{LCP,LCS}$的话,时间复杂度$\\mathcal{O}(n^2)$ 正解方法可以使用字符串哈希,后缀数组$+$线段树,后缀数组$+$$\\texttt{ST}$表的方法优化求$\\texttt{LCP,LCS}$的过程。 总时间复杂度$\\mathcal{O}(n \\; \\log^2 n)$或$\\mathcal{O}(n \\; \\log n)$(取决于写法)","link":"/2020/02/26/CodeChef TANDEM/"},{"title":"GYM103415K Magus Night","text":"简要题意对所有长度为 $n$ ,元素不超过 $m$ ,$\\texttt{lcm} \\ge p$,$\\texttt{gcd} \\le q$ 数列求积的和 分析原题意可转化为全部 数列的贡献去掉$\\texttt{lcm} < p$、$\\texttt{gcd} > q$,再加上 $\\texttt{lcm} < p$,$\\texttt{gcd} > q$ 数列的贡献 第一部分我们考虑二(多)项式定理,故总和为$H(m)=(\\sum_{i=1}^mi)^n=(\\frac{m(m+1)}{2})^n$ 第二部分我们考虑枚举 $\\texttt{lcm}$ ,设 $g(x)$ 表示 $\\texttt{lcm}=x$ 数列的贡献 则可以莫比乌斯反演,$g(x)=\\sum_{d|x} \\mu(d) h(\\frac{x}{d})$, 其中 $h(x)$ 是所有 $\\texttt{lcm}$ 为 $x$ 因数的数列的贡献 $h(x)$ 的表达式可以写成 $(\\sum_{i|x}i)^n$ ,于是我们就可以愉快计算 $g(x)$ 了 第三部分同样考虑枚举 $\\texttt{gcd}$,设 $G(x)$ 表示 $\\texttt{gcd}=x$ 数列的贡献 考虑把 $x$ 除掉变成互质数列,互质数列贡献 $F(m)=\\sum_{d=1}^x \\mu(d) H(\\frac{m}{d})$(好像这里可以整除分块?) 则 $G(x)=x^n F(\\frac{m}{x})$ ,于是也可以快乐计算 $G(x)$ 了 第四部分考虑同时枚举 $\\texttt{gcd}$ 和 $\\texttt{lcm}$,由于 $\\texttt{lcm} < p$,且 $\\texttt{gcd}$ $\\mid$ $\\texttt{lcm}$,故这里复杂度是正确的 先考虑一个$\\texttt{gcd} = 1, \\texttt{lcm}=x$ 的数列,它的贡献 $f(x)=\\sum_{d|x} \\mu(d) g(\\frac{x}{d})$ 再考虑一个$\\texttt{gcd}=t$ 和 $\\texttt{lcm}=xt$ 的数列,则 $f(x)$ 只要乘上 $t^n$ 即可 综上所述,总贡献为 ans=H(m)-\\sum_{x=1}^{p-1}g(x)-\\sum_{t=q+1}^{m}G(t)+\\sum_{t=q+1}^m \\sum_{x=1}^{\\frac{p-1}{t}}t^nf(x)代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135#include <bits/stdc++.h>#define R register#define ll long long#define meow(cat...) fprintf(stderr, cat)const ll MaxN = 2e5 + 10;const ll mod = 998244353;const ll inv2 = (mod + 1) / 2;std::vector<ll> d[MaxN];ll n, m, p, q, cnt, ans, f[MaxN], g[MaxN], h[MaxN], pw[MaxN], mu[MaxN];ll vis[MaxN], s[MaxN], pr[MaxN], F[MaxN], G[MaxN], H[MaxN];ll sum(ll a, ll b) { return ((a + b) % mod + mod) % mod; }ll Add(ll &a, ll b) { return a = sum(a, b); }ll fast_pow(ll a, ll b){ ll res = 1; while(b) { if(b & 1) res = res * a % mod; a = a * a % mod, b >>= 1; } return res;}void init(){ mu[1] = 1; for(ll i = 2; i < MaxN; i++) { if(!vis[i]) pr[++cnt] = i, mu[i] = -1; for(ll j = 1; j <= cnt && i * pr[j] < MaxN; j++) { vis[i * pr[j]] = 1; if(i * pr[j] == 0) { mu[i * pr[j]] = 0; break; } mu[i * pr[j]] = -mu[i]; } } for(ll i = 1; i < MaxN; i++) s[i] = sum(s[i - 1], mu[i]), Add(s[i], mod);}inline ll read(){ ll x = 0; char ch = getchar(); while(ch > '9' || ch < '0') ch = getchar(); while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}signed main(){ n = read(), m = read(), p = read(), q = read(), init(); for(ll i = 1; i <= m; i++) H[i] = i * (i + 1) % mod * inv2 % mod, H[i] = fast_pow(H[i], n); for(ll i = 1; i <= m; i++) for(ll j = 1; j * j <= i; j++) if(i % j == 0) { d[i].push_back(j); if(j * j != i) d[i].push_back(i / j); } // meow(\"1 %d\\n\", clock()); for(ll i = 1; i <= m; i++) { pw[i] = fast_pow(i, n); for(ll j = 0; j < d[i].size(); j++) Add(h[i], d[i][j]); // printf(\"$ %lld %lld\\n\", i, h[i]); h[i] = fast_pow(h[i], n); } // meow(\"2 %d\\n\", clock()); for (ll i = 1; i <= m; i++) { g[i] = h[i]; for (auto j : d[i]) { if (j == i) continue; Add(g[i], mod - g[j]); } } // meow(\"3 %d\\n\", clock()); // meow(\"4 %d\\n\", clock()); // for(ll i = 1; i <= m; i++) // { // printf(\"i: %d \", i); // for(ll l = 1, r; l <= i; l = r + 1) // r = i / (i / l), Add(F[i], sum(s[r], mod - s[l - 1]) * H[i / r] % mod), // printf(\"%d %d %d | \", l, r, i / l); // puts(\"\"); // } for (int i = m; i >= 1; --i) { F[i] = H[m / i] * pw[i] % mod; for (int j = i + i; j <= m; j += i) Add(F[i], mod - F[j]); } // meow(\"5 %d\\n\", clock()); for(ll i = 1; i <= m; i++) G[i] = (F[i] % mod + mod) % mod; ll res1 = 0, res2 = 0, res3 = 0; for(ll x = 1; x < p; x++) Add(res1, g[x]); // meow(\"6 %d\\n\", clock()); for(ll t = q + 1; t <= m; t++) Add(res2, G[t]); // for(ll i = 1; i <= m; i++) // for(ll j = 1; i * j <= m; j++) // Add(f[i * j], g[i] * mu[j]); // for(ll t = q + 1; t < p; t++) // for(ll x = 1; x <= (p - 1) / t; x++) // Add(res3, pw[t] * f[x] % mod); for (int i = 1; i <= m; i++) g[i] = sum(g[i], g[i - 1]); for (int i = q + 1; i < p; i++) f[i] = g[(p - 1) / i] * pw[i] % mod; for (int i = p - 1; i > q; i--) { for (int j = 2 * i; j < p; j += i) f[i] = sum(f[i], mod - f[j]); res3 = sum(res3, f[i]); } // meow(\"7 %d\\n\", clock()); ans = ((H[m] - res1 - res2 + res3) % mod + mod) % mod; meow(\"%lld %lld %lld %lld %lld\\n\", H[m], res2, res1, res3, ans); printf(\"%lld\\n\", ans); return 0;}","link":"/2022/04/10/GYM103415K/"},{"title":"「LOJ 144」 DFS序1","text":"一道经典的DFS序入门题. 很显然对整个子树的修改可以通过DFS序转化为序列问题 于是只要把树转化为序列,再在序列上跑树状数组就好了 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596#include <bits/stdc++.h>#define int long long#define lowbit(x) (x & (-x))const int MaxN = 1e6 + 10;struct edge{ int next, to;};struct node{ int dfn, val, r;};node a[MaxN];edge e[MaxN << 1];int n, m, r, dfsnum, cnt;int head[MaxN], vis[MaxN], c[MaxN];inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch - '0'), ch = getchar(); return f ? x : (-x);}inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline void dfs(int u){ vis[u] = true, a[u].dfn = ++dfsnum; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (!vis[v]) dfs(v); } a[u].r = dfsnum;}inline void modify(int pos, int x){ while (pos <= n) { c[pos] += x; pos += lowbit(pos); }}inline int query(int pos){ int ans = 0; while (pos) { ans += c[pos]; pos -= lowbit(pos); } return ans;}signed main(){ n = read(), m = read(), r = read(); for (int i = 1; i <= n; i++) a[i].val = read(); for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v); add_edge(v, u); } dfs(r); for (int i = 1; i <= n; i++) modify(a[i].dfn, a[i].val); for (int i = 1; i <= m; i++) { int op = read(); if (op == 1) { int pos = read(), x = read(); modify(a[pos].dfn, x); } else { int pos = read(); printf(\"%lld\\n\", query(a[pos].r) - query(a[pos].dfn - 1)); } } return 0;}","link":"/2019/02/11/LOJ-144/"},{"title":"LOJ 6000「网络流 24 题」搭配飞行员","text":"很简单的网络流 对于每个正飞行员,从源点向它连一条容量为$1$的边 对于每个副飞行员,从它向汇点连一条容量为$1$的边 对于每一对可以配对的正/副飞行员,从正飞行员向副飞行员连一条容量为$1$的边 然后跑网络流模板即可 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 2e4 + 10;const int MaxM = 5e5 + 10;const int inf = (1 << 30);struct edge{ int to, next, cap;};edge e[MaxM];int n, m, s = 20000, t = 20001, cnt = 1, ans;int head[MaxN], dep[MaxN], cur[MaxN], a[MaxN];inline void add(int u, int v, int c){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; e[cnt].cap = c; head[u] = cnt;}inline void add_edge(int u, int v, int c) { add(u, v, c), add(v, u, 0); }inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int bfs(){ memset(dep, 0, sizeof(dep)); memcpy(cur, head, sizeof(head)); std::queue<int> q; dep[s] = 1; q.push(s); while (!q.empty()) { int u = q.front(); q.pop(); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] || !c) continue; dep[v] = dep[u] + 1; q.push(v); } } return dep[t];}inline int dinic(int u, int flow){ if (u == t) return flow; int rest = flow; for (int i = cur[u]; i && (flow - rest < flow); i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] != dep[u] + 1 || !c) continue; int k = dinic(v, cmin(rest, c)); if (!k) dep[v] = dep[u] + 1; else { e[i].cap -= k; e[i ^ 1].cap += k; rest -= k; } } if (flow - rest < flow) dep[u] = -1; return flow - rest;}int main(){ n = read(), m = read(); int u, v; for (int i = 1; i <= m; i++) add_edge(s, i, 1); for (int i = m + 1; i <= n; i++) add_edge(i, t, 1); while (scanf(\"%d%d\", &u, &v) == 2) add_edge(u, v, 1); int now = 0; while (bfs()) while ((now = dinic(s, inf))) ans += now; printf(\"%d\\n\", ans); return 0;}","link":"/2019/05/09/LOJ-6000/"},{"title":"LOJ 6002「网络流 24 题」最小路径覆盖","text":"1.建立两个集合$x$和$y$ 2.如果有一条边$$,则从$x$集合中的$u$点连向$y$集合的$v$点,容量为$inf$ 3.从$s$向$x$中每一个点连边,从$y$中每一个点向$t$连边,容量为$1$ 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 2e4 + 10;const int MaxM = 5e5 + 10;const int inf = (1 << 30);struct edge{ int to, next, cap;};edge e[MaxM];int n, m, s = 20000, t = 20001, cnt = 1, ans;int head[MaxN], dep[MaxN], cur[MaxN], a[MaxN], vis[MaxN], to[MaxN];inline void add(int u, int v, int c){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; e[cnt].cap = c; head[u] = cnt;}inline void add_edge(int u, int v, int c) { add(u, v, c), add(v, u, 0); }inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int bfs(){ memset(dep, 0, sizeof(dep)); memcpy(cur, head, sizeof(head)); std::queue<int> q; dep[s] = 1; q.push(s); while (!q.empty()) { int u = q.front(); q.pop(); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] || !c) continue; dep[v] = dep[u] + 1; q.push(v); } } return dep[t];}inline int dinic(int u, int flow){ if (u == t) return flow; int rest = flow; for (int i = cur[u]; i && (flow - rest < flow); i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] != dep[u] + 1 || !c) continue; int k = dinic(v, cmin(rest, c)); if (!k) dep[v] = dep[u] + 1; else { e[i].cap -= k; e[i ^ 1].cap += k; rest -= k; if (e[i].to > n) vis[e[i].to - n] = 1; to[u] = e[i].to; } } if (flow - rest < flow) dep[u] = -1; return flow - rest;}inline void solve(){ int now = 0; while (bfs()) while ((now = dinic(s, inf))) ans += now;}int main(){ n = read(), m = read(); for (int i = 1; i <= m; i++) { int u = read(), v = read(); add_edge(u, v + n, inf); } for (int i = 1; i <= n; i++) add_edge(s, i, 1), add_edge(i + n, t, 1); solve(); for (int i = 1; i <= n; i++) { if (vis[i]) continue; printf(\"%d \", i); int t = i; while (to[t]) { printf(\"%d \", to[t] - n); t = to[t] - n; } puts(\"\"); } printf(\"%d\\n\", n - ans); return 0;}","link":"/2019/05/12/LOJ-6002/"},{"title":"Codeforces Round 550 (Div.3) 题解","text":"Codeforces Round #550 (Div.3) 题解 A. Diverse Strings & B. Parity Alternated Deletions太水了,略 C. Two Shuffled SequencesDescription你有一个长为$n$的数列,你要把它分成两个数列,满足一个数列单调递增,另一个数列单调递减 求任意一种方案 Solution根据抽屉原理,如果有$\\geq3$个相同的数字那么肯定不行 否则对于出现两次的数,把它分别放在两个数列里 出现一次的数随便放在哪个数列里都行 然后就做完了 Code1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)#define openfile(x) freopen(#x \".in\", \"r\", stdin), freopen(#x \".out\", \"w\", stdout)const int MaxN = 500010;int n, in, de;int a[MaxN], vis[MaxN], cnt[MaxN], inc[MaxN], dec[MaxN];int cmp(int a, int b){ return a > b;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(); for (int i = 1; i <= n; ++i) { a[i] = read(), ++cnt[a[i]]; if (cnt[a[i]] >= 3) return 0 * printf(\"NO\"); } for (int i = 1; i <= n; i++) { if (cnt[a[i]] == 1) inc[++in] = a[i]; else if (cnt[a[i]] == 2 && vis[a[i]] == 1) dec[++de] = a[i]; else inc[++in] = a[i], ++vis[a[i]]; } std::sort(inc + 1, inc + in + 1); std::sort(dec + 1, dec + de + 1, cmp); printf(\"YES\\n\"); printf(\"%d\\n\", in); for (int i = 1; i <= in; i++) printf(\"%d \", inc[i]); puts(\"\"); printf(\"%d\\n\", de); for (int i = 1; i <= de; i++) printf(\"%d \", dec[i]); puts(\"\"); return 0;} D. Equalize Them AllDescription给定一个数列$a_i$,你有两种操作 操作$1$,把$a_i$赋值为$a_i+|a_i−a_j|$ 操作$2$,把$a_i$赋值为$a_i-|a_i−a_j|$ 操作均需满足$|i-j|=1$ 求最小次数及方案 Solution贪心一下,你就知道 首先肯定是把所有数字全部变成出现次数最大的那个数时最优 所以你记录一下出现次数最大的那个数每次出现的位置,然后模拟一下 Finished Code123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)#define openfile(x) freopen(#x \".in\", \"r\", stdin), freopen(#x \".out\", \"w\", stdout)const int MaxN = 2e5 + 10;int n;int a[MaxN], cnt[MaxN];std::vector<int> vec;int main(){ scanf(\"%d\", &n); for (int i = 1; i <= n; i++) scanf(\"%d\", &a[i]), ++cnt[a[i]]; int max = 0, num = 0; for (int i = 1; i <= n; i++) { if (max < cnt[a[i]]) max = cnt[a[i]], num = a[i]; } printf(\"%d\\n\", n - max); if (max == n) return 0; vec.push_back(0); for (int i = 1; i <= n; i++) { if (a[i] == num) vec.push_back(i); } for (int i = 1; i < vec.size(); i++) { for (int j = vec[i] - 1; j > vec[i - 1]; j--) { if (a[j] > num) printf(\"%d %d %d\\n\", 2, j, j + 1); else printf(\"%d %d %d\\n\", 1, j, j + 1); } } if (vec[vec.size() - 1] < n) { for (int i = vec[vec.size() - 1] + 1; i <= n; i++) { if (a[i] > num) printf(\"%d %d %d\\n\", 2, i, i - 1); else printf(\"%d %d %d\\n\", 1, i, i - 1); } } return 0;} E. Median StringDescription有两个长度为$k$的字符串 你要求它们的”中间字符串”(即两个字符串的平均值) 数据保证有解 Solution首先把两个字符串化成两个数字数组$a_i$,$b_i$$a<b$ 然后按类似高精度的方式将两个串相减,再$÷2$,得到另一个串$c_i$ 然后让$a_i$加上$c_i$,Finish (注意进位! Code1234567891011121314151617181920212223242526272829303132333435363738394041424344#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)#define openfile(x) freopen(#x \".in\", \"r\", stdin), freopen(#x \".out\", \"w\", stdout)const int MaxN = 500010;std::string s, t;int k, nums[MaxN], numt[MaxN], ans[MaxN], add[MaxN];int main(){ scanf(\"%d\", &k); std::cin >> s >> t; if (s == t) { std::cout << s; return 0; } for (int i = 1; i <= k; i++) nums[i] = s[i - 1] - 'a' + 1; for (int i = 1; i <= k; i++) numt[i] = t[i - 1] - 'a' + 1; for (int i = k; i >= 1; i--) { while (nums[i] > numt[i]) numt[i] += 26, numt[i - 1]--; if ((numt[i] - nums[i]) % 2) add[i + 1] += 13; add[i] = (numt[i] - nums[i]) / 2; } for (int i = k; i >= 1; i--) { ans[i] += nums[i] + add[i]; while (ans[i] > 26) ans[i - 1]++, ans[i] -= 26; while (ans[i] == 0) ans[i - 1]++, ans[i] += 26; } for (int i = 1; i <= k; i++) printf(\"%c\", ans[i] + 'a' - 1); return 0;} F. Graph Without Long Directed PathsDescription你有一个无向图,没有重边和自环 你的任务是把这个无向图转成有向图,满足这个有向图里找不到长度$\\geq2$的边 Solution将这个图黑白染色 可以发现如果一条边连接的两个点如果都是同一个颜色,那么就不行 否则就从白向黑连边(黑向白也行) Finished. Code12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)#define openfile(x) freopen(#x \".in\", \"r\", stdin), freopen(#x \".out\", \"w\", stdout)const int MaxN = 500010;struct edge{ int to, next;};edge e[MaxN];int n, m, cnt;int head[MaxN], col[MaxN], u[MaxN], v[MaxN];inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void dfs(int u, int c){ col[u] = c; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (col[v]) continue; dfs(v, (c == 1) ? 2 : 1); }}int main(){ n = read(), m = read(); for (int i = 1; i <= m; i++) { u[i] = read(), v[i] = read(); add_edge(u[i], v[i]); add_edge(v[i], u[i]); } for (int i = 1; i <= n; i++) if (!col[i]) dfs(i, 1); for (int i = 1; i <= m; i++) { if (col[u[i]] == col[v[i]]) return 0 * printf(\"NO\"); } printf(\"YES\\n\"); for (int i = 1; i <= m; i++) { if (col[u[i]] == 1) printf(\"1\"); else printf(\"0\"); } return 0;} 后记这是zcy第一次在cf的比赛中切$6$题耶(^-^)V 一定要庆祝一下","link":"/2019/04/02/Codeforces-Round-550/"},{"title":"工程热力学","text":"本博客为清华大学“工程热力学”课程的复习笔记(持续更新中)。 第一章 基本概念热力系统热力系统:人为选取的一定范围内的物质 外界:系统意外的所有物质 边界:系统与外界的分界面 系统与外界的作用都通过边界 出口 有 无 是否传质 开口系 闭口系 是否传热 非绝热系 绝热系 是否传功 非绝功系 绝功系 是否传热、功、质 非孤立系 孤立系 简单可压缩系统:只交换热量及准静态容积变化功 状态和状态参数状态:某一瞬间热力系所呈现的宏观状况 状态参数:描述热力系状态的宏观物理量 状态参数的特征:单值性 强度参数:与物质的量无关的参数如压力 $p$ 、温度 $T$ 广延参数:与物质的量有关的参数,具有可加性,如质量 $m$ 、容积 $V$ 、内能 $U$ 、焓 $H$ 、熵 $S$ 比参数:具有强度参数的性质 eg. 比容 $v = \\frac{V}{m}$ 基本状态参数:压力 $p$,温度 $T$,比容 $v$。 压力常用单位: 1 \\, \\text{kPa} = 10^3 \\, \\text{Pa}, \\quad 1 \\, \\text{bar} = 10^5 \\, \\text{Pa}, \\quad 1 \\, \\text{MPa} = 10^6 \\, \\text{Pa} 1 \\, \\text{atm} = 760 \\, \\text{mmHg} = 1.013 \\times 10^5 \\, \\text{Pa} 1 \\, \\text{at} = 1 \\, \\text{kgf/cm}^2 = 9.8067 \\times 10^4 \\, \\text{Pa}绝对压力与环境压力的相对值 —— 相对压力 注意: 只有绝对压力 $p$ 才是状态参数 绝对压力与相对压力 当 $p > p_b$ :表压力 $p_g$ (Gauge pressure),$p = p_b + p_g$ 当 $p < p_b$ :真空度 $p_v$ (Vacuum pressure),$p = p_b - p_v$ (其中 $p$ 为绝对压力, $p_b$ 为环境压力) 温度温度 $T$ 的一般概念: 传统:冷热程度的度量(感觉,导热介质等有关) 微观:衡量分子平均动能的量度 热力学第零定律:如果两个系统分别与第三个系统处于热平衡,则两个系统彼此必然处于热平衡。 处于同一热平衡状态的各热力系,必定有某一宏观特征彼此相同,用于描述该宏观特征的物理量——温度 比容表示工质聚集的疏密程度 v = \\frac{V}{m} \\text{ }\\left[m^3/\\text{ kg}\\right]物理中常用密度 $\\rho$ v = \\frac{1}{\\rho}平衡状态平衡的本质:不存在不平衡势差 准静态过程与可逆过程准静态过程系统随时接近于某个平衡态 准静态过程的工程条件: 破坏平衡所需时间(外部作用时间) $\\gg$ 自行恢复平衡所需时间(驰豫时间) 能够足够快恢复到新平衡 $\\rightarrow$ 准静态过程 变化过程中的任一时刻的状态都是确定的,即可以用状态参数描述。 准静态过程的容积变化功: $m$ kg工质发生容积变化对外界作的功: \\delta W = p \\times A \\times \\mathrm{d} l = p \\times \\mathrm{d} V \\\\ W = \\int_1^2p \\mathrm d V$1$ kg工质对外界作的功: \\delta w = p \\times \\mathrm d w \\\\ w = \\int_1^2p \\mathrm d w功的大小与路径有关,是过程量 可逆过程一般定义:系统经历某一过程后,如果能使系统与外界同时恢复到初始状态,而不留下任何痕迹,则此过程称为可逆过程。 准静态过程 + 无耗散效应 = 可逆过程 功量功的力学定义: 力 $\\times$ 在力方向上的位移 功的热力学定义:功是系统与外界相互作用的一种方式,是在力的推动下,通过有序运动方式传递的能量。 功的一般表达式: \\delta W = F \\, dx W = \\int F \\, dx 热力学中常见的功: 准静态定容积变化功 膨胀功 (+) or 压缩功 (-) \\delta W = p \\, dV W = \\int p \\, dV热量与熵定义:热量是热力系与外界相互作用的另一种方式,是在温差的推动下,以微观无序运动方式传递的能量。 热量与容积变化功 能量传递方式 容积变化功 热量 性质 过程量 过程量 推动力 无限小 $p$ 势差 无限小 $T$ 势差 标志 $\\mathrm dV,\\mathrm dv$ $\\mathrm dS, \\mathrm ds$ 表达式 $\\delta w = p\\mathrm dv$ $\\delta q = T\\mathrm ds$ $w = \\int p\\mathrm dv$ $q = \\int T\\mathrm ds$ 适用条件 准静态或可逆 可逆 熵 (Entropy) 的定义: \\mathrm dS = \\frac{\\delta Q_{\\text{rev}}}{T}广延量 $\\text{kJ/K}$ \\mathrm ds = \\frac{\\delta q_{\\text{rev}}}{T}比参数 $\\text{kJ/(kg.K)}$ $\\mathrm{d} s$: 可逆过程 $\\delta q_{\\text{rev}}$ 除以传热时的 $T$ 所得的熵 熵的说明: 熵是状态参数 符号规定:系统吸热时为正,$Q > 0$,$dS > 0$系统放热时为负,$Q < 0$,$dS < 0$ 熵的物理意义:熵变可以体现可逆过程传热的大小与方向 用途:判断热量传递方向,计算可逆过程的传热量 热力循环定义: 工质经过一系列变化回到初态,这一系列的变化过程称为热力循环。 热力循环的评价指标正循环:净效应(吸热,对外作功)动力循环:热效率 \\eta = \\frac{收益}{\\text{代价}} = \\frac{\\text{净功}}{\\text{吸热}} = \\frac{W}{Q_1} 逆循环:净效应(对内作功,放热)制冷循环:制冷系数 \\varepsilon = \\frac{\\text{收益}}{\\text{代价}} = \\frac{\\text{吸热}}{\\text{耗功}} = \\frac{Q_2}{W}制热循环:制热系数 \\varepsilon = \\frac{\\text{收益}}{\\text{代价}} = \\frac{\\text{放热}}{\\text{耗功}} = \\frac{Q_1}{W}第二章 热力学第一定律热力学第一定律的本质本质:能量转换与守恒。 闭口系循环的热力学第一定律表达式: \\oint \\delta Q = \\oint \\delta W要想得到功,必须花费热能或其它能量 热力学第一定律又可表述为:第一类永动机是不可能制成的 热力学第一定律的推论——内能内能及闭口系热一律表达式定义 $\\mathrm d U = \\delta Q - \\delta W$,内能 $U$ 是状态参数 闭口系热一律表达式: \\delta Q = \\mathrm d U + \\delta W \\\\ Q = \\Delta U + W内能 $U$ 的物理意义 \\mathrm d U = \\delta Q - \\delta W$\\mathrm d U$ 代表某微元过程中系统通过边界交换的微热量与微功量之差值,也即系统内部能量的变化。 $U$ 代表储存于系统内部的能量——内储存能(内能,热力学能) 内能的说明 内能是状态参数 (state property) $U$:广延参数 $\\text{[kJ]}$ $u$:比参数 $\\text{[kJ/kg]}$ 内能总以变化量出现,其零点可人为确定 系统总能外部储存能 宏观动能 $ E_k = mc^2 / 2$ 重力位能 $ E_p = mgz$ 系统总能 (=内能+动能+位能) E = U + E_k + E_p \\\\ e = u + e_k + e_p热一律的文字表达式热一律: 能量守恒与转换定律 \\text{进入系统的能量} - \\text{离开系统的能量} = \\text{系统总能的变化}闭口系能量方程一般式: \\delta Q = \\mathrm d U + \\delta W Q = \\Delta U + W单位质量工质 \\delta q = \\mathrm d u + \\delta w q = \\Delta u + w适用条件: 1) 任何工质 2) 任何过程 注意: 状态量——微分符号 $\\mathrm d$ 过程量——微小变化符号 $\\delta$ 准静态过程及可逆过程能量方程简单可压缩系准静态过程: \\begin{aligned} & \\delta w = p \\mathrm d v \\newline & \\delta q = \\mathrm u + p \\mathrm d v \\text{(热一律解析式之一)} \\newline &q = \\Delta u + \\int p \\mathrm d v \\end{aligned}简单可压缩系可逆过程: \\begin{aligned} & \\delta q = T \\mathrm d s \\newline & T \\mathrm d s = \\mathrm d u + p \\mathrm d v \\text{(热力学恒等式)} \\newline \\int &T \\mathrm d s = \\Delta u + \\int p \\mathrm d v \\end{aligned}开口系能量方程与焓推进功 的表达式推进功(流动功、推动功):工质进、出开口系而传递的功 W_{推} = p A \\mathrm d l = pV \\\\ w_{推} = pv注意:不是 $p \\mathrm d v$,v 无变化。 对推进功的说明: 与宏观流动有关,流动停止,推进功不存在 作用过程中,工质仅发生位置变化,无状态变化 $w_{推}$ 与所处状态有关,是状态量 并非工质本身能量(动能、位能)变化引起,而由外界(泵或风机)做出,流动工质所携带的能量 可理解为:由于工质的进出,系统与外界交换的一种机械功,表现为流动工质进出系统使所携带或所传递的一种能量。 开口系能量方程的推导热一律: 进入系统的能量-离开系统的能量=系统总能的变化 \\delta Q + \\delta m_{in} \\left( u + pv + c^2 / 2 + gz \\right)_{in} - \\delta m_{out} \\left( u + pv + c^2 / 2 + gz \\right)_{out} - \\delta W_{net} = \\mathrm d E_{cv}其中 $W_{net}$ 为净功,$\\mathrm d E_{cv}$ 为控制体($\\text{control volume}$)总能量的变化。 开口系能量方程通用式及焓的引入定义:焓 $h = u + pv$ \\begin{aligned} \\dot{Q} &= \\frac{dE_{cv}}{\\delta \\tau} + \\dot{W}_{\\text{net}} \\\\ &+ \\sum \\left( h + \\frac{c^2}{2} + gz \\right)_{\\text{out}} \\dot{m}_{\\text{out}} \\\\ &- \\sum \\left( h + \\frac{c^2}{2} + gz \\right)_{\\text{in}} \\dot{m}_{\\text{in}} \\end{aligned} \\\\开口系能量方程通用式 焓 Enthalpy 的说明:定义: h = u + pv \\quad \\text{[kJ/kg]} \\\\ H = U + pV \\quad \\text{[kJ]} 焓是状态量 (state property) $H$ 为广延参数: H = U + pV = m \\left( u + pv \\right) = mh$h$ 为比参数 对流动工质,焓代表能量(内能 + 推进功)对静止工质,焓不代表能量 物理意义:开口系中随工质流动而携带的能量,取决于热力状态的能量。 稳定流动能量方程与技术功稳定流动的条件 \\dot{m}_{\\text{out}} = \\dot{m}_{\\text{in}} = \\dot{m} \\dot{Q} = \\text{Const} \\dot{W}_{\\text{net}} = \\text{Const} = \\dot{W}_s (轴功\\text{ Shaft work}) \\frac{dE_{cv}}{d\\tau} = 0 稳定流动能量方程的推导 q = \\left( h + \\frac{c^2}{2} + gz \\right)_{\\text{out}} - \\left( h + \\frac{c^2}{2} + gz \\right)_{\\text{in}} + w_s q = \\Delta h + \\frac{1}{2} \\Delta c^2 + g \\Delta z + w_s稳定流动能量方程,适用于任何工质稳定流动过程。 技术功 Technical work Q = \\Delta H + \\frac{1}{2} m\\Delta c^2 + mg \\Delta z + W_s = \\Delta H + w_t \\\\ q = \\Delta h + \\frac{1}{2} \\Delta c^2 + g \\Delta z + w_s = \\delta h + w_t$W_t$(动能,位能,轴功)$\\rightarrow$ 机械能,工程技术上可以直接利用 稳定流动过程中几种功的关系闭口系 $ q = \\Delta u + w $(容积变化功) 等价于稳流开口系 $ q = \\Delta h + w_t $(技术功) w = \\Delta (pv) + w_t \\\\ \\Delta h = \\Delta u + \\Delta(pv) \\\\ \\Downarrow \\\\ q = \\Delta h + w_t = \\Delta u + \\Delta (pv) + w_t = \\Delta u + w简单可压缩系统准静态过程技术功 \\begin{cases} w = \\Delta(pv) + w_t \\Rightarrow \\delta w &= d (pv) + \\delta w_t \\\\ \\text{准静态 } \\delta w &= p \\mathrm d v \\end{cases} \\delta w_t = p \\mathrm d v - \\mathrm d (pv) = p \\mathrm d v - (p \\mathrm d v + v \\mathrm d p) = -v \\mathrm d p \\delta w_t = -v \\mathrm d p, \\qquad w_t = -\\int v \\mathrm d p \\text{准静态} \\begin{cases} \\delta q = du + p \\, dv \\\\ \\delta q = dh - v \\, dp \\end{cases}准静态过程技术功在示功图上的表示 w_t = w - \\Delta(pv)技术功为膨胀功与推进功差值的代数和 $\\mathrm d p < 0$ 压力降低, $w_t > 0$ 对外作功 稳定流动能量方程的应用动力机械 能量方程1) 体积不大 2) 流速差不大: $q = \\Delta h + w_s$ 3) 时间短、保温: $q \\approx 0$ $w_s = -\\Delta h = h_1 - h_2 > 0$, 输出的轴功是靠焓降转变而来的。 压缩机械 能量方程1) 体积不大 2) 流速差不大: $q = \\Delta h + w_s$ 3) 保温层: $q \\approx 0$ $w_s = -\\Delta h = h_1 - h_2 < 0$,输入的轴功转变为焓升 换热设备 能量方程$q = \\Delta h + w_s$。无作用部件,$w_s = 0$ $\\Rightarrow q = \\Delta h = h_2 - h_1$ 焓变: 热流体放热:$q = \\Delta h = h_2 - h_1 < 0$ 冷流体吸热:$q’ = \\Delta h = h_1’ - h_2’ > 0$ 绝热节流 能量方程$q = \\Delta h + w_s$, 绝热: $q = 0$ 没有作用部件: $w_s = 0$ \\Delta h = 0, h_1 = h_2在忽略动、位能变化的绝热节流过程中,节流前后的工质焓值相等。但需注意,由于在上、下游截面之间,特别在缩口附近,流速变化很大,焓值并不处处相等,即不能把此绝热节流过程理解为定焓过程。 第三章 理想气体的性质与过程理想气体模型1. 分子间无相互作用力。(只有弹性碰撞力)2. 分子本身不占容积。(质点) 实际气体,当其 $p$ 很小, $v$ 很大, $T$ 不太低时, 即处于远离液态的稀薄状态时, 可视为理想气体。 三原子分子 $(H_2O, CO_2)$ 一般不能当作理想气体。 特殊情况可以,如空调的湿空气,高温烟气的 $CO_2$ 一般:$T \\geq$ 常温,$p < 7 \\text{ MPa}$ 的双原子分子 $\\Rightarrow$。(eg. $O_2, N_2, \\text{Air}, CO_2, H_2$) 理想气体状态方程理想气体定义: 遵循克拉贝龙状态方程的气体。 克拉贝龙状态方程的四种形式: \\begin{cases} 1 \\, \\text{kmol}&: \\quad p V_m = R_m T \\\\ n \\, \\text{kmol}&: \\quad p V = n R_m T \\\\ 1 \\, \\text{kg}&: \\quad p v = R T \\\\ m \\, \\text{kg}&: \\quad p V = m R T \\end{cases}注意: 1) $R_m$ 与 $R$ 2) 摩尔容积 $V_m$ 3) 统一单位 $R_m$ 与 $R$ 的区别$R_m$ —— 通用气体常数 R_m = 8.3143 \\, \\text{kJ/(kmol} \\cdot \\text{K)} 与气体种类无关 $R$ —— 气体常数 R = \\frac{R_m}{M} \\, \\text{kJ/(kg} \\cdot \\text{K)} 与气体种类有关 $M$ —— 摩尔质量 例如: R_{\\text{空气}} = \\frac{R_m}{M_{\\text{空气}}} = \\frac{8.3143}{28.97} = 0.287 \\, \\text{kJ/(kg} \\cdot \\text{K)}比热容比热容的定义: C = \\frac {\\delta q } {\\mathrm d T}单位物量的物质升高 $1K$ 或 $1 ^\\circ\\text{C}$ 所需的热量 各种比热容: $c$: 质量比热容 \\text{kJ/(kg} \\cdot \\text{K)}, kJ / \\text{kJ/(kg} \\cdot ^\\circ\\text{C)} $C_m$: 摩尔比热容 \\text{kJ/(kmol} \\cdot \\text{K)}, kJ / \\text{kJ/(kmol} \\cdot ^\\circ\\text{C)} $C’$: 容积比热容 \\text{kJ/(m}^3 \\cdot \\text{K)}, kJ / \\text{kJ/(m}^3 \\cdot ^\\circ\\text{C)} C_m = M \\cdot c = 22.414 \\cdot C'比热容是过程量。 常用某些特定过程的比热容:$\\begin{cases}定容比热容 \\\\定压比热容\\end{cases}$ 定容比热容 $c_v$准静态过程 $\\delta q = du + p dv$,$u$ 是状态量,设 $u = f(T, v)$: du = \\left( \\frac{\\partial u}{\\partial T} \\right)_v dT + \\left( \\frac{\\partial u}{\\partial v} \\right)_T dv因此: \\delta q = \\left( \\frac{\\partial u}{\\partial T} \\right)_v dT + \\left[ p + \\left( \\frac{\\partial u}{\\partial v} \\right)_T \\right] dv定容条件下: \\delta q = \\left( \\frac{\\partial u}{\\partial T} \\right)_v dT \\quad \\Rightarrow \\quad c_v = \\left( \\frac{\\delta q}{dT} \\right)_v = \\left( \\frac{\\partial u}{\\partial T} \\right)_v定压比热容 $c_p$准静态过程 $\\delta q = dh - v dp$,$h$ 是状态量,设 $h = f(T, p)$ dh = \\left( \\frac{\\partial h}{\\partial T} \\right)_p dT + \\left( \\frac{\\partial h}{\\partial p} \\right)_T dp因此: \\delta q = \\left( \\frac{\\partial h}{\\partial T} \\right)_p dT + \\left[ \\left( \\frac{\\partial h}{\\partial p} \\right)_T - v \\right] dp定压条件下: \\delta q = \\left( \\frac{\\partial h}{\\partial T} \\right)_p dT \\quad \\Rightarrow \\quad c_p = \\left( \\frac{\\delta q}{dT} \\right)_p = \\left( \\frac{\\partial h}{\\partial T} \\right)_p$c_v$ 和 $c_p$ 的说明 c_v = \\left( \\frac{\\partial u}{\\partial T} \\right)_v, \\quad c_p = \\left( \\frac{\\partial h}{\\partial T} \\right)_p $c_v$ 和 $c_p$ 是状态量$c_v$ 物理意义:$\\mathbf{v}$ 不变时 $1 \\, kg$ 工质温升 $1K$ 内能的增加量$c_p$ 物理意义:$\\mathbf{p}$ 不变时 $1 \\, kg$ 工质温升 $1K$ 焓的增加量 前面的推导没有用到理想气体性质,适用于任何气体。 $h$、$u$ 的计算要用 $c_v$ 和 $c_p$ 理想气体的内能、焓、熵和比热容理想气体的内能$u = f(T)$,理想气体 $u$ 只与 $T$ 有关 实际气体: u = f(T, v), c_v = \\left( \\frac{\\partial u}{\\partial T} \\right)_v du = \\left( \\frac{\\partial u}{\\partial T} \\right)_v \\mathrm dT + \\left( \\frac{\\partial u}{\\partial v} \\right)_T \\mathrm dv = c_v \\mathrm d T + \\left( \\frac{\\partial u}{\\partial v} \\right)_T \\mathrm dv理想气体: u = f(T), \\left( \\frac{\\partial u}{\\partial v} \\right)_T = 0 du = c_v \\mathrm d T对理想气体,任何过程都成立 理想气体的焓 h = u + pv = u + RT = f(T)理想气体 $h$ 只与 $T$ 有关。 实际气体: \\mathrm{d}h = \\left( \\frac{\\partial h}{\\partial T} \\right)_p \\mathrm{d}T + \\left( \\frac{\\partial h}{\\partial p} \\right)_T \\mathrm{d}p \\mathrm{d}h = c_p \\mathrm{d}T + \\left( \\frac{\\partial h}{\\partial p} \\right)_T \\mathrm{d}p理想气体: \\left( \\frac{\\partial h}{\\partial p} \\right)_T = 0因此: \\mathrm{d}h = c_p \\mathrm{d}T对理想气体,任何过程都成立 理想气体的熵熵的定义: $\\mathrm{d}s = \\frac{\\delta q_{\\text{rev}}}{T}$ 可逆过程 T \\mathrm{d}s = \\delta q_{\\text{rev}} = \\mathrm{d}u + p \\mathrm{d}v = \\mathrm{d}h - v \\mathrm{d}p因此: \\mathrm{d}s = \\frac{\\mathrm{d}u}{T} + \\frac{p \\mathrm{d}v}{T} = \\frac{\\mathrm{d}h}{T} - \\frac{v \\mathrm{d}p}{T}理想气体满足: \\mathrm{d}u = c_v \\mathrm{d}T \\\\ \\mathrm{d}h = c_p \\mathrm{d}T \\\\ pv = RT因此, \\mathrm{d}s = \\frac{c_v \\mathrm{d}T}{T} + R \\frac{\\mathrm{d}v}{v} = \\frac{c_p \\mathrm{d}T}{T} - R \\frac{\\mathrm{d}p}{p} = c_p \\frac{\\mathrm{d}v}{v} + c_v \\frac{\\mathrm{d}p}{p}理想气体的 $c_v$ 和 $c_p$ 的关系一般工质: c_v = \\left( \\frac{\\partial u}{\\partial T} \\right)_v, \\quad c_p = \\left( \\frac{\\partial h}{\\partial T} \\right)_p理想气体: c_v = \\frac{\\mathrm{d}u}{\\mathrm{d}T}, \\quad c_p = \\frac{\\mathrm{d}h}{\\mathrm{d}T}因此: c_p = \\frac{\\mathrm{d}h}{\\mathrm{d}T} = \\frac{\\mathrm{d}u}{\\mathrm{d}T} + \\frac{\\mathrm{d}(pv)}{\\mathrm{d}T} = c_v + R迈耶公式: c_p - c_v = R令: k = \\frac{c_p}{c_v} \\quad \\text{(比热比)}则: c_v = \\frac{R}{k - 1}, \\quad c_p = \\frac{kR}{k - 1}由于理想气体的内能,焓都只是温度的单值函数,理想气体的定压、定容比热容也只是温度的单值函数,甚至可能是定值。 通常只会在温度不太高、温度范围比较窄,且计算精度要求不高的情况下,或者为了分析问题方便,才将摩尔热容近似地看作定值。 实际上分子内部还存在振动,而且分子转动与振动的能量与温度并不是线性关系,因此理想气体热容并非定值,而是温度的单值函数。 理想气体比热容、内能、焓和熵的计算 \\mathrm{d}u = c_v \\mathrm{d}T \\quad \\mathrm{d}h = c_p \\mathrm{d}T \\mathrm{d}s = c_p \\frac{\\mathrm{d}v}{v} + c_v \\frac{\\mathrm{d}p}{p} = \\frac{c_v \\mathrm{d}T}{T} + R \\frac{\\mathrm{d}v}{v} = \\frac{c_p \\mathrm{d}T}{T} - R \\frac{\\mathrm{d}p}{p}$u$、$h$、$s$ 的计算要用 $c_v$ 和 $c_p$ 理想气体比热容的计算方法:按定比热计算: 由分子运动学: U_m = \\frac{i}{2} R_m T \\quad \\text{(运动自由度)} C_{v,m} = \\frac{dU_m}{dT} = \\frac{i}{2} R_m, \\quad C_{p,m} = \\frac{dH_m}{dT} = \\frac{d(U_m + R_m T)}{dT} = \\frac{i + 2}{2} R_m 单原子 双原子 多原子 $C_{v,m}$ (kJ/(kmol·K)) $\\dfrac{3}{2} R_m$ $\\dfrac{5}{2} R_m$ $\\dfrac{7}{2} R_m$ $C_{p,m}$ (kJ/(kmol·K)) $\\dfrac{5}{2} R_m$ $\\dfrac{7}{2} R_m$ $\\dfrac{9}{2} R_m$ $k$ (比热比) $1.67$ $1.4 $ $1.29 $ 按真实比热计算: 理想气体 u = f(T), \\quad h = \\psi(T) c_v = \\frac{\\mathrm{d}u}{\\mathrm{d}T} = f'(T), \\quad c_p = \\frac{\\mathrm{d}h}{\\mathrm{d}T} = \\psi'(T)根据实验结果整理: C_{v,m} = a_0 + a_1 T + a_2 T^2 + a_3 T^3 + \\cdots C_{p,m} = b_0 + b_1 T + b_2 T^2 + b_3 T^3 + \\cdots c_{p,\\text{R134a}} / R = 2.1015 + 0.03252 T - 17.457 \\times 10^{-6} T^2按平均比热计算: c \\Big|_{t_1}^{t_2} = \\dfrac{c \\Big|_{0}^{t_2} t_2 - c \\Big|_{0}^{t_1} t_1}{t_2 - t_1}理想气体 $\\Delta u$ 的计算 \\mathrm{d}u = c_v \\mathrm{d}T, \\quad 理想气体,任何过程 $c_v = \\text{Const}$ \\Delta u = c_v \\Delta T = c_v (T_2 - T_1) $c_v$ 为真实比热 \\Delta u = \\int_{T_1}^{T_2} c_v \\, \\mathrm{d}T $c_v$ 为平均比热 \\Delta u = c_v \\Big|_{t_1}^{t_2} \\cdot (T_2- T_1) 若为空气,可直接查附表2: \\Delta u = u_2 - u_1理想气体 $\\Delta h$ 的计算 \\mathrm{d}h = c_p \\mathrm{d}T,\\quad 理想气体,任何过程 $c_p = \\text{Const}$ \\Delta h = c_p \\Delta T $c_p$ 为真实比热 \\Delta h = \\int_{T_1}^{T_2} c_p \\, \\mathrm{d}T $c_p$ 为平均比热 \\Delta h = c_p \\Big|_{t_1}^{t_2} \\cdot (T_2- T_1) 若为空气,可直接查附表2: \\Delta h = h_2 - h_1理想气体 $\\Delta s$ 的计算 \\mathrm{d}s = c_v \\frac{\\mathrm{d}T}{T} + R \\frac{\\mathrm{d}v}{v} = c_p \\frac{\\mathrm{d}T}{T} - R \\frac{\\mathrm{d}p}{p} = c_p \\frac{\\mathrm{d}v}{v} + c_v \\frac{\\mathrm{d}p}{p}理想气体,任何过程 定比热(常用) \\Delta s = c_v \\ln \\frac{T_2}{T_1} + R \\ln \\frac{v_2}{v_1} = c_p \\ln \\frac{T_2}{T_1} - R \\ln \\frac{p_2}{p_1} = c_p \\ln\\frac{v_2}{v_1} + c_v \\ln \\frac{p_2}{p_1}理想气体的等熵过程 绝热过程未必是等熵过程,只有可逆绝热过程才是等熵过程。 不仅 $\\Delta s = 0$,而且 $\\mathrm{d}s = 0 \\longrightarrow s$ 处处相等。 理想气体 \\mathrm{d}s = c_v \\frac{\\mathrm{d}p}{p} + c_p \\frac{\\mathrm{d}v}{v} = 0 k = \\frac{c_p}{c_v} \\longrightarrow \\frac{\\mathrm{d}p}{p} + k \\frac{\\mathrm{d}v}{v} = 0当 $k = \\text{const}$ 时: \\ln p + k \\ln v = \\text{const} p v^k = \\text{const}即为可逆绝热系统的过程方程 需要满足三个条件: 理想气体 等熵过程 $k$ (比热比,绝热指数)为常数 理想气体绝热过程的过程方程 p v^k = \\text{Const} \\quad \\Rightarrow \\quad \\frac{p_2}{p_1} = \\left( \\frac{v_1}{v_2} \\right)^k p v^k = (p v) v^{k-1} = R T v^{k-1} = \\text{Const} T v^{k-1} = \\text{Const} \\quad \\Rightarrow \\quad \\frac{T_2}{T_1} = \\left( \\frac{v_1}{v_2} \\right)^{k-1} p v^k = \\frac{p^k v^k}{p^{k-1}} = \\left( \\frac{R T}{p} \\right)^k = \\text{Const} \\frac{T}{p^{\\frac{k-1}{k}}} = \\text{Const} \\quad \\Rightarrow \\quad \\frac{T_2}{T_1} = \\left( \\frac{p_2}{p_1} \\right)^{\\frac{k-1}{k}}理想气体绝热过程 $w$, $w_t$, $q$ 的计算容积变化功: p v^k = C w = \\int p \\, \\mathrm{d}v = \\int \\frac{C}{v^k} \\, \\mathrm{d}v = \\frac{C}{1-k} v^{1-k} \\Bigg|_1^2 = \\frac{1}{k-1} \\left( p_1 v_1 - p_2 v_2 \\right) = \\frac{R}{k-1} (T_1 - T_2) = c_v (T_1 - T_2) = - \\Delta u对于定比热容理想气体的可逆绝热过程,利用 $p v^k = \\text{const}$ : w = \\frac{RT_1}{k - 1} \\left[ 1 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{k - 1}{k}} \\right]技术功: w_t = - \\int v \\, \\mathrm{d}p = - \\Delta h = c_p (T_1 - T_2) = k w \\\\ = \\frac{k}{k-1} \\left( p_1 v_1 - p_2 v_2 \\right) = \\frac{kR}{k-1} (T_1 - T_2)可逆绝热时有: w_t = \\frac{k}{k - 1} RT_1 \\left[ 1 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{k - 1}{k}} \\right]热量: q = 0理想气体热力过程综合分析多变过程与基本过程 p v^n = \\text{Const}$n$ 是常量,每一个多变过程 $n$ 为定值 例:$n = k \\quad \\longrightarrow \\quad$ 等熵过程 \\frac{p_2}{p_1} = \\left( \\frac{v_1}{v_2} \\right)^n, \\quad \\frac{T_2}{T_1} = \\left( \\frac{v_1}{v_2} \\right)^{n-1}, \\quad \\frac{T_2}{T_1} = \\left( \\frac{p_2}{p_1} \\right)^{\\frac{n-1}{n}}理想气体多变过程 $w$, $w_t$, $q$ 的计算$p v^n = \\text{Const}$ \\begin{aligned} w &= \\int p \\, \\mathrm{d}v = \\begin{cases} \\dfrac{R}{n-1} (T_1 - T_2) \\; , \\; n \\not = 1 \\\\ \\dfrac{k}{k - 1} RT_1 \\left[ 1 - \\left( \\dfrac{p_2}{p_1} \\right)^{\\frac{k - 1}{k}} \\right] \\;, \\; 0 \\not = n \\not = 1 \\\\ RT \\ln \\dfrac{v_2}{v_1} = RT \\ln \\dfrac{p_1}{p_2} \\; , \\; n = 1 \\end{cases} \\\\ w_t &= \\begin{cases} n w \\;,\\;n \\not = \\infty\\\\ - v \\cdot \\Delta p \\; , \\; n = \\infty \\end{cases} \\\\ q &= \\Delta u + w = c_v (T_2 - T_1) - \\frac{R}{n-1} (T_2 - T_1) \\\\ &= \\left( c_v - \\frac{R}{n-1} \\right) (T_2 - T_1) = \\frac{n-k}{n-1} c_v (T_2 - T_1) = c_n (T_2 - T_1) \\\\ \\end{aligned}$c_n$ 为多变比热容 多变过程与基本过程的关系基本过程是多变过程的特例。 当 $n = 0$: (等压过程) p v^0 = \\text{Const} \\quad \\Rightarrow \\quad p = \\text{Const}, \\quad c_n = k c_v = c_p 当 $n = 1$: (等温过程) p v^1 = \\text{Const} = RT \\quad \\Rightarrow\\quad T = \\text{Const}, \\quad c_n = \\infty 当 $n = k$: (等熵过程) p v^k = \\text{Const} \\quad \\Rightarrow\\quad s = \\text{Const}, \\quad c_n = 0 当 $n = \\infty$: (等容过程) p v^{\\frac{1}{n}} = \\text{Const} \\quad\\Rightarrow \\quad v = \\text{Const}, \\quad c_n = c_v 理想气体基本过程的计算气体的各种热力过程: 过程 过程方程式 初、终状态参数间的关系 功量交换 $w$ (J/kg) 功量交换 $w_t$ 热量交换 $q$ (J/kg) 定容 $v = \\text{定数}$ $v_2 = v_1$ , $\\dfrac{T_2}{T_1} = \\dfrac{p_2}{p_1}$ $0$ $v(p_1 - p_2)$ $c_v (T_2 - T_1)$ 定压 $p = \\text{定数}$ $p_2 = p_1$ , $\\dfrac{T_2}{T_1} = \\dfrac{v_2}{v_1}$ $p(v_2 - v_1)$ 或 $R(T_2 - T_1)$ $0$ $c_p (T_2 - T_1)$ 定温 $pv = \\text{定数}$ $T_2 = T_1$, $\\dfrac{p_2}{p_1} = \\dfrac{v_1}{v_2}$ $p_1 v_1 \\ln \\dfrac{v_2}{v_1}$ $w$ $w$ 绝热 $pv^k = \\text{定数}$ $\\dfrac{p_2}{p_1} = \\left( \\dfrac{v_1}{v_2} \\right)^k$ $\\dfrac{T_2}{T_1} = \\left( \\dfrac{v_1}{v_2} \\right)^{k-1}$ $\\dfrac{p_1 v_1 - p_2 v_2}{k - 1}$ 或 $\\dfrac{R}{k - 1} (T_1 - T_2)$ $kw$ $0$ 多变 $pv^n = \\text{定数}$ $\\dfrac{p_2}{p_1} = \\left( \\dfrac{v_1}{v_2} \\right)^n$ $\\dfrac{T_2}{T_1} = \\left( \\dfrac{v_1}{v_2} \\right)^{n-1}$ $\\dfrac{p_1 v_1 - p_2 v_2}{n - 1}$ 或 $\\dfrac{R}{n - 1} (T_1 - T_2)$ $nw^{\\circ}$ $c_n (T_2 - T_1)$ $= \\left( c_v - \\dfrac{R}{n - 1} \\right)$ $\\times (T_2 - T_1)$ 气体的压缩压气机的作用: 生活中:自行车打气。 工业上:锅炉鼓风、出口引风、燃气轮机、制冷空调等 结构形式: 活塞式(往复式),视为连续流动 (离心式、涡旋)、(轴流式、螺杆):连续流动 压力范围: 通风机 $\\Delta p < 0.01 \\text{ MPa}$ 鼓风机 $0.01 \\text{ MPa} < \\Delta p < 0.3 \\text{ MPa}$ 压缩机 $\\Delta p > 0.3 \\text{ MPa}$ 活塞式压气机压缩过程分析目的:研究耗功,耗功最小 理论压气功:可逆过程中压送气体消耗的技术功 $w_t$ 可能的压气过程 特别快,来不及换热:绝热过程 $\\quad n = k$ 特别慢,热量全部散失:等温过程 $\\quad n = 1$ 一般压气过程:$n$ 不变 $\\quad 1 < n < k$ 三种压气过程参数的比较 三种压气过程的技术功 w_{tn} = \\frac{n}{n - 1} RT_1 \\left[ 1 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{n - 1}{n}} \\right] w_{ts} = \\frac{k}{k - 1} RT_1 \\left[ 1 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{k - 1}{k}} \\right] w_{T} = RT_1 \\ln \\frac{p_1}{p_2} \\qquad 最小最佳增压比的推导 w_{\\text{分级}} = w_{tn,I} + w_{tn,II} = \\frac{n}{n - 1} RT_1 \\left[ 1 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{n - 1}{n}} \\right] + \\frac{n}{n - 1} RT_3 \\left[ 1 - \\left( \\frac{p_4}{p_3} \\right)^{\\frac{n - 1}{n}} \\right]其中,$T1 = T3, \\; p_2 = p_3$ w_{\\text{分级}} = \\frac{n}{n - 1} RT_1 \\left[ 2 - \\left( \\frac{p_2}{p_1} \\right)^{\\frac{n - 1}{n}} - \\left( \\frac{p_4}{p_2} \\right)^{\\frac{n - 1}{n}} \\right]求 $w_{\\text{分级}}$ 的最小值:$\\dfrac{\\partial w_{\\text{分级}}}{\\partial p_2} = 0$ \\frac{p_2}{p_1} = \\frac{p_4}{p_2} \\Rightarrow p_2 = \\sqrt{p_1 p_4}最佳增压比: \\beta = \\dfrac{p_2}{p_1} = \\dfrac{\\sqrt{p_1 p_4}}{p_1} = \\sqrt{\\dfrac{p_4}{p_1}} = \\sqrt{\\dfrac{p_{\\text{终}}}{p_{\\text{初}}}}若为 $m$ 级,则 \\beta = \\sqrt[m]{\\dfrac{p_{\\text{终}}}{p_{\\text{初}}}}分级压缩的级数分级的作用:1. 省功 2. 降低出口温度 多级压缩达到无穷多级 $\\quad \\Rightarrow \\quad$ 等温过程。(缺点:难以实现、结构复杂(成本高)) 一般采用 $2 \\sim 4$ 级压缩 第四章 热力学第二定律热力学第二律的表述与实质 开尔文-普朗克表述:热功转换角度 克劳修斯表述:热量传递角度 开尔文-普朗克表述不可能从单一热源取热,并使之完全转变为有用功而不产生其它影响。 热机不可能将从热源吸收的热量全部转变为有用功,而必须将其中的一部分传给冷源。 理想气体等温过程 $q = w$ ?气体膨胀,产生其他影响。 热源/冷源的特点:容量无限大,吸热或放热,其温度不变。 第二类永动机:从单一热源取热,并使之完全变为功的热机。由热力学第二定律可以知道,第二类永动机是不可能制造成功的。 克劳修斯表述不可能将热从低温物体传至高温物体而不引起其它变化。 空调的制冷与制热? 代价:耗功 热量不可能自发地、不付代价地从低温物体传至高温物体。 热力学第二定律的实质自发过程都是具有方向性的 卡诺定理与卡诺循环卡诺定理定理:在两个不同温度的恒温热源间工作的所有热机,可逆热机的热效率最高。 即在恒温 $T_1$ 与 $T_2$ 下,$n_{t, A} \\not > n_{t, R}$ 卡诺定理的推论 在两个不同温度的恒温热源间工作的一切可逆热机具有相同的热效率,且与工质的性质无关。 在两个不同温度的恒温热源间工作的任何不可逆热机,其热效率总小于这两个热源间工作的可逆热机的热效率。 卡诺循环——理想可逆热机循环 卡诺热机循环的热效率热机热效率:$\\eta_t = \\dfrac{w}{q_1} = \\dfrac{q_1 - q_2}{q_1} = 1 - \\dfrac{q_2}{q_1}$ 卡诺定理推论:$T_1$ 与 $T_2$ 间的一切可逆热机,热效率相同,与工质性质无关。 对于卡诺热机循环,取理想气体为工质,其可逆定温过程的吸热量 $q_1$ 和放热量 $q_2$: q_1 = RT_1 \\ln \\frac{v_2}{v_1}, \\quad q_2 = RT_2 \\ln \\frac{v_3}{v_4}而对于 $2 - 3$ 及 $4 - 1$ 等熵过程, \\frac{v_2}{v_3} = \\left( \\frac{T_2}{T_1} \\right)^{\\frac{1}{k-1}} = \\frac{v_1}{v_4}, \\quad \\frac{v_2}{v_1} = \\frac{v_3}{v_4}所以, \\frac{q_2}{q_1} = \\dfrac{RT_2 \\ln \\dfrac{v_3}{v_4}}{RT_1 \\ln \\dfrac{v_2}{v_1}} = \\frac{T_2}{T_1}卡诺热机循环热效率: \\eta_{t, C} = 1 - \\frac{q_2}{q_1} = 1 - \\frac{T_2}{T_1}卡诺热机循环热效率的说明 \\eta_{t, C} = 1 - \\frac{T_2}{T_1} $\\eta_{t, c}$ 只取决于恒温热源的温度 $T_1$ 和 $T_2$,而与工质的性质无关; $T_1 \\uparrow$ 或 $T_2 \\downarrow$ 时,$\\eta_{t, c} \\uparrow$,即温差越大,$\\eta_{t, c}$ 越高 $T_1 \\neq \\infty \\, \\text{K}$,$T_2 \\neq 0 \\, \\text{K}$,∴ $\\eta_{t, c} < 100\\%$,符合热力学第二定律 当 $T_1 = T_2$,$\\eta_{t, c} = 0$,单热源热机不可能 卡诺逆循环 卡诺制冷循环 \\varepsilon_C = \\frac{q_2}{w} = \\frac{q_2}{q_1 - q_2} = \\frac{1}{\\dfrac{q_1}{q_2} - 1} = \\frac{1}{\\dfrac{T_0}{T_2} - 1} = \\frac{T_2}{T_0 - T_2} 卡诺制热循环 \\varepsilon' = \\frac{q_1}{w} = \\frac{q_1}{q_1 - q_2} = \\frac{1}{1 - \\dfrac{q_2}{q_1}} = \\frac{1}{1 - \\dfrac{T_0}{T_1}} = \\frac{T_1}{T_1 - T_0}多热源(变温热源)可逆热机 Q_{R\\text{多}1} < Q_{C1} \\quad Q_{R\\text{多}2} > Q_{C2} \\eta_t = 1 - \\frac{Q_2}{Q_1}因此 $\\eta_{t R\\text{多}} < \\eta_{t C}$ 平均温度法: $T_1$ 与 $T_2$ 间的卡诺循环 $\\eta_{t R\\text{多}} = 1 - \\dfrac{\\overline{T_2}}{\\overline{T_1}}$ 概括性卡诺循环用两个多变过程取代可逆绝热过程,吸热和放热的多变指数 $n$ 相同。 完全回热:$Q_{da} = -Q_{bc}$ \\overline{ab} = \\overline{cd} = \\overline{ef} \\eta_{t R\\text{概括}} = 1 - \\frac{T_2}{T_1} = \\eta_{t C}提供了提高热效率的一个途径:回热 关于热机热效率的小结 在两个恒温热源 $T_1$, $T_2$ 间工作的一切可逆热机,$\\eta_R = \\eta_C$,与工质性质无关。 多(变温)热源间工作的可逆热机 $\\eta_{R\\text{多}} <$ 同温限间工作卡诺热机 $\\eta_C$ 不可逆热机 $\\eta_{IR} <$ 同热源间工作可逆热机 $\\eta_R$ \\eta_{IR} < \\eta_R = \\eta_C因此在给定温度界限间工作的一切热机,$\\eta_C$ 最高 $\\Longrightarrow$ 热机极限 实际热机与卡诺热机卡诺热机只有理论意义,是理想极限。实际上等温过程和绝热过程很难实现。 克劳修斯不等式及熵的引出 对于任意可逆循环,如用无数组绝热线将其分割,微元循环 $\\text{abfga}$ 可近似看成微卡诺循环。 因此,对任意循环,有克劳修斯不等式: \\oint \\frac{\\delta Q}{T_r} \\leq 0 \\qquad T_r 代表热源温度 < 不可逆循环 = 可逆循环 > 不可能 是热力学第二定律表达式之一 熵的引出以及物理意义定义: 熵 \\mathrm{d}S = \\frac{\\delta Q_{\\text{re}}}{T}比熵 \\mathrm{d}s = \\frac{\\delta q_{\\text{re}}}{T}$T$:热源温度 $=$ 工质温度 熵的物理意义之一:熵变可以体现可逆过程热交换的方向和大小 可逆时 \\left\\{ \\begin{array}{l} \\mathrm{d}S > 0 \\Rightarrow \\delta Q > 0 \\\\ \\mathrm{d}S < 0 \\Rightarrow \\delta Q < 0 \\\\ \\mathrm{d}S = 0 \\Rightarrow \\delta Q = 0 \\end{array} \\right.熵变及熵的循环积分熵是状态量,熵变与路径无关,只与初终态有关 \\Delta S_{\\text{12可逆}} = \\Delta S_{\\text{12不可逆}}熵的环积分: \\oint \\mathrm d S_{\\text{可逆}} = \\oint \\mathrm d S_{\\text{不可逆}} = 0不可逆过程熵的变化熵变与传热量及熵流与熵产任意不可逆循环 \\oint \\frac{\\delta Q}{T_r} < 0 \\Longrightarrow \\int_{1a2} \\frac{\\delta Q}{T_r} + \\int_{2b1} \\frac{\\delta Q}{T_r} < 0而 \\int_{2b1} \\frac{\\delta Q}{T_r} = -\\int_{1b2} \\frac{\\delta Q}{T_r}因此 \\int_{1a2} \\frac{\\delta Q}{T_r} < \\int_{1b2} \\frac{\\delta Q}{T_r} = \\Delta S_{12} \\Delta S_{12} = S_2 - S_1 \\geq \\int_{1}^{2} \\frac{\\delta Q}{T_r} > 不可逆 \\qquad = 可逆 \\Delta S \\geq \\int \\frac{\\delta Q}{T_r},对于循环\\;\\Delta S=0 \\Longrightarrow 克劳修斯不等式不可逆绝热过程:$\\delta Q = 0$,$\\Delta S > 0$ 除了传热,还有其它因素影响熵 $\\Rightarrow$ 不可逆因素会引起熵变 熵流和熵产熵流:$\\mathrm{d}S_f = \\dfrac{\\delta Q}{T_r}$ 由热流进、出系统引起的 熵产:由不可逆因素引起的。永远有 $\\mathrm{d}S_g > 0$ \\mathrm{d}S = \\mathrm{d}S_f + \\mathrm{d}S_g \\Delta S = \\Delta S_f + \\Delta S_g热力学第二定律表达式之一 熵产是过程不可逆性程度的度量 熵变的计算1. 理想气体 a. 初、终态 任意过程 \\left\\{ \\begin{array}{l} \\Delta S_{12} = \\int_{1}^{2} c_v \\dfrac{\\mathrm{d}T}{T} + R \\ln \\dfrac{v_2}{v_1} \\\\ \\Delta S_{12} = \\int_{1}^{2} c_p \\dfrac{\\mathrm{d}T}{T} - R \\ln \\dfrac{p_2}{p_1} \\\\ \\Delta S_{12} = \\int_{1}^{2} c_p \\dfrac{\\mathrm{d}v}{v} + \\int_{1}^{2} c_v \\dfrac{\\mathrm{d}p}{p} \\end{array} \\right.b. 定义式 \\Delta S_{12} = \\Delta S_{13} + \\Delta S_{32} = \\frac{Q_{13}}{T_1} \\Delta S_{12} = \\Delta S_{14} + \\Delta S_{42} = \\frac{Q_{42}}{T_2}2. 非理想气体 :查图表 3. 固体和液体 $\\mathrm{d}S = \\dfrac{\\delta Q_{\\text{re}}}{T} \\qquad$ 熵变与过程无关 而 \\delta Q_{\\text{re}} = \\mathrm{d}U + p \\mathrm{d}V = \\mathrm{d}U = cm \\, \\mathrm{d}T \\therefore \\mathrm{d}S = \\frac{\\delta Q_{\\text{re}}}{T} = \\frac{cm \\, \\mathrm{d}T}{T}通常 $c_p = c_v = c$ 为常数,例:水 $c = 4.1868 \\, \\text{kJ/(kg} \\cdot \\text{K)}$ \\therefore \\Delta S = cm \\ln \\frac{T_2}{T_1}4. 热源(蓄热器) 与外界交换热量,$T$ 始终不变 热源 $T1$ 的熵变: \\Delta S = \\int \\dfrac{\\delta Q_{\\text{rev}}}{T_r} = \\dfrac{-Q_1}{T_1}5. 功源(蓄功器) :只与外界交换功 可以设想为理想弹簧:无耗散,无热量交换 功源的熵变: $\\Delta S = 0$ 孤立系统熵增原理孤立系统熵增原理 孤立系统\\text{ (Isolated system) } \\left\\{ \\begin{array}{l} 无质量交换 \\\\ 无热量交换 \\\\ 无功量交换 \\end{array} \\right. \\qquad \\mathrm{d}S_f = 0孤立系统 $=$ 非孤立系统 $+$ 相关外界 \\mathrm{d}S_{\\text{iso}} = \\mathrm{d}S_g \\geq 0 \\qquad热力学第二定律表达式之一 > 不可逆过程 \\qquad = 可逆过程孤立系统的熵只能增大,或者不变,绝不能减小。 —— 孤立系统熵增原理 作功能力损失热量 $Q$ 的量没变,但不可逆过程的熵增,导致作功能力下降,即能量贬值或功的耗散。 作功能力: 以环境为基准, 系统可能作出的最大功。 作功能力损失: \\Pi = W_R - W_{\\text{IR}} = T_0 \\Delta S_{\\text{iso}} = T_0 \\Delta S_g \\pi = T_0 \\Delta s_{\\text{iso}} = T_0 \\Delta s_g熵方程及对熵的小结熵方程以及绝热稳流系统熵增原理闭口系 \\Delta S_{12} = \\Delta S_f + \\Delta S_g开口系 \\mathrm{d}S_{\\text{cv}} = \\frac{\\delta Q}{T_r} + \\mathrm{d}S_g + \\sum_{i=1}^{n} S_{\\text{in}, i} \\, \\delta m_{\\text{in}, i} - \\sum_{i=1}^{n} S_{\\text{out}, i} \\, \\delta m_{\\text{out}, i}稳定流动 \\begin{aligned} \\mathrm{d}S_{\\text{cv}} &= 0 \\\\ \\delta m_{\\text{in}} &= \\delta m_{\\text{out}} = \\delta m \\\\ 0 = & \\dfrac{\\delta Q}{T_r} + \\mathrm{d}S_g + s_{\\text{in}} \\, \\delta m - s_{\\text{out}} \\, \\delta m \\\\ \\mathrm{d}S_{12}& = (s_{\\text{out}} - s_{\\text{in}}) \\, \\delta m = \\frac{\\delta Q}{T_r} + \\mathrm{d}S_g \\\\ \\end{aligned} \\Delta S_{12} = \\Delta S_f + \\Delta S_g绝热稳流系统的熵变: \\Delta S_{12} = \\Delta S_f + \\Delta S_g = \\int \\frac{\\delta Q}{T_r} + \\Delta S_g \\Delta S_{12} - \\int \\frac{\\delta Q}{T_r} = \\Delta S_g \\geq 0 \\Delta S_{12} + \\int -\\frac{\\delta Q}{T_r} = \\Delta S_g \\geq 0 \\Delta S_{12} + \\Delta S_{T_r} = \\Delta S_g \\geq 0绝热稳流系统熵增: \\Delta S_{\\text{总}} = \\Delta S_{12} + \\Delta S_{T_r} = \\Delta S_g \\geq 0熵与不可逆及熵的物理意义不可逆的深层含义:不可逆, 必然有熵产,必然导致作功能力损失。 熵的物理意义 可逆过程传热的大小和方向 \\mathrm{d}s = \\frac{\\delta q_{\\text{re}}}{T} 孤立系熵增 $\\Delta S_{\\text{iso}} \\geq 0$ 或任何系统的熵产 $\\Delta S_g \\geq 0$ 表征过程不可逆的程度,熵增越大,表明系统不可逆程度越甚。 自然界有限空间的过程总是朝着孤立系熵增的方向进行,所以熵可以作为过程方向性的判据。 熵的统计意义 S = k \\log W$k$:玻尔兹曼常数$W$:宏观态对应的可能的微观态的数目。 熵可以看作分子无序性或随机性的一种量度,系统越无序,分子所处的位置就越难以预测,熵也就越大。 㶲及其计算㶲即作功能力㶲 $\\text{Exergy}$: Available Energy 有效能 㷻 $\\text{Anergy}$:Unavailable Energy 无效能 㶲的定义:给定环境条件下,能量中最大可能转换为有用功的那部分能量,称为㶲。(㶲即作功能力) 能量中㶲以外的部分是㷻。 热力学第一及第二定律与㶲热力学第一定律:一切过程,㶲 + 㷻 总量恒定 热力学第二定律:㷻不能转换为㶲 可逆过程中,㶲保持不变 不可逆过程中,部分㶲退化为㷻 㶲损失、作功能力损失、能量贬值 任何一孤立系,㶲只能不变或减少,不能增加。 —— 孤立系㶲减原理 热量㶲及冷量㶲的计算1.热量㶲 $Ex_Q$ : 给定环境下,热量 $Q$ 能做的最大有用功。 Ex_Q = Q - T_0 \\Delta S若是恒温 $T$ 的热量,则: Ex_Q = Q - T_0 \\frac{Q}{T} = Q(1 - \\frac{T_0}{T})热量㶲的说明: $Ex_Q$: $Q$ 中能转换的最大有用功 $Ex_Q = Q - T_0 \\Delta S = f(Q, T, T_0)$ $T_0$, $Q$ 一定,$T \\downarrow \\Rightarrow Ex_Q \\uparrow$ $T$ 一定,$Q \\downarrow \\Rightarrow Ex_Q \\downarrow$ 㶲损失、作功能力损失 单热源热机不能作功$T = T_0$, $Ex_Q = 0$ $Ex_Q$ 是过程量 2.冷量㶲: Ex_{Q_2} = T_0 \\Delta S - Q_2冷量㶲:吸热 $Q_2$对外做的最大有用功,或制造冷量 $Q_2$ 时消耗的最小有用功。 实际上,只要系统状态与环境状态有差别,就有可能对外作功,就有㶲。 内能㶲及焓㶲的计算3.闭口系统内能㶲: ex_u = (u_1 - u_0) - \\left[ T_0 (s_1 - s_0) - p_0 (v_1 - v_0) \\right]闭口系统内能焓的说明 闭口系的内能 $u_1 - u_0$,只有一部分是 $ex_u$其余是 $an_u = T_0 (s_1 - s_0) - p_0 (v_1 - v_0)$ $ex_u$ 是与工质及环境有关的状态参数 环境的内能很大,但 $ex_u = 0$ 闭口系由 $1 \\rightarrow 2$ 可逆过程,工质作 最大有用功 或 消耗最小有用功: \\begin{aligned} w_{\\text{max}} &= ex_{u1} - ex_{u2} \\\\ &= (u_1 - u_2) - \\left[ T_0 (s_1 - s_2) - p_0 (v_1 - v_2) \\right] \\end{aligned}4.稳定流动工质的焓㶲: ex_h = (h_1 - h_0) - T_0 (s_1 - s_0)稳定流动工质焓㶲的说明: 稳流工质的焓 $h_1 - h_0$,只有一部分是 $ex_h$其余是 $an_h = T_0 (s_1 - s_0)$ $ex_h$ 是与工质及环境有关的状态参数 当工质状态与环境相平衡,$ex_h = 0$ 由初态 $1 \\rightarrow$ 终态 $2$ 的可逆过程,工质作 最大有用功 或 消耗最小有用功: w_{\\text{max}} = ex_{h1} - ex_{h2} = (h_1 - h_2) - T_0 (s_1 - s_2)㶲效率及㶲平衡㶲平衡:热力系统 \\sum Ex_{\\text{进}} - \\sum Ex_{\\text{出}} = \\sum \\Pi \\; (㶲损失)㶲效率: \\eta_{\\text{ex}} = \\frac{\\text{收益焓}}{\\text{支付焓}}可逆过程:$\\eta_{\\text{ex}} = 100\\%$ 动力装置 \\eta_{\\text{ex}} = \\frac{W_{\\text{net}}}{Ex_{h,\\text{in}} - Ex_{h,\\text{out}}} \\qquad \\eta_t = \\frac{W_{\\text{net}}}{Q_1} 动力装置 \\eta_{\\text{ex}} = \\frac{Ex_{h,\\text{out}} - Ex_{h,\\text{in}}}{W} 加热设备 \\eta_{\\text{ex}} = \\frac{\\text{冷流体得到的㶲}}{\\text{热流体放出的㶲}}㶲损失 $=$ 作功能力损失 \\sum \\Pi = T_0 \\Delta S_g","link":"/2024/10/18/Engineering-Thermodynamics/"},{"title":"LOJ 6003「网络流 24 题」魔术球","text":"枚举答案,对于$(i,j)(i<j)$,若$i<j$且$i+j$是完全平方数,则从$i$向$j$连一条边 然后跑最小路径覆盖(可以参照LOJ 6002) 方案输出也类似上一题 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 2e4 + 10;const int MaxM = 5e5 + 10;const int inf = (1 << 30);struct edge{ int to, next, cap;};edge e[MaxM];int n, m, s = 20000, t = 20001, cnt = 1, ans, tmp;int head[MaxN], dep[MaxN], cur[MaxN], a[MaxN], vis[MaxN], to[MaxN];inline void add(int u, int v, int c){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; e[cnt].cap = c; head[u] = cnt;}inline void add_edge(int u, int v, int c) { add(u, v, c), add(v, u, 0); }inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int bfs(){ memset(dep, 0, sizeof(dep)); memcpy(cur, head, sizeof(head)); std::queue<int> q; dep[s] = 1; q.push(s); while (!q.empty()) { int u = q.front(); q.pop(); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] || !c) continue; dep[v] = dep[u] + 1; q.push(v); } } return dep[t];}inline int dinic(int u, int flow){ if (u == t) return flow; int rest = flow; for (int i = cur[u]; i && (flow - rest < flow); i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] != dep[u] + 1 || !c) continue; int k = dinic(v, cmin(rest, c)); if (!k) dep[v] = dep[u] + 1; else { e[i].cap -= k; e[i ^ 1].cap += k; rest -= k; if (e[i].to > 5000) vis[e[i].to - 5000] = 1; to[u] = e[i].to; } } if (flow - rest < flow) dep[u] = -1; return flow - rest;}inline void solve(){ int now = 0; while (bfs()) while ((now = dinic(s, inf))) ans -= now;}int main(){ n = read(); while (1) { ans++, tmp++; for (int i = 1; i < tmp; i++) { int x = sqrt(i + tmp); if (x * x == (i + tmp)) add_edge(i, tmp + 5000, 1); } add_edge(s, tmp, 1), add_edge(tmp + 5000, t, 1); solve(); if (ans > n) break; } --tmp; printf(\"%d\\n\", tmp); for (int i = 1; i <= tmp; i++) { if (vis[i]) continue; printf(\"%d \", i); int t = i; while (to[t]) { printf(\"%d \", to[t] - 5000); t = to[t] - 5000; } puts(\"\"); } return 0;}","link":"/2019/05/12/LOJ-6003/"},{"title":"LOJ 6006「网络流 24 题」试题库","text":"和LOJ #6004圆桌聚餐很像 建模: 1.从源点向每道试题$x_i$连一条容量为$1$的边 2.从每种类型$y_i$向汇点连一条容量为该类型需求数量的边 3.如果试题$x_i$属于类型$y_i$则从$x_i$向$y_i$连一条容量为$1$的边 然后跑裸的网络最大流,如果最大流$\\not=$需求试题总数则无解 方案: 对于每种类型,它连出的所有满流量边即为该类型所对应的试题 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 2e4 + 10;const int MaxM = 5e5 + 10;const int inf = (1 << 30);struct edge{ int to, next, cap;};edge e[MaxM];int k, n, s = 20000, t = 20001, cnt = 1, ans;int head[MaxN], dep[MaxN], cur[MaxN], a[MaxN];inline void add(int u, int v, int c){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; e[cnt].cap = c; head[u] = cnt;}inline void add_edge(int u, int v, int c) { add(u, v, c), add(v, u, 0); }inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int bfs(){ memset(dep, 0, sizeof(dep)); memcpy(cur, head, sizeof(head)); std::queue<int> q; dep[s] = 1; q.push(s); while (!q.empty()) { int u = q.front(); q.pop(); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] || !c) continue; dep[v] = dep[u] + 1; q.push(v); } } return dep[t];}inline int dinic(int u, int flow){ if (u == t) return flow; int rest = flow; for (int i = cur[u]; i && (flow - rest < flow); i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] != dep[u] + 1 || !c) continue; int k = dinic(v, cmin(rest, c)); if (!k) dep[v] = dep[u] + 1; else { e[i].cap -= k; e[i ^ 1].cap += k; rest -= k; } } if (flow - rest < flow) dep[u] = -1; return flow - rest;}inline void solve(){ int now = 0; while (bfs()) while ((now = dinic(s, inf))) ans += now;}int main(){ int tmp = 0; k = read(), n = read(); for (int i = 1; i <= k; i++) { int x = read(); add_edge(i, t, x); tmp += x; } for (int i = 1; i <= n; i++) { int p = read(); add_edge(s, i + k, 1); for (int j = 1; j <= p; j++) { int x = read(); add_edge(i + k, x, 1); } } solve(); if (ans != tmp) return 0 * printf(\"No Solution!\"); for (int i = 1; i <= k; i++) { int t = head[i]; printf(\"%d: \", i); while (t) { if (e[t].cap == 1) printf(\"%d \", e[t].to - k); t = e[t].next; } printf(\"\\n\"); } return 0;}","link":"/2019/05/11/LOJ-6006/"},{"title":"LOJ 6004「网络流 24 题」圆桌聚餐","text":"建模: 1.从源点向每个单位$x_i$连边,容量是该单位的人数 2.从每张餐桌$y_i$向汇点连边,容量是该餐桌能容纳的人数 3.从每个单位$x_i$向每张餐桌$y_j$连边,容量为$1$ 如果最大流量等于所有单位人数之和,则有解,否则无解。 方案: 对于每个单位$x_i$,该单位向$y$集合连出的所有满流量边即为该单位人员的安排情况(证明显然 Code: 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 2e4 + 10;const int MaxM = 5e5 + 10;const int inf = (1 << 30);struct edge{ int to, next, cap;};edge e[MaxM];int n, m, s = 20000, t = 20001, cnt = 1, ans;int head[MaxN], dep[MaxN], cur[MaxN], a[MaxN];inline void add(int u, int v, int c){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; e[cnt].cap = c; head[u] = cnt;}inline void add_edge(int u, int v, int c) { add(u, v, c), add(v, u, 0); }inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int bfs(){ memset(dep, 0, sizeof(dep)); memcpy(cur, head, sizeof(head)); std::queue<int> q; dep[s] = 1; q.push(s); while (!q.empty()) { int u = q.front(); q.pop(); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] || !c) continue; dep[v] = dep[u] + 1; q.push(v); } } return dep[t];}inline int dinic(int u, int flow){ if (u == t) return flow; int rest = flow; for (int i = cur[u]; i && (flow - rest < flow); i = e[i].next) { int v = e[i].to, c = e[i].cap; if (dep[v] != dep[u] + 1 || !c) continue; int k = dinic(v, cmin(rest, c)); if (!k) dep[v] = dep[u] + 1; else { e[i].cap -= k; e[i ^ 1].cap += k; rest -= k; } } if (flow - rest < flow) dep[u] = -1; return flow - rest;}inline void solve(){ int now = 0; while (bfs()) while ((now = dinic(s, inf))) ans += now;}int main(){ int x, tmp = 0; m = read(), n = read(); for (int i = 1; i <= m; i++) x = read(), add_edge(s, i, x), tmp += x; for (int i = 1; i <= n; i++) x = read(), add_edge(i + m, t, x); for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) add_edge(i, j + m, 1); } solve(); if (ans != tmp) return 0 * printf(\"0\"); printf(\"1\\n\"); for (int i = 1; i <= m; i++) { int h = head[i]; while (h) { if (!e[h].cap) printf(\"%d \", e[h].to - m); h = e[h].next; } puts(\"\"); } return 0;}","link":"/2019/05/09/LOJ-6004/"},{"title":"LOJ 6012「网络流 24 题」分配问题","text":"建模: 1.从$s$向人$1-n$连边,容量为$1$,费用为$0$ 2.从工作$1-n$向$t$连边,容量为$1$,费用为$0$ 3.从人$1-n$向工作$1-n$连边,容量为$1$,费用为$c_{i,j}$ 然后我们就可以跑裸的费用流啦~ 什么?你问我最大费用怎么写?当然是把边权取反啊 Code: 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129#include <bits/stdc++.h>#define R register#define ll long long#define cmax(a, b) ((a < b) ? b : a)#define cmin(a, b) ((a < b) ? a : b)#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 5e3 + 10;const int MaxM = 5e4 + 10;struct edge{ int next, to, flow, cost;};edge e[MaxM];int n, s = 600, t = 601, ans, cnt = 1, mincost, maxflow;int head[MaxN], flow[MaxN], dis[MaxN], pre[MaxN], last[MaxN], vis[MaxN], a[210][210];inline void add(int u, int v, int f, int c){ ++cnt; e[cnt].to = v; e[cnt].flow = f; e[cnt].cost = c; e[cnt].next = head[u]; head[u] = cnt;}inline void add_edge(int u, int v, int f, int c){ add(u, v, f, c); add(v, u, 0, -c);}inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}int spfa(){ memset(dis, 0x3f, sizeof(dis)); memset(flow, 0x3f, sizeof(flow)); memset(vis, 0, sizeof(vis)); std::queue<int> q; q.push(s); vis[s] = 1; dis[s] = 0; pre[t] = -1; while (!q.empty()) { int u = q.front(); q.pop(); vis[u] = 0; for (int i = head[u]; i; i = e[i].next) { if (e[i].flow && dis[e[i].to] > dis[u] + e[i].cost) { int v = e[i].to; dis[v] = dis[u] + e[i].cost; pre[v] = u; last[v] = i; flow[v] = cmin(flow[u], e[i].flow); if (!vis[v]) { vis[v] = 1; q.push(v); } } } } return pre[t] != -1;}void MCMF(){ while (spfa()) { int u = t; maxflow += flow[t]; mincost += flow[t] * dis[t]; while (u != s) { e[last[u]].flow -= flow[t]; e[last[u] ^ 1].flow += flow[t]; u = pre[u]; } }}int main(){ n = read(); for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) a[i][j] = read(); for (int i = 1; i <= n; i++) add_edge(s, i, 1, 0), add_edge(i + n, t, 1, 0); for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) add_edge(i, j + n, 1, a[i][j]); MCMF(); cnt = 1; printf(\"%d\\n\", mincost); memset(head, 0, sizeof(head)); memset(pre, 0, sizeof(pre)); memset(last, 0, sizeof(last)); maxflow = mincost = 0; for (int i = 1; i <= n; i++) add_edge(s, i, 1, 0), add_edge(i + n, t, 1, 0); for (int i = 1; i <= n; i++) for (int j = 1; j <= n; j++) add_edge(i, j + n, 1, -a[i][j]); MCMF(); printf(\"%d\", -mincost); return 0;}","link":"/2019/05/21/LOJ-6012/"},{"title":"<模板> MillerRabin","text":"提交地址: LOJ #143. 质数判定 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465#include <bits/stdc++.h>#define ll long longconst int cnt = 2500;const int mod[] = {3, 5, 7, 11, 13, 17, 19, 23, 29};ll fast_mul(ll a, ll b, ll m){ ll d = ((long double)a / m * b + 0.5); ll r = a * b - d * m; return r < 0 ? r + m : r;}ll fast_pow(ll a, ll m, ll n){ ll ret = 1; while (m) { if (m & 1) ret = fast_mul(ret, a, n); a = fast_mul(a, a, n); m >>= 1; } return ret;}bool check(ll k){ if (k <= 1) return false; if (k == 2) return true; if (!(k & 1)) return false; ll t = k - 1; int now = 0; while (!(t & 1)) t >>= 1, ++now; for (int i = 0; i < 9; i++) { if (mod[i] == k) return 1; ll x = fast_pow(mod[i], t, k), y = x; for (int j = 1; j <= now; j++) { x = fast_mul(x, x, k); if (x == 1 && !(y == 1 || y == k - 1)) return false; y = x; } if (x != 1) return 0; } return true;}int main(){ srand(time(NULL)); ll k; while (scanf(\"%llu\", &k) == 1) printf(check(k) ? \"Y\\n\" : \"N\\n\"); return 0;}","link":"/2019/04/05/模板-millerrabin/"},{"title":"New Beginning","text":"博主考上清华了! 本博客以后会记录一些博主的大学生活。 (同时也是博客迁移到新电脑后的第一篇post!)","link":"/2023/07/27/new-beginning/"},{"title":"UVA10228 A Star not a Tree?","text":"题目大意给定$n$个点, 求一个点使得这个点到所有$n$个点的距离最小,输出距离(保留整数) 题解计算几何什么的我不会o((⊙﹏⊙))o 那我们就来随机化吧( ̄▽ ̄)~* 按照模拟退火的套路来:每次随机一个点,判他是不是比答案更优,如果更优的话就更新,否则就以一定的几率接受该解。直到稳定在最优解为止。 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556#include <bits/stdc++.h>const int MaxN = 200;const double delta = 0.995;int n;int x[MaxN], y[MaxN];double ansx, ansy;inline double calc(double nx, double ny){ double tmp = 0; for (int i = 1; i <= n; i++) tmp += sqrt((nx - x[i]) * (nx - x[i]) + (ny - y[i]) * (ny - y[i])); return tmp;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void sa(){ double t = 10000000; while (t > 1e-14) { double nowx = ansx + (rand() * 2 - RAND_MAX) * t; double nowy = ansy + (rand() * 2 - RAND_MAX) * t; double tmp = calc(nowx, nowy) - calc(ansx, ansy); if (tmp < 0) ansx = nowx, ansy = nowy; else if (exp(-tmp / t) * RAND_MAX > rand()) ansx = nowx, ansy = nowy; t *= delta; }}int main(){ srand(time(NULL)); int T = read(); while (T--) { ansx = ansy = 0; n = read(); for (int i = 1; i <= n; i++) x[i] = read(), y[i] = read(); for (int i = 1; i <= 100; i++) sa(); printf(\"%.0lf\\n\", calc(ansx, ansy)); if(T) printf(\"\\n\"); } return 0;}","link":"/2019/02/08/uva10228/"},{"title":"经典机器学习笔记","text":"本文为清华大学”模式识别与机器学习”课程的复习笔记。 Evaluation Metric \\begin{aligned} \\text{Accuracy} &= \\frac{\\text{TP+TN}}{\\text{TP+FP+TN+FN}} \\newline \\text{Precision} &= \\frac{\\text{TP}}{\\text{TP+FP}} \\newline \\text{Recall} &= \\text{Sensitivity} = \\frac{\\text{TP}}{\\text{TP+FN}} \\newline \\text{Specificity} &= \\frac{\\text{TN}}{\\text{TN+FP}} \\newline \\text{Type-I Error} &= \\frac{\\text{FP}}{\\text{TP+FN}} = 1 - \\text{Sensitivity} \\newline \\text{Type-II Error} &= \\frac{\\text{FN}}{\\text{TN+FP}} = 1 - \\text{Specificity} \\newline \\end{aligned}k-NNNearest NeighborFor a new instance $x’$, its class $\\omega’$ can be predicted by: \\omega' = \\omega_i, \\text{ where } i = \\underset{j}{\\arg\\min} \\, \\delta(x', x_j)k-Nearest NeighborFor a new instance $x$, define $g_i(x)$ as: the number of $x$’s k-nearest instances belonging to the class $\\omega_i$. Then the new instance’s class $\\omega’$ can be predicted as: \\omega' = \\omega_j,\\text{ where }j = \\underset{i}{\\arg\\max} \\, g_i(x)k-NN ImprovementsBranch-Bound AlgorithmUse tree structure to reduce calculation. Edit Nearest NeighborDelete nodes that may be misguiding from the training instance set. Condensed Nearest NeighborDelete nodes that are far away from decision boundaries. The Curse of DimensionalityProblem Many irrelevant attributes In high-dimensional spaces, most points are equally far from each other. Solution Dimensionality reduction techniques manifold learning Feature selection Use prior knowledge Linear Regression (Multivariate ver.)For a multivariate linear regression, the function becomes $ y_i = \\mathbf{w}^{\\rm T}\\mathbf{x}_i $ , where $ \\mathbf{x}_i = (1, x_i^1, \\cdots, x_i^d)^{\\rm T}\\in \\mathbb{R}^{d+1}, \\mathbf{w} = (w_0, w_1, \\cdots, w_d)^{\\rm T} \\in \\mathbb{R}^{d+1}$, We adjust the values of $\\mathbf{w}$ to find the equation that gives the best fitting line $f(x) = \\mathbf{w}^{\\rm T}\\mathbf{x}$ We find the best $ \\mathbf{w}^*$ using the Mean Squared Loss: $\\ell(f(\\mathbf x, y)) = \\min\\limits_{\\mathbf w} \\frac{1}{N} \\sum_{i = 1}^N (f(\\mathbf x_i) - y_i)^2 = \\min \\limits_{\\mathbf w} \\frac{1}{N}(\\mathbf {Xw-y})^{\\rm T}(\\mathbf {Xw-y})$ So that $ \\mathbf{w}^{\\star} $ must satisfy $ \\mathbf {X^{\\rm T}} \\mathbf {Xw^{\\star}} = \\mathbf X^{\\rm T}\\mathbf y$ , so we get $\\mathbf{w^{\\star}} = (\\mathbf {X^{\\rm T}X})^{-1}\\mathbf X^{\\rm T}\\mathbf y$ or $\\mathbf{w^{\\star}} = (\\mathbf {X^{\\rm T}X} + \\lambda \\mathbf I)^{-1}\\mathbf X^{\\rm T}\\mathbf y$ (Ridge Regression) Linear Discriminant Analysisproject input vector $\\mathbf x \\in \\mathbb{R}^{d+1}$ down to a 1-dimensional subspace with projection vector $\\mathbf w$ The problem is how do we find the good projection vector? We have Fisher’s Criterion, that is to maximize a function that represents the difference between-class means, which is normalized by a measure of the within-class scatter. We have between-class scatter $\\tilde{S}_b = (\\tilde{m}_1 - \\tilde{m}_2)^2$, where $\\tilde{m}_i$ is the mean for the i-th class. Also we have within-class scatter $\\tilde{S}_i=\\sum_{y_j \\in \\mathscr{y}_{i}} (y_j - \\tilde{m}_i)^2$, then we have total within-class scatter $\\tilde{S}_w = \\tilde{S}_1+ \\tilde{S}_2$. Combining the 2 expressions, the new objective function will be $J_F(\\mathbf w) = \\frac{\\tilde{S}_b}{\\tilde{S}_w}$ We have $\\tilde{S}_b = (\\tilde{m}_1 - \\tilde{m}_2)^2 = (\\mathbf w^{\\rm T} \\mathbf m_1 - \\mathbf w^{\\rm T} \\mathbf m_2)^2 = \\mathbf w^{\\rm T} (\\mathbf m_1 - \\mathbf m_2)(\\mathbf m_1 - \\mathbf m_2)^{\\rm T} \\mathbf w = \\mathbf w^{\\rm T} \\mathbf S_b \\mathbf w$, also $\\tilde{S}_w = \\mathbf w^{\\rm T} \\mathbf S_w \\mathbf w$, so now optimize objective function $J_F$ w.r.t $\\mathbf w$: \\max\\limits_{\\mathbf w} J_F(\\mathbf w) = \\max \\limits_ {\\mathbf w} \\frac{\\mathbf w^{\\rm T} \\mathbf S_b \\mathbf w}{\\mathbf w^{\\rm T} \\mathbf S_w \\mathbf w}Use Lagrange Multiplier Method we obtain: $\\lambda w^{\\star} = \\mathbf{S}_W^{-1} (\\mathbf m_1 - \\mathbf m_2)(\\mathbf m_1 - \\mathbf m_2)^{\\rm T}\\mathbf w^{\\star}$, since we only care about the direction of $\\mathbf w^*$ and $(\\mathbf m_1 - \\mathbf m_2)^{\\rm T}\\mathbf w^{\\star}$ is scalar, thus we obtain $w^{\\star} = \\mathbf{S}_W^{-1} (\\mathbf m_1 - \\mathbf m_2)$ Logistic RegressionLogistic regression is a statistical method used for binary classification, which means it is used to predict the probability of one of two possible outcomes. Unlike linear regression, which predicts a continuous output, logistic regression predicts a discrete outcome (0 or 1, yes or no, true or false, etc.). Key Concepts Odds and Log-Odds: Odds: The odds of an event are the ratio of the probability that the event will occur to the probability that it will not occur. \\text{Odds} = \\frac{P(y=1)}{P(y=0)} Log-Odds (Logit): The natural logarithm of the odds. \\text{Log-Odds} = \\log\\left(\\frac{P(y=1)}{P(y=0)}\\right) Logistic Function (Sigmoid Function): The logistic function maps any real-valued number into the range (0, 1), making it suitable for probability predictions. \\sigma(z) = \\frac{1}{1 + e^{-z}} In logistic regression, $ z $ is a linear combination of the input features. z = w^T x + b Model Equation: The probability of the positive class (e.g., $ y=1 $) is given by the logistic function applied to the linear combination of the features. P(y=1|x) = \\sigma(w^T x + b) = \\frac{1}{1 + e^{-(w^T x + b)}} The probability of the negative class (e.g., $ y=0 $) is: P(y=0|x) = 1 - P(y=1|x) Decision Boundary: To make a binary decision, we typically use a threshold (commonly 0.5). If $ P(y=1|x) $ is greater than 0.5, we predict the positive class; otherwise, we predict the negative class. Training the ModelWe use MLE(Maximum Likelihood Estimation) for logistic regression: \\max_{\\mathbf w} \\prod_{i=1}^{N} \\left[ \\theta(w^T x)^{\\mathbf 1(y_i=1)} \\times (1 - \\theta(w^T x))^{\\mathbf 1(y_i=0)} \\right]Applying negative log to the likelihood function, we obtain the log-likelihood for logistic regression. = \\min_{\\mathbf w} J(\\mathbf w) = \\min\\limits_{\\mathbf w} - \\sum_{i=1}^{N} \\left\\{ y_i \\log \\left( \\frac{e^{\\mathbf w^{\\rm T} \\mathbf x_i}}{1 + e^{\\mathbf w^{\\rm T} \\mathbf x_i}} \\right) + (1 - y_i) \\log \\left( 1 - \\frac{e^{\\mathbf w^{\\rm T} \\mathbf x_i}}{1 + e^{\\mathbf w^{\\rm T} \\mathbf x_i}} \\right) \\right\\}Substituting $y_i \\in \\{0, +1\\}$ with $\\tilde y_i \\in \\{-1, +1\\}$, and noting that $\\theta(-s) + \\theta(s) = 1$, we can simplify the previous expression: \\min_w J(w) = \\min_{\\mathbf w} \\sum_{i = 1}^N \\log(1 + e^{-\\tilde y_i \\mathbf w ^ {\\rm T}\\mathbf x_i})This is called the Cross Entropy Loss. Generalization to K-classesThe generalized version of logistic regression is called Softmax Regression. The probability of an input $x$ being class $k$ is denoted as: P(y = k | x; \\mathbf{W}) = \\frac{e^{\\mathbf w_k^{\\rm T} x}}{\\sum_{i=1}^{K} e^{\\mathbf w_i^{\\rm T} x}}In multiclass, the likelihood function can be written as: \\max_{w_1, w_2, \\ldots, w_k} \\prod_{i=1}^{N} \\prod_{k=0}^{K} P(y_i = k | x_i; \\mathbf{W})^{\\mathbf 1(y_i = k)}We can use the minimum negative log-likehood estimation: \\min\\limits_{\\mathbf{W}} J(\\mathbf{W}) = \\min_{\\mathbf w_1, \\mathbf w_2, \\ldots, \\mathbf w_k} -\\frac{1}{N} \\sum_{i=1}^{N} \\sum_{k=0}^{K} \\mathbf 1(y_i = k) \\cdot \\log \\frac{e^{\\mathbf w_k^{\\rm T} x_i}}{\\sum_{j=1}^{K} e^{\\mathbf w_j^T x_i}}PerceptronWe predict based on the sign of $y$: $y = \\text{sign}(f_{\\mathbf w}(x)) = \\text{sign}(\\mathbf w^{\\rm T}\\mathbf x)$ For Perceptron the objective loss function is defined as: J_p(\\mathbf{w}) = \\sum_{\\hat{x}_j \\in \\mathcal{X}^k} (-\\mathbf{w}^T \\hat{x}_j)where $\\mathcal{X}^k$ is the misclassified sample set at step $k$. We can use gradient descent to solve for $\\mathbf w^*$: \\mathbf{w}_{k+1} = \\mathbf{w}_k + \\rho_k \\sum_{x_j \\in \\mathcal{X}^k} (-\\hat{x}_j)Support Vector MachineWe want the optimal linear separators, that is the most robust classifier to the noisy data, meaning it has the largest margin to the training data. So we want to find the classifier with the largest margin. Modeling(For Linear-Separable Problem)We want the margin is largest: $\\max\\limits_{\\mathbf w, b}\\rho(\\mathbf w, b)$, and all the datapoints are classified correctly, that is $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1$. The distance between two paralleled hyperplanes is: $|b_1 - b_2| / ||a||$, and the distance between a point $\\mathbf x_0$ and a hyperplane $(\\mathbf w, b)$ is $|\\mathbf w^{\\rm T} \\mathbf x_0 + b| / ||\\mathbf w||$. Choose the points that are closest to the classifier, and they satisify: $|\\mathbf w^{\\rm T} \\mathbf x_0 + b| = 1$, so that margin $\\rho$ = $|\\mathbf w^{\\rm T} \\mathbf x_1 + b| / ||\\mathbf w|| + |\\mathbf w^{\\rm T} \\mathbf x_2 + b| / ||\\mathbf w|| = 2 / ||\\mathbf w||$. Thus we got the Hard-margin Support Vector Machine: \\max\\limits_{\\mathbf w, b}\\frac{2}{||\\mathbf w||}s.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1, 1 \\leq i \\leq n$ For compute convenience, we convert it into \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2s.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1, 1 \\leq i \\leq n$ Modeling(For Linearly Non-Separable Problem)We add a slack that allows points to be classified on the wrong side of the decision boundary, also we add a penalty. So we got the Soft-margin SVM: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\xi_is.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1 - \\xi_i, 1 \\leq i \\leq n$ Using hinge-loss $\\ell_{\\text{hinge}}(t) = \\max(1-t, 0)$, we have the final version of Soft-margin SVM: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\ell_{\\text{hinge}}(y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b))Optimization For TrainingLagrangian Function & KKT ConditionConsider a constrained optimization problem \\min_{x \\in \\mathbb{R}^d} f(x), \\text{ s.t. } g_i(x) \\leq 0, \\forall i = 1, \\dots, nThe Lagrangian function $L(x, \\mu)$ is defined as: L(x, \\mu) = f(x) + \\sum_{j = 1}^J \\mu_ig_j(x)We have KKT conditions(necessary condition): for $1 \\leq j \\leq J$ Primal feasibility: $g_j(x) \\leq 0$ dual feasibility: $\\mu_i \\geq 0$ Complementary slackness: $\\mu_i g_j(x^*) = 0$ Lagrangian optimality: $\\nabla_x L(x_*, \\mu) = 0$ Dual Problem For Soft-margin SVMFor Soft-margin Support Vector Machine: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\xi_is.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1 - \\xi_i, \\xi_i \\geq 0, 1 \\leq i \\leq n$ We have the Lagrangian function(with $2n$ inequality constraints): L(\\mathbf{w}, b, \\alpha, \\xi, \\mu) = \\frac{1}{2} \\|\\mathbf{w}\\|_2^2 + C \\sum_{i=1}^{n} \\xi_i + \\sum_{i=1}^{n} \\alpha_i [1 - \\xi_i - y_i (\\mathbf{w}^T \\mathbf{x}_i + b)] - \\sum_{i=1}^{n} \\mu_i \\xi_is.t. $\\alpha_i \\geq 0, \\mu_i \\geq 0, \\, i = 1, \\ldots, n$. take the partial derivatives of Lagrangian w.r.t $\\mathbf w, b, \\xi_i$ and set to zero \\begin{aligned} \\frac{\\partial L}{\\partial \\mathbf{w}} &= 0 \\implies \\mathbf{w} = \\sum_{i=1}^{n} \\alpha_i y_i \\mathbf{x}_i \\\\ \\frac{\\partial L}{\\partial b} &= 0 \\implies \\sum_{i=1}^{n} \\alpha_i y_i = 0 \\\\ \\frac{\\partial L}{\\partial \\xi_i} &= 0 \\implies C = \\alpha_i + \\mu_i, \\, i = 1, \\cdots, n \\\\ \\end{aligned}So that we got: L(\\mathbf{w}, b, \\alpha, \\xi, \\mu) = \\frac{1}{2} \\|\\mathbf{w}\\|_2^2 + C \\sum_{i=1}^{n} \\xi_i + \\sum_{i=1}^{n} \\alpha_i [1 - \\xi_i - y_i (\\mathbf{w}^T \\mathbf{x}_i + b)] - \\sum_{i=1}^{n} \\mu_i \\xi_i = \\frac{1}{2} \\mathbf{w}^T \\mathbf{w} + \\sum_{i=1}^{n} \\xi_i (C - \\alpha_i - \\mu_i) + \\sum_{i=1}^{n} \\alpha_i - \\sum_{i=1}^{n} \\alpha_i \\cdot y_i \\cdot \\mathbf{w}^T \\mathbf{x}_i - b \\sum_{i=1}^{n} \\alpha_i \\cdot y_i = \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\alpha_i y_i \\mathbf{x}_i \\right)^T \\left( \\sum_{j=1}^{n} \\alpha_j y_j \\mathbf{x}_j \\right) + 0 + \\sum_{i=1}^{n} \\alpha_i - \\sum_{i=1}^{n} \\alpha_i \\cdot y_i \\cdot \\left( \\sum_{j=1}^{n} \\alpha_j y_j \\mathbf{x}_j \\right) x_i + 0 = \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right) + \\sum_{i=1}^{n} \\alpha_i - \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right) = \\sum_{i=1}^{n} \\alpha_i - \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right)So we have the Dual Problem of Soft-SVM: \\max_{\\alpha} \\sum_{i=1}^{n} \\alpha_i - \\frac{1}{2} \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i^T \\mathbf{x}_js.t. $\\sum_{i=1}^{n} \\alpha_i y_i = 0, \\quad 0 \\leq \\alpha_i \\leq C, \\, i = 1, \\ldots, n.$ After solving $\\alpha$, we can get $\\mathbf{w} = \\sum_{j=1}^n\\alpha_j y_j x_j$ and $b$ Kernel Method for SVMLinear SVM cannot handle linear non-separable data. So we need to map the original feature space to a higher-dimensional feature space where the training set is separable. Basically we could set $x \\to \\phi(x)$, but calculating $x_i \\dots x_j$ will cause heavy computation cost, so we use the kernel trick, that is to find a function $k(x_i, x_j) = \\phi(x_i) \\dots \\phi(x_j)$. Some commonly used kernel: Linear Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = (\\mathbf{x} \\cdot \\mathbf{x}_i) Polynomial Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = [(\\mathbf{x} \\cdot \\mathbf{x}_i) + 1]^q Radial Basis Function Kernel (a.k.a. RBF kernel, Gaussian kernel): k(\\mathbf{x}, \\mathbf{x}_i) = \\exp \\left( -\\frac{\\|\\mathbf{x} - \\mathbf{x}_i\\|^2}{2\\sigma^2} \\right) Sigmoid Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = \\tanh (v(\\mathbf{x} \\cdot \\mathbf{x}_i) + c) Kernel tricks can also be applied to more algorithms, such as k-NN, LDA, etc. Decision TreeWe use a tree-like structure to deal with categorical features. For each node, we find the most useful feature, that means the feature that can better divide the data on the node. ID3 AlgorithmWe use entropy as criterion: H(D) = -\\sum_{k=1}^K \\frac{|C_k|}{|D|} \\log \\frac{|C_k|}{|D|}A good split gives minimal weighted average entropy of child nodes: \\frac{|D_1|}{|D|}H(D_1) + \\frac{|D_2|}{|D|}H(D_2)For any split, the entropy of the parent node is constant. Minimizing the weightedentropy of son nodes is equivalent to maximizing the information gain (IG): H(D) - \\frac{|D_1|}{|D|}H(D_1) - \\frac{|D_2|}{|D|}H(D_2)C4.5 AlgorithmInformation Gain is highly biased to multivalued features. So we use Information Gain Ratio (GR) to choose optimal feature: \\text{GR} = \\frac{\\text{Information Gain}}{\\text{Intrinsic Value}}Intrinsic Value (IV) is to punish multivalued features. For a selected feature $f$, its Intrinsic Value is: IV(f) = -\\sum_{k=1}^{|V|}\\frac{|F_k|}{|D|} \\log \\frac{|F_k|}{|D|}where $V$ is the set of all possible values of the feature $f$, and $F_k$ is the subset of $D$ where the value of the feature $A$ is $k$. Features with many possible values tend to have a large Intrinsic Value. Classification and Regression Tree(CART)The CART Tree muse be a binary tree. Regression TreeHow to divide the regions $R = \\{R_1, \\dots, R_m\\}$ and decide the values $V = \\{v_1, \\dots, v_m\\}$? We use minimum mean-square error over all examples $x_i$ with label $y_i$ \\min_{R, V} l = \\min_{R, V} \\sum_{j = 1}^m \\sum_{x_i \\in R_j} (y_i - v_j)^2Assuming that R has been determined and first find the optimal V. For a given region R_j, the value $v_j$ to minimize the loss is the average value of the labels of all samples belonging to region $R_j$: v_j = \\frac{1}{|R_j|} \\sum_{x_i \\in R_j} y_iNow for each feature $A$ and split threshold $a$, the parent node $R$ is split by $(A, a)$ to $R_1$ and $R_2$. We choose $(A, a) over all possible values to minimize: l(A, a) = \\sum_{x_i \\in R_1} (y_i - v_1(A, a))^2 + \\sum_{x_i \\in R_2} (y_i - v_2(A, a))^2where $v_1(A, a)$ and $v_2(A, a)$ are described above. Classification TreeThe split criteria is now Gini Index: \\text{Gini}(D) = 1 - \\sum_{k = 1}^K \\left(\\frac{|C_k|}{|D|}\\right)^2We choose the feature $A$ and the threshold $a$ over all possible values with themaximal gain \\text{Gini}(D) - \\frac{|D_1|}{|D|} \\text{Gini}(D_1) - \\frac{|D_2|}{|D|} \\text{Gini}(D_2)Ensemble LearningReduce the randomness (variance) by combining multiple learners. Bagging(Bootstrap Aggregating) Create $M$ bootstrap datasets Train a learner on each dataset Ensemble $M$ learners Uniformly sample from the original data D with replacement. The bootstrap datasethas the same size as the original data D, the probability of not showing up is (1-\\frac{1}{n})^n \\approx \\frac{1}{e} \\approx 0.37We use the elements show up in $D$ but not in the bootstrap dataset as the validation set(The out-of-bag dataset). Random ForestEnsemble decision trees (Training data with $d$ features) Create bootstrap datasets During tree construction, randomly sample $K (K<d)$ features as candidates for each split. (Usually choose $K = \\sqrt d$) Use feature selection to make treees mutally independent and diverse. BoostingBoosting: Sequentially train learners. Current Weak learners focus more on theexamples that previous weak learners misclassified. Weak classifiers $h_1, \\cdots, h_m$ are build sequentially. $h_m$ outputs ‘$+1$’ for oneclass and ‘$-1$’ for another class. Classify by $g(x) = \\text{sgn}(\\sum \\alpha_m h_m(x))$ AdaBoostCore idea: give higher weights to the misclassified examples so that half of thetraining samples come from incorrect classifications. (re-weighting) Mathematical Formulation: Weighted Error: \\epsilon_t = \\sum_{i=1}^n w_i \\cdot \\mathbf 1(y_i \\neq h_t(x_i)) Alpha Calculation: \\alpha_t = \\frac{1}{2} \\ln \\left( \\frac{1 - \\epsilon_t}{\\epsilon_t} \\right) Weight Update: w_i \\leftarrow w_i \\exp(\\alpha_t \\cdot \\mathbf 1(y_i \\neq h_t(x_i))) Final Hypothesis: H(x) = \\text{sign} \\left( \\sum_{t=1}^T \\alpha_t h_t(x) \\right) Gradient BoostingView boosting as an optimization problem. The criterion is to minimize the empirical loss: \\arg \\min_{(\\alpha_1, \\ldots, \\alpha_t, h_1, \\ldots, h_t)} \\sum_{i=1}^{n} l \\left( y_i, \\sum_{s=1}^{t} \\alpha_s h_s(x) \\right)Loss function $l$ depends on the task: Cross entropy for multi-classification $\\text{L2}$ loss for regression We use sequential training: optimize a single model at a time, that is freeze $h_1, \\cdots, h_{t-1}$ and optimize $h_t$. (Let $f_{t-1}(x) = \\sum_{s=1}^{t-1} \\alpha_s h_s(x)$, denoting the ensemble of $t-1$ learners.) Now let’s see how to choose the $\\alpha_t$ and $h_t$, we define: u = (f_{t-1}(x_1), \\cdots, f_{t-1}(x_n)) \\\\ \\Delta u = (h_t(x_1), \\cdots, h_t(x_n))Consider function $F(u) = \\sum_{i=1}^n l(y_i, u_i)$, then the original objective is equivalent to find a direction $\\Delta u$ and step size $\\alpha$ at the point $u$to minimize: F(u + \\alpha_t \\Delta u) = \\sum_{i=1}^n l(y_i, u_i + \\alpha \\Delta u_i)According to Gradient Descent, we could let $\\delta u = \\nabla_u F(u)$, thus h_t(x_i) = -\\frac{\\partial F(u)}{\\partial u_i} = -\\left[ \\frac{\\partial l(y_i, u_i)}{\\partial u_i} \\right]_{u_i = f_{t-1}(x_i)}Then how to decide $\\alpha_t$? Use one-dimensional search $(y_i, x_i, f_{t-1}, h_t \\text{ is fixed})$ \\alpha_t = \\arg\\min_{\\alpha_t} \\sum_{i=1}^{n} l(y_i, f_{t-1}(x_i) + \\alpha_t h_t(x_i))For simplicity, search of optimal multiplier can be replaced by setting it a constant. In conclusion, Gradient Boosting = Gradient Descent + Boosting. Learning TheoryEmpirical Risk Minimization (ERM)Empirical Risk: The average loss of the model $f$ on training set $\\mathcal D = \\{x_i, y_i\\}^N_{i=1}$ \\hat{R}(f) = \\frac{1}{N} \\sum_{i = 1}^N \\ell(f(x_i), y_i)Empirical Risk Minimization(ERM): The learning algorithm selects the model that minimizes the empirical risk on the training dataset. \\mathcal A(\\mathcal D, \\mathcal H) = \\arg \\min_{f \\in \\mathcal H} \\hat R(f)The Consistency of Learning ProcessWe say a learning process is consistent, if the minimizer for empirical risk atthe infinite data limit, converges to the minimum expected risk. Overfitting and Bias-Variance Trade-offDefine the Population Loss (also called Expected Risk) as R(f) = \\mathbb E_{(x, y) \\sim u} \\ell(f(x), y)Therefore define the Generalization Gap as: $R(f) - \\hat R(f)$ There are two important concepts of predicting model Bias: The assumptions of target model, represents the extent to which theaverage prediction over all datasets differs from the desired function. Variance: The extent of change for the model when the training data changes(can be understood as “stability” to dataset change). Bias-Variance Trade-off : There is an intrinsic contradict between bias and variance. The model’s test error contains the sum of both. Bias-Variance Decomposition : Suppose the ground truth function is $f^*$, the data distribution is $\\mu$, the algorithm $\\mathcal{A}$ learns from hypothesis space $\\mathcal{H}$. We use $y(x; \\mathcal{D}) = \\mathcal{A}(\\mathcal{D}, \\mathcal{H})(x)$ to denote the output of ERM model $\\hat{f} = \\mathcal{A}(\\mathcal{D}, \\mathcal{H})$ on input $x$.We are interested in the learned model’s prediction error on any $x$, namely [y(x; \\mathcal{D}) - f^*(x)]^2 = \\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] + \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 = \\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}^2 + \\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 - 2\\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}\\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}Taking expectation over all possible datasets $\\mathcal{D}$, the last term is zero. = \\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 + \\mathbb{E}_{\\mathcal{D}}[\\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}^2] = (\\text{bias})^2 + \\text{variance}Regularization refers to techniques that are used to calibrate machine learningmodels in order to prevent overfitting, which picks a small subset of solutionsthat are more regular (punish the parameters for behaving abnormally) toreduce the variance. Generalization Error and RegularizationVC dimensionVC dimension is a measure of complexity for a certain hypothesis class:The largest integer $d$ for a binary classification hypothesis class $\\mathcal H$, such thatthere exists 𝑑 points in the input space 𝒳 that can be perfectly classified by somefunction $h \\in \\mathcal H$ no matter how you assign labels for these $d$ points. VC dimension characterizes the model class’s capacity for fitting random labels. Generalization Error BoundIf a hypothesis class $\\mathcal{H}$ has VC dimension $d_{vc}$, we have a theorem that states that with probability $1 - \\delta$ and $m$ samples, we can bound the generalization gap for any model $h \\in \\mathcal{H}$ as R(h) \\leq \\hat{R}(h) + \\sqrt{\\frac{8d_{vc} \\ln\\left(\\frac{2em}{d_{vc}}\\right) + 8 \\ln\\left(\\frac{4}{\\delta}\\right)}{m}}Bayesian DecisionBayesian Decision: Find an optimal classifier according to the prior probability and class-conditional probability density of the feature The a priori or prior probability reflects our knowledge of how likely we expect a certain state of nature before we can actually observe said state of nature. The class-conditional probability density function is the probabilitydensity function $P(x|\\omega)$ for our feature $x$, given that the state/class is $\\omega$ Posterior Probability is the probability of a certain state/class givenour observable feature $x$: $P(\\omega | x)$ Minimum Prediction Error Principle. The optimal classifier $f(\\cdot)$ should minimize the expected prediction error, defined as P(\\text{error}) = \\int \\sum_{\\omega_j \\neq f(x)} P(x, \\omega_j) \\, dxSo, for each $x$, we want f(x) = \\arg\\min_{\\omega_i} \\sum_{\\omega_j \\neq \\omega_i} P(x, \\omega_j) = \\arg\\min_{\\omega_i} P(x) - P(x, \\omega_i) f(x) = \\arg\\max_{\\omega_i} P(x, \\omega_i) = \\arg\\max_{\\omega_i} P(\\omega_i | x)Therefore, the classifier just needs to pick the class with largest posterior probability. We could use a decision threshold $\\theta$ for diciding. Also we can avoid making decisions on the difficult cases in anticipation of a high error rate on those examples. Density estimationWe need a method to estimate the distribution of each feature, this is called density estimation. Parametric Density Estimation MethodWe can assume that the density function follows some form, for example: P(x|\\omega_i) = \\frac{1}{\\sqrt{2\\pi}\\sigma_i}e^{-\\frac{(x-\\mu_i)^2}{2\\sigma_i^2}}The unknown $\\theta_i = (\\mu_i, \\sigma_i)$ is called the parameters. Maximum Likelihood Estimation (MLE)Likelihood Function: $p(x|\\theta)$ measures the likelihood of a parametrized distribution to generate a sample $x$. Max Likelihood Estimation (MLE): Choose the parameter 𝜃 that maximizes thelikelihood function for all the samples. For example, if we use Gaussian to estimate $X = \\{x_i\\}_{i=1}^N$, MLE gives the result as \\mu, \\sigma = \\arg\\max_{\\mu, \\sigma} \\prod_{i=1}^{N} \\frac{1}{\\sqrt{2\\pi}\\sigma} e^{-\\frac{(x_i - \\mu)^2}{2\\sigma^2}}For the sake of simplicity, denote $H(\\theta) = \\ln p(X|\\theta) = \\sum_{i=1}^{N} \\ln p(x_i|\\theta)$ \\frac{dH}{d\\mu} = 0 \\implies \\sum_{i=1}^{N} \\frac{1}{\\sigma} (x_i - \\mu) = 0 \\implies \\mu = \\frac{1}{N} \\sum_{i=1}^{N} x_i, \\frac{dH}{d\\sigma} = 0 \\implies -\\sum_{i=1}^{N} \\frac{1}{\\sigma} + \\sum_{i=1}^{N} \\frac{(x_i - \\mu)^2}{2\\sigma^2} = 0 \\implies \\sigma^2 = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\mu)^2.Non-parametric Density Estimation MethodNon-parametric method makes few assumptions about the form of the distribution and does not involve any parameter about the density function’s form. Suppose totally we sample $N$ data, of which $K$ points are within $R$. Each data issample identically and independently. For each sample, whether it belongs to 𝑅 follows Bernoulli distribution with parameter $P_R$. We have $p(x) \\approx \\frac{P_R}{V} \\approx \\frac{K}{NV}$ We could apply kernel methods to it. Hidden Markov Models (HMMs)Understanding Bayes’ Rule: p(H|E)=\\frac{p(E|H)P(H)}{P(E)} Prior $P(H)$ : How probable was our hypothesis before observing the evidence? Likelihood $p(E|H)$ : How probable is the evidence given that our hypothesis is true? Marginal $P(E)$: How probable is the new evidence? Notation Explanation $Q = \\{q_1, \\ldots, q_n\\}$ The set of $n$ hidden states. $V = \\{v_1, \\ldots, v_v\\}$ The set of all possible observed values. $A = [a_{ij}]_{n \\times n}$ Transition matrix. $a_{ij}$ is the probability of transitioning from state $i$ to state $j$. $\\sum_{j=1}^n a_{ij} = 1 \\, \\forall i$. $O = o_1 o_2 \\cdots o_L$ Observed sequence. $o_t \\in V$. $x = x_1 x_2 \\cdots x_L$ Hidden state sequence. $x_t \\in Q$. $E = [e_{ij}]_{n \\times v}$ Emission probability matrix. $e_{ij} = P(o = v_j \\mid x = q_i)$ is the probability of observing $v_j$ at state $q_i$. $\\sum_{j=1}^V e_{ij} = 1 \\, \\forall i$. $\\pi = [\\pi_1, \\pi_2, \\ldots, \\pi_n]$ Start probability distribution. $\\pi_i$ is the probability of Markov chain starting from $i$. $\\sum_{i=1}^n \\pi_i = 1$. Question #1 – EvaluationThe evaluation problem in HMM: Given a model $M$ and an observed sequence $O$, calculate the probability of the observed sequence $P(O|M)$ . Forward AlgorithmDenote $\\alpha_t(j)$ as the probability of observing $o_1 o_2 \\ldots o_t$ and the hidden state at $t$ being $q_j$: \\alpha_t(j) = p(o_1 o_2 \\ldots o_t, x_t = q_j)Obviously, $\\alpha_t(j)$ can be rewritten as: \\alpha_t(j) = e_j(o_t) \\times \\sum_{i=1}^{n} \\alpha_{t-1}(i) a_{ij} Define Initial Values: \\alpha_1(j) = e_j(o_1) \\times \\pi_j, \\quad j = 1, \\cdots, n Iterative solving: \\alpha_t(j) = e_j(o_t) \\times \\sum_{i=1}^{n} \\alpha_{t-1}(i) a_{ij}, \\quad t = 1:L Obtaining results: p(O) = \\sum_{i=1}^{n} \\alpha_L(i) Backward AlgorithmDenote $\\beta_t(j)$ as the probability of observing $o_{t+1} o_{t+2} \\ldots o_L$ and the hidden state at $t$ being $q_j$: \\beta_t(j) = p(o_{t+1} o_{t+2} \\ldots o_L \\mid x_t = q_j)Obviously, $\\beta_t(j)$ can be rewritten as: \\beta_t(j) = \\sum_{i=1}^{n} a_{ji} e_i(o_{t+1}) \\beta_{t+1}(i) Define Initial Values: \\beta_L(j) = 1, \\quad j = 1:n \\quad (L + 1 \\text{ is terminal state}) Iterative solving: \\beta_t(j) = \\sum_{i=1}^{n} a_{ji} e_i(o_{t+1}) \\beta_{t+1}(i), \\quad t = 1:L, \\quad j = 1:n Obtaining results: p(O) = \\sum_{i=1}^{n} \\pi_i e_i(o_1) \\beta_1(i) Question #2 – DecodingThe decoding problem in HMM: Given a model $M$ and an observed sequence $O$, calculate the most probable hidden state sequence $\\mathbf{x} = \\arg\\max_{\\mathbf{x}} p(\\mathbf{x}, O | M)$. Define: v_t(j) = \\max_{q_1 \\ldots q_{t-1}} p(q_1 \\ldots q_{t-1}, o_1 \\ldots o_t, x_t = q_j)According to the recurrence relation, rewrite the above as: v_t(j) = \\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t)Therefore, the most probable hidden state sequence is: pa_t(j) = \\arg\\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t)Viterbi Algorithm Define Initial Values: v_1(j) = e_j(o_1) \\times \\pi_j, \\quad pa_1(j) = 0, \\quad j = 1:n Iterative solving: v_t(j) = \\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t) pa_t(j) = \\arg\\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t) Obtaining results: p^* = \\max_{i=1:n} v_L(i) x^*_L = \\arg\\max_{i=1:n} v_L(i)Computational Complexity: $O(n^2 L)$ Question #3 – LearningThe learning problem in HMM: Given an observed sequence $O$, estimate the parameters of model: $M = \\arg \\max \\limits_{M}P(M|O)$ For simplicity, in the following steps we only present the learning process of transition matrix $A$. (The other parameters can be learned in a similar manner.) Baum-Welch Algorithm (a special case of EM algorithm) Expectation Step (E-step): Using the observed available data of the dataset, we estimate (guess) the values of the missing data with the current parameters $\\theta_{\\text{old}}$. Maximization Step (M-step): Using complete data generated after the E-step, we update the parameters of the model. E-step(#$T_{ij}$ denotes the times of hidden state transitioning from $q_i$ to $q_j$) Generate the guesses of #$T_{ij}$, i.e., the expected counts: \\text{Expected Counts} = \\sum_{t=1}^{L} p(x_t = q_i, x_{t+1} = q_j \\mid O, \\theta_{\\text{old}})Can be estimated with Forward Algorithm and Backward Algorithm. M-stepGenerate new estimations with the expected counts: \\hat{a}_{ij} = \\frac{\\sum_{t=1}^{L-1} p(x_t = q_i, x_{t+1} = q_j \\mid O, \\theta_{\\text{old}})}{\\sum_{t=1}^{L-1} \\left( \\sum_{j'} p(x_t = q_i, x_{t+1} = q_{j'} \\mid O, \\theta_{\\text{old}}) \\right)}Estimation when hidden state is unknown. Iterative Solving: Recalculate the expected counts with newly estimated parameters (E-step). Then generate newer estimations of $\\theta$ with (M-step). Repeat until convergence. Bayesian NetworksNaive BayesNaïve Bayes Assumption: Features $X_i$ are independent given class $Y$: P_\\theta(X_1, \\ldots, X_n \\mid Y) = \\prod_i P_\\theta(X_i \\mid Y)Inference: the label can be easily predicted with Bayes’ rule Y^* = \\arg\\max_Y \\prod_i P_\\theta(X_i \\mid Y) P(Y)$Y^*$ is the value that maximizes Likelihood $\\times$ Prior. When the number of samples is small, it is likely to encounter cases where $\\text{Count}(Y = y) = 0$ or $\\text{Count}(X_i = x, Y = y) = 0$. So we use Laplace Smoothing. The parameters of Naïve Bayes can be learned by counting: Prior: P(Y = y) = \\frac{\\text{Count}(Y = y) + 1}{\\sum_{y'} \\text{Count}(Y = y') + C} Observation Distribution P(X_i = x \\mid Y = y) = \\frac{\\text{Count}(X_i = x, Y = y) + 1}{\\sum_{x'} \\text{Count}(X_i = x', Y = y) + S}Here, $C$ is the number of classes, $S$ is the number of possible values that $X_i$ can take. Learning & Decision on BNBayesian NetworkBN$(G, \\Theta)$: a Bayesian network $G$ is a DAG with nodes and directed edges. Each node represents a random variable. Each edge represents a causal relationship/dependency. $\\Theta$ is the network parameters that constitute conditional probabilities. For a node $t$, its parameters are represented as $p(x_t \\mid x_{\\text{pa}(t)})$. Joint probability of BN: p(x) = \\prod_{t=1}^{n} p(x_t \\mid x_{\\text{pa}(t)})where $\\text{pa}(t)$ is the set of all parent nodes of node $t$. \\begin{aligned} \\begin{array}{ccc} & D \\\\ & \\downarrow \\\\ & A \\rightarrow B \\rightarrow C \\end{array} \\end{aligned} P(A, B, C, D) = P(A) P(D) P(B \\mid A, D) P(C \\mid B)Learning on Bayesian NetworkNotation: Suppose BN has $n$ nodes, we use $\\text{pa}(t)$ to denote the parent nodes of $t$ $(t = 1, \\ldots, n)$ By the conditional independence of BN, we have p(D \\mid \\Theta) = \\prod_{i=1}^{N} p(x_i \\mid \\Theta) = \\prod_{i=1}^{N} \\prod_{t=1}^{n} p(x_{i,t} \\mid x_{i,\\text{pa}(t)}, \\theta_t) = \\prod_{t=1}^{n} \\prod_{i=1}^{N} p(D_{i,t} \\mid \\theta_t) p(\\Theta) = \\prod_{t=1}^{n} p(\\theta_t)Thus, the posterior becomes: p(\\Theta \\mid D) \\sim \\prod_{t=1}^{n} p(D_t \\mid \\theta_t) p(\\theta_t) p(\\theta \\mid D) \\sim \\prod_{t=1}^{n} \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc}) \\cdot p(\\theta_{tc})Learning BN with Categorical DistributionConsider a case where each probability distribution in BN is categorical, In this case, we can model the conditional distribution of node $t$ as(We use a scalar value $c$ to represent parent nodes’ states for simplicity.): P(x_t = k \\mid x_{\\text{pa}(t)} = c) = \\theta_{tck}and the conditional probability of node $t$ can be denoted as: \\theta_{tc} = [\\theta_{tc1}, \\theta_{tc2}, \\ldots, \\theta_{tcK_t}], \\quad \\sum_{k=1}^{K_t} \\theta_{tck} = 1Categorical Distribution: p = [\\theta_1, \\theta_2, \\ldots, \\theta_d], \\quad \\theta_i \\geq 0, \\quad \\sum_{i} \\theta_i = 1E.g., toss a coin $(d = 2)$, roll a die $(d = 6)$ Count the training samples where $x_t = k, x_{\\text{pa}(t)} = c$: N_{tck} = \\sum_{i=1}^{N} I(x_{i,t} = k, x_{i,\\text{pa}(t)} = c)According to the property of categorical distribution, we can represent the likelihood function as: p(D_t \\mid \\theta_t) = \\prod_{c=1}^{q_t} \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck}} = \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc})Thus the posterior can be further factorized: p(\\theta \\mid D) \\sim \\prod_{t=1}^{n} p(D_t \\mid \\theta_t)p(\\theta_t) = \\prod_{t=1}^{n} \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc})p(\\theta_{tc})Notation: $D_{tc}$ are the sample set where the value of $x_{\\text{pa}(t)}$ is $c$ $q_t$ is the number of possible values of $x_{\\text{pa}(t)}$ $K_t$ is the number of possible values of $x_t$ How to choose the probability distribution function for the prior $p(\\theta_{tc})$? It would be highly convenient if the posterior shares the same form as the prior. Conjugate Prior: A prior distribution is called a conjugate prior for a likelihood function if the posterior distribution is in the same probability distribution family as the prior. The conjugate prior for the categorical distribution is the Dirichlet distribution: Choosing the prior as conjugate prior — Dirichlet distribution: p(\\theta_{tc}) \\propto \\prod_{k=1}^{K_t} \\theta_{tck}^{\\alpha_{tck} - 1}$\\alpha_{tck}$ are integers and are the hyperparameters of BN model. In this case, the posterior can be easily derived as: p(D_{tc} \\mid \\theta_{tc}) p(\\theta_{tc}) \\propto \\left( \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck}} \\right) * \\left( \\prod_{k=1}^{K_t} \\theta_{tck}^{\\alpha_{tck} - 1} \\right) = \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck} + \\alpha_{tck} - 1}We can then derive an estimate of $\\theta_{tck}$ by calculating the expectation: \\hat{\\theta}_{tck} = E(\\theta_{tck}) = \\frac{N_{tck} + \\alpha_{tck}}{\\sum_{k'} (N_{tck'} + \\alpha_{tck'})}K-Means Algorithm Initalize cluster centers $\\mu_1, \\cdots, \\mu_k$ randomly. Repeat until no change of cluster assignment Assignment step: Assign data points to closest cluster center C_k \\leftarrow \\set{n \\mid x_n \\text{ is closest to } \\mu_k} Update Step: Change the cluster center to the average of its assigned points \\mu_k \\leftarrow \\frac{1}{|C_k|} \\sum_{n \\in C_k} x_n Optimization View of K-MeansOptimization Objective: within-cluster sum of squares (WCSS) \\min_{\\mu, r} J_e = \\sum_{k=1}^{K} \\sum_{n=1}^{N} r_{n,k} \\| x_n - \\mu_k \\|^2Step 1: Fix $\\mu$, optimize $r$ r_{n,k^*} = 1 \\quad \\Leftrightarrow \\quad k^* = \\arg\\min_k \\| x_n - \\mu_k \\|Step 2: Fix $r$, optimize $\\mu$ \\mu_k^* = \\frac{\\sum_{n} r_{n,k} x_n}{\\sum_{n} r_{n,k}} = \\frac{1}{|C_k|} \\sum_{n \\in C_k} x_iRule of Thumbs for initializing k-means Random Initialization: Randomly generate 𝑘 points in the space. Random Partition Initialization: Randomly group the data into 𝑘 clusters anduse their cluster center to initialize the algorithm. Forgy Initialization: Randomly select 𝑘 samples from the data. K-Means++: Iteratively choosing new centroids that are farthest from the existingcentroids. How to tell the right number of clusters?We find the elbow point of the $J_e$ image. EM Algorithm for Gaussian Mixture Model (GMM)Multivariate Gaussian Distribution$d$-dimensional Multivariate Gaussian: N(x \\mid \\mu, \\Sigma) = \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x - \\mu)^T \\Sigma^{-1} (x - \\mu) \\right) $\\mu \\in \\mathbb{R}^d$ the mean vector $\\Sigma \\in \\mathbb{R}^{d \\times d}$ the covariance matrix MLE of Gaussian DistributionThe likelihood function of a given dataset $X = \\{x_1, x_2, \\ldots, x_N\\}$: p(X \\mid \\mu, \\Sigma) = \\prod_{n=1}^{N} p(x_n \\mid \\mu, \\Sigma) = \\prod_{n=1}^{N} \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu) \\right)The maximum likelihood estimation (MLE) of the parameters is defined by: \\mu^*, \\Sigma^* = \\arg\\max_{\\mu, \\Sigma} \\mathcal{L}(\\mu, \\Sigma) \\mathcal{L}(\\mu, \\Sigma) = \\log p(X \\mid \\mu, \\Sigma) = \\frac{N}{2} \\log |\\Sigma| - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu)The optimization problem of maximum likelihood estimation (MLE): \\max_{\\mu, \\Sigma} \\mathcal{L}(\\mu, \\Sigma) = \\frac{N}{2} \\log |\\Sigma| - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu)Solve the optimization by taking the gradient: 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\mu} = \\sum_{n=1}^{N} \\Sigma^{-1} (x_n - \\mu) \\quad \\Rightarrow \\quad \\mu^* = \\frac{1}{N} \\sum_{n=1}^{N} x_n \\quad \\text{(Sample Mean)} 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\Sigma^{-1}} = \\frac{N}{2} \\Sigma - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)(x_n - \\mu)^T \\quad \\Rightarrow \\quad \\Sigma^* = \\frac{1}{N} \\sum_{n=1}^{N} (x_n - \\mu^*)(x_n - \\mu^*)^T \\quad \\text{(Sample Covariance)}Gaussian Mixture Model (GMM)A Gaussian Mixture Model (GMM) is the weighted sum of a family of Gaussians whose density function has the form: p(x \\mid \\pi, \\mu, \\Sigma) = \\sum_{k=1}^{K} \\pi_k N(x \\mid \\mu_k, \\Sigma_k) Each Gaussian $N(\\mu_k, \\Sigma_k)$ is called a component of GMM. Scalars $\\{\\pi_k\\}_{k=1}^{K}$ are referred to as mixing coefficients, which satisfy \\sum_{k=1}^{K} \\pi_k = 1This condition ensures $p(x \\mid \\pi, \\mu, \\Sigma)$ is indeed a density function. Soft Clustering with Mixture Model p(z = k) = \\pi_k, \\quad p(x \\mid z) = N(x \\mid \\mu_z, \\Sigma_z)By Bayes Rule, the posterior probability of $z$ given $x$ is: \\gamma_k \\overset{\\Delta}{=} p(z = k \\mid x) = \\frac{p(z = k, x)}{p(x)} = \\frac{\\pi_k N(x \\mid \\mu_k, \\Sigma_k)}{\\sum_{j=1}^{K} \\pi_j N(x \\mid \\mu_j, \\Sigma_j)}We call $\\gamma_k$ the responsibility of the $k$-th component on the data $x$. Probabilistic Clustering: each data point is assigned a probability distribution over the clusters. “$x$ belongs to the $k$-th cluster with probability $\\gamma_k$” MLE for Gaussian Mixture ModelLog-likelihood function of GMM \\log p(X \\mid \\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)Maximum Likelihood Estimation \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)subject to: \\sum_{k=1}^{K} \\pi_k = 1Optimality Condition for $\\mu$ N(x \\mid \\mu, \\Sigma) = \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x - \\mu)^T \\Sigma^{-1} (x - \\mu) \\right), \\frac{\\partial x^T A x}{\\partial x} = (A + A^T) x \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)Take partial derivative with respect to $\\mu_k$, 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\mu_k} = -\\sum_{n=1}^{N} \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)} \\Sigma_k^{-1} (x_n - \\mu_k)Notice that the posterior of $z_n$ (also known as responsibility $\\gamma_{n,k}$) can be written as \\gamma_{n,k} \\overset{\\Delta}{=} p(z_n = k \\mid x_n) = \\frac{p(z_n = k) p(x_n \\mid z_n = k)}{\\sum_j p(z_n = j) p(x_n \\mid z_n = j)} = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}Thus 0 = \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k) \\mu_k = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} x_n, \\text{ where } N_k = \\sum_{n=1}^{N} \\gamma_{n,k}Optimality Condition for $\\Sigma$ \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right) \\gamma_{n,k} = p(z_n = k \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}, \\quad N_k \\overset{\\Delta}{=} \\sum_{n=1}^{N} \\gamma_{n,k}Similarly, take derivative with respect to $\\Sigma_k$, which yields 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\Sigma_k} \\quad \\Rightarrow \\quad \\Sigma_k = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k)(x_n - \\mu_k)^TResponsibility-reweighted Sample Covariance Optimality Condition for $\\pi$ \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right) \\gamma_{n,k} = p(z_n = k \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}, \\quad N_k \\overset{\\Delta}{=} \\sum_{n=1}^{N} \\gamma_{n,k}Constraints of mixing coefficients $\\pi$: $\\sum_{k=1}^{K} \\pi_k = 1$ Introduce Lagrange multiplier: \\mathcal{L}' = \\mathcal{L} + \\lambda \\left( \\sum_{k=1}^{K} \\pi_k - 1 \\right)Take derivative with respect to $\\pi_k$, which gives 0 = \\frac{\\partial \\mathcal{L}'}{\\partial \\pi_k} \\quad \\Rightarrow \\quad \\sum_{n=1}^{N} \\frac{\\gamma_{n,k}}{\\pi_k} + \\lambda = \\frac{N_k}{\\pi_k} + \\lambda \\quad \\Rightarrow \\quad \\pi_k = \\frac{-N_k}{\\lambda}By the constraints, we have $1 = \\sum_{k=1}^{K} \\pi_k = \\frac{-1}{\\lambda} \\sum_{k=1}^{K} N_k$, Also notice that \\sum_{k=1}^{K} N_k = \\sum_{k=1}^{K} \\sum_{n=1}^{N} \\gamma_{n,k} = \\sum_{n=1}^{N} \\sum_{k=1}^{K} \\gamma_{n,k} = \\sum_{n=1}^{N} 1 = NTherefore, \\lambda = -\\sum_{k=1}^{K} N_k = -N, \\quad \\pi_k = \\frac{N_k}{N}Expectation-Maximization (EM) Algorithm Initialize $\\pi_k, \\mu_k, \\Sigma_k, \\quad k = 1, 2, \\ldots, K$ E-Step: Evaluate the responsibilities using the current parameter values \\gamma_{n,k} = p(z_n = 1 \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)} M-Step: Re-estimate the parameters using the current responsibilities \\mu_k^{\\text{new}} = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} x_n \\Sigma_k^{\\text{new}} = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k^{\\text{new}})(x_n - \\mu_k^{\\text{new}})^T \\pi_k^{\\text{new}} = \\frac{N_k}{N}where $N_k = \\sum_{n=1}^{N} \\gamma_{n,k}$ Return to step 2 if the convergence criterion is not satisfied. Hierarchical ClusteringDistance Function: The distance function affects which pairs of clusters are merged/split and in what order. Single Linkage: d(C_i, C_j) = \\min_{x \\in C_i, y \\in C_j} d(x, y) Complete Linkage: d(C_i, C_j) = \\max_{x \\in C_i, y \\in C_j} d(x, y) Average Linkage: d(C_i, C_j) = \\frac{1}{|C_i| \\cdot |C_j|} \\sum_{x \\in C_i, y \\in C_j} d(x, y)Two Types of Hierarchical Clustering Bottom-Up (Agglomerative) Start with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Top-Down (Divisive) Start with one all-inclusive cluster, consider every possible way to divide the cluster in two. Choose the best division and recursively operate on both sides. Agglomerative (Bottom-up) Clustering Input: cluster distance measure $d$, dataset $X = \\{x_n\\}_{n=1}^{N}$, number of clusters $k$ Initialize $\\mathcal{C} = \\{C_i = \\{x_n\\} \\mid x_n \\in X\\}$ // Each point in separate cluster Repeat: Find the closest pair of clusters $C_i, C_j \\in \\mathcal{C}$ based on distance metric $d$ $C_{ij} = C_i \\cup C_j$ // Merge the selected clusters $\\mathcal{C} = (\\mathcal{C} \\setminus \\{C_i, C_j\\}) \\cup \\{C_{ij}\\}$ // Update the clustering Until $|\\mathcal{C}| = k$ A naïve implementation takes space complexity $O(N^2)$, time complexity $O(N^3)$. LASSO RegressionLASSO (Least Absolute Shrinkage and Selection Operator): Simply linear regression with an $\\ell_1$ penalty for sparsity L(w) = \\sum_{i=1}^{n} \\left( w^T x_i - y_i \\right)^2 + C \\|w\\|_1sparse solution $\\leftrightarrow$ feature selection Principal Component Analysis (PCA)Computing PCA: Eigenvalue DecompositionObjective: Maximize variance of projected data \\max_{\\mathbf{u}_j} \\mathbb{E}[(\\mathbf{u}_j^T \\mathbf{x})^2]subject to $\\mathbf{u}_j^T \\mathbf{u}_j = 1$, $\\mathbf{u}_j^T \\mathbf{u}_k = 1$, $k < j$ Observation: PC $j$ is direction of the $j$-th largest eigenvector of $\\frac{1}{n} \\mathbf{X}^T \\mathbf{X}$ Eigenvalue Decomposition: \\mathbf{U} = \\begin{pmatrix} \\mathbf{u}_1 & \\cdots & \\mathbf{u}_k \\\\ \\end{pmatrix}are eigenvectors of $\\frac{1}{n} \\mathbf{X}^T \\mathbf{X}$ Manifold LearningGeodesic distance: lines of shortest length between points on a manifold","link":"/2024/06/09/Trad-ML/"},{"title":"<模板> 树套树","text":"提交地址: 洛谷P3380 二逼平衡树 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221#include <bits/stdc++.h>#define reg register#define lc t[x].ch[0]#define rc t[x].ch[1]using namespace std;const int MaxN = 50010;const int inf = 2147483647;int n, m, tot;int a[MaxN];struct node{ int ch[2]; int cnt, size, val, rnd;};node t[MaxN << 5];struct treap{ int l, r, root; void update(int x) { t[x].size = t[lc].size + t[rc].size + t[x].cnt; } void rotate(int &x, int c) { int s = t[x].ch[c]; t[x].ch[c] = t[s].ch[c ^ 1]; t[s].ch[c ^ 1] = x; update(x); update(x = s); } void insert(int &x, int val) { if (!x) { x = ++tot; t[x].cnt = t[x].size = 1; t[x].rnd = rand(), t[x].val = val; return; } t[x].size++; if (t[x].val == val) { ++t[x].cnt; return; } int c = val > t[x].val; insert(t[x].ch[c], val); if (t[x].rnd > t[t[x].ch[c]].rnd) rotate(x, c); } void del(int &x, int val) { if (!x) return; if (t[x].val == val) { if (t[x].cnt > 1) { t[x].cnt--, t[x].size--; return; } bool c = t[lc].rnd > t[rc].rnd; if (lc == 0 || rc == 0) x = lc + rc; else rotate(x, c), del(x, val); } else --t[x].size, del(t[x].ch[t[x].val < val], val); } int rank(int x, int val) { if (!x) return 0; if (t[x].val == val) return t[lc].size; if (t[x].val > val) return rank(lc, val); else return t[lc].size + t[x].cnt + rank(rc, val); } int query_val(int x, int val) { while (1) { if (val <= t[lc].size) x = lc; else if (val > t[lc].size + t[x].cnt) val -= t[lc].size + t[x].cnt, x = rc; else return t[x].val; } } int query_pre(int x, int val) { if (!x) return -inf; if (t[x].val >= val) return query_pre(lc, val); else return cmax(t[x].val, query_pre(rc, val)); } int query_sub(int x, int val) { if (!x) return inf; if (t[x].val <= val) return query_sub(rc, val); else return cmin(t[x].val, query_sub(lc, val)); }};treap tr[MaxN << 2];struct tree{ void build(int id, int l, int r) { tr[id].l = l, tr[id].r = r; for (int i = l; i <= r; i++) tr[id].insert(tr[id].root, a[i]); if (l == r) return; int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); } int query_rank(int id, int l, int r, int val) { if (tr[id].l > r || tr[id].r < l) return 0; if (l <= tr[id].l && tr[id].r <= r) { int ans = tr[id].rank(tr[id].root, val); return ans; } int ans = 0; ans += query_rank(id << 1, l, r, val); ans += query_rank(id << 1 | 1, l, r, val); return ans; } int query_val(int l, int r, int k) { int L = 0, R = 1e8; while (L < R) { int mid = (L + R + 1) >> 1; if (query_rank(1, l, r, mid) + 1 <= k) L = mid; else R = mid - 1; } return L; } void modify(int id, int pos, int val) { if (pos < tr[id].l || tr[id].r < pos) return; tr[id].del(tr[id].root, a[pos]); tr[id].insert(tr[id].root, val); if (tr[id].l == tr[id].r) return; modify(id << 1, pos, val); modify(id << 1 | 1, pos, val); } int query_pre(int id, int l, int r, int val) { if (tr[id].l > r || tr[id].r < l) return -inf; if (l <= tr[id].l && tr[id].r <= r) return tr[id].query_pre(tr[id].root, val); int ans = max(query_pre(id << 1, l, r, val), query_pre(id << 1 | 1, l, r, val)); return ans; } int query_sub(int id, int l, int r, int val) { if (tr[id].l > r || tr[id].r < l) return inf; if (l <= tr[id].l && tr[id].r <= r) return tr[id].query_sub(tr[id].root, val); int ans = min(query_sub(id << 1, l, r, val), query_sub(id << 1 | 1, l, r, val)); return ans; }} T;inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ srand(19260817); int n = read(), m = read(); for (int i = 1; i <= n; i++) a[i] = read(); int op; T.build(1, 1, n); for (int i = 1; i <= m; i++) { op = read(); int l, r, val, pos; if (op == 3) { pos = read(), val = read(); T.modify(1, pos, val); a[pos] = val; continue; } l = read(), r = read(), val = read(); if (op == 1) printf(\"%d\\n\", T.query_rank(1, l, r, val) + 1); else if (op == 2) printf(\"%d\\n\", T.query_val(l, r, val)); else if (op == 4) printf(\"%d\\n\", T.query_pre(1, l, r, val)); else if (op == 5) printf(\"%d\\n\", T.query_sub(1, l, r, val)); } return 0;}","link":"/2019/04/05/模板-树套树/"},{"title":"洛谷1156 垃圾陷阱","text":"一道简单的动态规划 将每个垃圾按扔下来的时间从小到大排序 每次扔下来一个垃圾时,如果能靠这个垃圾爬出来就123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051不然就继续最后如果挂了的话算一下他能撑多久```cpp#include <bits/stdc++.h>const int MaxN = 100010;struct node{ int t, f, h;};node a[MaxN];int d, g, f[MaxN];inline int cmp(node a, node b){ return a.t < b.t;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ d = read(), g = read(); for (int i = 1; i <= g; i++) a[i].t = read(), a[i].f = read(), a[i].h = read(); std::sort(a + 1, a + g + 1, cmp); f[0] = 10; for (int i = 1; i <= g; i++) { for (int j = d; j >= 0; j--) { if (f[j] >= a[i].t) { if (j + a[i].h >= d) return 0 * printf("%d\\n", a[i].t); f[j + a[i].h] = std::max(f[j], f[j + a[i].h]); f[j] += a[i].f; } } } printf("%d\\n", f[0]); return 0;}","link":"/2019/02/24/洛谷1156/"},{"title":"「洛谷2146」软件包管理器","text":"zcy会写树剖啦! 本题为树链剖分的模板题 对于”install x”操作, 将$x$到根节点路径上所有点的点权全部赋值为$1$ 对于”uninstall x”操作, 将$x$及$x$的子树点权全部赋值为$0$ 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184// luogu-judger-enable-o2#include <bits/stdc++.h>using namespace std;const int MaxN = 150010;struct edge{ int to, next;};struct node{ int l, r; int sum, tag;};struct SegmentTree{ node t[MaxN << 1]; inline void pushup(int id) { t[id].sum = t[id << 1].sum + t[id << 1 | 1].sum; } void build(int id, int l, int r) { t[id].l = l, t[id].r = r, t[id].tag = -1; if (l == r) return; int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); } inline void pushdown(int id) { if (t[id].tag != -1) { t[id << 1].sum = t[id].tag * (t[id << 1].r - t[id << 1].l + 1); t[id << 1 | 1].sum = t[id].tag * (t[id << 1 | 1].r - t[id << 1 | 1].l + 1); t[id << 1].tag = t[id].tag; t[id << 1 | 1].tag = t[id].tag; t[id].tag = -1; } } inline void modify(int id, int l, int r, int delta) { if (l > t[id].r || t[id].l > r) return; if (l <= t[id].l && t[id].r <= r) { t[id].sum = delta * (t[id].r - t[id].l + 1); t[id].tag = delta; return; } if (t[id].l == t[id].r) return; pushdown(id); modify(id << 1, l, r, delta); modify(id << 1 | 1, l, r, delta); pushup(id); return; } inline int query(int id, int l, int r) { if (l > t[id].r || t[id].l > r) return 0; if (l <= t[id].l && t[id].r <= r) return t[id].sum; if (t[id].l == t[id].r) return 0; pushdown(id); return query(id << 1, l, r) + query(id << 1 | 1, l, r); }} T;edge e[MaxN << 1];int n, m, cnt, dfsnum, size[MaxN], hson[MaxN];int head[MaxN], top[MaxN], dfn[MaxN], fa[MaxN], dep[MaxN];inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline void dfs1(int u, int f){ size[u] = 1; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == f) continue; dep[v] = dep[u] + 1; fa[v] = u; dfs1(v, u); size[u] += size[v]; if (size[v] > size[hson[u]]) hson[u] = v; }}inline void dfs2(int u, int Top){ ++dfsnum; dfn[u] = dfsnum; top[u] = Top; if (hson[u]) dfs2(hson[u], Top); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == hson[u] || v == fa[u]) continue; dfs2(v, v); }}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void update_tree(int u){ T.modify(1, dfn[u], dfn[u] + size[u] - 1, 0);}inline void update_chain(int u, int v, int delta){ while (top[u] != top[v]) { if (dep[top[u]] < dep[top[v]]) swap(u, v); T.modify(1, dfn[top[u]], dfn[u], delta); u = fa[top[u]]; } if (dep[u] < dep[v]) swap(u, v); T.modify(1, dfn[v], dfn[u], delta);}signed main(){ n = read(); for (int i = 2; i <= n; ++i) { int u = read() + 1; add_edge(i, u); add_edge(u, i); } dep[1] = 1, fa[1] = 0; dfs1(1, 0), dfs2(1, 1); T.build(1, 1, n); m = read(); for (int i = 1; i <= m; i++) { string op; cin >> op; int before = T.t[1].sum; if (op == \"install\") { int u = read() + 1; update_chain(u, 1, 1); int after = T.t[1].sum; printf(\"%d\\n\", after - before); } else { int u = read() + 1; update_tree(u); int after = T.t[1].sum; printf(\"%d\\n\", before - after); } } return 0;}","link":"/2019/02/21/洛谷2146/"},{"title":"洛谷 P1337 [JSOI2004]平衡点 / 吊打XXX","text":"一些模拟退火的注意事项: 开始温度要设到比较高 在不超时的情况下多随几次 最好确定一个随机种子 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172#include <bits/stdc++.h>const int MaxN = 1010;const double delta = 0.993;struct node{ int x, y, w;};int n;node a[MaxN];double ansx, ansy;double ans = 1e18, t;inline int read(){ bool f = 0; int x = 0; char ch = getchar(); while (ch < '0' || ch > '9') { if (ch == '-') f = true; ch = getchar(); } while (ch >= '0' && ch <= '9') x = (x << 1) + (x << 3) + ch - '0', ch = getchar(); return (!f) ? x : -x;}double calc(double nx, double ny){ double sum = 0; for (int i = 1; i <= n; i++) { double x = nx - a[i].x; double y = ny - a[i].y; sum += sqrt(x * x + y * y) * a[i].w; } return sum;}void sa(){ double nowx = ansx, nowy = ansy; t = 1000000; while (t > 1e-14) { double tmpx = ansx + (rand() * 2 - RAND_MAX) * t; double tmpy = ansy + (rand() * 2 - RAND_MAX) * t; double tmp = calc(tmpx, tmpy); if (tmp - ans < 0) { nowx = tmpx; nowy = tmpy; ansx = tmpx; ansy = tmpy; ans = tmp; } else if (exp((ans - tmp) / t) * RAND_MAX > rand()) { nowx = tmpx; nowy = tmpy; } t *= delta; }}int main(){ n = read(); srand(19260817); for (int i = 1; i <= n; i++) a[i].x = read(), a[i].y = read(), a[i].w = read(); sa(); printf(\"%.3lf %.3lf\", ansx, ansy); return 0;}","link":"/2019/02/06/洛谷1337/"},{"title":"洛谷2048 [NOI2010] 超级钢琴","text":"题目大意你有一个序列${a_i}$,你要找出$k$个不相同的区间$[l_i,r_i]$,满足$\\forall \\; i, (r_i-l_i+1) \\in [L, R]$,使得这些区间的和最大。 求这个最大值 分析我们发现这个题有点像$\\texttt{P2085 最小函数值}$,我们考虑用一样的方法做 显然,对于这个序列,$i \\in [1, n - l + 1]$的$a_i$都可以作为左端点,我们考虑建立一个堆,堆里放置以每个点为左端点的区间 堆里的节点可以记为$\\texttt{(L, R, maxp, val, pos)}$其中$\\texttt{L, R}$分别表示当前区间的长度的上下界,$\\texttt{pos}$表示该区间的左端点,$\\texttt{maxp}$表示以$\\texttt{pos}$为左端点且长度$\\in \\texttt{[L,R]}$的和最大的区间长度,$\\texttt{val}$表示$[i, i + maxp - 1]$这个区间的和 初始把所有左端点$i$对应的区间$[i,i+l-1]$到$[i,\\min(i+r-1,n)]$的和最大的区间加入堆中 每次我们取出堆中和最大的区间,设这个区间为$\\texttt{(L, R, maxp, val, pos)}$ 则我们把$\\texttt{val}$记录进答案,并往堆里插入$\\texttt{(L, maxp-1, maxp’, val’, pos)}$和$\\texttt{(maxp+1, R, maxp’’, val’’, pos)}$ 这里$\\texttt{maxp’,val’,maxp’’,val’’}$分别表示左右半区间的最大和取到的位置和这个最大和 将这个操作执行$k$次,时间复杂度$\\texttt{O(n log n)}$ PS. 维护区间的最大值和最大值位置可以用$\\texttt{ST}$表维护 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990#include <bits/stdc++.h>#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const ll MaxN = 5e5 + 10;struct node{ ll maxp, val; ll l, r, pos; bool operator<(node x) const { return val < x.val; }};ll n, k, l, r;std::priority_queue<node> q;ll a[MaxN], lg[MaxN], sum[MaxN], max[MaxN][21], maxp[MaxN][21];void query(ll l, ll r, ll &val, ll &pos){ ll len = lg[r - l + 1]; val = std::max(max[l][len], max[r - (1 << len) + 1][len]); pos = (max[l][len] > max[r - (1 << len) + 1][len]) ? maxp[l][len] : maxp[r - (1 << len) + 1][len];}void prework(){ lg[0] = -1; for (ll i = 1; i <= n; i++) maxp[i][0] = i, max[i][0] = sum[i], lg[i] = lg[i >> 1] + 1; for (ll j = 1; j <= 20; j++) for (ll i = 1; i <= n - (1 << j) + 1; i++) max[i][j] = std::max(max[i][j - 1], max[i + (1 << (j - 1))][j - 1]); for (ll j = 1; j <= 20; j++) for (ll i = 1; i <= n - (1 << j) + 1; i++) maxp[i][j] = ((max[i][j - 1] > max[i + (1 << (j - 1))][j - 1]) ? maxp[i][j - 1] : maxp[i + (1 << (j - 1))][j - 1]);}inline ll read(){ ll x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}int main(){ n = read(), k = read(); l = read(), r = read(); for (ll i = 1; i <= n; i++) a[i] = read(), sum[i] = sum[i - 1] + a[i]; prework(); for (ll i = 1; i <= n; i++) { ll pos, val; if (i + l - 1 > n) break; query(i + l - 1, std::min(i + r - 1, n), val, pos); val -= sum[i - 1], pos -= i - 1; q.push((node){pos, val, l, std::min(r, n - i + 1), i}); } ll ans = 0; for(ll i = 1; i <= k; i++) { node x = q.top(); q.pop(), ans += x.val; if(x.maxp > x.l) { ll pos, val; query(x.pos + x.l - 1, x.pos + x.maxp - 2, val, pos); val -= sum[x.pos - 1], pos -= x.pos - 1; q.push((node){pos, val, x.l, x.maxp - 1, x.pos}); } if(x.maxp < x.r) { ll pos, val; query(x.pos + x.maxp, x.pos + x.r - 1, val, pos); val -= sum[x.pos - 1], pos -= x.pos - 1; q.push((node){pos, val, x.maxp + 1, x.r, x.pos}); } } printf(\"%lld\\n\", ans); return 0;}","link":"/2020/03/09/洛谷2048/"},{"title":"洛谷 P2503 [HAOI2006]均分数据","text":"模拟退火写起来真舒服喵~ 首先我们把这$n$个数随机分成$m$组,然后退火时每次随机两个数交换分组,如果更优的话就保存,不然的话就以一定的概率接受该答案 记得多随机几次喵~ 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667#include <bits/stdc++.h>const int MaxN = 50;const double delta = 0.995;int n, m;int a[MaxN], f[MaxN];double sum[MaxN], aver = 0, ans = 1e18;inline int read(){ int x = 0, f = 1; char ch = getchar(); while(ch > '9' || ch < '0') { if(ch == '-') f = 0; ch = getchar(); } while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}inline double calc(){ double tmp = 0; for (int i = 1; i <= m; i++) tmp += (sum[i] - aver) * (sum[i] - aver); return tmp;}void sa(){ memset(sum, 0, sizeof(sum)); double tmp = 0; for(int i = 1; i <= n; i++) f[i] = rand() % m + 1, sum[f[i]] += a[i]; for(int i = 1; i <= m; i++) tmp += (sum[i] - aver) * (sum[i] - aver); double t = 10000000; while(t > 1e-14) { int x = rand() % n + 1, y = rand() % n + 1; while(f[x] == f[y]) x = rand() % n + 1, y = rand() % n + 1; sum[f[x]] -= a[x]; sum[f[x]] += a[y]; sum[f[y]] += a[x]; sum[f[y]] -= a[y]; double now = calc(); if ((now < tmp) || (exp((now - tmp) / t) * RAND_MAX < rand())) tmp = now, std::swap(f[x], f[y]); else sum[f[x]] += (a[x] - a[y]), sum[f[y]] += (a[y] - a[x]); t *= delta; } if(tmp < ans) ans = tmp;}int main(){ srand(time(NULL)); n = read(), m = read(); for(int i = 1; i <= n; i++) a[i] = read(), aver += a[i]; aver /= m; for(int i = 1; i <= 500; i++) sa(); printf(\"%.2lf\", sqrt(ans / m)); return 0;}","link":"/2019/02/07/洛谷2503/"},{"title":"洛谷 P2485 【[SDOI2011]计算器】","text":"数论三合一大礼包第一问快速幂不讲了 第二问要你求的是$x*y \\equiv z \\mod p$ 即 $xy - kp = z$ 即 $xy + p*(-k) = z$ 就转换为$exgcd$的标准形式了(这个相信大家都会吧) 第三问BSGS模板题 有兴趣可以看P4195 exBSGS模板 注意$b$有可能大于$p$,所以要膜一下 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114#include <bits/stdc++.h>#define ll long long#define int llstd::unordered_map<int, int> h;int gcd(int a, int b) { return b ? gcd(b, a % b) : a; }inline int mul(int a, int b, int p){ ll ret = 0; while(b) { if (b & 1) ret = (ret + a) % p; a = (a + a) % p; b >>= 1; } return ret;}void exgcd(int a, int b, int &x, int &y){ if (b == 0) { x = 1, y = 0; return; } exgcd(b, a % b, x, y); int t = x; x = y, y = t - (a / b) * y;}int solve1(int a, int b, int p){ ll ret = 1; while (b) { if (b & 1) ret = mul(ret, a, p); a = mul(a, a, p); b >>= 1; } return ret;}int solve2(int a, int b, int p){ int x = 0, y = 0; int g = gcd(a, p); if (b % g) return -1; exgcd(a, p, x, y); x *= (b / g); x = (x % p + p) % p; return x;}int solve3(int a, int b, int p){ if (b == 1) return 0; int cnt = 0, d, k = 1; while ((d = gcd(a, p)) ^ 1) { if (b % d) return -1; b /= d, p /= d, ++cnt; k = mul(k, a / d, p); if (k == b) return cnt; } int t = sqrt(p) + 1, tmp = 1; h.clear(); for (int i = 0; i < t; i++) { h[mul(tmp, b, p)] = i; tmp = mul(tmp, a, p); } k = mul(k, tmp, p); for (int i = 1; i <= t; i++) { if (h.find(k) != h.end()) return i * t - h[k] + cnt; k = mul(k, tmp, p); } return -1;}signed main(){ int T, op; scanf(\"%lld%lld\", &T, &op); while (T--) { int a, b, p; scanf(\"%lld%lld%lld\", &a, &b, &p); if (op == 1) printf(\"%lld\\n\", solve1(a, b, p)); if (op == 2) { b %= p; int ans = solve2(a, b, p); if (ans == -1) printf(\"Orz, I cannot find x!\\n\"); else printf(\"%lld\\n\", ans); } if (op == 3) { b %= p;//注意这个! int ans = solve3(a, b, p); if (ans == -1) printf(\"Orz, I cannot find x!\\n\"); else printf(\"%lld\\n\", ans); } } return 0;}","link":"/2019/02/06/洛谷2485/"},{"title":"洛谷3178 [HAOI2015] 树上操作","text":"就是个树剖的模板题嘛。。。。 两边dfs把树割成链, 然后在链上线段树维护 做完了。。。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187#include <bits/stdc++.h>#define int long longconst int MaxN = 100010;struct edge{ int next, to;};struct node{ int l, r; int sum, tag;};edge e[MaxN << 1];int n, m, cnt, dfscnt;int a[MaxN], head[MaxN], dfn[MaxN], pre[MaxN];int top[MaxN], dep[MaxN], hson[MaxN], fa[MaxN], size[MaxN];struct SegmentTree{ node t[MaxN << 2]; inline void pushup(int id) { t[id].sum = t[id << 1].sum + t[id << 1 | 1].sum; } void build(int id, int l, int r) { t[id].l = l, t[id].r = r; if (l == r) { t[id].sum = a[pre[l]]; return; } int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); pushup(id); } inline void pushdown(int id) { if (t[id].tag) { t[id << 1].sum += t[id].tag * (t[id << 1].r - t[id << 1].l + 1); t[id << 1 | 1].sum += t[id].tag * (t[id << 1 | 1].r - t[id << 1 | 1].l + 1); t[id << 1].tag += t[id].tag, t[id << 1 | 1].tag += t[id].tag; t[id].tag = 0; } } void modify(int id, int l, int r, int delta) { if (l > t[id].r || r < t[id].l) return; if (l <= t[id].l && t[id].r <= r) { t[id].sum += delta * (t[id].r - t[id].l + 1); t[id].tag += delta; return; } if (t[id].l == t[id].r) return; pushdown(id); modify(id << 1, l, r, delta); modify(id << 1 | 1, l, r, delta); pushup(id); } int query(int id, int l, int r) { if (l > t[id].r || r < t[id].l) return 0; if (l <= t[id].l && t[id].r <= r) return t[id].sum; if (t[id].l == t[id].r) return 0; pushdown(id); return query(id << 1, l, r) + query(id << 1 | 1, l, r); }} T;inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}void dfs1(int u, int f){ size[u] = 1; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == f) continue; dep[v] = dep[u] + 1, fa[v] = u; dfs1(v, u); size[u] += size[v]; if (size[v] > size[hson[u]]) hson[u] = v; }}void dfs2(int u, int Top){ ++dfscnt; dfn[u] = dfscnt; pre[dfscnt] = u; top[u] = Top; if (hson[u]) dfs2(hson[u], Top); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa[u] || v == hson[u]) continue; dfs2(v, v); }}inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}inline int query(int u, int v){ int ans = 0; while (top[u] != top[v]) { if (dep[u] < dep[v]) std::swap(u, v); ans += T.query(1, dfn[top[u]], dfn[u]); u = fa[top[u]]; } if (dep[u] < dep[v]) std::swap(u, v); ans += T.query(1, dfn[v], dfn[u]); return ans;}signed main(){ n = read(), m = read(); for (int i = 1; i <= n; i++) a[i] = read(); for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v); add_edge(v, u); } dep[1] = 1, fa[1] = 0; dfs1(1, 0), dfs2(1, 1); T.build(1, 1, n); for (int i = 1; i <= m; i++) { int op = read(); if (op == 1) { int u = read(), x = read(); T.modify(1, dfn[u], dfn[u], x); } if (op == 2) { int u = read(), x = read(); T.modify(1, dfn[u], dfn[u] + size[u] - 1, x); } if (op == 3) { int u = read(); printf(\"%lld\\n\", query(u, 1)); } } return 0;}","link":"/2019/02/22/洛谷3178/"},{"title":"洛谷3047 [USACO12FEB]Nearby Cows G","text":"题目大意有一棵$n$个节点的树,点有点权,对于每个节点,你要求出离这个节点距离$k$以内的节点的点权和 $1 \\leq n \\leq 10^5, 1 \\leq k \\leq 20$ 分析我们设与第$i$号点距离在$k$之内的点为$i$号点的影响集合。 首先我们发现每个节点的答案可以通过一遍$\\texttt{dfs}$求得,但是这样的时间复杂度高达$\\mathcal{O}(n ^ 2)$,显然不可能通过此题。 我们发现$k$很小,并且父亲节点的影响集合与孩子节点的影响集合有很大一部分是重合的。 这给了我们一些启发:如果我们能通过一些方法使得父亲节点的答案能够用比较少的时间转移到孩子节点,那么就能通过此题。 于是我们考虑换根$\\texttt{dp}$。 我们首先通过一遍$\\texttt{dfs}$求出节点$1$的答案$f[1]$,和每个节点$i$第$j$层孩子的点权和$sum[i][j] \\; (0 \\leq j \\leq k)$ 现在我们考虑答案如何从父亲节点转移到孩子节点。 首先我们发现,父亲节点的$k$级祖先不在孩子节点的影响集合内,孩子节点的$k$级儿子不在父亲节点的影响集合内。 令当前节点为$\\texttt{u}$,父亲节点为$\\texttt{fa}$,父亲节点的$k$级祖先为$\\texttt{top}$,则$\\Delta f_0 = -c[top] + sum[u][k]$ 另外,我们发现$\\forall \\; i \\in [1, k]$,$\\texttt{fa}$的第$i-1$级祖先的第$k-i+1$层儿子不在$i$的影响范围内,但是这些节点某一些不应该被删掉,于是就有$\\Delta f_i= - sum[now][k - i + 1] - sum[last][k - i]$,其中$\\texttt{now}$表示$\\texttt{fa}$的$i-1$级祖先,$last$表示$\\texttt{u}$的$i-1$级祖先。 所以$f_u = f_{fa} + \\sum_{i=0}^k \\Delta f_i$,总时间复杂度$\\mathcal{O}(n \\times k)$ 可能会有点绕,具体细节参见代码。 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 2e5 + 10;struct edge{ int next, to;};edge e[MaxN];int n, m, k, cnt;int fa[MaxN][21], sum[MaxN][21];int head[MaxN], f[MaxN], c[MaxN], dep[MaxN];void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}int jump(int x, int y){ for (int i = 20; ~i; i--) if (y & (1 << i)) x = fa[x][i]; return x;}void dfs1(int u, int fa){ if (dep[fa] <= k) f[1] += c[u]; dep[u] = dep[fa] + 1, ::fa[u][0] = fa; for (int i = 1; i <= 20; i++) ::fa[u][i] = ::fa[::fa[u][i - 1]][i - 1]; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs1(v, u); for (int j = 1; j <= k; j++) sum[u][j] += sum[v][j - 1]; }}void dfs2(int u, int fa){ if (u != 1) { int now = fa, last = u; f[u] = f[fa] + sum[u][k]; for(int i = 1; i <= k; i++) { if(!now) break; f[u] -= sum[now][k - i + 1] - sum[last][k - i]; last = now, now = ::fa[now][0]; } f[u] -= c[now]; } for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs2(v, u); }}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(), k = read(); for (int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v), add_edge(v, u); } for (int i = 1; i <= n; i++) sum[i][0] = c[i] = read(); dfs1(1, 0), dfs2(1, 0); for (int i = 1; i <= n; i++) { printf(\"%d\\n\", f[i]); // for (int j = 1; j <= k; j++) // printf(\"%d%c\", sum[i][j], \" \\n\"[j == k]); } return 0;}","link":"/2020/04/07/洛谷3047/"},{"title":"洛谷2210 Haywire","text":"模拟退火模板题… 每次随机将两个位置上的奶牛交换位置 然后算出现在所需的干草数量 如果比答案少就更新 否则就以一定概率接受这个解 然后。。多随机几次就做完了 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162#include <bits/stdc++.h>#define R register#define ll long long#define cmin(a, b) ((a < b) ? a : b)#define cmax(a, b) ((a < b) ? b : a)const double delta = 0.999;const int MaxN = 20;int n, ans = 0x3f3f3f3f;double t = 10000000.0;int pos[MaxN], fri[MaxN][4], g[MaxN][MaxN];inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int calc(){ int ret = 0; for (int i = 1; i <= n; i++) { ret += abs(pos[i] - pos[fri[i][1]]); ret += abs(pos[i] - pos[fri[i][2]]); ret += abs(pos[i] - pos[fri[i][3]]); } return ret / 2;}inline void SA(){ t = 1.0; while (t > 1e-10) { int x = rand() % n + 1, y = rand() % n + 1; std::swap(pos[x], pos[y]); int tmp = calc(); int del = tmp - ans; if (del < 0) ans = tmp; else if (exp(-del / t) * RAND_MAX <= rand()) std::swap(pos[x], pos[y]); t *= delta; }}int main(){ n = read(); srand(time(NULL)); for (int i = 1; i <= n; i++) { pos[i] = i; for (int j = 1; j <= 3; j++) fri[i][j] = read(), g[i][fri[i][j]] = 1, g[fri[i][j]][i] = 1; } for (int i = 1; i <= 100; i++) SA(); printf(\"%d\\n\", ans); return 0;} 不要问我为什么这么久没更博客","link":"/2019/03/30/洛谷2210/"},{"title":"洛谷3628 [APIO2010]特别行动队","text":"斜率优化的练手题 通读题目可以发现 f_i=\\max (f_j+g(s[i]-s[j])) 其中f_i表示在i处强制结束一段的最大代价,s_i表示a_i的前缀和,g(x)表示(ax^2+bx+c) 展开这个式子我们得到 f_i=\\max(f_j+as_i^2-2as_is_j+as_j^2+bs_i-bs_j+c)去掉$\\max$,移项得到: (f_j+as_j^2-bs_j)=2as_is_j+(f_i-as_i^2-bs_i-c)然后就是常规的单调队列维护上凸壳了 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 1e6 + 10;ll n, A, B, C;ll a[MaxN], s[MaxN], f[MaxN], q[MaxN];inline ll read(){ ll x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}ll g(int num) { return A * num * num + B * num + C; }ll x(int num) { return s[num]; }ll y(int num) { return (f[num] + A * s[num] * s[num] - B * s[num]); }ll k(int num) { return 2 * A * s[num]; }int main(){ n = read(), A = read(), B = read(), C = read(); for (int i = 1; i <= n; i++) a[i] = read(), s[i] = s[i - 1] + a[i]; int l = 1, r = 1; for (int i = 1; i <= n; i++) { while (l < r && (y(q[l + 1]) - y(q[l])) >= k(i) * (x(q[l + 1]) - x(q[l]))) ++l; f[i] = f[q[l]] + g(s[i] - s[q[l]]); while (l < r && (y(q[r]) - y(q[r - 1])) * (x(i) - x(q[r])) <= (y(i) - y(q[r])) * (x(q[r]) - x(q[r - 1]))) --r; q[++r] = i; } printf(\"%lld\\n\", f[n]); return 0;}","link":"/2019/11/30/洛谷3628/"},{"title":"洛谷 P3878 [TJOI2010]分金币","text":"题目大意将$n$个数分成两半,使得这两半的差尽量小 Solution我们首先先把这$n$个数按下标顺序分成两组,然后每次随机选取前半段和后半段的两个数将其交换,如果更优的话就更新$ans$,否则就以$e^{\\frac{-de}{t}}$($de=$当前解-最优解)的概率接受该交换(其实就是模拟退火的基本套路) 代码1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950#include <bits/stdc++.h>#define ll long longconst int MaxN = 50;const double delta = 0.993;ll n, a[MaxN], ans;ll abs(ll x){ return (x > 0) ? x : (-x);}inline ll calc(){ ll sum1 = 0, sum2 = 0; for (int i = 1; i <= n; i++) { if(i <= (n + 1) / 2) sum1 += a[i]; else sum2 += a[i]; } return abs(sum1 - sum2);}inline void sa(){ double t = 10000000; while (t > 1e-14) { int x = rand() % ((n + 1) / 2) + 1, y = rand() % ((n + 1) / 2) + ((n + 1) / 2); std::swap(a[x], a[y]); int now = calc(); int de = now - ans; if (de < 0) ans = now; else if (exp(-de / t) * RAND_MAX <= rand()) std::swap(a[x], a[y]); t *= delta; }}int main(){ int T; srand(time(NULL)); scanf(\"%d\", &T); while (T--) { scanf(\"%lld\", &n); for (int i = 1; i <= n; i++) scanf(\"%lld\", &a[i]); ans = 1e9; for (int i = 1; i <= 50; i++) sa(); printf(\"%lld\\n\", ans); } return 0;}","link":"/2019/02/10/洛谷3878/"},{"title":"洛谷 P3936 Coloring","text":"思路其实很容易想到,只是调参有那么”一点点”恶心 首先按顺序把$1-c$这$c$种数全部填进表格里 然后每次随机选两个颜色不同的块交换,然后计算原方案与现方案的差距,并按几率更新 代码五分钟,调参两百年:C 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697#include <bits/stdc++.h>#pragma GCC optimize(3)using namespace std;const int MaxN = 30;const int dx[] = {1, 0, -1, 0}, dy[] = {0, 1, 0, -1};int n, m, c;int p[51];int a[MaxN][MaxN], tmp[MaxN][MaxN];inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline int calc(int A[30][30]){ int ret = 0; for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) { for (int k = 0; k <= 3; k++) { int nx = i + dx[k], ny = j + dy[k]; if (A[nx][ny] && (A[nx][ny] != A[i][j])) ++ret; } } } return ret / 2;}inline void init(){ int now = 1, cnt = 0; for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) { a[i][j] = now; ++cnt; if (cnt == p[now]) cnt = 0, now++; } }}inline void sa(){ double t = 1.0, delta; if (n * m <= 400) delta = 0.9999900001; memcpy(tmp, a, sizeof(a)); while (t > 1e-5) { std::pair<int, int> pos1, pos2; pos1.first = rand() % n + 1; pos2.first = rand() % n + 1; pos1.second = rand() % m + 1; pos2.second = rand() % m + 1; while (tmp[pos1.first][pos1.second] == tmp[pos2.first][pos2.second]) { pos1.first = rand() % n + 1; pos2.first = rand() % n + 1; pos1.second = rand() % m + 1; pos2.second = rand() % m + 1; } std::swap(tmp[pos1.first][pos1.second], tmp[pos2.first][pos2.second]); int num = calc(tmp) - calc(a); if (num < 0) std::swap(a[pos1.first][pos1.second], a[pos2.first][pos2.second]); else if (exp(-num / t) * RAND_MAX > rand()) std::swap(a[pos1.first][pos1.second], a[pos2.first][pos2.second]); else std::swap(tmp[pos1.first][pos1.second], tmp[pos2.first][pos2.second]); t *= delta; }}int main(){ srand(107); n = read(), m = read(), c = read(); for (int i = 1; i <= c; i++) p[i] = read(); init(); for (int i = 1; i <= 2; i++) sa(); for (int i = 1; i <= n; i++) { for (int j = 1; j <= m; j++) printf(\"%d \", a[i][j]); puts(\"\"); } return 0;}","link":"/2019/02/09/洛谷3936/"},{"title":"「洛谷 P3674」 小清新人渣的本愿","text":"莫队+$bitset$优化 操作$1$: 维护一个$bitset:$ $cnt1$.$cnt1_i$表示$i$这个数是否出现 若存在数$y,z$使得$y-z=x$,则$y = z + x$ 故将$cnt1$与($cnt1<<x$)做与运算即可 操作$2$: 维护两个$bitset: cnt1,cnt2$. $cnt1_i$表示$i$这个数是否出现,$cnt2_i$表示$MaxN-i$这个数是否出现 若存在数$y,z$使得$y+z=x$,则有$y + z - MaxN= x - MaxN$ 令$z’=MaxN-z$,则原式转化为$y - z’ = x - MaxN$ 那么就变成了操作1了。。。只不过这次在cnt2中查$z’$ 故将$cnt1$与$(cnt2>>($MaxN-x$))$做与运算即可(为什么右移$MaxN-x$位呢?因为cnt2和cnt1是反着存储的) 操作$3$: 暴力枚举$x$的约数查询即可 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485#include <bits/stdc++.h>const int MaxN = 100010;struct query{ int id, pos; int op, l, r, x;};query q[MaxN];int n, m, size;int a[MaxN], cnt[MaxN], ans[MaxN];std::bitset<100010> cnt1, cnt2;inline int cmp(query a, query b){ if (a.pos != b.pos) return a.pos < b.pos; else return a.r < b.r;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void add(int x){ ++cnt[a[x]]; if(cnt[a[x]] == 1) cnt1[a[x]] = 1, cnt2[100000 - a[x]] = 1;}inline void del(int x){ --cnt[a[x]]; if(cnt[a[x]] == 0) cnt1[a[x]] = 0, cnt2[100000 - a[x]] = 0;}inline void solve(){ int l = 1, r = 0; for (int i = 1; i <= m; i++) { while (l > q[i].l) --l, add(l); while (r < q[i].r) ++r, add(r); while (l < q[i].l) del(l), ++l; while (r > q[i].r) del(r), --r; if (q[i].op == 1) ans[q[i].id] = (cnt1 & (cnt1 << q[i].x)).any(); else if (q[i].op == 2) ans[q[i].id] = (cnt1 & (cnt2 >> (100000 - q[i].x))).any(); else if (q[i].op == 3) { for (int j = 1; j * j <= q[i].x; j++) { if (q[i].x % j == 0) if (cnt1[j] && cnt1[q[i].x / j]) ans[q[i].id] = 1; } } }}int main(){ n = read(), m = read(); size = pow(n, 0.55); for (int i = 1; i <= n; i++) a[i] = read(); for (int i = 1; i <= n; i++) { q[i].op = read(), q[i].l = read(), q[i].r = read(), q[i].x = read(); q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q + 1, q + m + 1, cmp); solve(); for (int i = 1; i <= m; i++) puts(ans[i] == 1 ? \"hana\" : \"bi\"); return 0;}","link":"/2019/02/12/洛谷3674/"},{"title":"洛谷3959 [NOIP2017]宝藏","text":"题目大意给你$n$个点,$m$条边,要你选一个点作为根建一棵生成树满足代价最小 一棵生成树的代价是$\\Sigma \\; dep[i] * dis[fa_i][i]$, 其中$dep_i$表示$i$节点在这棵生成树中的深度(根节点深度为$0$,$dis[fa_i][i]$表示$i$节点到他父亲节点的距离 题目解析首先通过$n\\leq12$可以发现这是一道状压dp/搜索题 这里我们考虑状压dp 我们设状态$f[i]$表示选点的状态为$i$时,这棵生成树的最小代价,$st_{i,j}$表示当选点状态为$i$且$i$状态取最优方案时节点$j$的深度 那么我们可以很快想到一个dp方程 f_{i|2^k}=\\min\\{f_{i|2^k}, f_i+(g[j][k]*(st[i][j]+1))\\}, st_{i|2^k, k}=st_{i,j}+1(j \\in i \\;\\mathrm{and}\\;k \\notin i)\\\\初始值满足$f_{2^s}=0,st_{2^s,s}=0$,其余位置的$f$和$st$都为$\\mathrm{inf}$ 这个方程的复杂度是$O(n^2 \\times 2^n)$的 注意到根不是固定的, 所以我们可以每次选定一个根来进行dp的计算,总复杂度$O(n^3\\times2^n)$ 代码实现12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 13;int n, m;int g[MaxN][MaxN], st[1 << MaxN][MaxN], f[1 << MaxN];int dp(int s){ memset(st, 0x3f, sizeof(st)), memset(f, 0x3f, sizeof(f)); int lim = (1 << n); f[1 << s] = 0, st[1 << s][s] = 0; for (int i = 1; i < lim; i++) { if (f[i] < 0x3f3f3f3f) { for (int j = 0; j < n; j++) { if (i & (1 << j)) { for (int k = 0; k < n; k++) { if (!(i & (1 << k))) { if ((g[j][k] != 0x3f3f3f3f) && (f[i | (1 << k)] > f[i] + (g[j][k] * (st[i][j] + 1)))) { f[i | (1 << k)] = f[i] + (g[j][k] * (st[i][j] + 1)); memcpy(st[i | (1 << k)], st[i], sizeof(st[i | (1 << k)])); st[i | (1 << k)][k] = st[i][j] + 1; } } } } } } } return f[lim - 1];}int main(){ scanf(\"%d%d\", &n, &m); memset(g, 0x3f, sizeof(g)); for (int i = 1; i <= m; i++) { int u, v, d; scanf(\"%d%d%d\", &u, &v, &d), --u, --v; g[u][v] = std::min(g[u][v], d), g[v][u] = std::min(g[v][u], d); } int ans = 0x3f3f3f3f; for (int i = 0; i < n; i++) ans = std::min(ans, dp(i)); printf(\"%d\\n\", ans); return 0;}","link":"/2019/11/03/洛谷3959/"},{"title":"洛谷4092 [HEOI2016/TJOI2016]树","text":"题目大意给定一颗有根树,根为 $1$ ,有以下两种操作: 标记操作:对某个结点打上标记。(在最开始,只有结点 $1$ 有标记,其他结点均无标记,而且对于某个结点,可以打多次标记。) 询问操作:询问某个结点最近的一个打了标记的祖先。(这个结点本身也算自己的祖先) $1 \\leq n, q \\leq 10^5 $ 分析我们注意到一个性质:祖先的$\\texttt{dfs}$序比儿子的小。 于是,我们考虑这样一个做法: 最开始的时候,所有节点的值$a_i$为$1$ 对于一次对节点$u$的标记,我们把$u$子树内的所有节点权值对$dfn_u$取$\\texttt{max}$ 对于查询操作,我们输出该节点的权值所对应的点。 那么,为什么这个做法是正确的呢? 回想到上面那个性质,当我们用$dfn_u$更新子树内节点时,其实就是把所有被标记的祖先对答案的影响覆盖了。 而由于我们是取$\\max$,于是子树内如果某些节点已经被标记,则这些节点及他们的儿子的答案不会受影响。 所以这个做法是正确的。 又由于对区间取$\\max$的时间复杂度是$\\mathcal{O}(\\log n)$级别的,于是我们的总复杂度是$\\mathcal{O}(n \\log n)$ 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 2e5 + 10;struct edge{ int next, to;};struct node{ int l, r; int min, sec, tag;};edge e[MaxN << 1];int n, m, cnt, dfscnt;int head[MaxN], fa[MaxN], dep[MaxN], dfn[MaxN], pre[MaxN], size[MaxN];struct SegmentTree{ node t[MaxN << 2]; void pushup(int id) { int lc = id << 1, rc = id << 1 | 1; if(t[lc].min == t[rc].min) { t[id].min = t[lc].min; t[id].sec = std::min(t[lc].sec, t[rc].sec); } else if(t[lc].min < t[rc].min) { t[id].min = t[lc].min; t[id].sec = std::min(t[lc].sec, t[rc].min); } else { t[id].min = t[rc].min; t[id].sec = std::min(t[lc].min, t[rc].sec); } } void build(int id, int l, int r) { t[id].l = l, t[id].r = r, t[id].tag = -1; if(l == r) return (void) (t[id].min = 1, t[id].sec = 0x3f3f3f3f); int mid = (l + r) >> 1; build(id << 1, l, mid), build(id << 1 | 1, mid + 1, r); } void pushtag(int id, int val) { if(t[id].min >= val) return; t[id].min = t[id].tag = val; } void pushdown(int id) { if(t[id].tag == -1) return; pushtag(id << 1, t[id].tag), pushtag(id << 1 | 1, t[id].tag); t[id].tag = -1; } void setval(int id, int l, int r, int val) { if(t[id].min >= val) return; if(t[id].l > r || l > t[id].r) return; if(l <= t[id].l && t[id].r <= r && t[id].sec > val) return pushtag(id, val); pushdown(id), setval(id << 1, l, r, val); setval(id << 1 | 1, l, r, val), pushup(id); } int query(int id, int pos) { if(t[id].l > pos || t[id].r < pos) return 0x3f3f3f3f; if(t[id].l == t[id].r) return t[id].min; pushdown(id); return std::min(query(id << 1, pos), query(id << 1 | 1, pos)); }}T;void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}void dfs(int u, int fa){ dep[u] = dep[fa] + 1, ::fa[u] = fa; dfn[u] = ++dfscnt, pre[dfscnt] = u, size[u] = 1; for(int i = head[u]; i; i = e[i].next) { int v = e[i].to; if(v == fa) continue; dfs(v, u), size[u] += size[v]; }}char get(){ char ch = getchar(); while(!isalpha(ch)) ch = getchar(); return ch;}inline int read(){ int x = 0; char ch = getchar(); while(ch > '9' || ch < '0') ch = getchar(); while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}int main(){ n = read(), m = read(); for(int i = 1; i < n; i++) { int u = read(), v = read(); add_edge(u, v), add_edge(v, u); } dfs(1, 0), T.build(1, 1, n); while(m--) { char op = get(); int u = read(); if(op == 'Q') printf(\"%d\\n\", pre[T.query(1, dfn[u])]); else T.setval(1, dfn[u], dfn[u] + size[u] - 1, dfn[u]); } return 0;}","link":"/2020/04/14/洛谷4092/"},{"title":"洛谷4284 [SHOI2014]概率充电器","text":"设$f_u$表示$u$不被以$u$为根的子树内点(包括$u$)通上电的概率,则有: f_u=(1-p_u) \\times \\prod_{v \\in subtree \\; u}e(u, v) \\times f_v 为什么是这个式子呢? 我们发现,一个节点$u$不被通电当且仅当$u$不自己通电(废话)且$u$的子树内所有节点不能导电给$u$ 这个式子还有问题:他处理不了给$u$导电的点在$u$子树外的情况 因此我们可以采取一个换根的思路 设$g_u$表示$u$不通电的概率,则$g_u$可以通过$u$的子树和剩余部分计算,子树部分我们已经处理过,而剩余部分可以通过$u$的父亲计算,设$h_u$表示非$u$子树里的点导电给$u$的概率,我们有: h_u = \\frac{g_{fa}}{1 - e(fa, u) + e(fa, u) \\times f_u}通过$h$计算$g$ g_u=f_u \\times (1-e(fa, u) + e(fa, u) * h_u)原理与$f$相似 于是$ans=\\sum_{i=1}^n (1-g_i)$ 代码: 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) ((a + b) % mod)const int MaxN = 5e5 + 10;struct edge{ double d; int next, to;};edge e[MaxN << 1];int n, cnt;int head[MaxN];double p[MaxN], f[MaxN], g[MaxN], h[u];inline void add_edge(int u, int v, double d){ ++cnt; e[cnt].to = v; e[cnt].d = d; e[cnt].next = head[u]; head[u] = cnt;}inline void dfs(int u, int fa){ f[u] = 1 - p[u]; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs(v, u), f[u] *= (1 - e[i].d + (e[i].d * f[v])); }}inline void dfs1(int u, int fa, int id){ if (u == 1) g[u] = f[u]; else { h[u] = g[fa] / (1 - e[id].d + e[id].d * f[u]); g[u] = f[u] * (1 - e[id].d + e[id].d * h[u]); } for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs1(v, u, i); }}int main(){ scanf(\"%d\", &n); for (int i = 1; i < n; i++) { int u, v; double d; scanf(\"%d%d%lf\", &u, &v, &d), d *= 0.01; add_edge(u, v, d), add_edge(v, u, d); } for (int i = 1; i <= n; i++) scanf(\"%lf\", &p[i]), p[i] *= 0.01; dfs(1, 0), dfs1(1, 0, 0); double ans = 0; for (int i = 1; i <= n; i++) ans += 1.00 - g[i]; printf(\"%.6lf\", ans); return 0;}","link":"/2019/10/17/洛谷4284/"},{"title":"洛谷 P3950 部落冲突","text":"link-cut tree 板子题 这道题可以用来作为link-cut tree的练手题 C操作:把发生战争的俩部落的连边cut掉 U操作:把停战的俩部落link起来 Q操作:如果p部落和q部落在一棵树里(树根相同),就输出”Yes”,否则输出”No” 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138#include <bits/stdc++.h>using namespace std;const int MaxN = 300010;int n, m, val[MaxN], p[MaxN], q[MaxN], war;struct Link_Cut_Tree{ int top, ch[MaxN][2], fa[MaxN], sum[MaxN], q[MaxN], rev[MaxN]; inline void pushup(int x) { sum[x] = sum[ch[x][0]] ^ sum[ch[x][1]] ^ val[x]; } inline void pushdown(int x) { int l = ch[x][0], r = ch[x][1]; if (rev[x]) { rev[l] ^= 1; rev[r] ^= 1; rev[x] ^= 1; swap(ch[x][0], ch[x][1]); } } inline bool isroot(int x) { return ch[fa[x]][0] != x && ch[fa[x]][1] != x; } void rotate(int x) { int y = fa[x], z = fa[y], l, r; if (ch[y][0] == x) l = 0; else l = 1; r = l ^ 1; if (!isroot(y)) { if (ch[z][0] == y) ch[z][0] = x; else ch[z][1] = x; } fa[x] = z; fa[y] = x; fa[ch[x][r]] = y; ch[y][l] = ch[x][r], ch[x][r] = y; pushup(y), pushup(x); } void splay(int x) { top = 1; q[top] = x; for (int i = x; !isroot(i); i = fa[i]) q[++top] = fa[i]; for (int i = top; i; i--) pushdown(q[i]); while (!isroot(x)) { int y = fa[x], z = fa[y]; if (!isroot(y)) { if ((ch[y][0] == x) ^ (ch[z][0] == y)) rotate(x); else rotate(y); } rotate(x); } } void access(int x) { for (int t = 0; x; t = x, x = fa[x]) splay(x), ch[x][1] = t, pushup(x); } void makeroot(int x) { access(x); splay(x); rev[x] ^= 1; } int find(int x) { access(x); splay(x); while (ch[x][0]) x = ch[x][0]; return x; } void split(int x, int y) { makeroot(x); access(y); splay(y); } void cut(int x, int y) { makeroot(x); if (find(y) != x || fa[x] != y || ch[x][1]) return; fa[x] = ch[y][0] = 0; pushup(y); } void link(int x, int y) { makeroot(x); fa[x] = y; }} t;int main(){ scanf(\"%d%d\", &n, &m); for(int i = 1; i < n; i++) { int u, v; scanf(\"%d%d\", &u, &v); t.link(u, v); } for(int i = 1; i <= m; i++) { std::string op; std::cin >> op; if(op == \"Q\") { int x, y; scanf(\"%d%d\", &x, &y); int fx = t.find(x), fy = t.find(y); if(fx == fy) printf(\"Yes\\n\"); else printf(\"No\\n\"); } else if(op == \"C\") { ++war; scanf(\"%d%d\", &p[war], &q[war]); t.cut(p[war], q[war]); } else { int x; scanf(\"%d\", &x); t.link(p[x], q[x]); } } return 0;}","link":"/2019/02/07/洛谷3950/"},{"title":"洛谷4314 cpu监控","text":"首先我们可以想到一个显而易见的思路:每个节点维护$\\mathrm{add,set}$的$tag$,维护最大值$max$和历史最大值$Max$,然后像正常的线段树一样维护 然后你惊讶的发现你只拿到二十分(只有$Q$的部分分) 为什么呢?我们发现有些$tag$,他还没有来得及被更新就被覆盖了..而这些$tag$本来能改变世界更新答案 所以我们可以维护两个$tag$:$\\mathrm{Add,Set}$表示该节点从上次下放到目前的最大$add$和$set$值 然后我们就可以快乐的用这些$tag$来维护答案了 Code: 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) ((a + b) % mod)#define checkmax(a, b) ((a) = ((a) < (b)) ? (b) : (a))const int MaxN = 1e5 + 10;const int inf = 0x3f3f3f3f;struct node{ int l, r; int Max, Add, Set; int max, add, set;};int n, m, a[MaxN];struct SegmentTree{ node t[MaxN << 2]; inline void pushup(int id) { t[id].max = std::max(t[id << 1].max, t[id << 1 | 1].max); t[id].Max = std::max(t[id << 1].Max, t[id << 1 | 1].Max); } inline void build(int id, int l, int r) { t[id].l = l, t[id].r = r, t[id].set = t[id].Set = -inf; if (l == r) { t[id].max = t[id].Max = a[(l + r) >> 1]; return; } int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); pushup(id); } inline void checksum(int id, int add, int Add) { if (t[id].set != -inf) { checkmax(t[id].Set, t[id].set + Add); checkmax(t[id].Max, t[id].max + Add); t[id].set += add, t[id].max += add; } else { checkmax(t[id].Add, t[id].add + Add); checkmax(t[id].Max, t[id].max + Add); t[id].add += add, t[id].max += add; } } inline void checkset(int id, int set, int Set) { checkmax(t[id].Set, Set); checkmax(t[id].Max, Set); t[id].set = set, t[id].max = set; } inline void pushdown(int id) { checksum(id << 1, t[id].add, t[id].Add), checksum(id << 1 | 1, t[id].add, t[id].Add), t[id].add = t[id].Add = 0; if (t[id].set != -inf) { checkset(id << 1, t[id].set, t[id].Set), checkset(id << 1 | 1, t[id].set, t[id].Set); t[id].set = t[id].Set = -inf; } } void add(int id, int l, int r, int val) { if (t[id].l > r || t[id].r < l) return; if (l <= t[id].l && t[id].r <= r) { checksum(id, val, val); return; } pushdown(id), add(id << 1, l, r, val), add(id << 1 | 1, l, r, val), pushup(id); } void set(int id, int l, int r, int val) { if (t[id].l > r || t[id].r < l) return; if (l <= t[id].l && t[id].r <= r) { checkset(id, val, val); return; } pushdown(id), set(id << 1, l, r, val), set(id << 1 | 1, l, r, val), pushup(id); } int query_max(int id, int l, int r) { if (t[id].l > r || t[id].r < l) return -inf; if (l <= t[id].l && t[id].r <= r) return t[id].max; pushdown(id); return std::max(query_max(id << 1, l, r), query_max(id << 1 | 1, l, r)); } int query_Max(int id, int l, int r) { if (t[id].l > r || t[id].r < l) return -inf; if (l <= t[id].l && t[id].r <= r) return t[id].Max; pushdown(id); return std::max(query_Max(id << 1, l, r), query_Max(id << 1 | 1, l, r)); }} T;char get(){ char ch = getchar(); while (ch > 'Z' || ch < 'A') ch = getchar(); return ch;}inline int read(){ int x = 0, f = 1; char ch = getchar(); while (ch > '9' || ch < '0') { if (ch == '-') f = 0; ch = getchar(); } while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return f ? x : (-x);}int main(){ scanf(\"%d\", &n); for (int i = 1; i <= n; i++) scanf(\"%d\", &a[i]); T.build(1, 1, n); scanf(\"%d\", &m); while (m--) { char ch = get(); int x = read(), y = read(), z; if (ch == 'Q') printf(\"%d\\n\", T.query_max(1, x, y)); else if (ch == 'A') printf(\"%d\\n\", T.query_Max(1, x, y)); else if (ch == 'P') z = read(), T.add(1, x, y, z); else z = read(), T.set(1, x, y, z); } return 0;}","link":"/2019/10/07/洛谷4314/"},{"title":"洛谷 P5018 【对称二叉树】","text":"本题考察选手对DFS及树结构的掌握程度 首先,你把数据读入之后,先用一个大法师把以每个节点为根的子树的大小和权值都预处理出来,方便待会剪枝 然后,你对以每个节点为根的子树,都判断一下以下条件(这时刚才处理的东西就有用了) ① 左子树和右子树的节点数是否相等 ② 左子树和右子树的权值是否相等 ③ 以当前节点为根的子树大小是不是超过答案 第三个很重要,不加(洛谷 数据)最后一个点会TLE 有一个显而易见的剪枝:因为答案至少是1,所以大小为1的子树就不用check了,不然浪费常数 然后就是暴力判了 递归下去,建立两个队列,保存当前处理到的左子树上和右子树上的节点,判左子树当前节点的左儿子和右子树当前节点的右儿子权值是否相等,右子树当前节点的左儿子和左子树当前节点的右儿子权值是否相等(注意对应) 还有判下对应的节点有没有一个是空的一个没空的情况 如果不相等就返回 相等的话就扔进队列(注意对应顺序!) 注意:上述处理一定要左右子树一起做,不能先处理一边,再处理另一边,不然会WA 到最后如果都可以的话就return true 附考场代码 不得不说,为了能过,我加了一堆卡常 3e6的输入规模应该还是要快读的吧 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117# include <bits/stdc++.h># define R register const int MaxN = 1000010;struct node//节点{ int val; int l, r;};node a[MaxN];int f[MaxN], val[MaxN], ind[MaxN];//f[i]表示以i为根的子树大小,val表示以i为根的子树权值和,ind没啥用inline void read(int &x)//快读{ x = 0; bool op = 1; char ch = getchar(); while(ch > '9' || ch < '0') { if(ch == '-') op = 0; ch = getchar(); } while(ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch - '0'), ch = getchar(); if(!op) x = -x;}void dfs(int root){ if(root == -1) return; if(a[root].l == -1 && a[root].r == -1) f[root] = 1, val[root] = a[root].val; else { dfs(a[root].l); dfs(a[root].r); f[root] = f[a[root].l] + f[a[root].r] + 1; val[root] = val[a[root].l] + val[a[root].r] + a[root].val; }}inline int check(int x){ std::queue<int> l, r; l.push(x), r.push(x); while(!l.empty() || !r.empty()) { if(l.empty() || r.empty()) return false;//一边空了,一边没空 R int lx = l.front(), rx = r.front(); l.pop(), r.pop(); if(a[lx].val != a[rx].val) return false; R int lson[3], rson[3]; lson[1] = a[lx].l, lson[2] = a[lx].r;//左子树当前节点的左儿子,左子树当前节点的右儿子 rson[1] = a[rx].l, rson[2] = a[rx].r;//右子树当前节点的左儿子,右子树当前节点的右儿子 if((lson[1] == -1 && rson[2] != -1) || (lson[1] != -1 && rson[2] == -1)) return false;//一边空了,一边没空 if((lson[2] == -1 && rson[1] != -1) || (lson[2] != -1 && rson[1] == -1)) return false;//一边空了,一边没空 if(lson[1] != -1) l.push(lson[1]); if(lson[2] != -1) l.push(lson[2]); if(rson[2] != -1) r.push(rson[2]); if(rson[1] != -1) r.push(rson[1]); //推进队列 } return true;}int main(){// freopen(\"tree.in\", \"r\", stdin);// freopen(\"tree.out\", \"w\", stdout); R int n; scanf(\"%d\", &n); for(R unsigned i = 1; i <= n; ++i) read(a[i].val); for(R unsigned i = 1; i <= n; ++i) read(a[i].l), read(a[i].r), ++ind[a[i].l], ++ind[a[i].r];//处理入度 R unsigned root; for(R unsigned i = 1; i <= n; ++i) { if(!ind[i]) { root = i; break; } }//找树根 dfs(root);//预处理 int ans = 1; for(R unsigned i = 1; i <= n; ++i)//枚举子树 { if(f[a[i].l] != f[a[i].r]) continue;//剪枝1 if(val[a[i].l] != val[a[i].r]) continue;//剪枝2 if(f[i] < ans || f[i] == 1) continue;//剪枝3 if(check(i)) ans = f[i];//更新答案 } printf(\"%d\", ans); fclose(stdin); fclose(stdout); return 0;}","link":"/2019/02/06/洛谷5018/"},{"title":"洛谷4211 [LNOI2014]LCA","text":"可以发现题目可以转化为把从$l$到$r$节点到$1$的路径上的点的点权都加上$1$,然后统计$1$到$z$路径上的点权 然后发现这个东西可以差分。。。 于是我们就把询问拆成$l-1$和$r$,然后按$r$排序 从$1$到$n$把$1$到$i$路径点权全部$+1$ 询问时查询$1$到$z$路径点权和 然后就做完了。。。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181# include <bits/stdc++.h>const int mod = 201314;const int MaxN = 100010;struct edge{ int next, to;};struct node{ int l, r; int sum, tag;};struct query{ int r, z, id;};edge e[MaxN << 1];query q[MaxN << 1];int n, m, cnt, dfsnum;int hson[MaxN], fa[MaxN], dfn[MaxN], ans[MaxN];int head[MaxN], size[MaxN], dep[MaxN], top[MaxN];struct SegmentTree{ node t[MaxN << 2]; inline void pushup(int id){t[id].sum = t[id << 1].sum + t[id << 1 | 1].sum;} inline void build(int id, int l, int r) { t[id].l = l, t[id].r = r; if(l == r) return; int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); } inline void pushdown(int id) { if(t[id].tag) { t[id << 1].sum += t[id].tag * (t[id << 1].r - t[id << 1].l + 1); t[id << 1 | 1].sum += t[id].tag * (t[id << 1 | 1].r - t[id << 1 | 1].l + 1); t[id << 1].tag += t[id].tag; t[id << 1 | 1].tag += t[id].tag; t[id].tag = 0; } } inline void modify(int id, int l, int r, int val) { if(t[id].l > r || t[id].r < l) return; if(l <= t[id].l && t[id].r <= r) { t[id].sum += val * (t[id].r - t[id].l + 1); t[id].tag += val; return; } pushdown(id); modify(id << 1, l, r, val); modify(id << 1 | 1, l, r, val); pushup(id); } inline int query(int id, int l, int r) { if(t[id].l > r || t[id].r < l) return 0; if(l <= t[id].l && t[id].r <= r) return t[id].sum; pushdown(id); return query(id << 1, l, r) + query(id << 1 | 1, l, r); }}T;inline int cmp(query a, query b){ return a.r < b.r;}inline void add_edge(int u, int v){ ++cnt; e[cnt].to = v; e[cnt].next = head[u]; head[u] = cnt;}inline void dfs1(int u, int f){ size[u] = 1; for(int i = head[u]; i; i = e[i].next) { int v = e[i].to; if(v == f) continue; dep[v] = dep[u] + 1; fa[v] = u; dfs1(v, u); size[u] += size[v]; if(size[v] > size[hson[u]]) hson[u] = v; }}inline void dfs2(int u, int Top){ ++dfsnum; dfn[u] = dfsnum; top[u] = Top; if(hson[u]) dfs2(hson[u], Top); for(int i = head[u]; i; i = e[i].next) { int v = e[i].to; if(v == fa[u] || v == hson[u]) continue; dfs2(v, v); }}inline void update_chain(int u, int v){ while(top[u] != top[v]) { if(dep[top[u]] < dep[top[v]]) std::swap(u, v); T.modify(1, dfn[top[u]], dfn[u], 1); u = fa[top[u]]; } if(dep[u] < dep[v]) std::swap(u, v); T.modify(1, dfn[v], dfn[u], 1);}inline int query_chain(int u, int v){ int ans = 0; while(top[u] != top[v]) { if(dep[top[u]] < dep[top[v]]) std::swap(u, v); ans += T.query(1, dfn[top[u]], dfn[u]); u = fa[top[u]]; } if(dep[u] < dep[v]) std::swap(u, v); ans += T.query(1, dfn[v], dfn[u]); return ans;}int main(){ scanf(\"%d%d\", &n, &m); for(int i = 2; i <= n; i++) { int u; scanf(\"%d\", &u); ++u; add_edge(i, u); add_edge(u, i); } dep[1] = 1; dfs1(1, 0), dfs2(1, 1); T.build(1, 1, n); for(int i = 1; i <= m; i++) { int l, r, z; scanf(\"%d%d%d\", &l, &r, &z); l++, r++, z++; q[i * 2 - 1] = (query){l - 1, z, i * 2 - 1}; q[i * 2] = (query){r, z, i * 2}; } int now = 1; std::sort(q + 1, q + 2 * m + 1, cmp); for(int i = 1; i <= n; i++) { update_chain(1, i); while(q[now].r < i) ++now; while(q[now].r == i) { ans[q[now].id] = query_chain(1, q[now].z); ++now; } } for(int i = 1; i <= m; i++) printf(\"%d\\n\", (ans[i * 2] - ans[i * 2 - 1]) % mod); return 0; }","link":"/2019/03/14/洛谷4211/"},{"title":"洛谷 P4867 【Gty的二逼妹子序列】","text":"莫队好题 这种题一看直接莫队啊 但是我们要想想怎么修改 一开始我想树状数组,但是我不会写o((⊙﹏⊙))o 后来看了一下Solution,发现可以将值域分块,这样就可以做到查询$O(\\sqrt n)$,修改$O(1)$了 总复杂度$O(m \\sqrt n)$ 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697#include <bits/stdc++.h>#define getindex(x) ((x - 1) * block + 1)#define getpos(x) ((x - 1) / block + 1)const int MaxN = 1e5 + 10, MaxM = 1e6 + 10;struct query{ int id, pos; int l, r, a, b;};query q[MaxM];int n, m, size, block;int a[MaxN], ans[MaxM], cnt[MaxN], sum[MaxN];inline int cmp(query a, query b){ if (a.pos != b.pos) return a.pos < b.pos; return a.r < b.r;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void ins(int x){ ++cnt[a[x]]; if (cnt[a[x]] == 1) ++sum[getpos(a[x])];}inline void del(int x){ --cnt[a[x]]; if (cnt[a[x]] == 0) --sum[getpos(a[x])];}inline int ask(int a, int b, int l, int r){ int ans = 0, Posl = getpos(l), Posr = getpos(r); for (int i = Posl + 1; i < Posr; i++) ans += sum[i]; if (Posl == Posr) { for (int i = l; i <= r; i++) if (cnt[i]) ++ans; } else { int L = getindex(Posr), R = getindex(Posl + 1) - 1; for (int i = l; i <= R; i++) if (cnt[i]) ++ans; for (int i = L; i <= r; i++) if (cnt[i]) ++ans; } return ans;}inline void solve(){ int l = 1, r = 0; for (int i = 1; i <= m; i++) { while (l > q[i].l) l--, ins(l); while (r < q[i].r) r++, ins(r); while (l < q[i].l) del(l), l++; while (r > q[i].r) del(r), r--; ans[q[i].id] = ask(q[i].l, q[i].r, q[i].a, q[i].b); }}int main(){ n = read(), m = read(); size = pow(n, 0.55), block = sqrt(n); for (int i = 1; i <= n; ++i) a[i] = read(); for (int i = 1; i <= m; ++i) { q[i].l = read(), q[i].r = read(); q[i].a = read(), q[i].b = read(); q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q + 1, q + m + 1, cmp); solve(); for (int i = 1; i <= m; i++) printf(\"%d\\n\", ans[i]); return 0;}","link":"/2019/02/06/洛谷4867/"},{"title":"洛谷5046 [Ynoi2019 模拟赛] Yuno loves sqrt technology I","text":"提供一种理论复杂度正确($O(n\\sqrt n)$)的做法 我们维护以下几个东西: pre[i]:$i$到它的块首这段序列的逆序对数量 suf[i]:$i$到它的块尾这段序列的逆序对数量 cnt[i][j]:前$i$个块中小于$j$的数的个数 f[i][j]:第$i$个块到第$j$块的逆序对个数 v[i] 第$i$个块中数从小到大排序的结果 pre和suf可以用树状数组在$O(n \\log n)$时间内求出 cnt可以用两次前缀和在$O(n \\sqrt n)$时间内求出 f可以用以下方法求出:$f_{i,j}=f_{i+1,j}+f_{i,j+1}-f_{i+1,j-1}+(i,j)$这两块之间产生的贡献 由于我们已经维护好了v,于是我们可以用归并排序在$O(\\sqrt n)$时间内求出$(i,j)$这两块之间产生的贡献,总复杂度$O(n \\sqrt n)$ (下面用$[l,r] \\times [L,R]$表示$[l,r]$与$[L,R]$产生的贡献) 接下来我们考虑询问,设询问区间为$[l,r]$,我们分三种情况考虑: 1.$[l,r]$在一个块内 设$R$表示$l,r$块的块尾 由于我们已经维护好了pre,于是答案可以表示成$pre[l]-pre[r+1]-[l,r] \\times [r+1,R]$ 2.$[l,r]$在相邻两个块内 设$R$表示$l$块的块尾,$L$表示$r$块的块首 那么答案可以表示成$pre[l] + suf[r] + [l,R] \\times [L, r]$ 3.$[l,r]$横跨至少$3$个块 设$R$表示$l$块的块尾,$L$表示$r$块的块首 那么[l,r]的贡献可以拆分成$[l,R],[R+1,L-1],[L,r]$三个块两两之间的贡献,由于$[R+1,L-1]$是整块,于是我们的贡献就非常好求,它等于 pre[l]+suf[r]+f[id(R+1)][id(L-1)]+[l,R]\\times[R+1,L-1]+[R+1,L-1]\\times[L,r]+[l,R]\\times[L,r]其中$id(i)$表示$i$所属的块编号 于是主体部分就写完了,有没人教教怎么卡常啊/kel 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194#include <bits/stdc++.h>#include<sys/mman.h>#define ll long long#define pair std::pair<int, int>#define mp(i, j) std::make_pair(i, j)#define getl(i) ((i - 1) * siz + 1)#define getr(i) (std::min(n, i * siz))#define id(x) (((x - 1) / siz) + 1)#define meow(cat...) fprintf(stderr, cat)const int MaxB = 7e2 + 10;const int MaxN = 1e5 + 10;pair v[MaxB][MaxB];ll ans, f[MaxB][MaxB];int len[MaxB], pre[MaxN], cnt[MaxB][MaxN], suf[MaxN];int n, m, siz, num, a[MaxN], Id[MaxN], bl[MaxN], br[MaxN];struct BIT{ int c[MaxN]; int lowbit(int x) { return x & (-x); } void add(int x, int val) { while (x <= n) c[x] += val, x += lowbit(x); } int query(int x) { int ret = 0; while (x) ret += c[x], x -= lowbit(x); return ret; }} T;const int size = 1 << 22;char out[size], *p3 = out - 1;#define pc(x) *++p3=xvoid print(ll x){ if(x > 9)print(x / 10); pc(x % 10 + 48);}char *s;inline int read(){ register int u = 0; while(*s < 48) s++; while(*s > 32) u = u * 10 + *s++ -48; return u;}int ta[MaxB], tb[MaxB];ll query(int l1, int r1, int l2, int r2){ int lena = 0, lenb = 0; int L = Id[l1], R = Id[l2]; for (int i = 1; i <= len[L]; i++) { pair x = v[L][i]; if (x.second >= l1 && x.second <= r1) ta[++lena] = x.first; } for (int i = 1; i <= len[R]; i++) { pair x = v[R][i]; if (x.second >= l2 && x.second <= r2) tb[++lenb] = x.first; } ll ans = 0; int A = 1, B = 1; while (A <= lena && B <= lenb) { if (A <= lena) { if (ta[A] < tb[B] || B > lenb) ++A; else ++B, ans += lena - A + 1; } else ++B; } return ans;};signed main(){ // freopen(\"sqrt.in\", \"r\", stdin); // freopen(\"sqrt.out\", \"w\", stdout); s = (char*)mmap(0, 900 << 20, PROT_READ, MAP_PRIVATE, fileno(stdin), 0); n = read(), m = read(); siz = 160, num = id(n); for (int i = 1; i <= n; i++) a[i] = read(), Id[i] = id(i); for (int i = 1; i <= num; i++) { bl[i] = getl(i), br[i] = getr(i); len[i] = br[i] - bl[i] + 1; } for (int i = 1; i <= n; i++) v[Id[i]][i - bl[Id[i]] + 1] = mp(a[i], i); for (int i = 1; i <= num; i++) std::sort(v[i] + 1, v[i] + len[i] + 1); for (int i = 1; i <= num; i++) { int l = bl[i], r = br[i]; for (int j = l; j <= r; ++j) cnt[i][a[j]]++; for (int j = 1; j <= n; ++j) cnt[i][j] += cnt[i - 1][j]; } for (int i = 1; i <= num; i++) for (int j = 1; j <= n; ++j) cnt[i][j] += cnt[i][j - 1]; for (int i = 1; i <= num; i++) { int l = bl[i], r = br[i], x = 0, y; for (int j = l; j <= r; ++j) { y = T.query(n) - T.query(a[j]); x += y, pre[j] = x, T.add(a[j], 1); } f[i][i] = x; for (int j = l; j <= r; ++j) { y = T.query(a[j] - 1), suf[j] = x; x -= y, T.add(a[j], -1); } } // for(ll i = 1; i <= n; i++) // meow(\"%d %d %d\\n\", i, pre[i], suf[i]); for (int len = 1; len < num; len++) { int x = num - len + 1; for (int i = 1; i <= x; i++) { int j = i + len; f[i][j] = f[i + 1][j] + f[i][j - 1]; if (i + 1 <= j - 1) f[i][j] -= f[i + 1][j - 1]; f[i][j] += query(bl[i], br[i], bl[j], br[j]); } } // meow(\"xzakioi!\\n\"); for (int i = 1; i <= m; i++) { ll now = 0; int l = read() ^ ans, r = read() ^ ans; int L = Id[l], R = Id[r], Ll, rr, LL, RR; if (L == R) { rr = br[R]; if (r == rr) now = suf[l]; else { now = suf[l] - suf[r + 1]; now -= query(l, r, r + 1, rr); } } else if (R == L + 1) { Ll = bl[R], rr = br[L]; now = suf[l] + pre[r]; now += query(l, rr, Ll, r); } else { LL = bl[R], RR = br[L], rr = br[R - 1]; now = suf[l] + f[L + 1][R - 1]; now += query(l, RR, LL, r) + pre[r]; for (int i = l; i <= RR; i++) now += cnt[R - 1][a[i]] - cnt[L][a[i]]; for (int i = LL; i <= r; i++) { now += cnt[R - 1][n] - cnt[R - 1][a[i]]; now -= cnt[L][n] - cnt[L][a[i]]; } } ans = now, print(ans), pc('\\n'); } meow(\"used %d ms\\n\", clock()); fwrite(out, 1, p3 - out + 1, stdout); return 0;}","link":"/2021/01/18/洛谷5046/"},{"title":"洛谷7457 [CERC2018] The Bridge on the River Kawaii","text":"题目大意有一个 $n$ 个点的图,有 $q$ 个操作,每个操作形如: $ \\texttt{0 x y v:}$ 在 $x,y$ 间添加一条权值为 $v$ 的边。 $ \\texttt{1 x y:}$ 删除 $x,y$ 之间的边,保证存在。 $ \\texttt{2 x y:}$ 询问 $x,y$ 所有路径最大权值的最小值。 $ 1 \\leq n, q \\leq 2 \\times 10 ^ 5, 1 \\leq v \\leq 10$ 题目分析我们先忽略这个 $v$ ,看看我们得到了什么:加入一条边,删除一条边,询问两点是否联通。 这让我们联想到什么?没错,动态图连通性! 再一看,这个 $v$ 范围非常的小,于是我们考虑枚举 $v$ 的最大值,每次把权值不超过 $v$ 的边加入图中,询问两点之间是否联通。 这可以很轻松的使用线段树分治解决,时间复杂度 $ \\Theta (vn \\log^2 n) $ 。 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145#include <bits/stdc++.h>#define R register#define ll long long#define pair std::pair<int, int>#define mp(i, j) std::make_pair(i, j)#define meow(cat...) fprintf(stderr, cat)#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 2e5 + 10;const int MaxM = 5e5 + 10;struct Modify{ int x, y;} p[MaxM];struct Operation{ int op, x, y, v, id;} op[MaxM];struct Query{ int x, y, t, id;} q[MaxM], lq[MaxM], rq[MaxM];std::vector<int> v[MaxM << 2];int cnt, ans[MaxM], Ans[MaxM]; pair st[MaxM];std::unordered_map<int, int> tim[MaxN];int n, m, top, num, maxv, fa[MaxN], rk[MaxN];int getf(int x){ if (x == fa[x]) return fa[x]; return getf(fa[x]);}void del(int cur){ while (top > cur) { pair pre = st[top--]; fa[pre.first] = pre.first; rk[pre.first] = pre.second; }}void merge(int x, int y){ x = getf(x), y = getf(y); if (x == y) return; if (rk[x] < rk[y]) std::swap(x, y); fa[y] = x, st[++top] = mp(y, rk[y]); if (rk[x] == rk[y]) st[++top] = mp(x, ++rk[x]);}void modify(int id, int l, int r, int ql, int qr, int pos){ if (ql <= l && r <= qr) return (void)v[id].push_back(pos); int mid = (l + r) >> 1; if (ql <= mid) modify(id << 1, l, mid, ql, qr, pos); if (qr > mid) modify(id << 1 | 1, mid + 1, r, ql, qr, pos);}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}void solve(int id, int l, int r, int st, int ed){ if(st > ed) return; int cur = top; for(int i = 0; i < v[id].size(); i++) merge(p[v[id][i]].x, p[v[id][i]].y); if(l == r) { for(int i = st; i <= ed; i++) ans[q[i].id] = (getf(q[i].x) == getf(q[i].y)); return; } int mid = (l + r) >> 1, lt = 0, rt = 0; for(int i = st; i <= ed; i++) if(q[i].t <= mid) lq[++lt] = q[i]; else rq[++rt] = q[i]; for(int i = 1; i <= lt; i++) q[st + i - 1] = lq[i]; for(int i = 1; i <= rt; i++) q[st + i + lt - 1] = rq[i]; solve(id << 1, l, mid, st, st + lt - 1); solve(id << 1 | 1, mid + 1, r, st + lt, ed), del(cur);}int main(){ n = read(), m = read(), memset(Ans, -1, sizeof(Ans)); for(int i = 1; i <= m; i++) { op[i].op = read(), op[i].x = read() + 1, op[i].y = read() + 1, op[i].id = i; if(op[i].op == 0) op[i].v = read(), maxv = std::max(maxv, op[i].v); if(op[i].x > op[i].y) std::swap(op[i].x, op[i].y); } for(int V = 0; V <= maxv; V++) { for(int i = 1; i <= n; i++) tim[i].clear(); cnt = num = top = 0; for(int i = 1; i <= n; i++) fa[i] = i, rk[i] = 1; for(int i = 1; i <= m * 4; i++) v[i].clear(); for(int i = 1; i <= m; i++) { int x = op[i].x, y = op[i].y; if(op[i].op == 0 && op[i].v <= V) tim[x][y] = i; else if (op[i].op == 1 && tim[x][y]) { modify(1, 1, m, tim[x][y], i, ++cnt); p[cnt] = (Modify) {x, y}, tim[x][y] = 0; } else if(op[i].op == 2) q[++num] = (Query) {x, y, i, num}; } for(int i = 1; i <= n; i++) for(auto x : tim[i]) if(x.second) { modify(1, 1, m, x.second, m, ++cnt); p[cnt] = (Modify) {i, x.first}; } solve(1, 1, m, 1, num); for(int i = 1; i <= num; i++) { if(ans[i] && Ans[i] == -1) Ans[i] = V; ans[i] = 0; } } for(int i = 1; i <= num; i++) printf(\"%d\\n\", Ans[i]); return 0;}","link":"/2021/10/07/洛谷7457/"},{"title":"洛谷4768 [NOI2018] 归程","text":"题目大意有一个$n$个点$m$条边的无向联通图, 每条边有两个属性:长度$d$,海拔$h$ 有$q$个询问,每个询问给定两个数$v$, $p$,你要找到一个节点$u$,其中$u$要满足$v$到$u$存在一条路径使得这条路径上的边海拔全部大于$p$,求所有可能的$u$到$1$的最短路长度的最小值 分析显然,我们发现$v$到$u$的路径一定在$u$到$v$的最大生成树上。(例:货车运输) 把边按照海拔降序排列,建出该图的$\\texttt{kruskal}$重构树,则对于一个节点$s$, 若$s$的点权$val \\leq p$则该子树里的所有节点都互相连通(即能开车抵达)。 我们通过$\\texttt{dijkstra}$预处理出每个点到$1$的最短路$dis_i$, 并在建出$\\texttt{kruskal}$重构树之后在重构树上$\\texttt{dfs}$求出每个节点的子树里$dis$的最小值$mind_i$。询问时只要找到$v$最大的点权$\\leq p$的祖先$x$,则$mind_x$就是本题的答案。 $x$的寻找可以使用树上倍增算法,(在满足条件的情况下)逐级往上跳 时间复杂度$O(m \\log m)$ 一些额外的东西这里补充讲一下$\\texttt{kruskal}$重构树是怎么建出来的: 1.像正常的$\\texttt{kruskal}$重构树那样把所有边按照边权降序/升序排序 2.在合并一条边的两个端点$u, \\; v$时,我们不像原来那样把$v$联通块的根节点$fv$设为$fu$,而是新建一个节点$new$并把$new$设为$fu, \\; fv$的父亲,并在图中连上$(new, \\; fu)$和$(new, \\; fv)$两条边,此时该新点的点权就是$u$, $v$最大/小生成树路径上最小/大值 3.重复步骤$2$直到所有边都被遍历一遍 代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173#include <bits/stdc++.h>#define R register#define ll long long#define sum(a, b, mod) (((a) + (b)) % mod)const int MaxN = 8e5 + 10;const int MaxM = 1e6 + 10;struct edge{ int next, to, dis;};struct Edge{ int u, v, ht;};struct node{ int pos, dis; bool operator<(node x) const { return dis > x.dis; }};edge e[MaxM];Edge t[MaxN];int n, m, q, k, s, cnt, num;int head[MaxM], dep[MaxM], f[MaxM], val[MaxM], mind[MaxN];int u[MaxN], v[MaxN], l[MaxN], a[MaxN], dis[MaxN], vis[MaxN], fa[MaxN][21];int cmp(Edge a, Edge b) { return a.ht > b.ht; }void link(int u, int v, int a) { ++cnt, t[cnt].u = u, t[cnt].v = v, t[cnt].ht = a; }int getf(int x){ if (x != f[x]) f[x] = getf(f[x]); return f[x];}void rebuild(){ cnt = 0; for (int i = 1; i <= m; i++) link(u[i], v[i], a[i]);}int jump(int u, int k){ for (int i = 20; ~i; i--) if (val[fa[u][i]] > k) u = fa[u][i]; return u;}void add_edge(int u, int v, int d){ ++cnt; e[cnt].to = v; e[cnt].dis = d; e[cnt].next = head[u]; head[u] = cnt;}void init(){ n = m = cnt = num = 0; memset(f, 0, sizeof(fa)); memset(fa, 0, sizeof(fa)); memset(dep, 0, sizeof(dep)); memset(vis, 0, sizeof(vis)); memset(val, 0, sizeof(val)); memset(head, 0, sizeof(head)); memset(mind, 0x3f, sizeof(mind));}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}void dfs(int u, int fa){ dep[u] = dep[fa] + 1, ::fa[u][0] = fa; for (int i = 1; i <= 20; i++) ::fa[u][i] = ::fa[::fa[u][i - 1]][i - 1]; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa) continue; dfs(v, u), mind[u] = std::min(mind[u], mind[v]); }}void kruskal(){ num = n, cnt = 0; memset(head, 0, sizeof(head)); std::sort(t + 1, t + m + 1, cmp); for (int i = 1; i <= n; i++) f[i] = i; for (int i = 1; i <= m; i++) { int fu = getf(t[i].u), fv = getf(t[i].v); if (fu != fv) { val[++num] = t[i].ht; f[num] = f[fu] = f[fv] = num; add_edge(fu, num, 0), add_edge(num, fu, 0); add_edge(fv, num, 0), add_edge(num, fv, 0); } } dfs(num, 0);}void dijkstra(int u){ std::priority_queue<node> q; memset(dis, 0x3f, sizeof(dis)); dis[u] = 0, q.push((node){u, 0}); while (!q.empty()) { node x = q.top(); q.pop(), u = x.pos; if (vis[u]) continue; vis[u] = 1; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (dis[u] + e[i].dis < dis[v]) { dis[v] = dis[u] + e[i].dis; if (!vis[v]) q.push((node){v, dis[v]}); } } } for (int i = 1; i <= n; i++) mind[i] = dis[i];}int main(){ int T = read(); while (T--) { int lastans = 0; init(), n = read(), m = read(); for (int i = 1; i <= m; i++) { u[i] = read(), v[i] = read(), l[i] = read(), a[i] = read(); add_edge(u[i], v[i], l[i]), add_edge(v[i], u[i], l[i]); } dijkstra(1), rebuild(), kruskal(); q = read(), k = read(), s = read(); while (q--) { int v = (read() + k * lastans - 1) % n + 1, p = (read() + k * lastans) % (s + 1); lastans = mind[jump(v, p)], printf(\"%d\\n\", lastans); } } return 0;}","link":"/2020/02/26/洛谷4768/"},{"title":"洛谷4114 QTree1","text":"很明显这是一道树剖题 但是,树剖是在点上进行的操作,如何把它转化到边上呢? 不难发现,每一个点与他的父亲节点之间仅有唯一的一条边 于是我们可以把这条边的边权转化为这个儿子节点的点权。 然后还有一点要注意 查询时,我们是不能查询到$(u, v)$的LCA的因为$LCA$的点权是$LCA与$fa[LCA]4之间的边权 而我们并没有统计这鬼东西 怎么办呢? 注意到当$top[u] = top[v]$时,$v$就是$u$的$LCA$ 所以我们此时查询$(dfn[v+1], dfn[u])$即可 代码: 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159#include <bits/stdc++.h>const int MaxN = 500010;struct edge{ int next, to, dis;};struct node{ int max; int l, r;};edge e[MaxN << 1];int n, cnt, dfsnum;int a[MaxN], head[MaxN], dep[MaxN], fa[MaxN], size[MaxN];int hson[MaxN], dfn[MaxN], top[MaxN], from[MaxN], to[MaxN], pre[MaxN];struct SegmentTree{ node t[MaxN << 2]; inline void pushup(int id) { t[id].max = std::max(t[id << 1].max, t[id << 1 | 1].max); } inline void build(int id, int l, int r) { t[id].l = l, t[id].r = r; if (l == r) { t[id].max = a[pre[l]]; return; } int mid = (l + r) >> 1; build(id << 1, l, mid); build(id << 1 | 1, mid + 1, r); pushup(id); } inline void modify(int id, int l, int r, int val) { if (t[id].l > r || t[id].r < l) return; if (l <= t[id].l && t[id].r <= r) { t[id].max = val; return; } modify(id << 1, l, r, val); modify(id << 1 | 1, l, r, val); pushup(id); } inline int query(int id, int l, int r) { if (l > t[id].r || r < t[id].l) return 0; if (l <= t[id].l && t[id].r <= r) return t[id].max; return std::max(query(id << 1, l, r), query(id << 1 | 1, l, r)); }} T;inline void add_edge(int u, int v, int d){ ++cnt; e[cnt].to = v; e[cnt].dis = d; e[cnt].next = head[u]; head[u] = cnt;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void dfs1(int u, int f){ size[u] = 1; for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == f) continue; fa[v] = u; dep[v] = dep[u] + 1; a[v] = e[i].dis; dfs1(v, u); size[u] += size[v]; if (size[hson[u]] < size[v]) hson[u] = v; }}inline void dfs2(int u, int Top){ ++dfsnum; dfn[u] = dfsnum; pre[dfsnum] = u; top[u] = Top; if (hson[u]) dfs2(hson[u], Top); for (int i = head[u]; i; i = e[i].next) { int v = e[i].to; if (v == fa[u] || v == hson[u]) continue; dfs2(v, v); }}inline void modify(int pos, int val) { T.modify(1, dfn[pos], dfn[pos], val); }inline int query_chain(int u, int v){ int ans = 0; if (dfn[u] < dfn[v]) std::swap(u, v); while (top[u] != top[v]) { if (dfn[u] < dfn[v]) std::swap(u, v); ans = std::max(ans, T.query(1, dfn[top[u]], dfn[u])); u = fa[top[u]]; } if (dfn[u] < dfn[v]) std::swap(u, v); ans = std::max(ans, T.query(1, dfn[v] + 1, dfn[u])); return ans;}int main(){ n = read(); for (int i = 1; i < n; i++) { int u, v, d; scanf(\"%d%d%d\", &u, &v, &d); from[i] = u; to[i] = v; add_edge(u, v, d); add_edge(v, u, d); } dep[1] = 1, fa[1] = 0; dfs1(1, 0), dfs2(1, 1); T.build(1, 1, n); std::string op; std::cin >> op; while (op != \"DONE\") { if (op == \"CHANGE\") { int x = read(), val = read(); int u = from[x], v = to[x]; if (fa[v] == u) std::swap(u, v); modify(u, val); } else { int a = read(), b = read(); printf(\"%d\\n\", query_chain(a, b)); } std::cin >> op; } return 0;}","link":"/2019/03/10/洛谷4114/"},{"title":"洛谷 P4396 [AHOI2013]作业","text":"思路:莫队+分块 这题其实跟Gty的二逼妹子序列非常像 把那题代码改改就行了 首先区间问题,可以离线,马上想到莫队 然后发现不会修改?怎么办? (好像可以树状数组做,可是我不会o((⊙﹏⊙))o 我们可以把值域分块,这样就可以做到每次查询$O(\\sqrt n)$,修改$O(1)$了 总复杂度$O(m \\sqrt n)$ 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107#include <bits/stdc++.h>#define getpos(x) ((x - 1) / block + 1)#define getblock(x) ((x - 1) * block + 1)const int MaxN = 100010;struct query{ int id, pos; int l, r, a, b;};query q[MaxN];int n, m, size, block;int a[MaxN], cnt[MaxN], sum[MaxN][3], ans[MaxN][3];inline int cmp(query a, query b){ if (a.pos != b.pos) return a.pos < b.pos; return a.r < b.r;}inline int read(){ int x = 0; char ch = getchar(); while (ch > '9' || ch < '0') ch = getchar(); while (ch <= '9' && ch >= '0') x = (x << 1) + (x << 3) + (ch ^ 48), ch = getchar(); return x;}inline void add(int x){ ++cnt[a[x]]; ++sum[getpos(a[x])][1]; if (cnt[a[x]] == 1) ++sum[getpos(a[x])][2];}inline void del(int x){ --cnt[a[x]]; --sum[getpos(a[x])][1]; if (cnt[a[x]] == 0) --sum[getpos(a[x])][2];}inline void ask(int x){ int id = q[x].id, l = q[x].a, r = q[x].b, Posl = getpos(q[x].a), Posr = getpos(q[x].b); for (int i = Posl + 1; i < Posr; i++) ans[id][1] += sum[i][1], ans[id][2] += sum[i][2]; if (Posl == Posr) { for (int i = l; i <= r; i++) { ans[id][1] += cnt[i]; if (cnt[i]) ans[id][2]++; } } else { int L = getblock(Posr), R = getblock(Posl + 1) - 1; for (int i = l; i <= R; i++) { ans[id][1] += cnt[i]; if (cnt[i]) ans[id][2]++; } for (int i = L; i <= r; i++) { ans[id][1] += cnt[i]; if (cnt[i]) ans[id][2]++; } }}inline void solve(){ int l = 1, r = 0; for (int i = 1; i <= m; i++) { while (l > q[i].l) --l, add(l); while (r < q[i].r) ++r, add(r); while (l < q[i].l) del(l), l++; while (r > q[i].r) del(r), r--; ask(i); }}int main(){ n = read(), m = read(); size = pow(n, 0.55), block = sqrt(n); for (int i = 1; i <= n; ++i) a[i] = read(); for (int i = 1; i <= m; i++) { q[i].l = read(), q[i].r = read(); q[i].a = read(), q[i].b = read(); q[i].id = i, q[i].pos = (q[i].l - 1) / size + 1; } std::sort(q + 1, q + m + 1, cmp); solve(); for (int i = 1; i <= m; i++) printf(\"%d %d\\n\", ans[i][1], ans[i][2]); return 0;}","link":"/2019/02/06/洛谷4396/"},{"title":"模式识别与机器学习笔记","text":"本文为清华大学”模式识别与机器学习”课程的复习笔记($\\text{Full Version}$)。 Evaluation Metric \\begin{aligned} \\text{Accuracy} &= \\frac{\\text{TP+TN}}{\\text{TP+FP+TN+FN}} \\newline \\text{Precision} &= \\frac{\\text{TP}}{\\text{TP+FP}} \\newline \\text{Recall} &= \\text{Sensitivity} = \\frac{\\text{TP}}{\\text{TP+FN}} \\newline \\text{Specificity} &= \\frac{\\text{TN}}{\\text{TN+FP}} \\newline \\text{Type-I Error} &= \\frac{\\text{FP}}{\\text{TP+FN}} = 1 - \\text{Sensitivity} \\newline \\text{Type-II Error} &= \\frac{\\text{FN}}{\\text{TN+FP}} = 1 - \\text{Specificity} \\newline \\end{aligned}k-NNNearest NeighborFor a new instance $x’$, its class $\\omega’$ can be predicted by: \\omega' = \\omega_i, \\text{ where } i = \\underset{j}{\\arg\\min} \\, \\delta(x', x_j)k-Nearest NeighborFor a new instance $x$, define $g_i(x)$ as: the number of $x$’s k-nearest instances belonging to the class $\\omega_i$. Then the new instance’s class $\\omega’$ can be predicted as: \\omega' = \\omega_j,\\text{ where }j = \\underset{i}{\\arg\\max} \\, g_i(x)k-NN ImprovementsBranch-Bound AlgorithmUse tree structure to reduce calculation. Edit Nearest NeighborDelete nodes that may be misguiding from the training instance set. Condensed Nearest NeighborDelete nodes that are far away from decision boundaries. The Curse of DimensionalityProblem Many irrelevant attributes In high-dimensional spaces, most points are equally far from each other. Solution Dimensionality reduction techniques manifold learning Feature selection Use prior knowledge Linear Regression (Multivariate ver.)For a multivariate linear regression, the function becomes $y_i = \\mathbf{w}^{\\rm T}\\mathbf{x}_i$ , where: \\mathbf{x} = \\left(1, x_i^1, \\cdots, x_i^{d} \\right)^{\\mathrm T} \\in \\mathbb{R}^{d+1}, \\mathbf{w} = \\left(w_0, w_1, \\cdots, w_d\\right)^{\\mathrm T} \\in \\mathbb{R}^{d+1}, We adjust the values of $\\mathbf{w}$ to find the equation that gives the best fitting line $f(x) = \\mathbf{w}^{\\rm T}\\mathbf{x}$ We find the best $\\mathbf{w}^*$ using the Mean Squared Loss: \\ell(f(\\mathbf x, y)) = \\min\\limits_{\\mathbf w} \\frac{1}{N} \\sum_{i = 1}^N (f(\\mathbf x_i) - y_i)^2 = \\min \\limits_{\\mathbf w} \\frac{1}{N}(\\mathbf {Xw-y})^{\\rm T}(\\mathbf {Xw-y})So that $\\mathbf{w}^{\\star} $ must satisfy $\\mathbf {X^{\\rm T}} \\mathbf {Xw^{\\star}} = \\mathbf X^{\\rm T}\\mathbf y$ , so we get $\\mathbf{w^{\\star}} = (\\mathbf {X^{\\rm T}X})^{-1}\\mathbf X^{\\rm T}\\mathbf y$ or $\\mathbf{w^{\\star}} = (\\mathbf {X^{\\rm T}X} + \\lambda \\mathbf I)^{-1}\\mathbf X^{\\rm T}\\mathbf y$ (Ridge Regression) Linear Discriminant Analysisproject input vector $\\mathbf x \\in \\mathbb{R}^{d+1}$ down to a 1-dimensional subspace with projection vector $\\mathbf w$ The problem is how do we find the good projection vector? We have Fisher’s Criterion, that is to maximize a function that represents the difference between-class means, which is normalized by a measure of the within-class scatter. We have between-class scatter $\\tilde{S}_b = (\\tilde{m}_1 - \\tilde{m}_2)^2$, where $\\tilde{m}_i$ is the mean for the i-th class. Also we have within-class scatter $\\tilde{S}_i=\\sum_{y_j \\in \\mathscr{y}_{i}} (y_j - \\tilde{m}_i)^2$, then we have total within-class scatter $\\tilde{S}_w = \\tilde{S}_1+ \\tilde{S}_2$. Combining the 2 expressions, the new objective function will be $J_F(\\mathbf w) = \\frac{\\tilde{S}_b}{\\tilde{S}_w}$ We have $\\tilde{S}_b = (\\tilde{m}_1 - \\tilde{m}_2)^2 = (\\mathbf w^{\\rm T} \\mathbf m_1 - \\mathbf w^{\\rm T} \\mathbf m_2)^2 = \\mathbf w^{\\rm T} (\\mathbf m_1 - \\mathbf m_2)(\\mathbf m_1 - \\mathbf m_2)^{\\rm T} \\mathbf w = \\mathbf w^{\\rm T} \\mathbf S_b \\mathbf w$, also $\\tilde{S}_w = \\mathbf w^{\\rm T} \\mathbf S_w \\mathbf w$, so now optimize objective function $J_F$ w.r.t $\\mathbf w$: \\max\\limits_{\\mathbf w} J_F(\\mathbf w) = \\max \\limits_ {\\mathbf w} \\frac{\\mathbf w^{\\rm T} \\mathbf S_b \\mathbf w}{\\mathbf w^{\\rm T} \\mathbf S_w \\mathbf w}Use Lagrange Multiplier Method we obtain: $\\lambda w^{\\star} = \\mathbf{S}_W^{-1} (\\mathbf m_1 - \\mathbf m_2)(\\mathbf m_1 - \\mathbf m_2)^{\\rm T}\\mathbf w^{\\star}$, since we only care about the direction of $\\mathbf w^*$ and $(\\mathbf m_1 - \\mathbf m_2)^{\\rm T}\\mathbf w^{\\star}$ is scalar, thus we obtain $w^{\\star} = \\mathbf{S}_W^{-1} (\\mathbf m_1 - \\mathbf m_2)$ Logistic RegressionLogistic regression is a statistical method used for binary classification, which means it is used to predict the probability of one of two possible outcomes. Unlike linear regression, which predicts a continuous output, logistic regression predicts a discrete outcome (0 or 1, yes or no, true or false, etc.). Key Concepts Odds and Log-Odds: Odds: The odds of an event are the ratio of the probability that the event will occur to the probability that it will not occur. \\text{Odds} = \\frac{P(y=1)}{P(y=0)} Log-Odds (Logit): The natural logarithm of the odds. \\text{Log-Odds} = \\log\\left(\\frac{P(y=1)}{P(y=0)}\\right) Logistic Function (Sigmoid Function): The logistic function maps any real-valued number into the range (0, 1), making it suitable for probability predictions. \\sigma(z) = \\frac{1}{1 + e^{-z}} In logistic regression, $ z $ is a linear combination of the input features. z = w^T x + b Model Equation: The probability of the positive class (e.g., $ y=1 $) is given by the logistic function applied to the linear combination of the features. P(y=1|x) = \\sigma(w^T x + b) = \\frac{1}{1 + e^{-(w^T x + b)}} The probability of the negative class (e.g., $ y=0 $) is: P(y=0|x) = 1 - P(y=1|x) Decision Boundary: To make a binary decision, we typically use a threshold (commonly 0.5). If $ P(y=1|x) $ is greater than 0.5, we predict the positive class; otherwise, we predict the negative class. Training the ModelWe use MLE(Maximum Likelihood Estimation) for logistic regression: \\max_{\\mathbf w} \\prod_{i=1}^{N} \\left[ \\theta(w^T x)^{\\mathbf 1(y_i=1)} \\times (1 - \\theta(w^T x))^{\\mathbf 1(y_i=0)} \\right]Applying negative log to the likelihood function, we obtain the log-likelihood for logistic regression. = \\min_{\\mathbf w} J(\\mathbf w) = \\min\\limits_{\\mathbf w} - \\sum_{i=1}^{N} \\left\\{ y_i \\log \\left( \\frac{e^{\\mathbf w^{\\rm T} \\mathbf x_i}}{1 + e^{\\mathbf w^{\\rm T} \\mathbf x_i}} \\right) + (1 - y_i) \\log \\left( 1 - \\frac{e^{\\mathbf w^{\\rm T} \\mathbf x_i}}{1 + e^{\\mathbf w^{\\rm T} \\mathbf x_i}} \\right) \\right\\}Substituting $y_i \\in \\{0, +1\\}$ with $\\tilde y_i \\in \\{-1, +1\\}$, and noting that $\\theta(-s) + \\theta(s) = 1$, we can simplify the previous expression: \\min_w J(w) = \\min_{\\mathbf w} \\sum_{i = 1}^N \\log(1 + e^{-\\tilde y_i \\mathbf w ^ {\\rm T}\\mathbf x_i})This is called the Cross Entropy Loss. Generalization to K-classesThe generalized version of logistic regression is called Softmax Regression. The probability of an input $x$ being class $k$ is denoted as: P(y = k | x; \\mathbf{W}) = \\frac{e^{\\mathbf w_k^{\\rm T} x}}{\\sum_{i=1}^{K} e^{\\mathbf w_i^{\\rm T} x}}In multiclass, the likelihood function can be written as: \\max_{w_1, w_2, \\ldots, w_k} \\prod_{i=1}^{N} \\prod_{k=0}^{K} P(y_i = k | x_i; \\mathbf{W})^{\\mathbf 1(y_i = k)}We can use the minimum negative log-likehood estimation: \\min\\limits_{\\mathbf{W}} J(\\mathbf{W}) = \\min_{\\mathbf w_1, \\mathbf w_2, \\ldots, \\mathbf w_k} -\\frac{1}{N} \\sum_{i=1}^{N} \\sum_{k=0}^{K} \\mathbf 1(y_i = k) \\cdot \\log \\frac{e^{\\mathbf w_k^{\\rm T} x_i}}{\\sum_{j=1}^{K} e^{\\mathbf w_j^T x_i}}PerceptronWe predict based on the sign of $y$: $y = \\text{sign}(f_{\\mathbf w}(x)) = \\text{sign}(\\mathbf w^{\\rm T}\\mathbf x)$ For Perceptron the objective loss function is defined as: J_p(\\mathbf{w}) = \\sum_{\\hat{x}_j \\in \\mathcal{X}^k} (-\\mathbf{w}^T \\hat{x}_j)where $\\mathcal{X}^k$ is the misclassified sample set at step $k$. We can use gradient descent to solve for $\\mathbf w^*$: \\mathbf{w}_{k+1} = \\mathbf{w}_k + \\rho_k \\sum_{x_j \\in \\mathcal{X}^k} (-\\hat{x}_j)Support Vector MachineWe want the optimal linear separators, that is the most robust classifier to the noisy data, meaning it has the largest margin to the training data. So we want to find the classifier with the largest margin. Modeling(For Linear-Separable Problem)We want the margin is largest: $\\max\\limits_{\\mathbf w, b}\\rho(\\mathbf w, b)$, and all the datapoints are classified correctly, that is $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1$. The distance between two paralleled hyperplanes is: $|b_1 - b_2| / ||a||$, and the distance between a point $\\mathbf x_0$ and a hyperplane $(\\mathbf w, b)$ is $|\\mathbf w^{\\rm T} \\mathbf x_0 + b| / ||\\mathbf w||$. Choose the points that are closest to the classifier, and they satisify: $|\\mathbf w^{\\rm T} \\mathbf x_0 + b| = 1$, so that margin $\\rho$ = $|\\mathbf w^{\\rm T} \\mathbf x_1 + b| / ||\\mathbf w|| + |\\mathbf w^{\\rm T} \\mathbf x_2 + b| / ||\\mathbf w|| = 2 / ||\\mathbf w||$. Thus we got the Hard-margin Support Vector Machine: \\max\\limits_{\\mathbf w, b}\\frac{2}{||\\mathbf w||}s.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1, 1 \\leq i \\leq n$ For compute convenience, we convert it into \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2s.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1, 1 \\leq i \\leq n$ Modeling(For Linearly Non-Separable Problem)We add a slack that allows points to be classified on the wrong side of the decision boundary, also we add a penalty. So we got the Soft-margin SVM: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\xi_is.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1 - \\xi_i, 1 \\leq i \\leq n$ Using hinge-loss $\\ell_{\\text{hinge}}(t) = \\max(1-t, 0)$, we have the final version of Soft-margin SVM: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\ell_{\\text{hinge}}(y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b))Optimization For TrainingLagrangian Function & KKT ConditionConsider a constrained optimization problem \\min_{x \\in \\mathbb{R}^d} f(x), \\text{ s.t. } g_i(x) \\leq 0, \\forall i = 1, \\dots, nThe Lagrangian function $L(x, \\mu)$ is defined as: L(x, \\mu) = f(x) + \\sum_{j = 1}^J \\mu_ig_j(x)We have KKT conditions(necessary condition): for $1 \\leq j \\leq J$ Primal feasibility: $g_j(x) \\leq 0$ dual feasibility: $\\mu_i \\geq 0$ Complementary slackness: $\\mu_i g_j(x^*) = 0$ Lagrangian optimality: $\\nabla_x L(x_*, \\mu) = 0$ Dual Problem For Soft-margin SVMFor Soft-margin Support Vector Machine: \\min\\limits_{\\mathbf w, b}\\frac{1}{2}||\\mathbf w||^2 + C\\sum_{i=1}^N \\xi_is.t. $y_i \\cdot (\\mathbf w^{\\rm T}\\mathbf x_i + b) \\geq 1 - \\xi_i, \\xi_i \\geq 0, 1 \\leq i \\leq n$ We have the Lagrangian function(with $2n$ inequality constraints): L(\\mathbf{w}, b, \\alpha, \\xi, \\mu) = \\frac{1}{2} \\|\\mathbf{w}\\|_2^2 + C \\sum_{i=1}^{n} \\xi_i + \\sum_{i=1}^{n} \\alpha_i [1 - \\xi_i - y_i (\\mathbf{w}^T \\mathbf{x}_i + b)] - \\sum_{i=1}^{n} \\mu_i \\xi_is.t. $\\alpha_i \\geq 0, \\mu_i \\geq 0, \\, i = 1, \\ldots, n$. take the partial derivatives of Lagrangian w.r.t $\\mathbf w, b, \\xi_i$ and set to zero \\begin{aligned} \\frac{\\partial L}{\\partial \\mathbf{w}} &= 0 \\implies \\mathbf{w} = \\sum_{i=1}^{n} \\alpha_i y_i \\mathbf{x}_i \\\\ \\frac{\\partial L}{\\partial b} &= 0 \\implies \\sum_{i=1}^{n} \\alpha_i y_i = 0 \\\\ \\frac{\\partial L}{\\partial \\xi_i} &= 0 \\implies C = \\alpha_i + \\mu_i, \\, i = 1, \\cdots, n \\\\ \\end{aligned}So that we got: L(\\mathbf{w}, b, \\alpha, \\xi, \\mu) = \\frac{1}{2} \\|\\mathbf{w}\\|_2^2 + C \\sum_{i=1}^{n} \\xi_i + \\sum_{i=1}^{n} \\alpha_i [1 - \\xi_i - y_i (\\mathbf{w}^T \\mathbf{x}_i + b)] - \\sum_{i=1}^{n} \\mu_i \\xi_i = \\frac{1}{2} \\mathbf{w}^T \\mathbf{w} + \\sum_{i=1}^{n} \\xi_i (C - \\alpha_i - \\mu_i) + \\sum_{i=1}^{n} \\alpha_i - \\sum_{i=1}^{n} \\alpha_i \\cdot y_i \\cdot \\mathbf{w}^T \\mathbf{x}_i - b \\sum_{i=1}^{n} \\alpha_i \\cdot y_i = \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\alpha_i y_i \\mathbf{x}_i \\right)^T \\left( \\sum_{j=1}^{n} \\alpha_j y_j \\mathbf{x}_j \\right) + 0 + \\sum_{i=1}^{n} \\alpha_i - \\sum_{i=1}^{n} \\alpha_i \\cdot y_i \\cdot \\left( \\sum_{j=1}^{n} \\alpha_j y_j \\mathbf{x}_j \\right) x_i + 0 = \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right) + \\sum_{i=1}^{n} \\alpha_i - \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right) = \\sum_{i=1}^{n} \\alpha_i - \\frac{1}{2} \\left( \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i \\mathbf{x}_j \\right)So we have the Dual Problem of Soft-SVM: \\max_{\\alpha} \\sum_{i=1}^{n} \\alpha_i - \\frac{1}{2} \\sum_{i=1}^{n} \\sum_{j=1}^{n} \\alpha_i \\alpha_j y_i y_j \\mathbf{x}_i^T \\mathbf{x}_js.t. $\\sum_{i=1}^{n} \\alpha_i y_i = 0, \\quad 0 \\leq \\alpha_i \\leq C, \\, i = 1, \\ldots, n.$ After solving $\\alpha$, we can get $\\mathbf{w} = \\sum_{j=1}^n\\alpha_j y_j x_j$ and $b$ Kernel Method for SVMLinear SVM cannot handle linear non-separable data. So we need to map the original feature space to a higher-dimensional feature space where the training set is separable. Basically we could set $x \\to \\phi(x)$, but calculating $x_i \\dots x_j$ will cause heavy computation cost, so we use the kernel trick, that is to find a function $k(x_i, x_j) = \\phi(x_i) \\dots \\phi(x_j)$. Some commonly used kernel: Linear Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = (\\mathbf{x} \\cdot \\mathbf{x}_i) Polynomial Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = [(\\mathbf{x} \\cdot \\mathbf{x}_i) + 1]^q Radial Basis Function Kernel (a.k.a. RBF kernel, Gaussian kernel): k(\\mathbf{x}, \\mathbf{x}_i) = \\exp \\left( -\\frac{\\|\\mathbf{x} - \\mathbf{x}_i\\|^2}{2\\sigma^2} \\right) Sigmoid Kernel: k(\\mathbf{x}, \\mathbf{x}_i) = \\tanh (v(\\mathbf{x} \\cdot \\mathbf{x}_i) + c) Kernel tricks can also be applied to more algorithms, such as k-NN, LDA, etc. Decision TreeWe use a tree-like structure to deal with categorical features. For each node, we find the most useful feature, that means the feature that can better divide the data on the node. ID3 AlgorithmWe use entropy as criterion: H(D) = -\\sum_{k=1}^K \\frac{|C_k|}{|D|} \\log \\frac{|C_k|}{|D|}A good split gives minimal weighted average entropy of child nodes: \\frac{|D_1|}{|D|}H(D_1) + \\frac{|D_2|}{|D|}H(D_2)For any split, the entropy of the parent node is constant. Minimizing the weightedentropy of son nodes is equivalent to maximizing the information gain (IG): H(D) - \\frac{|D_1|}{|D|}H(D_1) - \\frac{|D_2|}{|D|}H(D_2)C4.5 AlgorithmInformation Gain is highly biased to multivalued features. So we use Information Gain Ratio (GR) to choose optimal feature: \\text{GR} = \\frac{\\text{Information Gain}}{\\text{Intrinsic Value}}Intrinsic Value (IV) is to punish multivalued features. For a selected feature $f$, its Intrinsic Value is: IV(f) = -\\sum_{k=1}^{|V|}\\frac{|F_k|}{|D|} \\log \\frac{|F_k|}{|D|}where $V$ is the set of all possible values of the feature $f$, and $F_k$ is the subset of $D$ where the value of the feature $A$ is $k$. Features with many possible values tend to have a large Intrinsic Value. Classification and Regression Tree(CART)The CART Tree muse be a binary tree. Regression TreeHow to divide the regions $R = \\{R_1, \\dots, R_m\\}$ and decide the values $V = \\{v_1, \\dots, v_m\\}$? We use minimum mean-square error over all examples $x_i$ with label $y_i$ \\min_{R, V} l = \\min_{R, V} \\sum_{j = 1}^m \\sum_{x_i \\in R_j} (y_i - v_j)^2Assuming that R has been determined and first find the optimal V. For a given region R_j, the value $v_j$ to minimize the loss is the average value of the labels of all samples belonging to region $R_j$: v_j = \\frac{1}{|R_j|} \\sum_{x_i \\in R_j} y_iNow for each feature $A$ and split threshold $a$, the parent node $R$ is split by $(A, a)$ to $R_1$ and $R_2$. We choose $(A, a) over all possible values to minimize: l(A, a) = \\sum_{x_i \\in R_1} (y_i - v_1(A, a))^2 + \\sum_{x_i \\in R_2} (y_i - v_2(A, a))^2where $v_1(A, a)$ and $v_2(A, a)$ are described above. Classification TreeThe split criteria is now Gini Index: \\text{Gini}(D) = 1 - \\sum_{k = 1}^K \\left(\\frac{|C_k|}{|D|}\\right)^2We choose the feature $A$ and the threshold $a$ over all possible values with themaximal gain \\text{Gini}(D) - \\frac{|D_1|}{|D|} \\text{Gini}(D_1) - \\frac{|D_2|}{|D|} \\text{Gini}(D_2)Ensemble LearningReduce the randomness (variance) by combining multiple learners. Bagging(Bootstrap Aggregating) Create $M$ bootstrap datasets Train a learner on each dataset Ensemble $M$ learners Uniformly sample from the original data D with replacement. The bootstrap datasethas the same size as the original data D, the probability of not showing up is (1-\\frac{1}{n})^n \\approx \\frac{1}{e} \\approx 0.37We use the elements show up in $D$ but not in the bootstrap dataset as the validation set(The out-of-bag dataset). Random ForestEnsemble decision trees (Training data with $d$ features) Create bootstrap datasets During tree construction, randomly sample $K (K<d)$ features as candidates for each split. (Usually choose $K = \\sqrt d$) Use feature selection to make treees mutally independent and diverse. BoostingBoosting: Sequentially train learners. Current Weak learners focus more on theexamples that previous weak learners misclassified. Weak classifiers $h_1, \\cdots, h_m$ are build sequentially. $h_m$ outputs ‘$+1$’ for oneclass and ‘$-1$’ for another class. Classify by $g(x) = \\text{sgn}(\\sum \\alpha_m h_m(x))$ AdaBoostCore idea: give higher weights to the misclassified examples so that half of thetraining samples come from incorrect classifications. (re-weighting) Mathematical Formulation: Weighted Error: \\epsilon_t = \\sum_{i=1}^n w_i \\cdot \\mathbf 1(y_i \\neq h_t(x_i)) Alpha Calculation: \\alpha_t = \\frac{1}{2} \\ln \\left( \\frac{1 - \\epsilon_t}{\\epsilon_t} \\right) Weight Update: w_i \\leftarrow w_i \\exp(\\alpha_t \\cdot \\mathbf 1(y_i \\neq h_t(x_i))) Final Hypothesis: H(x) = \\text{sign} \\left( \\sum_{t=1}^T \\alpha_t h_t(x) \\right) Gradient BoostingView boosting as an optimization problem. The criterion is to minimize the empirical loss: \\arg \\min_{(\\alpha_1, \\ldots, \\alpha_t, h_1, \\ldots, h_t)} \\sum_{i=1}^{n} l \\left( y_i, \\sum_{s=1}^{t} \\alpha_s h_s(x) \\right)Loss function $l$ depends on the task: Cross entropy for multi-classification $\\text{L2}$ loss for regression We use sequential training: optimize a single model at a time, that is freeze $h_1, \\cdots, h_{t-1}$ and optimize $h_t$. (Let $f_{t-1}(x) = \\sum_{s=1}^{t-1} \\alpha_s h_s(x)$, denoting the ensemble of $t-1$ learners.) Now let’s see how to choose the $\\alpha_t$ and $h_t$, we define: u = (f_{t-1}(x_1), \\cdots, f_{t-1}(x_n)) \\\\ \\Delta u = (h_t(x_1), \\cdots, h_t(x_n))Consider function $F(u) = \\sum_{i=1}^n l(y_i, u_i)$, then the original objective is equivalent to find a direction $\\Delta u$ and step size $\\alpha$ at the point $u$to minimize: F(u + \\alpha_t \\Delta u) = \\sum_{i=1}^n l(y_i, u_i + \\alpha \\Delta u_i)According to Gradient Descent, we could let $\\delta u = \\nabla_u F(u)$, thus h_t(x_i) = -\\frac{\\partial F(u)}{\\partial u_i} = -\\left[ \\frac{\\partial l(y_i, u_i)}{\\partial u_i} \\right]_{u_i = f_{t-1}(x_i)}Then how to decide $\\alpha_t$? Use one-dimensional search $(y_i, x_i, f_{t-1}, h_t \\text{ is fixed})$ \\alpha_t = \\arg\\min_{\\alpha_t} \\sum_{i=1}^{n} l(y_i, f_{t-1}(x_i) + \\alpha_t h_t(x_i))For simplicity, search of optimal multiplier can be replaced by setting it a constant. In conclusion, Gradient Boosting = Gradient Descent + Boosting. Learning TheoryEmpirical Risk Minimization (ERM)Empirical Risk: The average loss of the model $f$ on training set $\\mathcal D = \\{x_i, y_i\\}^N_{i=1}$ \\hat{R}(f) = \\frac{1}{N} \\sum_{i = 1}^N \\ell(f(x_i), y_i)Empirical Risk Minimization(ERM): The learning algorithm selects the model that minimizes the empirical risk on the training dataset. \\mathcal A(\\mathcal D, \\mathcal H) = \\arg \\min_{f \\in \\mathcal H} \\hat R(f)The Consistency of Learning ProcessWe say a learning process is consistent, if the minimizer for empirical risk atthe infinite data limit, converges to the minimum expected risk. Overfitting and Bias-Variance Trade-offDefine the Population Loss (also called Expected Risk) as R(f) = \\mathbb E_{(x, y) \\sim u} \\ell(f(x), y)Therefore define the Generalization Gap as: $R(f) - \\hat R(f)$ There are two important concepts of predicting model Bias: The assumptions of target model, represents the extent to which theaverage prediction over all datasets differs from the desired function. Variance: The extent of change for the model when the training data changes(can be understood as “stability” to dataset change). Bias-Variance Trade-off : There is an intrinsic contradict between bias and variance. The model’s test error contains the sum of both. Bias-Variance Decomposition : Suppose the ground truth function is $f^*$, the data distribution is $\\mu$, the algorithm $\\mathcal{A}$ learns from hypothesis space $\\mathcal{H}$. We use $y(x; \\mathcal{D}) = \\mathcal{A}(\\mathcal{D}, \\mathcal{H})(x)$ to denote the output of ERM model $\\hat{f} = \\mathcal{A}(\\mathcal{D}, \\mathcal{H})$ on input $x$.We are interested in the learned model’s prediction error on any $x$, namely [y(x; \\mathcal{D}) - f^*(x)]^2 = \\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] + \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 = \\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}^2 + \\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 - 2\\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}\\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}Taking expectation over all possible datasets $\\mathcal{D}$, the last term is zero. = \\{\\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})] - f^*(x)\\}^2 + \\mathbb{E}_{\\mathcal{D}}[\\{y(x; \\mathcal{D}) - \\mathbb{E}_{\\mathcal{D}}[y(x; \\mathcal{D})]\\}^2] = (\\text{bias})^2 + \\text{variance}Regularization refers to techniques that are used to calibrate machine learningmodels in order to prevent overfitting, which picks a small subset of solutionsthat are more regular (punish the parameters for behaving abnormally) toreduce the variance. Generalization Error and RegularizationVC dimensionVC dimension is a measure of complexity for a certain hypothesis class:The largest integer $d$ for a binary classification hypothesis class $\\mathcal H$, such thatthere exists 𝑑 points in the input space 𝒳 that can be perfectly classified by somefunction $h \\in \\mathcal H$ no matter how you assign labels for these $d$ points. VC dimension characterizes the model class’s capacity for fitting random labels. Generalization Error BoundIf a hypothesis class $\\mathcal{H}$ has VC dimension $d_{vc}$, we have a theorem that states that with probability $1 - \\delta$ and $m$ samples, we can bound the generalization gap for any model $h \\in \\mathcal{H}$ as R(h) \\leq \\hat{R}(h) + \\sqrt{\\frac{8d_{vc} \\ln\\left(\\frac{2em}{d_{vc}}\\right) + 8 \\ln\\left(\\frac{4}{\\delta}\\right)}{m}}Bayesian DecisionBayesian Decision: Find an optimal classifier according to the prior probability and class-conditional probability density of the feature The a priori or prior probability reflects our knowledge of how likely we expect a certain state of nature before we can actually observe said state of nature. The class-conditional probability density function is the probabilitydensity function $P(x|\\omega)$ for our feature $x$, given that the state/class is $\\omega$ Posterior Probability is the probability of a certain state/class givenour observable feature $x$: $P(\\omega | x)$ Minimum Prediction Error Principle. The optimal classifier $f(\\cdot)$ should minimize the expected prediction error, defined as P(\\text{error}) = \\int \\sum_{\\omega_j \\neq f(x)} P(x, \\omega_j) \\, dxSo, for each $x$, we want f(x) = \\arg\\min_{\\omega_i} \\sum_{\\omega_j \\neq \\omega_i} P(x, \\omega_j) = \\arg\\min_{\\omega_i} P(x) - P(x, \\omega_i) f(x) = \\arg\\max_{\\omega_i} P(x, \\omega_i) = \\arg\\max_{\\omega_i} P(\\omega_i | x)Therefore, the classifier just needs to pick the class with largest posterior probability. We could use a decision threshold $\\theta$ for diciding. Also we can avoid making decisions on the difficult cases in anticipation of a high error rate on those examples. Density estimationWe need a method to estimate the distribution of each feature, this is called density estimation. Parametric Density Estimation MethodWe can assume that the density function follows some form, for example: P(x|\\omega_i) = \\frac{1}{\\sqrt{2\\pi}\\sigma_i}e^{-\\frac{(x-\\mu_i)^2}{2\\sigma_i^2}}The unknown $\\theta_i = (\\mu_i, \\sigma_i)$ is called the parameters. Maximum Likelihood Estimation (MLE)Likelihood Function: $p(x|\\theta)$ measures the likelihood of a parametrized distribution to generate a sample $x$. Max Likelihood Estimation (MLE): Choose the parameter 𝜃 that maximizes thelikelihood function for all the samples. For example, if we use Gaussian to estimate $X = \\{x_i\\}_{i=1}^N$, MLE gives the result as \\mu, \\sigma = \\arg\\max_{\\mu, \\sigma} \\prod_{i=1}^{N} \\frac{1}{\\sqrt{2\\pi}\\sigma} e^{-\\frac{(x_i - \\mu)^2}{2\\sigma^2}}For the sake of simplicity, denote $H(\\theta) = \\ln p(X|\\theta) = \\sum_{i=1}^{N} \\ln p(x_i|\\theta)$ \\frac{dH}{d\\mu} = 0 \\implies \\sum_{i=1}^{N} \\frac{1}{\\sigma} (x_i - \\mu) = 0 \\implies \\mu = \\frac{1}{N} \\sum_{i=1}^{N} x_i, \\frac{dH}{d\\sigma} = 0 \\implies -\\sum_{i=1}^{N} \\frac{1}{\\sigma} + \\sum_{i=1}^{N} \\frac{(x_i - \\mu)^2}{2\\sigma^2} = 0 \\implies \\sigma^2 = \\frac{1}{N} \\sum_{i=1}^{N} (x_i - \\mu)^2.Non-parametric Density Estimation MethodNon-parametric method makes few assumptions about the form of the distribution and does not involve any parameter about the density function’s form. Suppose totally we sample $N$ data, of which $K$ points are within $R$. Each data issample identically and independently. For each sample, whether it belongs to 𝑅 follows Bernoulli distribution with parameter $P_R$. We have $p(x) \\approx \\frac{P_R}{V} \\approx \\frac{K}{NV}$ We could apply kernel methods to it. Hidden Markov Models (HMMs)Understanding Bayes’ Rule: p(H|E)=\\frac{p(E|H)P(H)}{P(E)} Prior $P(H)$ : How probable was our hypothesis before observing the evidence? Likelihood $p(E|H)$ : How probable is the evidence given that our hypothesis is true? Marginal $P(E)$: How probable is the new evidence? Notation Explanation $Q = \\{q_1, \\ldots, q_n\\}$ The set of $n$ hidden states. $V = \\{v_1, \\ldots, v_v\\}$ The set of all possible observed values. $A = [a_{ij}]_{n \\times n}$ Transition matrix. $a_{ij}$ is the probability of transitioning from state $i$ to state $j$. $\\sum_{j=1}^n a_{ij} = 1 \\, \\forall i$. $O = o_1 o_2 \\cdots o_L$ Observed sequence. $o_t \\in V$. $x = x_1 x_2 \\cdots x_L$ Hidden state sequence. $x_t \\in Q$. $E = [e_{ij}]_{n \\times v}$ Emission probability matrix. $e_{ij} = P(o = v_j \\mid x = q_i)$ is the probability of observing $v_j$ at state $q_i$. $\\sum_{j=1}^V e_{ij} = 1 \\, \\forall i$. $\\pi = [\\pi_1, \\pi_2, \\ldots, \\pi_n]$ Start probability distribution. $\\pi_i$ is the probability of Markov chain starting from $i$. $\\sum_{i=1}^n \\pi_i = 1$. Question #1 – EvaluationThe evaluation problem in HMM: Given a model $M$ and an observed sequence $O$, calculate the probability of the observed sequence $P(O|M)$ . Forward AlgorithmDenote $\\alpha_t(j)$ as the probability of observing $o_1 o_2 \\ldots o_t$ and the hidden state at $t$ being $q_j$: \\alpha_t(j) = p(o_1 o_2 \\ldots o_t, x_t = q_j)Obviously, $\\alpha_t(j)$ can be rewritten as: \\alpha_t(j) = e_j(o_t) \\times \\sum_{i=1}^{n} \\alpha_{t-1}(i) a_{ij} Define Initial Values: \\alpha_1(j) = e_j(o_1) \\times \\pi_j, \\quad j = 1, \\cdots, n Iterative solving: \\alpha_t(j) = e_j(o_t) \\times \\sum_{i=1}^{n} \\alpha_{t-1}(i) a_{ij}, \\quad t = 1:L Obtaining results: p(O) = \\sum_{i=1}^{n} \\alpha_L(i) Backward AlgorithmDenote $\\beta_t(j)$ as the probability of observing $o_{t+1} o_{t+2} \\ldots o_L$ and the hidden state at $t$ being $q_j$: \\beta_t(j) = p(o_{t+1} o_{t+2} \\ldots o_L \\mid x_t = q_j)Obviously, $\\beta_t(j)$ can be rewritten as: \\beta_t(j) = \\sum_{i=1}^{n} a_{ji} e_i(o_{t+1}) \\beta_{t+1}(i) Define Initial Values: \\beta_L(j) = 1, \\quad j = 1:n \\quad (L + 1 \\text{ is terminal state}) Iterative solving: \\beta_t(j) = \\sum_{i=1}^{n} a_{ji} e_i(o_{t+1}) \\beta_{t+1}(i), \\quad t = 1:L, \\quad j = 1:n Obtaining results: p(O) = \\sum_{i=1}^{n} \\pi_i e_i(o_1) \\beta_1(i) Question #2 – DecodingThe decoding problem in HMM: Given a model $M$ and an observed sequence $O$, calculate the most probable hidden state sequence $\\mathbf{x} = \\arg\\max_{\\mathbf{x}} p(\\mathbf{x}, O | M)$. Define: v_t(j) = \\max_{q_1 \\ldots q_{t-1}} p(q_1 \\ldots q_{t-1}, o_1 \\ldots o_t, x_t = q_j)According to the recurrence relation, rewrite the above as: v_t(j) = \\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t)Therefore, the most probable hidden state sequence is: pa_t(j) = \\arg\\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t)Viterbi Algorithm Define Initial Values: v_1(j) = e_j(o_1) \\times \\pi_j, \\quad pa_1(j) = 0, \\quad j = 1:n Iterative solving: v_t(j) = \\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t) pa_t(j) = \\arg\\max_{i=1}^n v_{t-1}(i) a_{ij} e_j(o_t) Obtaining results: p^* = \\max_{i=1:n} v_L(i) x^*_L = \\arg\\max_{i=1:n} v_L(i)Computational Complexity: $O(n^2 L)$ Question #3 – LearningThe learning problem in HMM: Given an observed sequence $O$, estimate the parameters of model: $M = \\arg \\max \\limits_{M}P(M|O)$ For simplicity, in the following steps we only present the learning process of transition matrix $A$. (The other parameters can be learned in a similar manner.) Baum-Welch Algorithm (a special case of EM algorithm) Expectation Step (E-step): Using the observed available data of the dataset, we estimate (guess) the values of the missing data with the current parameters $\\theta_{\\text{old}}$. Maximization Step (M-step): Using complete data generated after the E-step, we update the parameters of the model. E-step(#$T_{ij}$ denotes the times of hidden state transitioning from $q_i$ to $q_j$) Generate the guesses of #$T_{ij}$, i.e., the expected counts: \\text{Expected Counts} = \\sum_{t=1}^{L} p(x_t = q_i, x_{t+1} = q_j \\mid O, \\theta_{\\text{old}})Can be estimated with Forward Algorithm and Backward Algorithm. M-stepGenerate new estimations with the expected counts: \\hat{a}_{ij} = \\frac{\\sum_{t=1}^{L-1} p(x_t = q_i, x_{t+1} = q_j \\mid O, \\theta_{\\text{old}})}{\\sum_{t=1}^{L-1} \\left( \\sum_{j'} p(x_t = q_i, x_{t+1} = q_{j'} \\mid O, \\theta_{\\text{old}}) \\right)}Estimation when hidden state is unknown. Iterative Solving: Recalculate the expected counts with newly estimated parameters (E-step). Then generate newer estimations of $\\theta$ with (M-step). Repeat until convergence. Bayesian NetworksNaive BayesNaïve Bayes Assumption: Features $X_i$ are independent given class $Y$: P_\\theta(X_1, \\ldots, X_n \\mid Y) = \\prod_i P_\\theta(X_i \\mid Y)Inference: the label can be easily predicted with Bayes’ rule Y^* = \\arg\\max_Y \\prod_i P_\\theta(X_i \\mid Y) P(Y)$Y^*$ is the value that maximizes Likelihood $\\times$ Prior. When the number of samples is small, it is likely to encounter cases where $\\text{Count}(Y = y) = 0$ or $\\text{Count}(X_i = x, Y = y) = 0$. So we use Laplace Smoothing. The parameters of Naïve Bayes can be learned by counting: Prior: P(Y = y) = \\frac{\\text{Count}(Y = y) + 1}{\\sum_{y'} \\text{Count}(Y = y') + C} Observation Distribution P(X_i = x \\mid Y = y) = \\frac{\\text{Count}(X_i = x, Y = y) + 1}{\\sum_{x'} \\text{Count}(X_i = x', Y = y) + S}Here, $C$ is the number of classes, $S$ is the number of possible values that $X_i$ can take. Learning & Decision on BNBayesian NetworkBN$(G, \\Theta)$: a Bayesian network $G$ is a DAG with nodes and directed edges. Each node represents a random variable. Each edge represents a causal relationship/dependency. $\\Theta$ is the network parameters that constitute conditional probabilities. For a node $t$, its parameters are represented as $p(x_t \\mid x_{\\text{pa}(t)})$. Joint probability of BN: p(x) = \\prod_{t=1}^{n} p(x_t \\mid x_{\\text{pa}(t)})where $\\text{pa}(t)$ is the set of all parent nodes of node $t$. \\begin{aligned} \\begin{array}{ccc} & D \\\\ & \\downarrow \\\\ & A \\rightarrow B \\rightarrow C \\end{array} \\end{aligned} P(A, B, C, D) = P(A) P(D) P(B \\mid A, D) P(C \\mid B)Learning on Bayesian NetworkNotation: Suppose BN has $n$ nodes, we use $\\text{pa}(t)$ to denote the parent nodes of $t$ $(t = 1, \\ldots, n)$ By the conditional independence of BN, we have p(D \\mid \\Theta) = \\prod_{i=1}^{N} p(x_i \\mid \\Theta) = \\prod_{i=1}^{N} \\prod_{t=1}^{n} p(x_{i,t} \\mid x_{i,\\text{pa}(t)}, \\theta_t) = \\prod_{t=1}^{n} \\prod_{i=1}^{N} p(D_{i,t} \\mid \\theta_t) p(\\Theta) = \\prod_{t=1}^{n} p(\\theta_t)Thus, the posterior becomes: p(\\Theta \\mid D) \\sim \\prod_{t=1}^{n} p(D_t \\mid \\theta_t) p(\\theta_t) p(\\theta \\mid D) \\sim \\prod_{t=1}^{n} \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc}) \\cdot p(\\theta_{tc})Learning BN with Categorical DistributionConsider a case where each probability distribution in BN is categorical, In this case, we can model the conditional distribution of node $t$ as(We use a scalar value $c$ to represent parent nodes’ states for simplicity.): P(x_t = k \\mid x_{\\text{pa}(t)} = c) = \\theta_{tck}and the conditional probability of node $t$ can be denoted as: \\theta_{tc} = [\\theta_{tc1}, \\theta_{tc2}, \\ldots, \\theta_{tcK_t}], \\quad \\sum_{k=1}^{K_t} \\theta_{tck} = 1Categorical Distribution: p = [\\theta_1, \\theta_2, \\ldots, \\theta_d], \\quad \\theta_i \\geq 0, \\quad \\sum_{i} \\theta_i = 1E.g., toss a coin $(d = 2)$, roll a die $(d = 6)$ Count the training samples where $x_t = k, x_{\\text{pa}(t)} = c$: N_{tck} = \\sum_{i=1}^{N} I(x_{i,t} = k, x_{i,\\text{pa}(t)} = c)According to the property of categorical distribution, we can represent the likelihood function as: p(D_t \\mid \\theta_t) = \\prod_{c=1}^{q_t} \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck}} = \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc})Thus the posterior can be further factorized: p(\\theta \\mid D) \\sim \\prod_{t=1}^{n} p(D_t \\mid \\theta_t)p(\\theta_t) = \\prod_{t=1}^{n} \\prod_{c=1}^{q_t} p(D_{tc} \\mid \\theta_{tc})p(\\theta_{tc})Notation: $D_{tc}$ are the sample set where the value of $x_{\\text{pa}(t)}$ is $c$ $q_t$ is the number of possible values of $x_{\\text{pa}(t)}$ $K_t$ is the number of possible values of $x_t$ How to choose the probability distribution function for the prior $p(\\theta_{tc})$? It would be highly convenient if the posterior shares the same form as the prior. Conjugate Prior: A prior distribution is called a conjugate prior for a likelihood function if the posterior distribution is in the same probability distribution family as the prior. The conjugate prior for the categorical distribution is the Dirichlet distribution: Choosing the prior as conjugate prior — Dirichlet distribution: p(\\theta_{tc}) \\propto \\prod_{k=1}^{K_t} \\theta_{tck}^{\\alpha_{tck} - 1}$\\alpha_{tck}$ are integers and are the hyperparameters of BN model. In this case, the posterior can be easily derived as: p(D_{tc} \\mid \\theta_{tc}) p(\\theta_{tc}) \\propto \\left( \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck}} \\right) * \\left( \\prod_{k=1}^{K_t} \\theta_{tck}^{\\alpha_{tck} - 1} \\right) = \\prod_{k=1}^{K_t} \\theta_{tck}^{N_{tck} + \\alpha_{tck} - 1}We can then derive an estimate of $\\theta_{tck}$ by calculating the expectation: \\hat{\\theta}_{tck} = E(\\theta_{tck}) = \\frac{N_{tck} + \\alpha_{tck}}{\\sum_{k'} (N_{tck'} + \\alpha_{tck'})}K-Means Algorithm Initalize cluster centers $\\mu_1, \\cdots, \\mu_k$ randomly. Repeat until no change of cluster assignment Assignment step: Assign data points to closest cluster center C_k \\leftarrow \\set{n \\mid x_n \\text{ is closest to } \\mu_k} Update Step: Change the cluster center to the average of its assigned points \\mu_k \\leftarrow \\frac{1}{|C_k|} \\sum_{n \\in C_k} x_n Optimization View of K-MeansOptimization Objective: within-cluster sum of squares (WCSS) \\min_{\\mu, r} J_e = \\sum_{k=1}^{K} \\sum_{n=1}^{N} r_{n,k} \\| x_n - \\mu_k \\|^2Step 1: Fix $\\mu$, optimize $r$ r_{n,k^*} = 1 \\quad \\Leftrightarrow \\quad k^* = \\arg\\min_k \\| x_n - \\mu_k \\|Step 2: Fix $r$, optimize $\\mu$ \\mu_k^* = \\frac{\\sum_{n} r_{n,k} x_n}{\\sum_{n} r_{n,k}} = \\frac{1}{|C_k|} \\sum_{n \\in C_k} x_iRule of Thumbs for initializing k-means Random Initialization: Randomly generate 𝑘 points in the space. Random Partition Initialization: Randomly group the data into 𝑘 clusters anduse their cluster center to initialize the algorithm. Forgy Initialization: Randomly select 𝑘 samples from the data. K-Means++: Iteratively choosing new centroids that are farthest from the existingcentroids. How to tell the right number of clusters?We find the elbow point of the $J_e$ image. EM Algorithm for Gaussian Mixture Model (GMM)Multivariate Gaussian Distribution$d$-dimensional Multivariate Gaussian: N(x \\mid \\mu, \\Sigma) = \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x - \\mu)^T \\Sigma^{-1} (x - \\mu) \\right) $\\mu \\in \\mathbb{R}^d$ the mean vector $\\Sigma \\in \\mathbb{R}^{d \\times d}$ the covariance matrix MLE of Gaussian DistributionThe likelihood function of a given dataset $X = \\{x_1, x_2, \\ldots, x_N\\}$: p(X \\mid \\mu, \\Sigma) = \\prod_{n=1}^{N} p(x_n \\mid \\mu, \\Sigma) = \\prod_{n=1}^{N} \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu) \\right)The maximum likelihood estimation (MLE) of the parameters is defined by: \\mu^*, \\Sigma^* = \\arg\\max_{\\mu, \\Sigma} \\mathcal{L}(\\mu, \\Sigma) \\mathcal{L}(\\mu, \\Sigma) = \\log p(X \\mid \\mu, \\Sigma) = \\frac{N}{2} \\log |\\Sigma| - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu)The optimization problem of maximum likelihood estimation (MLE): \\max_{\\mu, \\Sigma} \\mathcal{L}(\\mu, \\Sigma) = \\frac{N}{2} \\log |\\Sigma| - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)^T \\Sigma^{-1} (x_n - \\mu)Solve the optimization by taking the gradient: 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\mu} = \\sum_{n=1}^{N} \\Sigma^{-1} (x_n - \\mu) \\quad \\Rightarrow \\quad \\mu^* = \\frac{1}{N} \\sum_{n=1}^{N} x_n \\quad \\text{(Sample Mean)} 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\Sigma^{-1}} = \\frac{N}{2} \\Sigma - \\frac{1}{2} \\sum_{n=1}^{N} (x_n - \\mu)(x_n - \\mu)^T \\quad \\Rightarrow \\quad \\Sigma^* = \\frac{1}{N} \\sum_{n=1}^{N} (x_n - \\mu^*)(x_n - \\mu^*)^T \\quad \\text{(Sample Covariance)}Gaussian Mixture Model (GMM)A Gaussian Mixture Model (GMM) is the weighted sum of a family of Gaussians whose density function has the form: p(x \\mid \\pi, \\mu, \\Sigma) = \\sum_{k=1}^{K} \\pi_k N(x \\mid \\mu_k, \\Sigma_k) Each Gaussian $N(\\mu_k, \\Sigma_k)$ is called a component of GMM. Scalars $\\{\\pi_k\\}_{k=1}^{K}$ are referred to as mixing coefficients, which satisfy \\sum_{k=1}^{K} \\pi_k = 1This condition ensures $p(x \\mid \\pi, \\mu, \\Sigma)$ is indeed a density function. Soft Clustering with Mixture Model p(z = k) = \\pi_k, \\quad p(x \\mid z) = N(x \\mid \\mu_z, \\Sigma_z)By Bayes Rule, the posterior probability of $z$ given $x$ is: \\gamma_k \\overset{\\Delta}{=} p(z = k \\mid x) = \\frac{p(z = k, x)}{p(x)} = \\frac{\\pi_k N(x \\mid \\mu_k, \\Sigma_k)}{\\sum_{j=1}^{K} \\pi_j N(x \\mid \\mu_j, \\Sigma_j)}We call $\\gamma_k$ the responsibility of the $k$-th component on the data $x$. Probabilistic Clustering: each data point is assigned a probability distribution over the clusters. “$x$ belongs to the $k$-th cluster with probability $\\gamma_k$” MLE for Gaussian Mixture ModelLog-likelihood function of GMM \\log p(X \\mid \\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)Maximum Likelihood Estimation \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)subject to: \\sum_{k=1}^{K} \\pi_k = 1Optimality Condition for $\\mu$ N(x \\mid \\mu, \\Sigma) = \\frac{1}{(2\\pi)^{d/2} |\\Sigma|^{1/2}} \\exp \\left( -\\frac{1}{2} (x - \\mu)^T \\Sigma^{-1} (x - \\mu) \\right), \\frac{\\partial x^T A x}{\\partial x} = (A + A^T) x \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right)Take partial derivative with respect to $\\mu_k$, 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\mu_k} = -\\sum_{n=1}^{N} \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)} \\Sigma_k^{-1} (x_n - \\mu_k)Notice that the posterior of $z_n$ (also known as responsibility $\\gamma_{n,k}$) can be written as \\gamma_{n,k} \\overset{\\Delta}{=} p(z_n = k \\mid x_n) = \\frac{p(z_n = k) p(x_n \\mid z_n = k)}{\\sum_j p(z_n = j) p(x_n \\mid z_n = j)} = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}Thus 0 = \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k) \\mu_k = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} x_n, \\text{ where } N_k = \\sum_{n=1}^{N} \\gamma_{n,k}Optimality Condition for $\\Sigma$ \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right) \\gamma_{n,k} = p(z_n = k \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}, \\quad N_k \\overset{\\Delta}{=} \\sum_{n=1}^{N} \\gamma_{n,k}Similarly, take derivative with respect to $\\Sigma_k$, which yields 0 = \\frac{\\partial \\mathcal{L}}{\\partial \\Sigma_k} \\quad \\Rightarrow \\quad \\Sigma_k = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k)(x_n - \\mu_k)^TResponsibility-reweighted Sample Covariance Optimality Condition for $\\pi$ \\max_{\\pi, \\mu, \\Sigma} \\mathcal{L}(\\pi, \\mu, \\Sigma) = \\sum_{n=1}^{N} \\log \\left( \\sum_{k=1}^{K} \\pi_k N(x_n \\mid \\mu_k, \\Sigma_k) \\right) \\gamma_{n,k} = p(z_n = k \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)}, \\quad N_k \\overset{\\Delta}{=} \\sum_{n=1}^{N} \\gamma_{n,k}Constraints of mixing coefficients $\\pi$: $\\sum_{k=1}^{K} \\pi_k = 1$ Introduce Lagrange multiplier: \\mathcal{L}' = \\mathcal{L} + \\lambda \\left( \\sum_{k=1}^{K} \\pi_k - 1 \\right)Take derivative with respect to $\\pi_k$, which gives 0 = \\frac{\\partial \\mathcal{L}'}{\\partial \\pi_k} \\quad \\Rightarrow \\quad \\sum_{n=1}^{N} \\frac{\\gamma_{n,k}}{\\pi_k} + \\lambda = \\frac{N_k}{\\pi_k} + \\lambda \\quad \\Rightarrow \\quad \\pi_k = \\frac{-N_k}{\\lambda}By the constraints, we have $1 = \\sum_{k=1}^{K} \\pi_k = \\frac{-1}{\\lambda} \\sum_{k=1}^{K} N_k$, Also notice that \\sum_{k=1}^{K} N_k = \\sum_{k=1}^{K} \\sum_{n=1}^{N} \\gamma_{n,k} = \\sum_{n=1}^{N} \\sum_{k=1}^{K} \\gamma_{n,k} = \\sum_{n=1}^{N} 1 = NTherefore, \\lambda = -\\sum_{k=1}^{K} N_k = -N, \\quad \\pi_k = \\frac{N_k}{N}Expectation-Maximization (EM) Algorithm Initialize $\\pi_k, \\mu_k, \\Sigma_k, \\quad k = 1, 2, \\ldots, K$ E-Step: Evaluate the responsibilities using the current parameter values \\gamma_{n,k} = p(z_n = 1 \\mid x_n) = \\frac{\\pi_k N(x_n \\mid \\mu_k, \\Sigma_k)}{\\sum_j \\pi_j N(x_n \\mid \\mu_j, \\Sigma_j)} M-Step: Re-estimate the parameters using the current responsibilities \\mu_k^{\\text{new}} = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} x_n \\Sigma_k^{\\text{new}} = \\frac{1}{N_k} \\sum_{n=1}^{N} \\gamma_{n,k} (x_n - \\mu_k^{\\text{new}})(x_n - \\mu_k^{\\text{new}})^T \\pi_k^{\\text{new}} = \\frac{N_k}{N}where $N_k = \\sum_{n=1}^{N} \\gamma_{n,k}$ Return to step 2 if the convergence criterion is not satisfied. Hierarchical ClusteringDistance Function: The distance function affects which pairs of clusters are merged/split and in what order. Single Linkage: d(C_i, C_j) = \\min_{x \\in C_i, y \\in C_j} d(x, y) Complete Linkage: d(C_i, C_j) = \\max_{x \\in C_i, y \\in C_j} d(x, y) Average Linkage: d(C_i, C_j) = \\frac{1}{|C_i| \\cdot |C_j|} \\sum_{x \\in C_i, y \\in C_j} d(x, y)Two Types of Hierarchical Clustering Bottom-Up (Agglomerative) Start with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Top-Down (Divisive) Start with one all-inclusive cluster, consider every possible way to divide the cluster in two. Choose the best division and recursively operate on both sides. Agglomerative (Bottom-up) Clustering Input: cluster distance measure $d$, dataset $X = \\{x_n\\}_{n=1}^{N}$, number of clusters $k$ Initialize $\\mathcal{C} = \\{C_i = \\{x_n\\} \\mid x_n \\in X\\}$ // Each point in separate cluster Repeat: Find the closest pair of clusters $C_i, C_j \\in \\mathcal{C}$ based on distance metric $d$ $C_{ij} = C_i \\cup C_j$ // Merge the selected clusters $\\mathcal{C} = (\\mathcal{C} \\setminus \\{C_i, C_j\\}) \\cup \\{C_{ij}\\}$ // Update the clustering Until $|\\mathcal{C}| = k$ A naïve implementation takes space complexity $O(N^2)$, time complexity $O(N^3)$. LASSO RegressionLASSO (Least Absolute Shrinkage and Selection Operator): Simply linear regression with an $\\ell_1$ penalty for sparsity L(w) = \\sum_{i=1}^{n} \\left( w^T x_i - y_i \\right)^2 + C \\|w\\|_1sparse solution $\\leftrightarrow$ feature selection Principal Component Analysis (PCA)Computing PCA: Eigenvalue DecompositionObjective: Maximize variance of projected data \\max_{\\mathbf{u}_j} \\mathbb{E}[(\\mathbf{u}_j^T \\mathbf{x})^2]subject to $\\mathbf{u}_j^T \\mathbf{u}_j = 1$, $\\mathbf{u}_j^T \\mathbf{u}_k = 1$, $k < j$ Observation: PC $j$ is direction of the $j$-th largest eigenvector of $\\frac{1}{n} \\mathbf{X}^T \\mathbf{X}$ Eigenvalue Decomposition: \\mathbf{U} = \\begin{pmatrix} \\mathbf{u}_1 & \\cdots & \\mathbf{u}_k \\\\ \\end{pmatrix}are eigenvectors of $\\frac{1}{n} \\mathbf{X}^T \\mathbf{X}$ Manifold LearningGeodesic distance: lines of shortest length between points on a manifold Classical Neural NetworksForward propagation \\begin{aligned} \\mathbf{a}^{(1)} &= \\mathbf{x} \\newline \\mathbf{z}^{(2)} &= \\Theta^{(1)} \\mathbf{a}^{(1)} \\newline \\mathbf{a}^{(2)} &= g(\\mathbf{z}^{(2)}) \\quad [\\text{append } a_0^{(2)}] \\newline \\mathbf{z}^{(3)} &= \\Theta^{(2)} \\mathbf{a}^{(2)} \\newline \\mathbf{a}^{(3)} &= g(\\mathbf{z}^{(3)}) \\quad [\\text{append } a_0^{(3)}] \\newline \\mathbf{z}^{(4)} &= \\Theta^{(3)} \\mathbf{a}^{(3)} \\newline \\mathbf{a}^{(4)} &= h_\\Theta(\\mathbf{x}) = g(\\mathbf{z}^{(4)}) \\end{aligned}Backpropagation: Gradient ComputationApply the chain rule to compute gradients. Summary of backpropagation: \\delta^{(4)} = \\frac{\\partial J(\\Theta)}{\\partial \\mathbf{z}^{(4)}} = \\mathbf{a}^{(4)} - \\mathbf{y} \\delta^{(3)} = \\frac{\\partial J(\\Theta)}{\\partial \\mathbf{z}^{(3)}} = (\\Theta^{(3)})^T \\delta^{(4)} \\ast g'(\\mathbf{z}^{(3)}) \\delta^{(2)} = \\frac{\\partial J(\\Theta)}{\\partial \\mathbf{z}^{(2)}} = (\\Theta^{(2)})^T \\delta^{(3)} \\ast g'(\\mathbf{z}^{(2)})(No $\\delta^{(1)}$) Based on $\\delta^{(l)}$, $\\frac{\\partial J(\\Theta)}{\\partial \\Theta^{(l)}}$ can be computed as: \\frac{\\partial J(\\Theta)}{\\partial \\Theta^{(l)}} = \\delta^{(l+1)} (\\mathbf{a}^{(l)})^T \\quad (l = 1, 2, 3)For example, the activation function $g(x)$ is sigmoid, i.e., $g(x) = \\frac{1}{1+e^{-x}}$ and $g’(x) = g(x)(1 - g(x))$. For example, $J(\\Theta)$ is the cross-entropy loss for binary classification, i.e., J(\\Theta) = -(1 - y) \\log(1 - h_\\Theta(\\mathbf{x})) - y \\log(h_\\Theta(\\mathbf{x}))and J'(\\Theta) = \\frac{h_\\Theta(\\mathbf{x}) - y}{h_\\Theta(\\mathbf{x})(1 - h_\\Theta(\\mathbf{x}))}Optimization of Deep NetworksVanilla Gradient DescentCore: Compute the gradient of the loss function $g_t = \\nabla \\mathcal L$ on all training samples Stochastic Gradient DescentCore: Select a sample $(\\mathbf x_i, y_i)$ from the training set and compute the gradient of the loss function $g_t = \\nabla \\mathcal L$ on the selected sample. Mini-batch Gradient DescentCore Randomly select $b$ samples $\\{(\\mathbf x_i, y_i)\\}_{i \\in [1, n]}$ and compute the gradient of the loss function $g_t = \\nabla \\mathcal L$ on the selected sample. Gradient Descent with Momentum \\Theta_{t+1} = \\Theta_t + \\mathbf{v}_twhere \\mathbf{v}_t = \\beta \\mathbf{v}_{t-1} - \\alpha \\nabla_\\Theta \\mathcal{L}The momentum term ($\\mathbf{v}_t$) accumulates the gradients from the past several steps. Adaptative Gradient (AdaGrad)Particularly, it tends to assign higher learning rates to infrequent features, which ensures that the parameter updates rely less on frequency and more on relevance. In AdaGrad, the parameters are updated as: \\Theta_{t+1} = \\Theta_t - \\frac{\\alpha}{\\sqrt{r_t} + \\epsilon} .* \\mathbf{g}_twhere r_t = r_{t-1} + \\mathbf{g}_t .* \\mathbf{g}_t$\\epsilon$ is a small number to ensure numerical stability. Here, $.*$ is the element-wise product, and $\\mathbf{g}_t = \\nabla \\mathcal{L}(\\Theta_t)$. Root Mean Square Propagation (RMSProp)RMSProp changes the gradient accumulation in AdaGrad into an exponentially weighted moving average. This method uses an exponentially decaying average to discard history from the extreme past so that it can converge rapidly after finding a convex bowl, as if it were an instance of the AdaGrad algorithm initialized within that bowl. The update rule is denoted as: \\Theta_{t+1} = \\Theta_t - \\frac{\\alpha}{\\sqrt{r_t} + \\epsilon} .* \\mathbf{g}_twhere r_t = \\beta r_{t-1} + (1 - \\beta) \\mathbf{g}_t .* \\mathbf{g}_t$r$ is the moving average of squared gradients, $\\beta$ is the decay rate. Adaptive Moment Estimation (Adam)Adam extends the RMSProp method by making use of first moments of gradients, instead of second moment only in RMSProp. Adam can be seen as a variant of combination of RMSProp and Momentum with a few distinctions: First-order Moment: s_t = \\beta_1 s_{t-1} + (1 - \\beta_1) g_tSecond-order Moment: r_t = \\beta_2 r_{t-1} + (1 - \\beta_2) g_t .* g_tConsidering the first-order moment $s_t = \\beta_1 s_{t-1} + (1 - \\beta_1) g_t$, we start by initializing $s_0 = 0$, then: \\begin{aligned} s_1 &= \\beta_1 s_0 + (1 - \\beta_1) g_1 = (1 - \\beta_1) g_1 \\\\ s_2 &= \\beta_1 s_1 + (1 - \\beta_1) g_2 = \\beta_1 (1 - \\beta_1) g_1 + (1 - \\beta_1) g_2 \\\\ s_3 &= \\beta_1 s_2 + (1 - \\beta_1) g_3 = \\beta_1 [ \\beta_1 (1 - \\beta_1) g_1 + (1 - \\beta_1) g_2 ] + (1 - \\beta_1) g_3 \\\\ s_t &= (1 - \\beta_1) \\sum_{i=0}^t (\\beta_1^i) g_i \\end{aligned}Note that we initialized $s_0 = 0$, this causes significant amount of bias initially towards smaller values. We can use the fact that $\\sum_{i=0}^{t-1} \\beta_1^i = \\frac{1 - \\beta_1^t}{1 - \\beta_1}$ to re-normalize the terms, and get: \\hat{s}_t = \\frac{s_t}{1 - \\beta_1^t}The same method can be performed in the second-order moments, we get: \\hat{r}_t = \\frac{r_t}{1 - \\beta_2^t}Finally, Adam combines the bias-corrected first and second-order moments and updates the parameters as: \\theta_{t+1} = \\theta_t - \\alpha \\frac{\\hat{s}_t}{\\sqrt{\\hat{r}_t + \\epsilon}}Convolutional Neural NetworksConvolution LayerThe convolution operator preserves the spatial structure of image. Different filters extract different features from the original image. Pooling LayerThe pooling layer is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. Commonly used pooling operations: Max Pooling and Average Pooling Batch Normalization LayerBatch Normalization alleviate the problem of gradient vanishing. Fully Connected Layer (FC) or Dense LayerThe fully connected layer operates on a flattened input where each input isconnected to all neurons. If present, FC layers are usually found towards the end of CNN architectures andcan be used to optimize objectives such as class scores Word EmbeddingBuild a dense vector for each word, chosen so that it is similar to vectors ofwords that appear in similar contexts, measuring similarity as the vector dotproduct, such a representation is called work embedding or word vector. Word2VecIdea: We have a large corpus (“body”) of text: a long list of words. Every word in a fixed vocabulary is represented by a vector. Go through each position t in the text, which has a center word $c$ and context (“outside”) words $o$. Use the similarity of the word vectors for $c$ and $o$ to calculate the probability of $o$ given $c$. Keep adjusting the word vectors to maximize this probability. Word2vec: Objective Function For each position $t = 1, \\ldots, T$, predict context words within a window of fixed size $m$, given center word $w_t$. In our case, each word is represented as a parameter vector $\\theta_i$. $\\theta = [\\theta_1, \\ldots, \\theta_V]$ represents all the parameters of $V$-many words. The objective function $J(\\theta)$ is the (average) negative log likelihood: J(\\theta) = -\\frac{1}{T} \\log L(\\theta) = -\\frac{1}{T} \\sum_{t=1}^{T} \\sum_{-m \\le j \\le m, j \\ne 0} \\log P(w_{t+j} | w_t; \\theta)Word2vec: Prediction Function How to calculate $P(w_{t+j} | w_t; \\theta)$? Softmax function We will use two vectors per word $w$ $v_w$ when $w$ is a center word; $u_w$ when $w$ is a context word. $V$: the set of all possible words P(o | c) = \\frac{\\exp(u_o^T v_c)}{\\sum_{w \\in V} \\exp(u_w^T v_c)} Dot product compares similarity of $o$ and $c$. Exponentiation makes anything positive. Normalize over entire vocabulary to give probability distribution. Language ModelingA language model is a probability distribution over sequences of words, e.g., predicting what word comes next. A system that does this is called a Language Model. Recurrent Neural Network(RNN)Need a neural network that can process any length input? Apply the same weights repeatedly. Training RNNGet a big corpus of text which is sequences of words Sample a (batch of) sequence of length ( T ) into RNN-LM; compute output distribution ( \\hat{y}^{(t)} ) for every step ( t ), i.e., predict probability distribution of every word, given words so far. Average this to get overall loss for a sentence (actually, a batch of sentences): J(\\theta) = \\frac{1}{T} \\sum_{t=1}^{T} J^{(t)}(\\theta)Backpropagation for RNNs: backpropagation through time Apply the multivariable chain rule: \\frac{\\partial J}{\\partial \\mathbf{W}_h} = \\sum_{i=1}^{t} \\frac{\\partial J^{(t)}}{\\partial \\mathbf{W}_h^{(i)}} = \\sum_{i=1}^{t} \\frac{\\partial J^{(t)}}{\\partial \\mathbf{W}_h^{(i)}}The gradient w.r.t. a repeated weight is the sum of the gradient w.r.t. each time it appears. Long Short-Term Memory RNNThe key to LSTMs is the cell state, the hidden state stores short-term information; the cell stores long-term information. The cell runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged. LSTM – Forget GateGating mechanisms: Control which information is erased/written/read from the cell On each timestep, each element of the gates can be open (1), closed (0), or somewhere in-between (Sigmoid function). The gates are dynamic; their value is computed based on the current context. Forget gate: Controls what is forgotten from the previous cell state f^{(t)} = \\sigma (W_f h^{(t-1)} + U_f x^{(t)} + b_f)LSTM – Input GateNew cell content:The content that will be added to the cell state \\tilde{c}^{(t)} = \\tanh (W_c h^{(t-1)} + U_c x^{(t)} + b_c)Input gate: Controls what parts of the new cell content are written to the cell i^{(t)} = \\sigma (W_i h^{(t-1)} + U_i x^{(t)} + b_i)LSTM – Update Cell StateCell state: Forget some content from the last cell state; input some new content to the cell state c^{(t)} = f^{(t)} \\odot c^{(t-1)} + i^{(i)} \\odot \\tilde{c}^{(t)}LSTM – Output GateOutput gate: Controls what parts of cell content are output to hidden state o^{(t)} = \\sigma (W_o h^{(t-1)} + U_o x^{(t)} + b_o)Hidden state: Output some content from the cell as hidden state h^{(t)} = o^{(t)} \\odot \\tanh c^{(t)} TransformersAttention Attention: directly model relationships between any two positions in the input sequence, regardless of their distance. Let the sequence be $w_{1:n}$. For each word $w_i$, let $x_i$ be its word embedding. Each word vector (embedding) is transformed into three vectors query, key, value. q_i = Qx_i k_i = Kx_i v_i = Vx_iMatrices (Q, K, V) are learnable parameters. o_i = \\sum_{j'} \\alpha_{ij} v_j e_{ij} = q_i^T k_j \\alpha_{ij} = \\frac{\\exp(e_{ij})}{\\sum_{j'} \\exp(e_{ij'})}Transformer EncoderPosition encodingSince self-attention doesn’t build the order information, we need to encode the order of the sentence in word embeddings. Consider representing each sequence index as $p_i \\in \\mathbb{R}^d$ \\tilde{x}_i = x_i + p_i$x_i$ is word embedding, $\\tilde{x}_i$ is positioned word embedding. $p_i$ can be a sinusoidal function or learnable parameters. Multi-head attentionDefine multiple attention “heads” through multiple $Q$, $K$, $V$ matrices.Each attention head performs attention independently.Then the outputs of all the heads are combined. Residual connectionA trick from ResNet to help models train better. Layer normalizationA trick to stabilize the training.$\\mu, \\sigma$ is the mean and standard deviation of $x \\in \\mathbb{R}^d$ o = \\frac{x - \\mu}{\\sqrt{\\sigma^2 + \\epsilon}} * \\gamma + \\betaFeed-forward networkSelf-attention is just the linear combination of values.Feed-forward network introduces nonlinearity. o = W_2 * \\text{ReLu}(W_1 * x + b_1) + b_2Transformer DecoderMasked attention: Mask out attention to future words by setting attention scores to $-\\infty$ e_{ij} = q_i^T k_j \\quad \\Rightarrow \\quad e_{ij} = \\begin{cases} q_i^T k_j & \\text{if } j \\leq i \\\\ -\\infty & \\text{if } j > k \\end{cases}For any current word $i$ and future word $j$ ($i < j$), we have the attention weight \\alpha_{ij} = \\frac{\\exp(e_{ij})}{\\sum_{j'} \\exp(e_{ij'})} = 0Cross-attention: The queries are drawn from the decoder, the keys and values are drawn from the encoder. Establish a connection between the source (input) and target (output) sequences. q_i = Q x_{\\text{decoder}} k_i = K x_{\\text{encoder}} v_i = V x_{\\text{encoder}} Variational Autoencoder(VAE)Let’s turn the autoencoder into a probabilistic model. The encoder encodes the input data into a distribution of the latent space instead of a single point in latent space. q_{\\phi}(z | x) = \\mathcal{N} \\left(z; \\mu_{\\phi}(x), \\sigma_{\\phi}^2(x) \\right)The decoder maps any latent code to a meaningful data distribution p_{\\theta}(x | z) = \\mathcal{N} \\left(x; \\mu_{\\theta}(z), \\Sigma_{\\theta}(z) \\right)VAE: Generative ProcessWE assume each data point is generated by the following two steps: Sample latent variable $z$ from its prior distribution $p(z)$ Generate $x$ by the conditional model $p_{\\theta}(x \\mid z)$ The prior distribution $p(z)$ is usually simple, say $p(z) \\sim N(0, I)$ p_{\\theta}(x) = \\int p(z) p_{\\theta}(x | z) \\, dzThe likelihood $p_{\\theta}(x)$ is intractable. However, with the help of an encoder $q_{\\phi}(z | x)$, we can obtain a tractable lower bound of the likelihood. q_{\\phi}(z | x) = \\mathcal{N} \\left(z; \\mu_{\\phi}(x), \\sigma_{\\phi}^2(x) \\right)The model can be trained by maximizing this lower bound. \\log p_{\\theta}(x) = \\log \\int p_{\\theta}(x, z) \\, dz = \\log \\int \\frac{p_{\\theta}(x, z)}{q_{\\phi}(z | x)} \\cdot q_{\\phi}(z | x) \\, dz = \\log \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\frac{p_{\\theta}(x, z)}{q_{\\phi}(z | x)} \\right] \\geq \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{p_{\\theta}(x, z)}{q_{\\phi}(z | x)} \\right]The Evidence Lower Bound (ELBO) L_{ELBO}(x, \\theta, \\phi) \\triangleq \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{p_{\\theta}(x, z)}{q_{\\phi}(z | x)} \\right]The formula for the Kullback-Leibler (KL) divergence: The KL divergence from distribution $Q$ to distribution $P$ is defined as: D_{KL}(P \\| Q) = \\int_{-\\infty}^{\\infty} p(x) \\log \\frac{p(x)}{q(x)} \\, dxFor discrete probability distributions, it is defined as: D_{KL}(P \\| Q) = \\sum_{i} P(i) \\log \\frac{P(i)}{Q(i)}In the context of variational autoencoders, where $q_{\\phi}(z|x)$ is the approximate posterior and $p_{\\theta}(z|x)$ is the true posterior, the KL divergence is given by: D_{KL}(q_{\\phi}(z|x) \\| p_{\\theta}(z|x)) = \\mathbb{E}_{z \\sim q_{\\phi}(z|x)} \\left[ \\log \\frac{q_{\\phi}(z|x)}{p_{\\theta}(z|x)} \\right]Notice that D_{KL} \\left( q_{\\phi}(z | x) \\| p_{\\theta}(z | x) \\right) = \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{q_{\\phi}(z | x)}{p_{\\theta}(z | x)} \\right] = \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{q_{\\phi}(z | x) p_{\\theta}(x)}{p_{\\theta}(x, z)} \\right] = \\log p_{\\theta}(x) + \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{q_{\\phi}(z | x)}{p_{\\theta}(x, z)} \\right] = \\log p_{\\theta}(x) - L_{ELBO} \\Rightarrow \\log p_{\\theta}(x) = L_{ELBO}(x, \\theta, \\phi) + D_{KL} \\left( q_{\\phi}(z | x) \\| p_{\\theta}(z | x) \\right)The gap between log-likelihood and ELBO = distance between true posterior and $q_{\\phi}$ Encoder and decoder are jointly trained to maximize the evidence lower bound. \\theta^*, \\phi^* = \\arg \\max_{\\theta, \\phi} \\sum_{i=1}^{N} L_{ELBO}(x_i, \\theta, \\phi)This type of DGM is called Variational Autoencoder (VAE). Further Analysis for ELBOELBO is tractable and differentiable. \\log p_{\\theta}(x) \\geq L_{ELBO}(x, \\theta, \\phi) = \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log \\frac{p_{\\theta}(x, z)}{q_{\\phi}(z | x)} \\right] = \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log p_{\\theta}(x | z) + \\log p(z) - \\log q_{\\phi}(z | x) \\right] = \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log p_{\\theta}(x | z) \\right] - D_{KL} \\left( q_{\\phi}(z | x) \\| p(z) \\right)Reconstruction Loss: \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log p_{\\theta}(x | z) \\right]The reconstruction error of sending $x$ through the encoder and decoder. Prior Regularization: D_{KL} \\left( q_{\\phi}(z | x) \\| p(z) \\right)Make approximate posterior close to prior. Prior RegularizationMake approximate posterior distribution closer to prior. It prevents the “overfitting” of the autoencoder. Suppose a standard Gaussian prior and a Gaussian decoder: p(z) \\sim \\mathcal{N}(0, 1) \\quad q_{\\phi}(z | x) = \\mathcal{N} \\left(z; \\mu_{\\phi}(x), \\sigma_{\\phi}^2(x) \\right)The KL-divergence between Gaussians has closed-form solution: -D_{KL} \\left( q_{\\phi}(z | x) \\| p(z) \\right) = \\frac{1}{2} \\left(1 + \\log \\sigma_{\\phi}^2(x) - \\mu_{\\phi}^2(x) - \\sigma_{\\phi}^2(x) \\right)Reconstruction LossSuppose the encoder and decoder are both Gaussian q_{\\phi}(z | x) = \\mathcal{N} \\left(z; \\mu_{\\phi}(x), \\sigma_{\\phi}^2(x) \\right) \\quad p_{\\theta}(x | z) = \\mathcal{N} \\left(x; \\mu_{\\theta}(z), \\sigma^2 \\right)(Decoder has fixed variance for simplicity) The likelihood function: \\log p_{\\theta}(x | z) = -\\frac{d}{2} \\log \\sigma^2 - \\frac{\\| x - \\mu_{\\theta}(z) \\|^2}{2\\sigma^2} \\text{ (The later term denotes L2 loss between decoder output and the input data) }The reconstruction loss can be approximated via Monte Carlo methods: \\mathbb{E}_{z \\sim q_{\\phi}(z | x)} \\left[ \\log p_{\\theta}(x | z) \\right] \\approx \\frac{1}{K} \\sum_{k=1}^{K} \\log p_{\\theta}(x | z^{(k)}) \\quad \\text{where} \\quad z^{(k)} \\sim q_{\\phi}(z | x) = -C_1 \\sum_{k=1}^{K} \\| x - \\mu_{\\theta}(z^{(k)}) \\|^2 + C_2where $C_1 = \\frac{1}{2 \\sigma^2 K}$ $K$ is the number of MC samples. VAE: Putting togetherTraining Forward Encoder, compute regularization term L_{\\text{reg}} = -D_{KL} \\left( q_{\\phi}(z | x) \\| p(z) \\right) Sample $z \\sim q_{\\phi}(z | x)$ with reparameterization trick. Forward decoder, compute reconstruction term x' \\sim p_{\\theta}(x | z), \\quad L_{\\text{recon}} = -C_1 \\| x - x' \\|^2 Maximize ELBO with gradient ascent L_{ELBO} = L_{\\text{recon}} + L_{\\text{reg}} Sampling Discard the encoder Sample latent from prior z \\sim \\mathcal{N}(0, 1) Forward decoder x' \\sim p_{\\theta}(x | z) Generative Adversarial NetworkA generative adversarial network (GAN) consists of: A discriminator $D(x)$ A generator $G(z)$ $D$ is a binary classifier that tries to discriminate between a sample from the data distribution and a sample from the generator $G$. $G$ tries to “trick” $D$ by generating samples that are hard for $D$ to distinguish from data. Min-max objective: \\min_G \\max_D V(D, G) = \\mathbb E_{x \\sim p_{data}(x)} [ \\log D(x) ] + \\mathbb E_{z \\sim p(z)} [\\log(1 - D(G(z)))] \\begin{aligned} V(D,G)&=\\mathbb{E}_{x\\sim p_{data}(x)}[log~D(x)]+\\mathbb{E}_{z\\sim p(z)}[log(1-D(G(z)))] \\newline &=\\int p_{data}(x)log~D(x)dx+\\int p(z)log(1-D(G(z)))dz \\newline &=\\int p_{data}(x)log~D(x)dx+\\int p_{model}(x)log(1-D(x))dx \\newline &=\\int(p_{data}(x)log~D(x)+p_{model}(x)log(1-D(x)))dx \\newline \\end{aligned}For a fixed generator $G$, the optimal discriminator $D^*$ is D^* = \\arg \\max_D V(D, G) = \\frac{p_{data}(x)}{p_{data}(x) + p_{model}(x)}Now consider the min-max objective: \\begin{aligned} \\min_G \\max_D V(D, G) &= \\min_G V(D^*, G) \\newline &= \\min_G \\mathbb{E}_{x \\sim p_{\\text{data}}(x)}[\\log D_G^*(x)] + \\mathbb{E}_{z \\sim p(z)}[\\log (1 - D_G^*(G(z)))] \\newline &= \\min_G \\mathbb{E}_{x \\sim p_{\\text{data}}(x)}[\\log D_G^*(x)] + \\mathbb{E}_{x \\sim p_{\\text{model}}(x)}[\\log (1 - D_G^*(x))] \\newline &= \\min_G \\mathbb{E}_{x \\sim p_{\\text{data}}(x)}\\left[\\log \\frac{p_{\\text{data}}(x)}{p_{\\text{data}}(x) + p_{\\text{model}}(x)}\\right] + \\mathbb{E}_{x \\sim p_{\\text{model}}(x)}\\left[\\log \\frac{p_{\\text{model}}(x)}{p_{\\text{data}}(x) + p_{\\text{model}}(x)}\\right] \\newline &= \\min_G \\left[ D_{KL}(p_{\\text{data}}(x) \\| \\frac{p_{\\text{data}}(x) + p_{\\text{model}}(x)}{2}) + D_{KL}(p_{\\text{model}}(x) \\| \\frac{p_{\\text{data}}(x) + p_{\\text{model}}(x)}{2}) - \\log 4 \\right] \\newline &= \\min_G \\left[ 2 \\cdot \\text{JSD}(p_{\\text{data}}(x) \\| p_{\\text{model}}(x)) - \\log 4 \\right] \\newline \\end{aligned}Where $\\text{JSD}$ is Jensen-Shannon Divergence: \\text{JSD}(P \\| Q) = \\frac{1}{2} D_{KL}(P \\| M) + \\frac{1}{2} D_{KL}(Q \\| M) M = \\frac{1}{2} (P + Q)Thus we have the Unique global minimum p_{\\text{model}}(x) = p_{\\text{data}}(x)For training we use Gradient ascent on generator, with objective \\max_G \\mathbb E_{z \\sim p(z)} [\\log(D(G(z)))] Diffusion Probabilistic ModelsForward Diffusion ProcessGiven a data point sampled from real distribution $x_0 \\sim q(x_0)$, define a forward diffusion process in which we add small amount of Gaussian noise to the sample in $T$ steps, producing noisy samples $x_1, \\ldots, x_T$. The step sizes are controlled by $\\beta_t$. q(x_t | x_{t-1}) = \\mathcal{N} (x_t; \\sqrt{1 - \\beta_t} x_{t-1}, \\beta_t I) q(x_{1:T} | x_0) = \\prod_{t=1}^{T} q(x_t | x_{t-1})Let $\\alpha_t = 1 - \\beta_t, \\bar{\\alpha}_t = \\prod_{i=1}^{t} \\alpha_t$. Then we have x_t = \\sqrt{\\alpha_t} x_{t-1} + \\sqrt{1 - \\alpha_t} \\epsilon_{t-1} = \\sqrt{\\alpha_t} (\\sqrt{\\alpha_{t-1}} x_{t-2} + \\sqrt{1 - \\alpha_{t-1}} \\epsilon_{t-2}) + \\sqrt{1 - \\alpha_t} \\epsilon_{t-1} = \\sqrt{\\alpha_t \\alpha_{t-1}} x_{t-2} + \\sqrt{\\alpha_t (1 - \\alpha_{t-1})} \\epsilon_{t-2} + \\sqrt{1 - \\alpha_t} \\epsilon_{t-1} = \\ldots = \\sqrt{\\bar{\\alpha}_t} x_0 + \\sqrt{1 - \\bar{\\alpha}_t} \\epsilon \\Rightarrow q(x_t | x_0) = \\mathcal{N} (x_t; \\sqrt{\\bar{\\alpha}_t} x_0, (1 - \\bar{\\alpha}_t) I)Backward Diffusion ProcessCore idea: Learn to map noise to data by reversing the time. To reverse the diffusion process, we need to estimate the reverse conditional probabilities $q(x_{t-1} | x_t)$. Note that if $\\beta_t$ is small enough, $q(x_{t-1} | x_t)$ will also be Gaussian. Learn a model $p_{\\theta}(x_{t-1} | x_t)$ to approximate the reverse process $q(x_{t-1} | x_t)$. p_{\\theta}(x_{0:T}) = p(x_T) \\prod_{t=1}^{T} p_{\\theta}(x_{t-1} | x_t) p_{\\theta}(x_{t-1} | x_t) = \\mathcal{N} (x_{t-1}; \\mu_{\\theta}(x_t, t), \\Sigma_{\\theta}(x_t, t))DPM: Putting TogetherTraining Algorithmrepeat $x_0 \\sim q(x_0)$ $t \\sim \\text{Uniform}(\\{1, \\ldots, T\\})$ $\\epsilon \\sim \\mathcal{N}(0, I)$ Take gradient descent step on \\nabla_{\\theta} \\| \\epsilon - \\epsilon_{\\theta} (\\sqrt{\\bar{\\alpha}_t} x_0 + \\sqrt{1 - \\bar{\\alpha}_t} \\epsilon, t) \\|^2 until converged Sampling Algorithm$x_T \\sim \\mathcal{N}(0, I)$ for $t = T, \\ldots, 1$ do $z \\sim \\mathcal{N}(0, I)$ if $t > 1$ else $z = 0$ x_{t-1} = \\frac{1}{\\sqrt{\\alpha_t}} \\left( x_t - \\frac{1 - \\alpha_t}{\\sqrt{1 - \\bar{\\alpha}_t}} \\epsilon_{\\theta}(x_t, t) \\right) + \\sigma_t z end for return $x_0$ Contrastive Representation LearningWe want a feature extractor $f$ and a score function $S$, such that S(f(x), f(x^+)) \\gg S(f(x), f(x^-))Here, $x$ is the reference sample, $x^+$ is positive sample and $x^-$ is negative sample. Given a chosen score function $S(\\cdot)$, we aim to learn an encoder function $f$ that yields high score for positive pairs $(x, x^+)$ and low scores for negative pairs $(x, x^-)$. L = - \\mathbb{E}_X \\left[ \\log \\frac{\\exp \\left( s(f(x), f(x^+)) \\right)}{\\exp \\left( s(f(x), f(x^+)) \\right) + \\sum_{j=1}^{N-1} \\exp \\left( s(f(x), f(x_j^-)) \\right)} \\right]Commonly known as InfoNCE loss A lower bound on the mutual information between $f(x)$ and $f(x^+)$ I[f(x): f(x^+)] \\geq \\log(N) - LTherefore, the larger total samples $N$ is, the lower loss $L$ is, $f(x)$ and $f(x^+)$ are more correlated.","link":"/2024/10/18/ML-DL/"}],"tags":[{"name":"其他","slug":"其他","link":"/tags/其他/"},{"name":"dp","slug":"dp","link":"/tags/dp/"},{"name":"构造","slug":"构造","link":"/tags/构造/"},{"name":"集训队作业","slug":"集训队作业","link":"/tags/集训队作业/"},{"name":"莫队","slug":"莫队","link":"/tags/莫队/"},{"name":"数据结构","slug":"数据结构","link":"/tags/数据结构/"},{"name":"dijkstra","slug":"dijkstra","link":"/tags/dijkstra/"},{"name":"图论","slug":"图论","link":"/tags/图论/"},{"name":"搜索","slug":"搜索","link":"/tags/搜索/"},{"name":"BFS","slug":"BFS","link":"/tags/BFS/"},{"name":"01trie","slug":"01trie","link":"/tags/01trie/"},{"name":"数论","slug":"数论","link":"/tags/数论/"},{"name":"数学","slug":"数学","link":"/tags/数学/"},{"name":"差分","slug":"差分","link":"/tags/差分/"},{"name":"前缀和","slug":"前缀和","link":"/tags/前缀和/"},{"name":"单调栈","slug":"单调栈","link":"/tags/单调栈/"},{"name":"贪心","slug":"贪心","link":"/tags/贪心/"},{"name":"数列","slug":"数列","link":"/tags/数列/"},{"name":"矩阵加速","slug":"矩阵加速","link":"/tags/矩阵加速/"},{"name":"矩阵快速幂","slug":"矩阵快速幂","link":"/tags/矩阵快速幂/"},{"name":"二分","slug":"二分","link":"/tags/二分/"},{"name":"分类讨论","slug":"分类讨论","link":"/tags/分类讨论/"},{"name":"枚举","slug":"枚举","link":"/tags/枚举/"},{"name":"根号分治","slug":"根号分治","link":"/tags/根号分治/"},{"name":"字符串","slug":"字符串","link":"/tags/字符串/"},{"name":"线段树优化建图","slug":"线段树优化建图","link":"/tags/线段树优化建图/"},{"name":"线段树","slug":"线段树","link":"/tags/线段树/"},{"name":"动态规划","slug":"动态规划","link":"/tags/动态规划/"},{"name":"dfs树","slug":"dfs树","link":"/tags/dfs树/"},{"name":"分块","slug":"分块","link":"/tags/分块/"},{"name":"树","slug":"树","link":"/tags/树/"},{"name":"扫描线","slug":"扫描线","link":"/tags/扫描线/"},{"name":"DFS序","slug":"DFS序","link":"/tags/DFS序/"},{"name":"hash","slug":"hash","link":"/tags/hash/"},{"name":"组合数学","slug":"组合数学","link":"/tags/组合数学/"},{"name":"SA","slug":"SA","link":"/tags/SA/"},{"name":"SAM","slug":"SAM","link":"/tags/SAM/"},{"name":"莫比乌斯反演","slug":"莫比乌斯反演","link":"/tags/莫比乌斯反演/"},{"name":"网络流","slug":"网络流","link":"/tags/网络流/"},{"name":"网络流24题","slug":"网络流24题","link":"/tags/网络流24题/"},{"name":"Engineering Thermodynamics","slug":"Engineering-Thermodynamics","link":"/tags/Engineering-Thermodynamics/"},{"name":"费用流","slug":"费用流","link":"/tags/费用流/"},{"name":"模板","slug":"模板","link":"/tags/模板/"},{"name":"Life","slug":"Life","link":"/tags/Life/"},{"name":"模拟退火","slug":"模拟退火","link":"/tags/模拟退火/"},{"name":"随机化","slug":"随机化","link":"/tags/随机化/"},{"name":"Machine Learning","slug":"Machine-Learning","link":"/tags/Machine-Learning/"},{"name":"树套树","slug":"树套树","link":"/tags/树套树/"},{"name":"树链剖分","slug":"树链剖分","link":"/tags/树链剖分/"},{"name":"NOI","slug":"NOI","link":"/tags/NOI/"},{"name":"ST表","slug":"ST表","link":"/tags/ST表/"},{"name":"堆","slug":"堆","link":"/tags/堆/"},{"name":"树形dp","slug":"树形dp","link":"/tags/树形dp/"},{"name":"斜率优化","slug":"斜率优化","link":"/tags/斜率优化/"},{"name":"状压dp","slug":"状压dp","link":"/tags/状压dp/"},{"name":"概率期望","slug":"概率期望","link":"/tags/概率期望/"},{"name":"link-cut tree","slug":"link-cut-tree","link":"/tags/link-cut-tree/"},{"name":"DFS","slug":"DFS","link":"/tags/DFS/"},{"name":"Ynoi","slug":"Ynoi","link":"/tags/Ynoi/"},{"name":"线段树分治","slug":"线段树分治","link":"/tags/线段树分治/"},{"name":"倍增","slug":"倍增","link":"/tags/倍增/"},{"name":"kruskal重构树","slug":"kruskal重构树","link":"/tags/kruskal重构树/"},{"name":"Deep Learning","slug":"Deep-Learning","link":"/tags/Deep-Learning/"}],"categories":[{"name":"Contest","slug":"Contest","link":"/categories/Contest/"},{"name":"Solution","slug":"Solution","link":"/categories/Solution/"},{"name":"Algorithm","slug":"Algorithm","link":"/categories/Algorithm/"},{"name":"Learning","slug":"Learning","link":"/categories/Learning/"},{"name":"Notes","slug":"Notes","link":"/categories/Notes/"},{"name":"Template","slug":"Template","link":"/categories/Template/"}]}