Skip to content

Commit

Permalink
uwu
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel-Liu-c0deb0t committed Mar 25, 2021
1 parent 418cc84 commit 5283dc5
Show file tree
Hide file tree
Showing 5 changed files with 150 additions and 133 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "uwuify"
version = "0.2.0"
version = "0.2.1"
authors = ["c0deb0t <[email protected]>"]
edition = "2018"
license = "MIT"
Expand Down
9 changes: 2 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,8 @@ this is on crates.io [here](https://crates.io/crates/uwuify)
2. the library is called `uwuifier` (slightly different from the name of the binary!)
use it like so:
```rust
use uwuifier::{uwuify_sse, round_up16};
let s = "hello world";
let b = s.as_bytes();
let mut temp1 = vec![0u8; round_up16(b.len()) * 16];
let mut temp2 = vec![0u8; round_up16(b.len()) * 16];
let res = uwuify_sse(b, &mut temp1, &mut temp2);
assert_eq!(std::str::from_utf8(res).unwrap(), "hewwo wowwd");
use uwuifier::uwuify_str_sse;
assert_eq!(uwuify_str_sse("hello world"), "hewwo wowwd");
```

documentation is [here](https://docs.rs/uwuify/latest/uwuifier/)
Expand Down
237 changes: 116 additions & 121 deletions README_UWU.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ hey... i-i think i weawwy wuv you. OwO do you w-want a headpat?
```
into
```
hey... i-i think i w-weawwy wuv you. OwO (⑅˘꒳˘) d-do you want a headpat?
hey... i-i think i w-weawwy wuv you. (⑅˘꒳˘) (⑅˘꒳˘) d-do you want a headpat?
```

t-thewe's an [uwu'd](weadme_uwu.txt) v-vewsion of this w-weadme
Expand All @@ -17,7 +17,7 @@ t-thewe's an [uwu'd](weadme_uwu.txt) v-vewsion of this w-weadme
u want wawge amounts of text uwu'd in a smow amount of time

### whewe?
uw computew, o.O if i-it has a wecent x86 cpu (intew, (U ﹏ U) amd) that suppowts sse4.1
uw computew, UwU if i-it has a wecent x86 cpu (intew, (///ˬ///✿) amd) that suppowts sse4.1

### why?
why nyot?
Expand All @@ -29,222 +29,217 @@ twdw: 128-bit s-simd vectowization pwus s-some big bwain awgos
<summawy>cwick fow mowe info</summawy>
<p>

aftew houws of weseawch, σωσ i-i've finawwy undewstood the e-essence of uwu'd t-text
aftew houws of weseawch, ( ͡o ω ͡o ) i-i've finawwy undewstood the e-essence of uwu'd t-text

thewe awe a-a few twansfowmations:
1. ʘwʘ n-nyya-ify (eg. (U ﹏ U) `nawuhodo` -> `nyawuhodo`)
2. (ꈍᴗꈍ) wepwace `w` a-and `w` with `w`
3. -.- stuttew sometimes (`hi` -> `h-hi`)
4. o.O a-add a-a text emoji aftew p-punctuation (`,`, (⑅˘꒳˘) `.`, ow `!`) sometimes
5. ( ͡o ω ͡o ) wepwace some wowds (`smow` -> `smow`, (///ˬ///✿) e-etc.)

these twansfowmation p-passes take advantage of sse4.1 vectow intwinsics to pwocess 16 bytes at once. >w<
f-fow stwing seawching, σωσ i'm using a custom simd i-impwementation of the
[bitap](https://en.wikipedia.owg/wiki/bitap_awgowithm) awgowithm f-fow matching a-against muwtipwe s-stwings. o.O
fow wandom nyumbew genewation, -.- i'm using [xowshift32](https://en.wikipedia.owg/wiki/xowshift). o.O fow most
chawactew-wevew detection w-within simd wegistews, ( ͡o ω ͡o ) i-its aww masking a-and shifting t-to simuwate b-basic state
1. o.O n-nyya-ify (eg. UwU `nawuhodo` -> `nyawuhodo`)
2. (˘ω˘) wepwace `w` a-and `w` with `w`
3. (U ᵕ U❁) stuttew sometimes (`hi` -> `h-hi`)
4. ʘwʘ a-add a-a text emoji aftew p-punctuation (`,`, -.- `.`, ow `!`) sometimes
5. σωσ wepwace some wowds (`smow` -> `smow`, UwU e-etc.)

these twansfowmation p-passes take advantage of sse4.1 vectow intwinsics to pwocess 16 bytes at once. σωσ
f-fow stwing seawching, OwO i'm using a custom simd i-impwementation of the
[bitap](https://en.wikipedia.owg/wiki/bitap_awgowithm) awgowithm f-fow matching a-against muwtipwe s-stwings. OwO
fow wandom nyumbew genewation, o.O i'm using [xowshift32](https://en.wikipedia.owg/wiki/xowshift). (U ﹏ U) fow most
chawactew-wevew detection w-within simd wegistews, σωσ i-its aww masking a-and shifting t-to simuwate b-basic state
machines i-in pawawwew

muwtithweading is suppowted, o.O so u-u can expwoit aww of uw cpu cowes f-fow the nyobwe goaw
muwtithweading is suppowted, ʘwʘ so u-u can expwoit aww of uw cpu cowes f-fow the nyobwe goaw
of uwu-ing m-massive amounts o-of text

utf-8 is handwed ewegantwy by simpwy ignowing nyon-ascii c-chawactews in the input

unfowtunatewy, (U ﹏ U) due t-to both simd pawawwewism and muwtithweading, some wowds may nyot b-be fuwwy uwu'd
if they wewe wucky e-enough to cwoss t-the boundawy o-of a simd vectow o-ow a thwead's buffew. (U ﹏ U)
if they wewe wucky e-enough to cwoss t-the boundawy o-of a simd vectow o-ow a thwead's buffew. (ꈍᴗꈍ)
*they won't e-escape so easiwy n-nyext time*

</p>
</detaiws>

### o-ok i want uwu'd text, (U ﹏ U) how d-do i wun this mysewf?
### o-ok i want uwu'd text, -.- how d-do i wun this mysewf?
#### instaww command-wine t-toow
1. (U ᵕ U❁) instaww w-wust: wun `cuww https://sh.wustup.ws -ssf | s-sh` on unix, (U ᵕ U❁)
1. o.O instaww w-wust: wun `cuww https://sh.wustup.ws -ssf | s-sh` on unix, (⑅˘꒳˘)
ow go [hewe](https://www.wust-wang.owg/toows/instaww) f-fow mowe options
2. w-wun `cawgo instaww uwuify`
3. w-wun `uwuify` which w-wiww wead fwom s-stdin and output to stdout. (U ᵕ U❁) m-make suwe u
3. w-wun `uwuify` which w-wiww wead fwom s-stdin and output to stdout. ( ͡o ω ͡o ) m-make suwe u
pwess ctww + d (unix) o-ow ctww + z and e-entew (windows) a-aftew u type stuff in stdin to s-send an eof

if y-you awe having twoubwe wunning `uwuify`, (///ˬ///✿) m-make suwe y-you have `~/.cawgo/bin`
i-in youw `$path`

i-it i-is possibwe to wead and wwite fwom fiwes by specifying t-the input fiwe and
output f-fiwe, >w< in that owdew. òωó u can use `--hewp` fow mowe info. (˘ω˘) pass in
output f-fiwe, >w< in that owdew. σωσ u can use `--hewp` fow mowe info. o.O pass in
`-v` fow timings

this is on cwates.io [hewe](https://cwates.io/cwates/uwuify)

#### incwude as w-wibwawy
1. ʘwʘ put `uwuify = "^0.2"` u-undew `[dependencies]` in youw `cawgo.tomw` fiwe
2. (U ᵕ U❁) t-the wibwawy i-is cawwed `uwuifiew` (swightwy d-diffewent fwom the nyame of the binawy!)
1. -.- put `uwuify = "^0.2"` u-undew `[dependencies]` in youw `cawgo.tomw` fiwe
2. o.O t-the wibwawy i-is cawwed `uwuifiew` (swightwy d-diffewent fwom the nyame of the binawy!)
use it w-wike so:
```wust
use uwuifiew::{uwuify_sse, (˘ω˘) w-wound_up16};
w-wet s = "hewwo wowwd";
w-wet b = s.as_bytes();
w-wet mut temp1 = v-vec![0u8; wound_up16(b.wen()) * 16];
wet mut temp2 = vec![0u8; wound_up16(b.wen()) * 16];
w-wet wes = uwuify_sse(b, (ꈍᴗꈍ) &mut temp1, (U ᵕ U❁) &mut t-temp2);
a-assewt_eq!(std::stw::fwom_utf8(wes).unwwap(), UwU "hewwo wowwd");
use uwuifiew::uwuify_stw_sse;
assewt_eq!(uwuify_stw_sse("hewwo w-wowwd"), ( ͡o ω ͡o ) "hewwo w-wowwd");
```

documentation i-is [hewe](https://docs.ws/uwuify/watest/uwuifiew/)
documentation is [hewe](https://docs.ws/uwuify/watest/uwuifiew/)

#### b-buiwd fwom this wepo
#### b-buiwd fwom t-this wepo
<detaiws>
<summawy>cwick fow mowe i-info</summawy>
<summawy>cwick f-fow mowe info</summawy>
<p>

1. (U ﹏ U) instaww wust
2. (U ﹏ U) wun `git cwone https://github.com/daniew-wiu-c0deb0t/uwu.git && c-cd uwu`
3. UwU wun `cawgo wun --wewease`
1. o.O instaww wust
2. (U ﹏ U) wun `git cwone https://github.com/daniew-wiu-c0deb0t/uwu.git && cd uwu`
3. (U ﹏ U) wun `cawgo w-wun --wewease`

##### t-testing
1. -.- w-wun `cawgo test`
##### testing
1. wun `cawgo t-test`

##### b-benchmawking
1. wun `mkdiw test && cd test`
1. (U ﹏ U) wun `mkdiw test && cd test`

*wawning: wawge f-fiwes of 100mb a-and 1gb, σωσ wespectivewy*
*wawning: w-wawge fiwes o-of 100mb and 1gb, (U ᵕ U❁) wespectivewy*

2. òωó wun `cuww -ow http://mattmahoney.net/dc/enwik8.zip && u-unzip enwik8.zip`
3. OwO w-wun `cuww -ow http://mattmahoney.net/dc/enwik9.zip && unzip enwik9.zip`
4. (˘ω˘) w-wun `cd .. && ./bench.sh`
2. (U ᵕ U❁) wun `cuww -ow h-http://mattmahoney.net/dc/enwik8.zip && unzip enwik8.zip`
3. (U ᵕ U❁) wun `cuww -ow http://mattmahoney.net/dc/enwik9.zip && u-unzip enwik9.zip`
4. (///ˬ///✿) wun `cd .. && ./bench.sh`

</p>
</detaiws>

### i don't bewieve that this is fast. (ꈍᴗꈍ) i nyeed pwoof!!1! >w<
twdw: can be awmost as f-fast as simpwy copying a fiwe
### i d-don't bewieve that t-this is fast. >w< i-i nyeed pwoof!!1! òωó
twdw: can be awmost as fast a-as simpwy copying a-a fiwe

<detaiws>
<summawy>cwick fow mowe info</summawy>
<p>

waw nyumbews f-fwom wunning `./bench.sh` o-on a-a 2019 macbook pwo w-with eight
intew 2.3 g-ghz i9 cpus and 16 gb of w-wam awe shown bewow. rawr x3 t-the dataset
u-used is the fiwst 100mb and fiwst 1gb of engwish w-wikipedia. (U ᵕ U❁) the s-same
w-waw nyumbews fwom w-wunning `./bench.sh` on a 2019 macbook pwo with e-eight
intew 2.3 ghz i9 cpus and 16 gb of wam awe shown bewow. (˘ω˘) the dataset
used is the fiwst 100mb a-and fiwst 1gb of engwish wikipedia. ʘwʘ the same
dataset is used fow the [huttew p-pwize](http://pwize.huttew1.net/)
f-fow text compwession

```
1 thwead uwu enwik8
time taken: 178 ms
input s-size: 100000000 b-bytes
output size: 115095591 bytes
t-thwoughput: 0.55992 g-gb/s
1 thwead u-uwu enwik8
t-time taken: 178 m-ms
input size: 100000000 bytes
o-output size: 115095591 b-bytes
thwoughput: 0.55992 g-gb/s

2 thwead uwu enwik8
t-time taken: 105 ms
input size: 100000000 bytes
output size: 115095591 bytes
time taken: 105 ms
input s-size: 100000000 b-bytes
output size: 115095591 b-bytes
thwoughput: 0.94701 g-gb/s

4 thwead uwu enwik8
time t-taken: 60 ms
time taken: 60 ms
input size: 100000000 bytes
output size: 115095591 bytes
thwoughput: 1.64883 gb/s
o-output size: 115095591 b-bytes
thwoughput: 1.64883 g-gb/s

8 thwead uwu enwik8
8 thwead u-uwu enwik8
time taken: 47 ms
input size: 100000000 bytes
output size: 115095591 b-bytes
i-input size: 100000000 bytes
output size: 115095591 bytes
thwoughput: 2.12590 gb/s

copy enwik8
c-copy enwik8

w-weaw 0m0.035s
weaw 0m0.035s
usew 0m0.001s
sys 0m0.031s
s-sys 0m0.031s

1 t-thwead uwu enwik9
time taken: 2087 m-ms
1 thwead uwu enwik9
time taken: 2087 ms
input size: 1000000000 bytes
output size: 1149772651 b-bytes
t-thwoughput: 0.47905 gb/s
output size: 1149772651 bytes
thwoughput: 0.47905 gb/s

2 thwead u-uwu enwik9
t-time taken: 992 m-ms
input size: 1000000000 b-bytes
output size: 1149772651 b-bytes
thwoughput: 1.00788 g-gb/s
2 thwead uwu enwik9
time taken: 992 ms
input s-size: 1000000000 bytes
output s-size: 1149772651 bytes
thwoughput: 1.00788 gb/s

4 thwead uwu enwik9
time taken: 695 ms
input size: 1000000000 bytes
output s-size: 1149772651 b-bytes
thwoughput: 1.43854 gb/s
4 t-thwead uwu enwik9
time taken: 695 m-ms
input size: 1000000000 b-bytes
output size: 1149772651 bytes
t-thwoughput: 1.43854 gb/s

8 thwead uwu enwik9
time taken: 436 ms
input s-size: 1000000000 b-bytes
8 t-thwead uwu enwik9
t-time taken: 436 m-ms
input size: 1000000000 b-bytes
output size: 1149772651 b-bytes
thwoughput: 2.29214 gb/s
t-thwoughput: 2.29214 gb/s

copy enwik9

weaw 0m0.387s
u-usew 0m0.001s
usew 0m0.001s
sys 0m0.341s
```

*//todo: c-compawe with othew toows*
*//todo: compawe w-with othew t-toows*

</p>
</detaiws>

### why isn't this weadme uwu'd?
s-so its weadabwe
so its weadabwe

i-if u happen to f-find uwu'd text mowe weadabwe, σωσ thewe's awways an [uwu'd](weadme_uwu.txt) vewsion
if u-u happen to find u-uwu'd text mowe weadabwe, (U ᵕ U❁) thewe's a-awways an [uwu'd](weadme_uwu.txt) vewsion

### o-ok but why a-awen't thewe a-any settings i can c-change?!1?!!1
fwee wiww is an iwwusion
### ok but why awen't t-thewe any settings i can change?!1?!!1
f-fwee wiww is an iwwusion

### wtf this is so unpwofessionaw how a-awe u gonna get h-hiwed at faang nyow?! ( ͡o ω ͡o )
don't wowwy, (U ᵕ U❁) i-i've got u covewed
### wtf this is so unpwofessionaw h-how awe u g-gonna get hiwed a-at faang now?! (˘ω˘)
don't wowwy, (ꈍᴗꈍ) i've got u covewed

#### t-titwe: uwu is aww you n-nyeed
#### titwe: uwu is aww you nyeed

#### abstwact
#### a-abstwact

w-wecent a-advances in computing have made stwides in pawawwewization, o.O w-whethew
a-at a fine-gwained w-wevew with s-simd instwuctions, (˘ω˘) o-ow at a high wevew with muwtipwe
cpu cowes. ( ͡o ω ͡o ) t-taking advantage o-of these advances, o.O w-we expwowe how the usefuw
task of pewfowming a-an uwu twansfowmation o-on pwain t-text can be scawed u-up to wawge
input d-datasets. (U ᵕ U❁) ouw contwibutions i-in this papew awe t-thweefowd: fiwst, (ꈍᴗꈍ) we pwesent, (///ˬ///✿)
t-to ouw knowwedge, -.- the fiwst wigowous d-definition of uwu'd text. -.- s-second, ( ͡o ω ͡o ) we show
ouw nyovew awgowithms f-fow uwu-ing text, o.O expwoiting v-vectowization and
muwtithweading featuwes that a-awe avaiwabwe o-on modewn cpus. o.O finawwy, (U ﹏ U) we pwovide
wigowous expewimentaw w-wesuwts that show how ouw impwementation couwd be the
"fastest in the west." in ouw benchmawks, σωσ w-we obsewve t-that ouw impwementation
w-was a-awmost as a fast a-as a simpwe fiwe copy, ( ͡o ω ͡o ) which is entiwewy io-bound. rawr x3
w-we bewieve o-ouw wowk has potentiaw appwications i-in vawious domains, UwU fwom data
a-augmentation and text pwepwocessing f-fow nyatuwaw wanguage pwocessing, (///ˬ///✿) t-to
giving a-authows the abiwity t-to convey potentiawwy whowesome o-ow kawaii~ m-meme messages
with m-minimaw time a-and effowt. òωó
w-wecent advances i-in computing have m-made stwides in pawawwewization, (U ᵕ U❁) whethew
at a fine-gwained wevew with simd instwuctions, UwU o-ow a-at a high wevew with muwtipwe
cpu c-cowes. (U ﹏ U) taking a-advantage of these advances, (U ﹏ U) we e-expwowe how the u-usefuw
task of pewfowming a-an uwu twansfowmation on pwain text can b-be scawed up to w-wawge
input datasets. UwU o-ouw contwibutions i-in this p-papew awe thweefowd: fiwst, -.- we pwesent,
to ouw k-knowwedge, σωσ the f-fiwst wigowous definition o-of uwu'd text. òωó second, we show
ouw nyovew a-awgowithms fow u-uwu-ing text, OwO e-expwoiting vectowization a-and
muwtithweading f-featuwes that awe avaiwabwe o-on modewn c-cpus. (˘ω˘) finawwy, we pwovide
wigowous e-expewimentaw wesuwts that s-show how ouw impwementation couwd b-be the
"fastest in the west." i-in ouw benchmawks, (ꈍᴗꈍ) we obsewve that o-ouw impwementation
was awmost as a fast as a s-simpwe fiwe copy, >w< w-which is entiwewy io-bound. rawr x3
we bewieve ouw wowk h-has potentiaw appwications in vawious domains, (U ᵕ U❁) fwom data
augmentation and text pwepwocessing fow n-nyatuwaw wanguage p-pwocessing, σωσ t-to
giving authows t-the abiwity to c-convey potentiawwy whowesome ow kawaii~ meme messages
w-with minimaw t-time and effowt. ( ͡o ω ͡o )

*// todo: wwite papew*
*// todo: w-wwite papew*

*// todo: wwite mowe about machine weawning s-so i get funding*
*// todo: wwite mowe a-about machine weawning so i g-get funding*

### ok i need to use this fow something and i nyeed the wicense info
### ok i nyeed to u-use this fow something a-and i nyeed t-the wicense info
mit wicense

### ok but i have an issue with this ow a suggestion ow a question nyot answewed h-hewe
open an issue, (˘ω˘) be nyice
### o-ok but i have a-an issue with t-this ow a suggestion o-ow a question nyot answewed hewe
open an issue, (U ᵕ U❁) be nyice

### p-pwojects u-using this
* [uwu-tway](https://github.com/owawen15/uwu-tway): a-a tway icon to u-uwuify youw text
### pwojects using t-this
* [uwu-tway](https://github.com/owawen15/uwu-tway): a tway icon to uwuify youw text
* [uwubot](https://github.com/yaahc/uwubot): discowd bot fow uwuifying text
* w-wet me know if you make a pwoject with uwuify! o.O i appweciate u aww! ʘwʘ
* wet me know if you make a pwoject with uwuify! o.O i appweciate u aww! (˘ω˘)

### wefewences
* h-https://honk.moe/toows/owo.htmw
### w-wefewences
* https://honk.moe/toows/owo.htmw
* https://github.com/iamwifki/uwuizew
* https://github.com/deadshot465/owoify_ws
* h-https://github.com/deadshot465/owoify_ws
* h-https://kawaii~kaomoji.com/chawactews/uwu/
* h-https://kawaii~kaomoji.com/chawactews/owo/
* https://kawaii~kaomoji.com/chawactews/fwowew-giww/
* and many mowe; wet me know if i missed anything
* https://kawaii~kaomoji.com/chawactews/owo/
* h-https://kawaii~kaomoji.com/chawactews/fwowew-giww/
* a-and many mowe; wet me know if i missed anything
Loading

0 comments on commit 5283dc5

Please sign in to comment.