RK.Feng的个人小站

mongo学习笔记

mongo学习笔记 1. mongo document学习笔记 1.1 BSON 类型 ObjectId: 快速, 有序, 时间相关 String Timestamps Date 1.2 Document类型字段(field)有长度限制: 如field name不超过128, 等 Dot Notation: <array>.<index>, <embedded document>.<field> 单文档有大小限制:16MB 1.3 聚合 MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose aggregation methods. 聚合字段: $match: 匹配, { $match: { name: "Joe Schmoe" } } unwind: 打散,针对array, { $unwind: "$resultingArray"} $project: 投射, {"$project":{"author":1,"_id":0} #只提前author $redact: 校验, { $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } } $skip: 跳过, { $skip: 5 } $lookup: 跨表检索, { $lookup: { from: "otherCollection", as: "resultingArray", localField: "x", foreignField: "y" } } Note: 各字段配合的优化, 参考文档聚合限制: Result Size Restrictions: 单doc <= 16MB Memory Restrictions: Pipeline stages have a limit of 100 megabytes of RAM; The $graphLookup stage must stay within the 100 megabyte memory limit. todo: 聚合操作zip code data set(经纬度) Aggregation with User Preference Data, 利用用户信息表举例: 获取所有员工名称根据加入时间返回员工名称获取每个月新加入的人数获取前五个最受欢迎的爱好 1.4 检索条件检索 $or: cursor = db.inventory.find({"$or": [{"status": "A"}, {"qty": {"$lt": 30}}]}) $and 检索列表: Match an Array: db.inventory.find({"tags": ["red", "blank"]}) # tags == ["red", "blank"]; db.inventory.find({"tags": {"$all": ["red", "blank"]}}) # tags 同时存在"red","blank"两个元素 Query an Array with Compound Filter Conditions on the Array Elements: db.inventory.find({"dim_cm": {"$gt": 15, "$lt": 20}}) # 15<x<20; x>15, y<20 Query for an Array Element that Meets Multiple Criteria: db.inventory.find({"dim_cm": {"$elemMatch": {"$gt": 22, "$lt": 30}}}) # 其中有一个元素22<x<30 Query for an Element by the Array Index Position: db.inventory.find({"dim_cm.1": {"$gt": 25}}) Query an Array by Array Length: db.inventory.find({"tags": {"$size": 3}}) Project： db.inventory.find({"status": "A"}, {"item": 1, "status": 1}) means SELECT _id, item, status from inventory WHERE status = "A" db.inventory.find({"status": "A"}, {"item": 1, "status": 1, "_id": 0}) means SELECT item, status from inventory WHERE status = "A" db.inventory.find({"status": "A"}, {"status": 0, "instock": 0}) means return All except for status and instock 其他 db.inventory.find({"item": None}): None 或不存在 db.inventory.find({"item": {"$type": 10}}): 类型检查, 查询值为None的记录 db.inventory.find({"item": {"$exists": False}}): 不存在 1.5 固定集合(Capped Collection) 判断当前集合是否是固定集合: db.collection.isCapped() 转化为固定集合(原数据可能会丢失): db.runCommand({"convertToCapped":"my_coll",size:2000000000, max:500000}). max 为文档数量, size 为内容大小 2.聚合操作 2.1 获取列表元素集合 keywords: group, unwind, aggregate 举例,有如下数据: {"_id": "1", "tags": ["a", "b"]} {"_id": "2", "tags": ["a", "b", "c"]} {"_id": "3", "tags": []} {"_id": "4", "tags": ["c", "d"]} 求tags元素集合? 方法: db.test.aggregate([ {"$unwind": "$tags"}, {"$group": {"_id": "$tags"}}, ]) 2.2 获取某个字段的所有取值 db.getCollection('<collection>').aggregate( [ { "$group" : { _id : null, "city": { "$addToSet": "$city" } } } ] ) 3.update操作 3.1 修改列表元素的值 keywords: update_many 举例,有如下数据: {"_id": "1", "tags": ["a", "b"]} {"_id": "2", "tags": ["a", "b", "c"]} {"_id": "3", "tags": []} {"_id": "4", "tags": ["c", "d"]} 将tags中”a”修改为”A”? 方法: self.db["test"].update_many( filter={"tags": "a"}, update={"$set": {'tags.$': "A"}}, upsert=False, ) # 一次操作只能修改一个值 # 如果tags中存在多个"a", 需要多次执行以上代码 3.2 删除列表元素的值 keywords: update_many 举例,有如下数据: {"_id": "1", "tags": ["a", "b"]} {"_id": "2", "tags": ["a", "b", "c"]} {"_id": "3", "tags": []} {"_id": "4", "tags": ["c", "d"]} {"_id": "5", "tags": ["c", "d", "a", "a"]} 将tags中所有”a”删除? 方法: tag_list = ["a"] self.db["test"].update_many( filter={"tags": {"$in": tag_list}}, update={"$pull": {'tags': {"$in": tag_list}}}, upsert=False, ) # 执行后, _id = "5"的记录, tags中两个"a"均被删除 3.3 列表修改高级版假设有数据如下 collection.insert_many([ {"name": "a1", "tags": [{"weight": 10}, {"weight": 20}]}, {"name": "a2", "tags": [{"weight": 11}, {"weight": 21}]}, {"name": "a3", "tags": [{"weight": 10}, {"weight": 25}]}, ]) 3.3.1 将weight=10的标签的权重更改为20 # 执行一次，只会更新该条记录中满足条件的第一个元素 collection.update_many( {"tags.weight": 10}, {"$set": {"tags.$.weight": 20}}, upsert=False, ) 修改所有匹配值的方法: while True: result = collection.update_many( {"tags.weight": 10}, {"$set": {"tags.$.weight": 20}}, upsert=False,) if result.matched_count == 0: break 3.3.2 将weight!=10的标签的权重更改为10 # 执行一次，只会更新该条记录中满足条件的第一个元素 collection.update_many( {"tags": {"$elemMatch": {"weight": {"$ne": 10}}}, {"$set": {"tags.$.weight": 20}}, upsert=False, ) 3.3.3 将weight!=10 or weight!=20的标签的权重更改为10 # 执行一次，只会更新该条记录中满足条件的第一个元素 collection.update_many( {"tags": {"$elemMatch": {"$or": [{"weight": {"$ne": 10}, {"weight": {"$ne": 20}]}}}, {"$set": {"tags.$.weight": 20}}, upsert=False, ) 3.4 set only not exists mongo 提供 $setOnInsert 操作符，来实现当文档不存在才设置的功能。 db.collection.update( <query>, { $setOnInsert: { <field1>: <value1>, ... } }, { upsert: true } ) 4. 工具 4.1 导出 collection mongoexport --uri "mongodb://<username>:<password>@<host1>:<port1>,<host2>:<port2>/<database>?replicaSet=mgset-123456&authSource=admin" --collection <collection> --fields <field1>,<field2> --out <outfile> 要点: uri后加引号, admin放到authSource 或者: mongoexport -h <host> -d <databse> --collection <collection> --fields <field1>,<field2> --out <outfile> 4.2 导入 collection mongoimport --uri "mongodb://<username>:<password>@<host1>:<port1>,<host2>:<port2>/<database>?replicaSet=mgset-123456&authSource=admin" --collection <collection> --fields <field1>,<field2> <datafile> 或者 mongoimport -h <host> -d <databse> --collection <collection> --fields <field1>,<field2> <datafile> 4.3 索引操作新建索引: db.getCollection('<collection>').createIndex( { "age": 1}, {background: true, name:"_age_"} ) 4.4 mongo 版本不兼容 mongo 升级到 4.0 版本后，其工具如mongodump, mongoimport, mongoexport也需要升级到 4.0版本。为避免安装这些工具导致主机软件环境混乱，可以使用 docker 执行所需的工具。 # download images docker pull mongo:4.0 # mkdir working dir mkdir -p dodo && chmod 777 dodo/ -R && cd dodo/ # run mongo tools docker run --rm -v $(pwd):/workdir/ -w /workdir/ mongo:4.0 mongoimport --uri "mongodb://<username>:<password>@<host1>:<port1>,<host2>:<port2>/<database>?replicaSet=mgset-123456&authSource=admin" --collection <collection> --fields <field1>,<field2> <datafile> 5. pymongo 5.1 正则写法 robo 3T 中执行 _id 正则匹配: db.getCollection('col').find({_id: /version1/}) 需要使用 pymongo 执行上述搜索. 上述的搜索, 可在 robo 3T 中使用$regex等效实现: db.getCollection('col').find({"_id": {"$regex": "/version1/", "$options": 'm'}}) 所以, pymongo 对应写法为: collection.find({"_id": {"$regex": "/version1/", "$options": 'm'}})

2019/03/18 技术

python异步服务器测试

python异步服务器测试 1. 安装AB进行压力测试 1.1 准备环境 mac 安装AB, 参考: curl -OL http://ftpmirror.gnu.org/libtool/libtool-2.4.2.tar.gz tar -xzf libtool-2.4.2.tar.gz cd libtool-2.4.2 ./configure && make && sudo make install # brew install 'https://raw.github.com/simonair/homebrew-dupes/e5177ef4fc82ae5246842e5a544124722c9e975b/ab.rb' # brew test ab curl -O https://archive.apache.org/dist/httpd/httpd-2.4.2.tar.bz2 tar zxvf httpd-2.4.2.tar.bz2 cd httpd-2.4.2.tar.bz2 ./configure && make && make install 1.2 客户端测试代码 ab -n 10 -c 1 http://localhost:8000/ ab -n 10 -c 2 http://localhost:8000/ ab -n 10 -c 5 http://localhost:8000/ ab -n 10 -c 10 http://localhost:8000/ ab -n 100 -c 10 http://localhost:8000/ ab -n 100 -c 20 http://localhost:8000/ ab -n 100 -c 50 http://localhost:8000/ ab -n 100 -c 100 http://localhost:8000/ 2. tornado测试 2.1 tornado服务端代码 from concurrent.futures import ThreadPoolExecutor import tornado from tornado.concurrent import run_on_executor from tornado.web import RequestHandler import time class SimpleAsyncServer(RequestHandler): def __init__(self, application, request, **kwargs): super(SimpleAsyncServer, self).__init__(application, request, **kwargs) self.executor = ThreadPoolExecutor(10) @tornado.gen.coroutine def get(self, ): print("on get...") result = yield self._do_something() self.write(result) @run_on_executor def _do_something(self, ): """ 模拟耗时操作 :return: """ time.sleep(5) return {"msg": "OK"} def make_app(): return tornado.web.Application([ (r"/", SimpleAsyncServer), ]) if __name__ == "__main__": app = make_app() app.listen(8000) tornado.ioloop.IOLoop.current().start() 2.2 测试结果请求总数并发数 ab总耗时(s) 10 1 50.0 10 2 30.0 10 5 15.0 10 10 10.0 100 10 55.1 100 20 30.0 100 50 15.1 100 100 10.1

2019/03/15 技术

No module named 'Crypto' on Mac

No module named ‘Crypto’ on Mac 问题在mac中，为新项目配置python环境，运行时报错: ... from Crypto.Cipher import AES ImportError: No module named Crypto.Cipher 原因及解决方法原来，mac中提供Crypto模块的包，有Crypto，pycrypto,pycryptodome等。这些包同时安装，会产生冲突。解决方法是只保留一个包，这里建议保留pycryptodome。列出所有crypto包，确认原因: python -m pip list | grep rypto 只保留一个包: python -m pip uninstall crypto python -m pip uninstall pycrypto python -m pip uninstall pycryptodome python -m pip install pycryptodome

2019/03/12 技术

mac中安装python3.5

mac中安装python3.5 mac 10.13.6 中需要安装python3.5的环境，而mac自带python2.7的环境。尝试使用brew命令安装python3.5,失败。最终解决方案: 在https://www.python.org/downloads/mac-osx/页面中，选择有预编译的python3.5版本，直接安装即可。

2019/03/11 技术

py3.6环境下numpy C扩展出错

2018/12/14 技术

mtcnn读书笔记

2018/12/13 技术

shell 学习笔记

shell学习笔记 1. 一个命令的结果填充到另一个命令中 ssh例子: # 获取远程服务器的 ip, 并 ssh连接到该服务器上 ssh foo@$(cat /data/ip.result) docker例子: # 删除所有仓库名为 redis 的镜像： docker image rm $(docker image ls -q redis) 需要注意的是, 单引号与双引号的效果不一样, 需要使用双引号举例说明如下, py_cmd.py模拟一个简单的 echo 命令: import argparse def echo_name(argv=None): parser = argparse.ArgumentParser(description='demo') # name parser.add_argument('--name', type=str, help='name') args = parser.parse_args(args=argv) print("name_{}".format(args.name)) if __name__ == '__main__': echo_name() 示例命令: name="M 1 M" c1=$(python py_cmd.py --name='${name}' ) echo "case1 ${c1}" # 输出: case1 name_${name} c2=$(python py_cmd.py --name="${name}" ) echo "case2 ${c2}" # 输出: case2 name_M 1 M 2. sudo执行echo命令 sudo sh -c "echo '{ \"registry-mirrors\": [\"https://registry.docker-cn.com\"] }' >> /etc/docker/daemon.json" 3. 清空文件 echo -n > ~/xx.conf 4. 常用命令 # 查看磁盘使用 df -lh # 查看当前目录所占空间 du -sh ./ 5. ssh命令 # 上传文件 scp ./local.file ubuntu@host:/remote/remote.file # 下载文件 scp ubuntu@host:/remote/remote.file ./local.file 6. 获取本机ip 获取本机ip: ifconfig|sed -n '/inet addr/s/^[^:]*:$[0-9.]\{7,15\}$ .*/\1/p' 获取当前虚拟机ip: ifconfig|sed -n '/inet addr/s/^[^:]*:$[0-9.]\{7,15\}$ .*/\1/p' | grep 192.168 7. 设置屏幕亮度为0 [[ "$(cat /sys/class/backlight/intel_backlight/brightness)" -ne "0" ]] && (echo 0 | sudo tee /sys/class/backlight/intel_backlight/brightness) 8. 复杂命令示例来源docker配置 ARG CHROME_VERSION="google-chrome-stable" ARG CHROME_DRIVER_VERSION RUN apt-get update -qqy \ # install chrome && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \ && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \ && apt-get update -qqy \ && apt-get -qqy install ${CHROME_VERSION:-google-chrome-stable} \ && rm /etc/apt/sources.list.d/google-chrome.list \ # install chrome drive && if [ -z "$CHROME_DRIVER_VERSION" ]; \ then CHROME_MAJOR_VERSION=$(google-chrome --version | sed -E "s/.* ([0-9]+)(\.[0-9]+){3}.*/\1/") \ && CHROME_DRIVER_VERSION=$(wget --no-verbose -O - "https://chromedriver.storage.googleapis.com/LATEST_RELEASE_${CHROME_MAJOR_VERSION}"); \ fi \ && echo "Using chromedriver version: "$CHROME_DRIVER_VERSION \ && wget --no-verbose -O /tmp/chromedriver_linux64.zip https://chromedriver.storage.googleapis.com/$CHROME_DRIVER_VERSION/chromedriver_linux64.zip \ && rm -rf /opt/selenium/chromedriver \ && unzip /tmp/chromedriver_linux64.zip -d /opt/selenium \ && rm /tmp/chromedriver_linux64.zip \ && mv /opt/selenium/chromedriver /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION \ && chmod 755 /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION \ && ln -fs /opt/selenium/chromedriver-$CHROME_DRIVER_VERSION /usr/bin/chromedriver \ # install fonts && apt-get -qqy install ttf-wqy-zenhei ttf-wqy-microhei fonts-droid-fallback fonts-arphic-ukai fonts-arphic-uming \ # install selenium && pip install --upgrade selenium \ && rm -rf /root/.cache/pip/* \ && rm -rf /var/lib/apt/lists/* /var/cache/apt/* 9. 环境变量动态设置示例 LDFLAGS=-L/usr/local/opt/openssl/lib https_proxy="" http_proxy="" pip install mysqlclient 10. 安装 BBR wget --no-check-certificate https://github.com/teddysun/across/raw/master/bbr.sh && sudo chmod +x bbr.sh && sudo ./bbr.sh 11. tar 分卷压缩及解压 dd if=/dev/zero of=test.log bs=1M count=1000 # 压缩 tar zcf - test.log |split -b 100m - test.tar.gz. # 解压 mkdir -p test_tmp mv test.tar.gz.* test_tmp/ cd test_tmp/ cat logs.test.tar.gz.* | tar zx 12. debian系系统安装 tzdata 免输入时区 DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata 13. 复杂条件的文件复制操作 mkdir -p /opt/python_libs # copy folder cp -r ~/code/my_libs /opt/python_libs/ # copy folder if exists [ -d "/opt/code/new_libs" ] && cp -r /opt/code/new_libs /opt/python_libs/ 14. 文件下载 # curl curl -o out.file -sfL http:xxx.com # wget wget -qO out.file http:xxx.com 15. 关闭进程 for pid in `ps -ef | grep python3 | grep "server.py" | grep -v grep | awk '{print $2;}'` do kill -9 $pid done 16. grep + awk + xargs 在实践中，需要批量处理数据文件。处理完成后，将结果写入命名方式为原文件名 + .upload 的结果文件中。操作日志记录在 simple.log 中，其日志内容格式如下： 2020-03-21 10:00:00 output to file: /tmp/abc_1.log.upload 2020-03-21 10:00:00 xxxx 2020-03-21 10:10:00 output to file: /tmp/abc_2.log.upload 2020-03-21 10:10:00 xxx1xxx 2020-03-21 10:20:00 output to file: /tmp/abc_4.log.upload 2020-03-21 10:20:00 .... 现在需要删除已经处理的原文件。可以使用 grep + awk + xargs 实现： for filename in $(cat simple.log | grep "output to file" | awk '{print $NF}' | awk -F "/" '{print $NF}' ); do tmp_file="/tmp/${filename%%.upload}" if [ -f "$tmp_file" ]; then echo "$tmp_file can remove" rm "$tmp_file" fi done 17. 删除修改日期为一天以前的日志文件 #!/bin/bash while IFS= read -r -d '' file do if [[ $file =~ "tmp_" ]]; then let count++ echo "rm file $file" rm "$file" fi done < <(find /tmp/ -maxdepth 1 -mtime +1 -print0) echo "remove $count files from /tmp/ " 18. 删除以日期命名的文件目录临时日志目录，存在以下子目录： /var/log/result_20200102 /var/log/result_20200103 /var/log/result_20200104 .... /var/log/result_20200324 现在需要删除昨天以前的子目录，具体实现如下 #!/bin/bash # 删除临时文件 today_str=$(date +%Y%m%d) yesterday_str=$(date -d -1day +%Y%m%d) for filename in /var/log/result_2020*; do if [[ $filename =~ $today_str ]]; then echo "$filename today" else if [[ $filename =~ $yesterday_str ]]; then echo "$filename yesterday" else echo "$filename remove" rm -r "$filename" fi fi done 19. 文本替换 linux 版本: # 找到所有 py 文件, 将 # print("data is {}".format(data)) # 替换为 # print("数据是 {}".format(data)) find . -name "*.py" -exec sed -i s/print\(\"data\ is\ \{/print\(\"数据\ 是\ \{/g {} + mac 略有不同: # 找到所有 py 文件, 将 # print("data is {}".format(data)) # 替换为 # print("数据是 {}".format(data)) find . -name "*.py" -exec sed -i '' s/print\(\"data\ is\ \{/print\(\"数据\ 是\ \{/g {} + 20. pushd 和 popd cd ~ # cd /tmp and do something pushd /tmp/ echo "abc" > abc.txt tar -czvf abc.txt.tar.gz abc.txt popd # backup to ~ ls ./ 21. 获取文件 basename FILE="/home/vivek/lighttpd.tar.gz" basename "$FILE" # 输出: lighttpd.tar.gz f="$(basename -- $FILE)" echo "$f" # 输出: lighttpd.tar.gz 22. 多行文本 config_file="config" # 代码行 cat > ${config_file} <<EOF a=b b=c x=y EOF 23. linux挂载swap文件 # 挂载 swap sudo dd if=/dev/zero of=/swapfile bs=1M count=8192 && \ sudo mkswap /swapfile && \ sudo chmod 600 /swapfile && \ sudo swapon /swapfile && \ sudo sh -c "echo '/swapfile swap swap defaults 0 0' >> /etc/fstab " && \ sudo sh -c "echo 10 >/proc/sys/vm/swappiness" # 停用 swap sudo swapoff /swapfile && \ sudo rm /swapfile 24. grep执行二进制文件过滤日志文件 file_with_bin.log 包含二进制数据. 直接执行 grep 过滤 cat file_with_bin.log | grep "2020-07-18 01" 时, 报错 Binary file (standard input) matches. 解决方法是, 使用 -a 参数: cat file_with_bin.log | grep -a -20 "2020-07-18 01" 25. crontab/ssh 启动带gui 程序参考: 使用crontab执行GUI程序. 例如, 脚本 ‘/start_selenium.sh’ 会以 headless=False 的方式, 启动 selenium, 在 ubuntu 桌面上启动 chrome 浏览器执行任务. 在自己主机中, 使用 ssh 远程到上述脚本所在的 ubuntu 主机, 可以使用命令启动脚本: ` export DISPLAY=:0 && ./start_selenium.sh ` 26. 默认输入 yes wget http://gosspublic.alicdn.com/ossfs/ossfs_1.80.6_ubuntu18.04_amd64.deb && (yes | sudo gdebi ossfs_1.80.6_ubuntu18.04_amd64.deb ) 27. crontab 查看错误日志正常条件下, crontab 执行的脚本如果出错, 在 /var/log/syslog 中看不到有用的信息. 这时, 需要借助 postfix 接收crontab 的执行日志. 具体原理, 可参考迷之 crontab 异常：不运行、不报错、无日志. 简要流程: # step 1 安装 postfix, 并选择 local only sudo apt install -y postfix # step 2 启动 postfix sudo service postfix start # step 3 读取 crontab 的执行日志: tail -f /usr/mail/root 28. 获取主机所有 ip ```shell script all_ips=( “localhost” “127.0.0.1” “::1” “0.0.0.0” ) echo ${all_ips[@]} for ip in $(ifconfig | grep inet | awk ‘{print $2}’ | grep -oE “\b([0-9]{1,3}.){3}[0-9]{1,3}\b”) do echo ${ip} all_ips+=(${ip}) done echo ${all_ips} sorted_unique_ips=($(echo “${all_ips[@]}” | tr ‘ ‘ ‘\n’ | sort -u | tr ‘\n’ ‘ ‘)) echo ${sorted_unique_ips[@]} ```

2018/11/27 技术

RK.Feng的个人小站

mongo学习笔记

python异步服务器测试

No module named 'Crypto' on Mac

mac中安装python3.5

py3.6环境下numpy C扩展出错

mtcnn读书笔记

shell 学习笔记

install ubuntu18.04

定时备份linux系统的history记录

asyncio异步请求示例

Search

My Popular Repositories