前言

主导了MongoDB的整体迁移工作. 将操作记录和细节记录在此处.


基本命令

本次主要有2个基础命令bin/mongoimportbin/mongoexport.

  • mongoimport
localhost:bin sean$ ./mongoexport --help
Usage:
  mongoexport <options>

Export data from MongoDB in CSV or JSON format.

See http://docs.mongodb/manual/reference/program/mongoexport/ for more information.

general options:
      --help                                      print usage
      --version                                   print the tool version and exit

verbosity options:
  -v, --verbose=<level>                           more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)
      --quiet                                     hide all log output

connection options:
  -h, --host=<hostname>                           mongodb host to connect to (setname/host1,host2 for replica sets)
      --port=<port>                               server port (can also use --host hostname:port)

authentication options:
  -u, --username=<username>                       username for authentication
  -p, --password=<password>                       password for authentication
      --authenticationDatabase=<database-name>    database that holds the user's credentials
      --authenticationMechanism=<mechanism>       authentication mechanism to use

namespace options:
  -d, --db=<database-name>                        database to use
  -c, --collection=<collection-name>              collection to use

output options:
  -f, --fields=<field>[,<field>]*                 comma separated list of field names (required for exporting CSV) e.g. -f "name,age"
      --fieldFile=<filename>                      file with field names - 1 per line
      --type=<type>                               the output format, either json or csv (defaults to 'json') (default: json)
  -o, --out=<filename>                            output file; if not specified, stdout is used
      --jsonArray                                 output to a JSON array rather than one object per line
      --pretty                                    output JSON formatted to be human-readable
      --noHeaderLine                              export CSV data without a list of field names at the first line

querying options:
  -q, --query=<json>                              query filter, as a JSON string, e.g., '{x:{$gt:1}}'
      --queryFile=<filename>                      path to a file containing a query filter (JSON)
  -k, --slaveOk                                   allow secondary reads if available (default true) (default: false)
      --readPreference=<string>|<json>            specify either a preference name or a preference json object
      --forceTableScan                            force a table scan (do not use $snapshot)
      --skip=<count>                              number of documents to skip
      --limit=<count>                             limit the number of documents to export
      --sort=<json>                               sort order, as a JSON string, e.g. '{x:1}'
      --assertExists                              if specified, export fails if the collection does not exist (default: false)

基本命令参数如下. 其实我们一般主要用的是-h/-d/-c/--jsonArray

  • mongoimport
localhost:bin sean$ ./mongoimport --help
Usage:
  mongoimport <options> <file>

Import CSV, TSV or JSON data into MongoDB. If no file is provided, mongoimport reads from stdin.

See http://docs.mongodb/manual/reference/program/mongoimport/ for more information.

general options:
      --help                                      print usage
      --version                                   print the tool version and exit

verbosity options:
  -v, --verbose=<level>                           more detailed log output (include multiple times for more verbosity, e.g. -vvvvv, or specify a numeric value, e.g. --verbose=N)
      --quiet                                     hide all log output

connection options:
  -h, --host=<hostname>                           mongodb host to connect to (setname/host1,host2 for replica sets)
      --port=<port>                               server port (can also use --host hostname:port)

authentication options:
  -u, --username=<username>                       username for authentication
  -p, --password=<password>                       password for authentication
      --authenticationDatabase=<database-name>    database that holds the user's credentials
      --authenticationMechanism=<mechanism>       authentication mechanism to use

namespace options:
  -d, --db=<database-name>                        database to use
  -c, --collection=<collection-name>              collection to use

input options:
  -f, --fields=<field>[,<field>]*                 comma separated list of fields, e.g. -f name,age
      --fieldFile=<filename>                      file with field names - 1 per line
      --file=<filename>                           file to import from; if not specified, stdin is used
      --headerline                                use first line in input source as the field list (CSV and TSV only)
      --jsonArray                                 treat input source as a JSON array
      --parseGrace=<grace>                        controls behavior when type coercion fails - one of: autoCast, skipField, skipRow, stop (defaults to 'stop') (default: stop)
      --type=<type>                               input format to import: json, csv, or tsv (defaults to 'json') (default: json)
      --columnsHaveTypes                          indicated that the field list (from --fields, --fieldsFile, or --headerline) specifies types; They must be in the form of
                                                  '<colName>.<type>(<arg>)'. The type can be one of: auto, binary, bool, date, date_go, date_ms, date_oracle, double, int32, int64,
                                                  string. For each of the date types, the argument is a datetime layout string. For the binary type, the argument can be one of:
                                                  base32, base64, hex. All other types take an empty argument. Only valid for CSV and TSV imports. e.g. zipcode.string(),
                                                  thumbnail.binary(base64)

ingest options:
      --drop                                      drop collection before inserting documents
      --ignoreBlanks                              ignore fields with empty values in CSV and TSV
      --maintainInsertionOrder                    insert documents in the order of their appearance in the input source
  -j, --numInsertionWorkers=<number>              number of insert operations to run concurrently (defaults to 1) (default: 1)
      --stopOnError                               stop importing at first insert/upsert error
      --mode=[insert|upsert|merge]                insert: insert only. upsert: insert or replace existing documents. merge: insert or modify existing documents. defaults to insert
      --upsertFields=<field>[,<field>]*           comma-separated fields for the query part when --mode is set to upsert or merge
      --writeConcern=<write-concern-specifier>    write concern options e.g. --writeConcern majority, --writeConcern '{w: 3, wtimeout: 500, fsync: true, j: true}' (defaults to
                                                  'majority') (default: majority)
      --bypassDocumentValidation                  bypass document validation

操作流程

  • 使用mongoexport导出为json文件.
  • 注意此时的json文件是没加[], 以及逗号的. 我现在的做法是手动加上. (汗)
  • 使用mongoimport导入json文件

实战记录

实战1 导出
localhost:bin sean$ ./mongoexport -h 127.0.0.1 -d yanxml -c yanxml -q '{"name" : "www.yanxml"}' -o yanxml_2_20210419.json
2021-04-19T00:40:13.153+0800	connected to: 127.0.0.1
2021-04-19T00:40:13.155+0800	exported 1 record

实战2 导入尝试
localhost:bin sean$ ./mongoimport -h 127.0.0.1:27017 -d yanxml_import_db -c yanxml --jsonArray yanxml_20210419.json
2021-04-19T00:31:45.967+0800	connected to: 127.0.0.1:27017
2021-04-19T00:31:45.969+0800	Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
2021-04-19T00:31:45.969+0800	imported 0 documents

我们可以看到这边爆了一个错误. 指定我们的JSON格式不对. 我们先看看JSON文件咯.

{"_id":{"$oid":"5cd541d1e6b987425bc13718"},"name":"www.yanxml"}
{"_id":{"$oid":"5cd541d1e6b987425bc13712"},"name":"www.yanxml2"}

发现没有[],.

目标的结果情况.

[
{"_id":{"$oid":"5cd541d1e6b987425bc13718"},"name":"www.yanxml"},
{"_id":{"$oid":"5cd541d1e6b987425bc13712"},"name":"www.yanxml2"}
]

此次, 我这里使用笨方法, 手动插入. 但是Subline有一个全局行添加的功能, 推荐给大家.

(百度经验)sublime同时编辑多行


实战3 - 再次插入
localhost:bin sean$ ./mongoimport -h 127.0.0.1:27017 -d yanxml_import_db -c yanxml --jsonArray yanxml_20210419.json
2021-04-19T00:31:55.959+0800	connected to: 127.0.0.1:27017
2021-04-19T00:31:56.028+0800	imported 1 document
# 插入前的查询记录
> show dbs
admin       0.000GB
helloworld  0.000GB
local       0.000GB
yanxml      0.000GB
# 插入后的查询记录
> show dbs
admin       0.000GB
helloworld  0.000GB
local       0.000GB
yanxml      0.000GB
> show dbs
admin             0.000GB
helloworld        0.000GB
local             0.000GB
yanxml            0.000GB
yanxml_import_db  0.000GB
> use yanxml_import_db
switched to db yanxml_import_db
> show collections
yanxml
> db.yanxml.find()
{ "_id" : ObjectId("5cd541d1e6b987425bc13718"), "name" : "www.yanxml" }

Others

  • Tips1 : 导出时候需要特别注意时间的格式是String还是ISODate. 是ObjectId还是String. 这个我们下一篇再提及这个方面内容.

  • Tips2 : 此外还有mongodumpmongorestore两个命令也能达到同样效果. More For (菜鸟教程)MongoDB 备份(mongodump)与恢复(mongorestore)


Reference

[1]. (百度经验)sublime同时编辑多行

更多推荐

[MongoDB] MongoDB内数据迁移