NLP-自然语言处理入门

1.书籍-理论篇

  •   吴军老师的的《数学之美》
  • 《统计自然语言处理(第2版)》(宗成庆)蓝皮版
  • 《统计学习方法》(李航)
  • 《自然语言处理简明教程》(冯志伟)
  • 《自然语言处理综论》(Daniel Jurafsky)
  • 《自然语言处理的形式模型》(冯志伟)

2.书籍——实践篇

  •   python基础教程(翻译版)+python入门博客推荐:廖雪峰的python教程
  •  《机器学习实战》哈林顿 (Peter Harrington)
  •   西瓜书《机器学习》(周志华)
  • 《集体智慧编程》—[美] 西格兰 著,莫映,王开福 译
  • 《python自然语言处理》—伯德 (Steven Bird)(主要讲NLTK这个包的使用)

3.视频——辅助篇

  • 自然语言处理-宗庆成 
  • 自然语言处理-关毅 
  • 计算语言学概论_侯敏 
  • 计算语言学_冯志伟
  • 语法分析_陆俭明
  • 哥伦比亚大学https://class.coursera/nlan+他人的博客自然语言处理大菜鸟
  •  mooc学院-机器学习-大牛Andrew Ng
  • 网易公开课-机器学习-Andrew Ng
  • 慕课网-初识机器学习
  • 台湾大学林轩田机器学习
  • 斯坦福的nlp课程Video Listing

4.优秀参考博客

  •  我爱自然语言处理专门记录nlp的
  •  北京大学中文系 应用语言学专业

5.国际学术组织、学术会议与学术论文

  • 国际机器学习会议(ICML)
  • ACL,URL:http://aclweb/
  • 国际神经信息处理系统会议(NIPS)
  • 国际学习理论会议(COLT)
  • 欧洲机器学习会议(ECML)
  • 亚洲机器学习会议(ACML)
  • EMNLP:http://emnlp2017/ 丹麦哥本哈根 9.7-9.11
  • CCKS http://wwwks2017/index.php/att/ 成都 8月26-8月29
  • SMP http://www.cips-smp/smp2017/ 北京 9.14-9.17
  • CCL http://www.cips-cl:8080/CCL2017/home.html 南京 10.13-10.15
  • NLPCC http://tccif/conference/2017/ 大连 11.8-11.12
  • NCMMSC http://www.ncmmsc2017/index.html 连云港 11.11 - 11.13

6.知名国际学术期刊

  • Journal of Machine Learning Research
  • Computational Linguistics(URL:http://www.mitpressjournals/loi/coli)
  • TACL,URL:http://www.transacl/
  • Machine Learning
  • IJCAI
  • AAAI
  • Artificial Intelligence
  • Journal of Artificial Intelligence Research

7.工具包推荐

  • 中文的显然是哈工大开源的那个工具包 LTP (Language Technology Platform) developed by HIT-SCIR(哈尔滨工业大学社会计算与信息检索研究中心).
  • 英文的(python):
  1. pattern - simpler to get started than NLTK
  2. chardet - character encoding detection
  3. pyenchant - easy access to dictionaries
  4. scikit-learn - has support for text classification
  5. unidecode - because ascii is much easier to deal with

8.Quora上推荐的NLP的论文

  • Parsing(句法结构分析~语言学知识多,会比较枯燥)
  1. Klein & Manning: "Accurate Unlexicalized Parsing" (克莱因与曼宁:“精确非词汇化句法分析” )
  2. Klein & Manning: "Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency" (革命性的用非监督学习的方法做了parser)
  3. Nivre "Deterministic Dependency Parsing of English Text" (shows that deterministic parsing actually works quite well)
  4. McDonald et al. "Non-Projective Dependency Parsing using Spanning-Tree Algorithms" (the other main method of dependency parsing, MST parsing)
  • Machine Translation(机器翻译,如果不做机器翻译就可以跳过了,不过翻译模型在其他领域也有应用)
  1. Knight "A statistical MT tutorial workbook" (easy to understand, use instead of the original Brown paper)
  2. Och "The Alignment-Template Approach to Statistical Machine Translation" (foundations of phrase based systems)
  3. Wu "Inversion Transduction Grammars and the Bilingual Parsing of Parallel Corpora" (arguably the first realistic method for biparsing, which is used in many systems)
  4. Chiang "Hierarchical Phrase-Based Translation" (significantly improves accuracy by allowing for gappy phrases)
  • Language Modeling (语言模型)
  1. Goodman "A bit of progress in language modeling" (describes just about everything related to n-gram language models 这是一个survey,这个survey写了几乎所有和n-gram有关的东西,包括平滑 聚类)
  2. Teh "A Bayesian interpretation of Interpolated Kneser-Ney" (shows how to get state-of-the art accuracy in a Bayesian framework, opening the path for other applications)
  • Machine Learning for NLP
  1. Sutton & McCallum "An introduction to conditional random fields for relational learning" (CRF实在是在NLP中太好用了!!!!!而且我们大家都知道有很多现成的tool实现这个,而这个就是一个很简单的论文讲述CRF的,不过其实还是蛮数学= =。。。)
  2. Knight "Bayesian Inference with Tears" (explains the general idea of bayesian techniques quite well)
  3. Berg-Kirkpatrick et al. "Painless Unsupervised Learning with Features" (this is from this year and thus a bit of a gamble, but this has the potential to bring the power of discriminative methods to unsupervised learning)
  • Information Extraction
  1. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. COLING 1992. (The very first paper for all the bootstrapping methods for NLP. It is a hypothetical work in a sense that it doesn't give experimental results, but it influenced it's followers a lot.)
  2. Collins and Singer. Unsupervised Models for Named Entity Classification. EMNLP 1999. (It applies several variants of co-training like IE methods to NER task and gives the motivation why they did so. Students can learn the logic from this work for writing a good research paper in NLP.)
  • Computational Semantics
  1. Gildea and Jurafsky. Automatic Labeling of Semantic Roles. Computational Linguistics 2002. (It opened up the trends in NLP for semantic role labeling, followed by several CoNLL shared tasks dedicated for SRL. It shows how linguistics and engineering can collaborate with each other. It has a shorter version in ACL 2000.)
  2. Pantel and Lin. Discovering Word Senses from Text. KDD 2002. (Supervised WSD has been explored a lot in the early 00's thanks to the senseval workshop, but a few system actually benefits from WSD because manually crafted sense mappings are hard to obtain. These days we see a lot of evidence that unsupervised clustering improves NLP tasks such as NER, parsing, SRL, etc,

 

 

 

更多推荐

NLP-自然语言处理入门(持续更新)