Social Media Text Mining

author:Stephen J. Turnbull
organization:Faculty of Engineering, Information, and Systems at the University of Tsukuba
contact:Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp>
date:September 10, 2019
copyright:2018-2019, Stephen J. Turnbull
topic:STEM, social networks, social media, text mining

Annotated Bibliography

Planned: Several tutorial-style books on using Python, social media, and statistical software.

Each entry contains the usual bibliographical data (title, author(s), publisher, and publication date), as well as page count and ISBN.

Pattern Recognition and Machine Learning

Christopher M. Bishop, New York: Springer, 2006. 758 pp., ISBN: 978-0-387-31073-2

Explains “the algorithms” of machine learning (as in Google's “page rank” and “the algorithms” of Facebook and Twitter that ensure that your interest is reflected in your feed), starting with basic probability and statistics using college-level mathematics (calculus). (Sorry, this is generic discussion, not specific to those famous algorithms!)

Mexico’s misinformation wars: How organized troll networks attack and harass journalists and activists in Mexico

Amnesty International

https://medium.com/amnesty-insights/mexico-s-misinformation-wars-cb748ecb32e9

A discussion of tracking “Twitter trolls” in Mexico. I've also heard a podcast on this but haven't researched it yet ... watch this space!

The Twitter API reference index

Twitter, Inc.

https://developer.twitter.com/en/docs/api-reference-index.html

All the things you can do with the Twitter API!

The Python twitter module documentation

Mike Verdone

https://pypi.org/project/twitter/

The Scikit-Learn documentation

http://scikit-learn.org/stable/documentation.html

Software: Where to get it

Turnbull Lab

Planned: This section will also contain some software developed in Turnbull Lab to make simple use of Twitter APIs and data science software straightforward.

Planned: Distribution of USB keys with pre-installed Anaconda environments.

Open Source Software

"Hacking"

Some students expressed interest in hacking. I don't do "black hat" hacking, either as a user or as a researcher. I am interested in "white hat" hacking, aka "computer and network security", both as a software developer and as a researcher. I recommend the following sources for those interested:

  • Clifford Stoll, The Cuckoo's Egg (book, also a movie I hear)
  • Bellovin and Cheswick, Firewalls and Internet Security (book)
  • Patrick Gray and friends, Risky Business (weekly podcast)
  • Rational Security (weekly podcast), mostly about law not hacking
  • Cyberlaw (weekly podcast), ditto
  • Tavis Ormandy, @taviso on Twitter (Google hacker)
  • Bruce Schneier (blog and books, Google him; @bruceschneier is not him, and his @schneierblog account tweets very rarely) famous for the "security mindset"
  • Kate Moussouris, @k8em0 on Twitter, designed Microsoft's "bug bounty" program
  • Jek Hyde, @HydeNS33k on Twitter, professional "red teamer" (she is paid by companies to break into their systems if she can to test their security, has great stories when she tells them)