Dig deep into the data with a hands-on guide to machine learning Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals. The book contains a breakdown of each ML variant, explaining how it works and how it is used within certain industries, allowing readers to incorporate the presented techniques into their own work as they follow along. A core tenant of machine learning is a strong focus on data preparation, and a full exploration of the various types of learning algorithms illustrates how the proper tools can help any developer extract information and insights from existing data. The book includes a full complement of Instructor's Materials to facilitate use in the classroom, making this resource useful for students and as a professional reference. At its core, machine learning is a mathematical, algorithm-based technology that forms the basis of historical data mining and modern big data science. Scientific analysis of big data requires a working knowledge of machine learning, which forms predictions based on known properties learned from training data. Machine Learning is an accessible, comprehensive guide for the non-mathematician, providing clear guidance that allows readers to: * Learn the languages of machine learning including Hadoop, Mahout, and Weka * Understand decision trees, Bayesian networks, and artificial neural networks * Implement Association Rule, Real Time, and Batch learning * Develop a strategic plan for safe, effective, and efficient machine learning By learning to construct a system that can learn from data, readers can increase their utility across industries. Machine learning sits at the core of deep dive data analysis and visualization, which is increasingly in demand as companies discover the goldmine hiding in their existing data. For the tech professional involved in data science, Machine Learning: Hands-On for Developers and Technical Professionals provides the skills and techniques required to dig deeper.
Les mer
Dig deep into the data with a hands-on guide to machine learning Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals.
Les mer
Introduction xix Chapter 1 What Is Machine Learning? 1 History of Machine Learning 1 Alan Turing 1 Arthur Samuel 2 Tom M. Mitchell 2 Summary Definition 2 Algorithm Types for Machine Learning 3 Supervised Learning 3 Unsupervised Learning 3 The Human Touch 4 Uses for Machine Learning 4 Software 4 Stock Trading 5 Robotics 6 Medicine and Healthcare 6 Advertising 6 Retail and E-Commerce 7 Gaming Analytics 8 The Internet of Things 9 Languages for Machine Learning 10 Python 10 R 10 Matlab 10 Scala 10 Clojure 11 Ruby 11 Software Used in This Book 11 Checking the Java Version 11 Weka Toolkit 12 Mahout 12 SpringXD 13 Hadoop 13 Using an IDE 14 Data Repositories 14 UC Irvine Machine Learning Repository 14 Infochimps 14 Kaggle 15 Summary 15 Chapter 2 Planning for Machine Learning 17 The Machine Learning Cycle 17 It All Starts with a Question 18 I Don t Have Data! 19 Starting Local 19 Competitions 19 One Solution Fits All? 20 Defining the Process 20 Planning 20 Developing 21 Testing 21 Reporting 21 Refining 22 Production 22 Building a Data Team 22 Mathematics and Statistics 22 Programming 23 Graphic Design 23 Domain Knowledge 23 Data Processing 23 Using Your Computer 24 A Cluster of Machines 24 Cloud-Based Services 24 Data Storage 25 Physical Discs 25 Cloud-Based Storage 25 Data Privacy 25 Cultural Norms 25 Generational Expectations 26 The Anonymity of User Data 26 Don t Cross The Creepy Line 27 Data Quality and Cleaning 28 Presence Checks 28 Type Checks 29 Length Checks 29 Range Checks 30 Format Checks 30 The Britney Dilemma 30 What s in a Country Name? 33 Dates and Times 35 Final Thoughts on Data Cleaning 35 Thinking about Input Data 36 Raw Text 36 Comma Separated Variables 36 JSON 37 YAML 39 XML 39 Spreadsheets 40 Databases 41 Thinking about Output Data 42 Don t Be Afraid to Experiment 42 Summary 43 Chapter 3 Working with Decision Trees 45 The Basics of Decision Trees 45 Uses for Decision Trees 45 Advantages of Decision Trees 46 Limitations of Decision Trees 46 Different Algorithm Types 47 How Decision Trees Work 48 Decision Trees in Weka 53 The Requirement 53 Training Data 53 Using Weka to Create a Decision Tree 55 Creating Java Code from the Classifi cation 60 Testing the Classifi er Code 64 Thinking about Future Iterations 66 Summary 67 Chapter 4 Bayesian Networks 69 Pilots to Paperclips 69 A Little Graph Theory 70 A Little Probability Theory 72 Coin Flips 72 Conditional Probability 72 Winning the Lottery 73 Bayes Theorem 73 How Bayesian Networks Work 75 Assigning Probabilities 76 Calculating Results 77 Node Counts 78 Using Domain Experts 78 A Bayesian Network Walkthrough 79 Java APIs for Bayesian Networks 79 Planning the Network 79 Coding Up the Network 81 Summary 90 Chapter 5 Artificial Neural Networks 91 What Is a Neural Network? 91 Artificial Neural Network Uses 92 High-Frequency Trading 92 Credit Applications 93 Data Center Management 93 Robotics 93 Medical Monitoring 93 Breaking Down the Artifi cial Neural Network 94 Perceptrons 94 Activation Functions 95 Multilayer Perceptrons 96 Back Propagation 98 Data Preparation for Artifi cial Neural Networks 99 Artificial Neural Networks with Weka 100 Generating a Dataset 100 Loading the Data into Weka 102 Configuring the Multilayer Perceptron 103 Training the Network 105 Altering the Network 108 Increasing the Test Data Size 108 Implementing a Neural Network in Java 109 Create the Project 109 The Code 111 Converting from CSV to Arff 114 Running the Neural Network 114 Summary 115 Chapter 6 Association Rules Learning 117 Where Is Association Rules Learning Used? 117 Web Usage Mining 118 Beer and Diapers 118 How Association Rules Learning Works 119 Support 121 Confidence 121 Lift 122 Conviction 122 Defining the Process 122 Algorithms 123 Apriori 123 FP-Growth 124 Mining the Baskets A Walkthrough 124 Downloading the Raw Data 124 Setting Up the Project in Eclipse 125 Setting Up the Items Data File 126 Setting Up the Data 129 Running Mahout 131 Inspecting the Results 133 Putting It All Together 135 Further Development 136 Summary 137 Chapter 7 Support Vector Machines 139 What Is a Support Vector Machine? 139 Where Are Support Vector Machines Used? 140 The Basic Classifi cation Principles 140 Binary and Multiclass Classifi cation 140 Linear Classifi ers 142 Confidence 143 Maximizing and Minimizing to Find the Line 143 How Support Vector Machines Approach Classifi cation 144 Using Linear Classifi cation 144 Using Non-Linear Classifi cation 146 Using Support Vector Machines in Weka 147 Installing LibSVM 147 A Classification Walkthrough 148 Implementing LibSVM with Java 154 Summary 159 Chapter 8 Clustering 161 What Is Clustering? 161 Where Is Clustering Used? 162 The Internet 162 Business and Retail 163 Law Enforcement 163 Computing 163 Clustering Models 164 How the K-Means Works 164 Calculating the Number of Clusters in a Dataset 166 K-Means Clustering with Weka 168 Preparing the Data 168 The Workbench Method 169 The Command-Line Method 174 The Coded Method 178 Summary 186 Chapter 9 Machine Learning in Real Time with Spring XD 187 Capturing the Firehose of Data 187 Considerations of Using Data in Real Time 188 Potential Uses for a Real-Time System 188 Using Spring XD 189 Spring XD Streams 190 Input Sources, Sinks, and Processors 190 Learning from Twitter Data 193 The Development Plan 193 Configuring the Twitter API Developer Application 194 Configuring Spring XD 196 Starting the Spring XD Server 197 Creating Sample Data 198 The Spring XD Shell 198 Streams 101 199 Spring XD and Twitter 202 Setting the Twitter Credentials 202 Creating Your First Twitter Stream 203 Where to Go from Here 205 Introducing Processors 206 How Processors Work within a Stream 206 Creating Your Own Processor 207 Real-Time Sentiment Analysis 215 How the Basic Analysis Works 215 Creating a Sentiment Processor 217 Spring XD Taps 221 Summary 222 Chapter 10 Machine Learning as a Batch Process 223 Is It Big Data? 223 Considerations for Batch Processing Data 224 Volume and Frequency 224 How Much Data? 225 Which Process Method? 225 Practical Examples of Batch Processes 225 Hadoop 225 Sqoop 226 Pig 226 Mahout 226 Cloud-Based Elastic Map Reduce 226 A Note about the Walkthroughs 227 Using the Hadoop Framework 227 The Hadoop Architecture 227 Setting Up a Single-Node Cluster 229 How MapReduce Works 233 Mining the Hashtags 234 Hadoop Support in Spring XD 235 Objectives for This Walkthrough 235 What s a Hashtag? 235 Creating the MapReduce Classes 236 Performing ETL on Existing Data 247 Product Recommendation with Mahout 250 Mining Sales Data 256 Welcome to My Coffee Shop! 257 Going Small Scale 258 Writing the Core Methods 258 Using Hadoop and MapReduce 260 Using Pig to Mine Sales Data 263 Scheduling Batch Jobs 273 Summary 274 Chapter 11 Apache Spark 275 Spark: A Hadoop Replacement? 275 Java, Scala, or Python? 276 Scala Crash Course 276 Installing Scala 276 Packages 277 Data Types 277 Classes 278 Calling Functions 278 Operators 279 Control Structures 279 Downloading and Installing Spark 280 A Quick Intro to Spark 280 Starting the Shell 281 Data Sources 282 Testing Spark 282 Spark Monitor 284 Comparing Hadoop MapReduce to Spark 285 Writing Standalone Programs with Spark 288 Spark Programs in Scala 288 Installing SBT 288 Spark Programs in Java 291 Spark Program Summary 295 Spark SQL 295 Basic Concepts 295 Using SparkSQL with RDDs 296 Spark Streaming 305 Basic Concepts 305 Creating Your First Stream with Scala 306 Creating Your First Stream with Java 309 MLib: The Machine Learning Library 311 Dependencies 311 Decision Trees 312 Clustering 313 Summary 313 Chapter 12 Machine Learning with R 315 Installing R 315 Mac OSX 315 Windows 316 Linux 316 Your First Run 316 Installing R-Studio 317 The R Basics 318 Variables and Vectors 318 Matrices 319 Lists 320 Data Frames 321 Installing Packages 322 Loading in Data 323 Plotting Data 324 Simple Statistics 327 Simple Linear Regression 329 Creating the Data 329 The Initial Graph 329 Regression with the Linear Model 330 Making a Prediction 331 Basic Sentiment Analysis 331 Functions to Load in Word Lists 331 Writing a Function to Score Sentiment 332 Testing the Function 333 Apriori Association Rules 333 Installing the ARules Package 334 The Training Data 334 Importing the Transaction Data 335 Running the Apriori Algorithm 336 Inspecting the Results 336 Accessing R from Java 337 Installing the rJava Package 337 Your First Java Code in R 337 Calling R from Java Programs 338 Setting Up an Eclipse Project 338 Creating the Java/R Class 339 Running the Example 340 Extending Your R Implementations 342 R and Hadoop 342 The RHadoop Project 342 A Sample Map Reduce Job in RHadoop 343 Connecting to Social Media with R 345 Summary 347 Appendix A SpringXD Quick Start 349 Installing Manually 349 Starting SpringXD 349 Creating a Stream 350 Adding a Twitter Application Key 350 Appendix B Hadoop 1.x Quick Start 351 Downloading and Installing Hadoop 351 Formatting the HDFS Filesystem 352 Starting and Stopping Hadoop 353 Process List of a Basic Job 353 Appendix C Useful Unix Commands 355 Using Sample Data 355 Showing the Contents: cat, more, and less 356 Example Command 356 Expected Output 356 Filtering Content: grep 357 Example Command for Finding Text 357 Example Output 357 Sorting Data: sort 358 Example Command for Basic Sorting 358 Example Output 358 Finding Unique Occurrences: uniq 360 Showing the Top of a File: head 361 Counting Words: wc 361 Locating Anything: fi nd 362 Combining Commands and Redirecting Output 363 Picking a Text Editor 363 Colon Frenzy: Vi and Vim 363 Nano 364 Emacs 364 Appendix D Further Reading 367 Machine Learning 367 Statistics 368 Big Data and Data Science 368 Hadoop 368 Visualization 369 Making Decisions 369 Datasets 369 Blogs 370 Useful Websites 370 The Tools of the Trade 370 Index 373
Les mer

Produktdetaljer

ISBN
9781118889060
Publisert
2014-12-30
Utgiver
Vendor
John Wiley & Sons Inc
Vekt
692 gr
Høyde
235 mm
Bredde
188 mm
Dybde
19 mm
Aldersnivå
06, P
Språk
Product language
Engelsk
Format
Product format
Heftet
Antall sider
408

Forfatter

Biographical note

Jason Bell has been working with point of sale and customer loyalty data since 2002 and has been involved in software development for more than 25 years. He works as a senior technical architect, lecturer and also advises startups that are just beginning their technical adventures.