Your logical, linear guide to the fundamentals of data science programming Data science is exploding—in a good way—with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models. Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time. Get grounded: the ideal start for new data professionalsWhat lies ahead: learn about specific areas that data is transforming  Be meaningful: find out how to tell your data storySee clearly: pick up the art of visualization Whether you’re a beginning student or already mid-career, get your copy now and add even more meaning to your life—and everyone else’s!
Les mer
Introduction 1 About This Book 1 Foolish Assumptions 3 Icons Used in This Book 4 Beyond the Book 4 Where to Go from Here 5 Book 1: Defining Data Science 7 Chapter 1: Considering the History and Uses of Data Science 9 Considering the Elements of Data Science 10 Considering the emergence of data science 10 Outlining the core competencies of a data scientist 11 Linking data science, big data, and AI 12 Understanding the role of programming 12 Defining the Role of Data in the World 13 Enticing people to buy products 13 Keeping people safer 14 Creating new technologies 15 Performing analysis for research 16 Providing art and entertainment 17 Making life more interesting in other ways 18 Creating the Data Science Pipeline 18 Preparing the data 18 Performing exploratory data analysis 18 Learning from data 19 Visualizing 19 Obtaining insights and data products 19 Comparing Different Languages Used for Data Science 20 Obtaining an overview of data science languages 20 Defining the pros and cons of using Python 22 Defining the pros and cons of using R 23 Learning to Perform Data Science Tasks Fast 25 Loading data 26 Training a model 26 Viewing a result 26 Chapter 2: Placing Data Science within the Realm of AI 29 Seeing the Data to Data Science Relationship 30 Considering the data architecture 30 Acquiring data from various sources 31 Performing data analysis 32 Archiving the data 33 Defining the Levels of AI 33 Beginning with AI 34 Advancing to machine learning 39 Getting detailed with deep learning 43 Creating a Pipeline from Data to AI 47 Considering the desired output 47 Defining a data architecture 47 Combining various data sources 47 Checking for errors and fixing them 48 Performing the analysis 48 Validating the result 49 Enhancing application performance 49 Chapter 3: Creating a Data Science Lab of Your Own 51 Considering the Analysis Platform Options 52 Using a desktop system 53 Working with an online IDE 53 Considering the need for a GPU 54 Choosing a Development Language 56 Obtaining and Using Python 58 Working with Python in this book 58 Obtaining and installing Anaconda for Python 59 Defining a Python code repository 64 Working with Python using Google Colaboratory 69 Defining the limits of using Azure Notebooks with Python and R 71 Obtaining and Using R 72 Obtaining and installing Anaconda for R 72 Starting the R environment 73 Defining an R code repository 75 Presenting Frameworks 76 Defining the differences 76 Explaining the popularity of frameworks 77 Choosing a particular library 79 Accessing the Downloadable Code 80 Chapter 4: Considering Additional Packages and Libraries You Might Want 81 Considering the Uses for Third-Party Code 82 Obtaining Useful Python Packages 83 Accessing scientific tools using SciPy 84 Performing fundamental scientific computing using NumPy 85 Performing data analysis using pandas 85 Implementing machine learning using Scikit-learn 86 Going for deep learning with Keras and TensorFlow 86 Plotting the data using matplotlib 87 Creating graphs with NetworkX 88 Parsing HTML documents using Beautiful Soup 88 Locating Useful R Libraries 89 Using your Python code in R with reticulate 89 Conducting advanced training using caret 90 Performing machine learning tasks using mlr 90 Visualizing data using ggplot2 91 Enhancing ggplot2 using esquisse 91 Creating graphs with igraph 91 Parsing HTML documents using rvest 92 Wrangling dates using lubridate 92 Making big data simpler using dplyr and purrr 93 Chapter 5: Leveraging a Deep Learning Framework 95 Understanding Deep Learning Framework Usage 96 Working with Low-End Frameworks 97 Chainer 97 PyTorch 98 MXNet 98 Microsoft Cognitive Toolkit/CNTK 99 Understanding TensorFlow 100 Grasping why TensorFlow is so good 101 Making TensorFlow easier by using TFLearn 102 Using Keras as the best simplifier 102 Getting your copy of TensorFlow and Keras 103 Fixing the C++ build tools error in Windows 106 Accessing your new environment in Notebook 108 Book 2: Interacting with Data Storage 109 Chapter 1: Manipulating Raw Data 111 Defining the Data Sources 112 Obtaining data locally 112 Using online data sources 117 Employing dynamic data sources 121 Considering other kinds of data sources 123 Considering the Data Forms 124 Working with pure text 124 Accessing formatted text 125 Deciphering binary data 126 Understanding the Need for Data Reliability 128 Chapter 2: Using Functional Programming Techniques 131 Defining Functional Programming 132 Differences with other programming paradigms 132 Understanding its goals 133 Understanding Pure and Impure Languages 134 Using the pure approach 134 Using the impure approach 134 Comparing the Functional Paradigm 135 Imperative 135 Procedural 136 Object-oriented 136 Declarative 136 Using Python for Functional Programming Needs 137 Understanding How Functional Data Works 138 Working with immutable data 139 Considering the role of state 139 Eliminating side effects 140 Passing by reference versus by value 140 Working with Lists and Strings 142 Creating lists 144 Evaluating lists 144 Performing common list manipulations 146 Understanding the Dict and Set alternatives 147 Considering the use of strings 148 Employing Pattern Matching 150 Looking for patterns in data 150 Understanding regular expressions 152 Using pattern matching in analysis 155 Working with pattern matching 156 Working with Recursion 159 Performing tasks more than once 159 Understanding recursion 161 Using recursion on lists 162 Considering advanced recursive tasks 163 Passing functions instead of variables 164 Performing Functional Data Manipulation 165 Slicing and dicing 166 Mapping your data 167 Filtering data 168 Organizing data 169 Chapter 3: Working with Scalars, Vectors, and Matrices 171 Considering the Data Forms 172 Defining Data Type through Scalars 173 Creating Organized Data with Vectors 174 Defining a vector 175 Creating vectors of a specific type 175 Performing math on vectors 176 Performing logical and comparison tasks on vectors 176 Multiplying vectors 177 Creating and Using Matrices 178 Creating a matrix 178 Creating matrices of a specific type 179 Using the matrix class 181 Performing matrix multiplication 181 Executing advanced matrix operations 183 Extending Analysis to Tensors 185 Using Vectorization Effectively 186 Selecting and Shaping Data 187 Slicing rows 188 Slicing columns 188 Dicing 189 Concatenating 189 Aggregating 194 Working with Trees 195 Understanding the basics of trees 195 Building a tree 196 Representing Relations in a Graph 198 Going beyond trees 198 Arranging graphs 199 Chapter 4: Accessing Data in Files 201 Understanding Flat File Data Sources 202 Working with Positional Data Files 203 Accessing Data in CSV Files 205 Working with a simple CSV file 205 Making use of header information 208 Moving On to XML Files 209 Working with a simple XML file 209 Parsing XML 211 Using XPath for data extraction 212 Considering Other Flat-File Data Sources 214 Working with Nontext Data 215 Downloading Online Datasets 218 Working with package datasets 218 Using public domain datasets 219 Chapter 5: Working with a Relational DBMS 223 Considering RDBMS Issues 224 Defining the use of tables 225 Understanding keys and indexes 226 Using local versus online databases 227 Working in read-only mode 228 Accessing the RDBMS Data 228 Using the SQL language 229 Relying on scripts 231 Relying on views 231 Relying on functions 232 Creating a Dataset 233 Combining data from multiple tables 233 Ensuring data completeness 234 Slicing and dicing the data as needed 234 Mixing RDBMS Products 234 Chapter 6: Working with a NoSQL DMBS 237 Considering the Ramifications of Hierarchical Data 238 Understanding hierarchical organization 238 Developing strategies for freeform data 239 Performing an analysis 240 Working around dangling data 241 Accessing the Data 243 Creating a picture of the data form 243 Employing the correct transiting strategy 244 Ordering the data 247 Interacting with Data from NoSQL Databases 248 Working with Dictionaries 249 Developing Datasets from Hierarchical Data 250 Processing Hierarchical Data into Other Forms 251 Book 3: Manipulating Data Using Basic Algorithms 253 Chapter 1: Working with Linear Regression 255 Considering the History of Linear Regression 256 Combining Variables 257 Working through simple linear regression 257 Advancing to multiple linear regression 260 Considering which question to ask 262 Reducing independent variable complexity 263 Manipulating Categorical Variables 265 Creating categorical variables 266 Renaming levels 267 Combining levels 268 Using Linear Regression to Guess Numbers 269 Defining the family of linear models 270 Using more variables in a larger dataset 271 Understanding variable transformations 274 Doing variable transformations 275 Creating interactions between variables 277 Understanding limitations and problems 282 Learning One Example at a Time 283 Using Gradient Descent 283 Implementing Stochastic Gradient Descent 283 Considering the effects of regularization 287 Chapter 2: Moving Forward with Logistic Regression 289 Considering the History of Logistic Regression 290 Differentiating between Linear and Logistic Regression 291 Considering the model 291 Defining the logistic function 292 Understanding the problems that logistic regression solves 294 Fitting the curve 295 Considering a pass/fail example 296 Using Logistic Regression to Guess Classes 297 Applying logistic regression 297 Considering when classes are more 298 Defining logistic regression performance 300 Switching to Probabilities 301 Specifying a binary response 301 Transforming numeric estimates into probabilities 302 Working through Multiclass Regression 305 Understanding multiclass regression 305 Developing a multiclass regression implementation 306 Chapter 3: Predicting Outcomes Using Bayes 309 Understanding Bayes’ Theorem 310 Delving into Bayes history 310 Considering the basic theorem 312 Using Naïve Bayes for Predictions 313 Finding out that Naïve Bayes isn’t so naïve 314 Predicting text classifications 315 Getting an overview of Bayesian inference 318 Working with Networked Bayes 324 Considering the network types and uses 324 Understanding Directed Acyclic Graphs (DAGs) 327 Employing networked Bayes in predictions 328 Deciding between automated and guided learning 332 Considering the Use of Bayesian Linear Regression 332 Considering the Use of Bayesian Logistic Regression 333 Chapter 4: Learning with K-Nearest Neighbors 335 Considering the History of K-Nearest Neighbors 336 Learning Lazily with K-Nearest Neighbors 337 Understanding the basis of KNN 337 Predicting after observing neighbors 338 Choosing the k parameter wisely 341 Leveraging the Correct k Parameter 342 Understanding the k parameter 342 Experimenting with a flexible algorithm 343 Implementing KNN Regression 345 Implementing KNN Classification 347 Book 4: Performing Advanced Data Manipulation 351 Chapter 1: Leveraging Ensembles of Learners 353 Leveraging Decision Trees 354 Growing a forest of trees 356 Seeing Random Forests in action 358 Understanding the importance measures 360 Configuring your system for importance measures with Python 361 Seeing importance measures in action 361 Working with Almost Random Guesses 364 Understanding the premise 365 Bagging predictors with AdaBoost 366 Meeting Again with Gradient Descent 369 Understanding the GBM difference 369 Seeing GBM in action 371 Averaging Different Predictors 372 Chapter 2: Building Deep Learning Models 373 Discovering the Incredible Perceptron 374 Understanding perceptron functionality 375 Touching the nonseparability limit 376 Hitting Complexity with Neural Networks 378 Considering the neuron 379 Pushing data with feed-forward 381 Defining hidden layers 383 Executing operations 384 Considering the details of data movement through the neural network 386 Using backpropagation to adjust learning 387 Understanding More about Neural Networks 390 Getting an overview of the neural network process 391 Defining the basic architecture 391 Documenting the essential modules 393 Solving a simple problem 396 Looking Under the Hood of Neural Networks 399 Choosing the right activation function 399 Relying on a smart optimizer 401 Setting a working learning rate 402 Explaining Deep Learning Differences with Other Forms of AI 402 Adding more layers 403 Changing the activations 405 Adding regularization by dropout 406 Using online learning 407 Transferring learning 407 Learning end to end 408 Chapter 3: Recognizing Images with CNNs 409 Beginning with Simple Image Recognition 410 Considering the ramifications of sight 410 Working with a set of images 411 Extracting visual features 417 Recognizing faces using Eigenfaces 419 Classifying images 423 Understanding CNN Image Basics 427 Moving to CNNs with Character Recognition 429 Accessing the dataset 430 Reshaping the dataset 431 Encoding the categories 432 Defining the model 432 Using the model 433 Explaining How Convolutions Work 435 Understanding convolutions 435 Simplifying the use of pooling 439 Describing the LeNet architecture 440 Detecting Edges and Shapes from Images 446 Visualizing convolutions 447 Unveiling successful architectures 449 Discussing transfer learning 450 Chapter 4: Processing Text and Other Sequences 453 Introducing Natural Language Processing 454 Defining the human perspective as it relates to data science 454 Considering the computer perspective as it relates to data science 455 Understanding How Machines Read 456 Creating a corpus 457 Performing feature extraction 457 Understanding the BoW 458 Processing and enhancing text 459 Maintaining order using n-grams 461 Stemming and removing stop words 462 Scraping textual datasets from the web 465 Handling problems with raw text 470 Storing processed text data in sparse matrices 473 Understanding Semantics Using Word Embeddings 478 Using Scoring and Classification 482 Performing classification tasks 482 Analyzing reviews from e-commerce 485 Book 5: Performing Data-Related Tasks 491 Chapter 1: Making Recommendations 493 Realizing the Recommendation Revolution 494 Downloading Rating Data 495 Navigating through anonymous web data 496 Encountering the limits of rating data 499 Leveraging SVD 506 Considering the origins of SVD 506 Understanding the SVD connection 508 Chapter 2: Performing Complex Classifications 509 Using Image Classification Challenges 510 Delving into ImageNet and Coco 511 Learning the magic of data augmentation 513 Distinguishing Traffic Signs 516 Preparing the image data 517 Running a classification task 520 Chapter 3: Identifying Objects 525 Distinguishing Classification Tasks 526 Understanding the problem 526 Performing localization 527 Classifying multiple objects 528 Annotating multiple objects in images 529 Segmenting images 530 Perceiving Objects in Their Surroundings 531 Considering vision needs in self-driving cars 531 Discovering how RetinaNet works 532 Using the Keras-RetinaNet code 534 Overcoming Adversarial Attacks on Deep Learning Applications 538 Tricking pixels 539 Hacking with stickers and other artifacts 541 Chapter 4: Analyzing Music and Video 543 Learning to Imitate Art and Life 544 Transferring an artistic style 545 Reducing the problem to statistics 546 Understanding that deep learning doesn’t create 548 Mimicking an Artist 548 Defining a new piece based on a single artist 549 Combining styles to create new art 550 Visualizing how neural networks dream 551 Using a network to compose music 551 Other creative avenues 552 Moving toward GANs 553 Finding the key in the competition 554 Considering a growing field 556 Chapter 5: Considering Other Task Types 559 Processing Language in Texts 560 Considering the processing methodologies 560 Defining understanding as tokenization 561 Putting all the documents into a bag 562 Using AI for sentiment analysis 566 Processing Time Series 574 Defining sequences of events 574 Performing a prediction using LSTM 575 Chapter 6: Developing Impressive Charts and Plots 579 Starting a Graph, Chart, or Plot 580 Understanding the differences between graphs, charts, and plots 580 Considering the graph, chart, and plot types 582 Defining the plot 583 Drawing multiple lines 584 Drawing multiple plots 584 Saving your work 586 Setting the Axis, Ticks, and Grids 587 Getting the axis 587 Formatting the ticks 590 Adding grids 590 Defining the Line Appearance 591 Working with line styles 592 Adding markers 593 Using Labels, Annotations, and Legends 594 Adding labels 595 Annotating the chart 596 Creating a legend 598 Creating Scatterplots 599 Depicting groups 599 Showing correlations 600 Plotting Time Series 603 Representing time on axes 604 Plotting trends over time 605 Plotting Geographical Data 608 Getting the toolkit 608 Drawing the map 609 Plotting the data 613 Visualizing Graphs 615 Understanding the adjacency matrix 615 Using NetworkX basics 615 Book 6: Diagnosing and Fixing Errors 619 Chapter 1: Locating Errors in Your Data 621 Considering the Types of Data Errors 622 Obtaining the Required Data 624 Considering the data sources 624 Obtaining reliable data 625 Making human input more reliable 626 Using automated data collection 628 Validating Your Data 629 Figuring out what’s in your data 629 Removing duplicates 631 Creating a data map and a data plan 632 Manicuring the Data 634 Dealing with missing data 634 Considering data misalignments 639 Separating out useful data 640 Dealing with Dates in Your Data 640 Formatting date and time values 641 Using the right time transformation 641 Chapter 2: Considering Outrageous Outcomes 643 Deciding What Outrageous Means 644 Considering the Five Mistruths in Data 645 Commission 645 Omission 646 Perspective 646 Bias 647 Frame-of-reference 648 Considering Detection of Outliers 649 Understanding outlier basics 649 Finding more things that can go wrong 651 Understanding anomalies and novel data 651 Examining a Simple Univariate Method 653 Using the pandas package 653 Leveraging the Gaussian distribution 655 Making assumptions and checking out 656 Developing a Multivariate Approach 657 Using principle component analysis 658 Using cluster analysis 659 Automating outliers detection with Isolation Forests 661 Chapter 3: Dealing with Model Overfitting and Underfitting 663 Understanding the Causes 664 Considering the problem 664 Looking at underfitting 665 Looking at overfitting 666 Plotting learning curves for insights 668 Determining the Sources of Overfitting and Underfitting 670 Understanding bias and variance 671 Having insufficient data 671 Being fooled by data leakage 672 Guessing the Right Features 672 Selecting variables like a pro 673 Using nonlinear transformations 676 Regularizing linear models 684 Chapter 4: Obtaining the Correct Output Presentation 689 Considering the Meaning of Correct 690 Determining a Presentation Type 691 Considering the audience 691 Defining a depth of detail 692 Ensuring that the data is consistent with audience needs 693 Understanding timeliness 693 Choosing the Right Graph 694 Telling a story with your graphs 694 Showing parts of a whole with pie charts 694 Creating comparisons with bar charts 695 Showing distributions using histograms 697 Depicting groups using boxplots 699 Defining a data flow using line graphs 700 Seeing data patterns using scatterplots 701 Working with External Data 702 Embedding plots and other images 703 Loading examples from online sites 703 Obtaining online graphics and multimedia 704 Chapter 5: Developing Consistent Strategies 707 Standardizing Data Collection Techniques 707 Using Reliable Sources 709 Verifying Dynamic Data Sources 711 Considering the problem 712 Analyzing streams with the right recipe 714 Looking for New Data Collection Trends 715 Weeding Old Data 716 Considering the Need for Randomness 717 Considering why randomization is needed 718 Understanding how probability works 718 Index 721
Les mer
Your complete guide to data science programming This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear and logistic regression, ensembles of learners, deep neural networks, recommenders, optimization and validation of models. However, it isn't all about using math to manipulate the data. You use the math to perform the very same machine learning and deep learning tasks that make doctors more efficient, reduce traffic accidents and solve business problems, as well as many other problems of daily life. Knowing the math also enables you to do things like analyze image and audio data, create graphics that allow others to understand the data, and even possibly become an artist. 6 Books Inside… Defining Data ScienceInteracting with Data StorageManipulating Data Using Basic AlgorithmsPerforming Advanced Data ManipulationPerforming Data-related TasksDiagnosing and Fixing Errors
Les mer

Produktdetaljer

ISBN
9781119626114
Publisert
2020-02-10
Utgiver
Vendor
For Dummies
Vekt
998 gr
Høyde
234 mm
Bredde
188 mm
Dybde
43 mm
Aldersnivå
G, 01
Språk
Product language
Engelsk
Format
Product format
Heftet
Antall sider
768

Biographical note

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.