Data Science Projects with Python: A case study approach to successful data science projects using Python, pandas, and scikit-learn

book cover
Buy it on Amazon

There are a lot of great data science and machine learning books out there. But most of them don’t discuss what it’s actually like to be a data scientist. What challenges will you face day-to-day? How do you make sure that your work will be useful to stakeholders? This book presents an end-to-end data science project as it may be encountered in a professional setting, with hands-on coding exercises and activities.

In this book, you will learn how to explore data and create compelling visualizations, using Python and its rich set of open-source data science packages. There are also deep dives in to foundational modeling techniques. You’ll develop a thorough understanding of the logistic regression and random forest classification methods. By the end, you’ll see how a data science project life cycle works, and know what you need to do along the way to make sure the project succeeds, by anticipating questions such as:

  • What are the circumstances under which data may come to you?
  • How do you communicate with the client who depends on your services?
  • How can you demonstrate the business value that your model creates?

This book is aimed at business analysts, IT professionals, and other aspiring data scientists. Ideal prequisites are comfort with math through at least algebra, basic statistics, and the fundamental ideas of computer programming in any language. I hope you find it useful on your journey to becoming a data scientist.

Please let me know if you find errors, or have a question or other feedback, by using this form.

Errors

Page # (print edition) Paragraph Error
2 Last paragraph “index-1” should be “index -1”, i.e. with a space before “-1”.
45 Code example f_clean_2.groupby('EDUCATION')... should be df_clean_2.groupby('EDUCATION')...
61 Link in Note The link is incorrect, it should be http://bit.ly/2W9cwPH
239 2nd code example cross_ent = -1*((pm0*np.log(pm0)) + (mp1*np.log(mp1))) should be cross_ent = -1*((pm0*np.log(pm0)) + (pm1*np.log(pm1))). In other words mp1 should be pm1 here.

Q&A

Ask a question