I work at Ericsson R&D since 2011 designing and developing SW tools which analyze the mobile networks (GSM, UMTS and LTE) and some months ago I had the opportunity to begin a course to teach data analysis using Python to the Continuous Analysis department.
It was difficult to find an only book which had all I wanted to teach, so I decided to gather information from several sources: books, web and especially experience.
Now that the course is almost finished I found the book that I wanted at the beginning: Python Data Analysis. I enjoyed a lot reading this book because it covers almost everything you need nowadays in order to work with data, from Python scripting to NoSQL databases. It is a very good book for beginners, don't try to find very specific matters.
The first chapters are focus on the SciPy stack. My background is in engineering, so I learnt Matlab at the University, but Matlab is not a very extended tool in the companies due to its licence. You can arrive to a company and maybe they have licenses or maybe not, or even they don't have enough license to everybody. SciPy stack is a very good substitute of Matlab and is free for everybody.
The book dedicates a chapter to one of my favourite python libraries: Pandas. Pandas is a open-source library which helps you a lot in order to work and analyze data. It seems to work very similar to R, but I need to be honest, I hate to work with R, otherwise I love python and I like to work with data, so Pandas is the way.
If you work with data I am sure you work with databases. The book explains how to interact with SQL databases using ORMs like SQLAlchemy (very good tool similar to Django ORM) and Pony ORM (I had no idea about it but the graphical tool is cool) and also with the most known nosql databases like MongoDB, Redis and Cassandra.
The book also have chapters dedicated to Natural Language Analysis, Social Media Analysis, Signal Processing, Time Series, Predictive Analytic and Machine Learning. I liked a lot to have all these chapters in the same book as introductory topics, but it is not a book to go deeper and learn about these fields.
We have already mentioned Matlab and R. These two tools are very common in the data world. The book also explains how to interact with R through rpy2 interface or exchange information with Matlab or even Octave (Open source alternative). In my opinion, I prefer not to work with Python and R/Matlab/Octave at the same time, unless I already had written code.
Nowadays the Cloud is the trendy environment, where the most of people want to run their SW. The book dedicates some introductory sub-chapters to Google App Engine, Wakary and PythonAnywhere.
Finally the book offers a very interesting chapter about performance. The Data Scientist (a very trendy name about the people who work with data and try to extract useful information from it) usually don't pay attention to the performance. They usually only want to get the results. But we all know that it is not the same to wait for a short "moment" that wait for a very long "moment". You can make improve your productivity if you improve your code. This chapter was a very big surprise to me.
Python Data Analysis is a very good book for someone who wants to work with data but need to learn the basic tools to do it.
It was difficult to find an only book which had all I wanted to teach, so I decided to gather information from several sources: books, web and especially experience.
Now that the course is almost finished I found the book that I wanted at the beginning: Python Data Analysis. I enjoyed a lot reading this book because it covers almost everything you need nowadays in order to work with data, from Python scripting to NoSQL databases. It is a very good book for beginners, don't try to find very specific matters.
The first chapters are focus on the SciPy stack. My background is in engineering, so I learnt Matlab at the University, but Matlab is not a very extended tool in the companies due to its licence. You can arrive to a company and maybe they have licenses or maybe not, or even they don't have enough license to everybody. SciPy stack is a very good substitute of Matlab and is free for everybody.
The book dedicates a chapter to one of my favourite python libraries: Pandas. Pandas is a open-source library which helps you a lot in order to work and analyze data. It seems to work very similar to R, but I need to be honest, I hate to work with R, otherwise I love python and I like to work with data, so Pandas is the way.
If you work with data I am sure you work with databases. The book explains how to interact with SQL databases using ORMs like SQLAlchemy (very good tool similar to Django ORM) and Pony ORM (I had no idea about it but the graphical tool is cool) and also with the most known nosql databases like MongoDB, Redis and Cassandra.
The book also have chapters dedicated to Natural Language Analysis, Social Media Analysis, Signal Processing, Time Series, Predictive Analytic and Machine Learning. I liked a lot to have all these chapters in the same book as introductory topics, but it is not a book to go deeper and learn about these fields.
We have already mentioned Matlab and R. These two tools are very common in the data world. The book also explains how to interact with R through rpy2 interface or exchange information with Matlab or even Octave (Open source alternative). In my opinion, I prefer not to work with Python and R/Matlab/Octave at the same time, unless I already had written code.
Nowadays the Cloud is the trendy environment, where the most of people want to run their SW. The book dedicates some introductory sub-chapters to Google App Engine, Wakary and PythonAnywhere.
Finally the book offers a very interesting chapter about performance. The Data Scientist (a very trendy name about the people who work with data and try to extract useful information from it) usually don't pay attention to the performance. They usually only want to get the results. But we all know that it is not the same to wait for a short "moment" that wait for a very long "moment". You can make improve your productivity if you improve your code. This chapter was a very big surprise to me.
Python Data Analysis is a very good book for someone who wants to work with data but need to learn the basic tools to do it.