As per the Harvard Business Review, data scientist is the sexiest job of the 21st century. ‘Just a small clarification: the profile is sexy, the people are not’, says our data scientist (and we couldn’t agree more :P).
Raise your hand if you’re totally confused with all those data related terms you hear. I, for once, was very confused. Don’t worry, I’ll help you decode the enigma and ‘analyse’ (pun intended) these terms bit by bit. But first, let’s see how data is generated. Everytime you click a link, post a picture on Instagram, like Facebook pages, buy clothes from Myntra, tweet a message, or send nudges to your friends, data is generated and fed into the system’s database. Now, once you have data, you’ll need a data scientist to derive value from it.
Data science is an umbrella term for all things data related – data analytics, machine learning, data mining, big data, and others. Data science involves not only drawing insights and trends from the data collected over a certain span of time but also creating intelligent systems and developing predictive models, prototypes, and algorithms. Data analytics is the process of inspecting data, finding problem areas, making hypotheses, generating insights from the data, and eventually recommending solutions for the betterment of the product. To put it simply, data analytics involves breaking a larger problem into smaller problems based on the data collected so far, whereas data science involves employing predictive modelling to solve a problem, i.e. predicting what’ll happen in the future based on the data analysis performed.
Let’s consider the following cases for a better understanding of the distinction between the two terms. Suppose, we want to run an ad for our web development training. What concerns us is the audience for it. Now, we’ve been running the training for almost three years, and we have a lot of data regarding the students who enroll for it. We’ll analyse this data and come up with a concrete solution, which would be to show it to 2nd and 3rd year CS, IT students and all final year students. This is data analysis! Now, if we want to personalize the training that we recommend to a student, we’ll need to employ machine learning algorithms which take into consideration a student’s resume & preferences and suggest a training she is likely to take up. This constitutes data science.
Big Data Analytics is same as data analytics with the only difference being, it involves working on data of humongous volume and velocity. Big data is categorized as structured data, i.e. the data collected by services, products, and electronic devices, and unstructured data, i.e. the data that comes from human input such as customer review.
Machine Learning is a type of artificial intelligence that teaches the system to learn and take decisions when exposed to a new set of data on the basis of the experience it gains while performing different actions. It uses pattern recognition, computational theories, and algorithms to provide computers with the ability to learn without being explicitly programmed. Netflix movie recommendations and Amazon’s ‘You may also like’ are some fine examples of machine learning wherein the system recognizes patterns in the movies you watch or products you buy and presents you with related suggestions.
Sounds interesting! What skills do I need to become a data scientist?
You should have a balanced mix of left and right brain skills, i.e. you should be excellent with numbers and have a ‘curiosity ka keeda’ for any data-related job. There are certain technical skills too which are important; let’s take a look at them.
1. Programming languages – To start your journey as a data scientist, you need to have a sound knowledge of either of the three languages – Python, Java, or R.
a. Java: It is a high performance, general purpose, compiled language which makes it suitable for writing complex machine learning algorithms. It allows data science methods to be integrated directly into the existing codebase. It is fast and extremely scalable and is thus used by most startups for their product development.
b. Python: Python makes an excellent choice for data science and not just at an entry level. Even for advanced machine learning applications, Python leads the way with Pandas, Tensorflow, and Scikit-learn. Python is extremely powerful and easy to learn, thus recommended (Even NASA uses it!).
c. R: R is the lingua franca of data science! It allows you to carry out almost all quantitative and statistical applications. Neural networks, nonlinear regression, matrix algebra, advanced plotting – it handles them all! And this is what makes it the most preferred language to perform statistical analysis on large datasets.
d. SQL: SQL is like Excel on steroids! To operate on data and drive the inputs in a manner so as to achieve the predicted outcome, you first need data. And what do you need to extract data? SQL (or NoSQL)! Organisations these days have huge databases to store all their data, so you need to be a master of this trade. No second thoughts!
2. Ms-Excel – Now that you’re only taking your first steps into data science and R seems too intimidating with the cocktail of features that it offers, Excel is here to your rescue! For basic statistical modelling, Excel proves to be a great tool. You can take up this MS-Excel training for a comprehensive understanding of Excel concepts.
3. Statistics and probability – Before you give me the eye, let’s recapitulate what data science is. You have a problem statement, you analyze the past data, build a hypothesis, predict the future results, and ensure that you do get the predicted results. Now, statistics involves analysing the frequency of past data and probability involves predicting the likelihood of future events.
4. Analytical rigor – If you are like Dexter of Dexter’s Laboratory (or Paresh Rawal of Judaai), this job is the one for you! To find innovative solutions, you need to know the ‘why’ of everything. Be inquisitive and ask a lot of questions. Some rate dropped – ask why. Some number increased – ask why. And start finding solutions!
5. Structured thinking – The problem statements a data scientist gets are quite vague. To come up with concrete solutions, you first need to break the vague problem into smaller bits of concrete problems and then analyse the data. To do so, you’ll need to structure your analyses properly.
Whoa. I’m game! How do I get started?
1. Enroll in a data science program – Institutes likes Indian School of Business, Praxis Business School, IIM Bangalore, and Coimbatore Institute of Technology provide full-time degrees in data science and business analytics.
2. Master the tools – To fight for the Iron Throne, you’ve to build an army and train your dragons! So, build your conceptual knowledge and get a hands-on experience in a programming language of your choice. You can learn Python and Java with Internshala Trainings and sharpen your coding skills.
3. Polish your Mathematics – Do you at times crib thinking where in God’s name would you use all those things you learn in Maths – linear algebra, probability, and that gut-wrenching calculus? The answer is ‘Here’! To master the required mathematical concepts, try Khan Academy.
4. Read up – A new tool comes out every day and puts forth a novel approach to solving problems. Subscribe to Analytics Vidhya and Data Science Weekly for the latest advancements in the field of data science. Follow related topics on Quora and Reddit. Read books on Data Science. For Machine Learning, in particular, go through these GitHub repositories – FastPhoto Style, Twitter Scraper, Handwriting Synthesis, and ENAS PyTorch.
5. Take an online course – From learning the crucial concepts and tools to making inferences, an online training in data science (or specifically in data analysis or machine learning) will guide you on the path to becoming a data scientist.
6. Go for an internship – Working on real problems that an organization faces and coming up with solutions in real-time is the best possible way to understand the nooks and corners of data science. While interning, you’d also get an exposure to the different technologies that are used in this field. You can apply to these 200+ data science internships on Internshala.
What would be my career options?
Data science, being the current hottest industry, offers various roles including data analyst, data engineer, machine learning engineer, business analyst, and data scientist, of course. With data analytics industry mushrooming all over the country, there is a rising demand for freshers in the data analytics domain. Although students from the non-technical background are eligible for jobs in this field, the industry has a soft-corner for engineering students for they have an inherent knack for programming, statistics, and mathematics. The major organisations hiring in this domain are Tata Communications, Ericsson, GE, IBM, Amazon, NTT Data, and Honeywell.
Enthralled by the world of data heaps? Then, gather your tools and dig up your way to the sexiest job of 21st century. Apply to these cool data science internships and training to expedite the process!
Picture credits – cdn.skilledup.com