- What is Data Science?
- Why Python for data science?
- Relevance in industry and need of the hour
- How leading companies are harnessing the power of Data Science with
Python? - Different phases of a typical Analytics/Data Science projects and role of
python - Anaconda vs. Python
- Overview of Python- Starting with Python
- Introduction to installation of Python
- Introduction to Python Editors & IDE’s(Canopy, pycharm, Jupyter, Rodeo, Ipython etc…
- Understand Jupyter notebook & Customize Settings
- Concept of Packages/Libraries – Important packages(NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc)
- Installing & loading Packages & Name Spaces
- Data Types & Data objects/structures (strings, Tuples, Lists, Dictionaries)
- List and Dictionary Comprehensions
- Variable & Value Labels – Date & Time Values
- Basic Operations – Mathematical – string – date
- Reading and writing data
- Simple plotting
- Control flow & conditional statements
- Debugging & Code profiling
- How to create class and modules and how to call them?
- Numpy
- Scipy
- Pandas
- Scikitlearn
- Statmodels
- Nltk……. etc
- Importing Data from various sources (Csv, txt, excel, access etc)
- Database Input (Connecting to database)
- Viewing Data objects – subsetting, methods
- Exporting Data to various formats
- Important python modules: Pandas, beautifulsoup
- Cleansing Data with Python
- Data Manipulation steps(Sorting, filtering, duplicates, merging, appending, subsetting, derived variables, sampling, Data type conversions, renaming, formatting etc)
- Data manipulation tools(Operators, Functions, Packages, control structures, Loops, arrays etc)
- Python Built-in Functions (Text, numeric, date, utility functions)
- Python User Defined Functions
- Stripping out extraneous information
- Normalizing data
- Formatting data
- Important Python modules for data manipulation (Pandas, Numpy, re, math, string, datetime etc)
- Introduction exploratory data analysis
- Descriptive statistics, Frequency Tables and summarization
- Univariate Analysis (Distribution of data & Graphical Analysis)
- Bivariate Analysis(Cross Tabs, Distributions & Relationships, Graphical Analysis)
- Creating Graphs- Bar/pie/line chart/histogram/ boxplot/ scatter/ density etc)
- Important Packages for Exploratory Analysis(NumPy Arrays, Matplotlib, seaborn, Pandas and scipy.stats etc)
- Basic Statistics – Measures of Central Tendencies and Variance
- Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
- Inferential Statistics -Sampling – Concept of Hypothesis Testing
- Statistical Methods – Z/t-tests (One sample, independent, paired), Anova, Correlation and Chi-square
- Important modules for statistical methods: Numpy, Scipy, Pandas
- Introduction to Machine Learning & Predictive Modeling
- Types of Business problems – Mapping of Techniques – Regression vs. classification vs. segmentation vs. Forecasting
- Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning
- Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)
- Overfitting (Bias-Variance Tradeoff) & Performance Metrics
- Feature engineering & dimension reduction
- Concept of optimization & cost function
- Concept of the gradient descent algorithm
- Concept of Cross-validation(Bootstrapping, K-Fold validation etc)
- Model performance metrics (R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics )
- Linear & Logistic Regression
- Segmentation – Cluster Analysis (K-Means)
- Decision Trees (CART/CD 5.0)
- Ensemble Learning (Random Forest, Bagging & boosting)
- Artificial Neural Networks(ANN)
- Support Vector Machines(SVM)
- Other Techniques (KNN, Naïve Bayes, PCA)
- Introduction to Text Mining using NLTK
- Introduction to Time Series Forecasting (Decomposition & ARIMA
- Important python modules for Machine Learning (SciKit Learn, stats models, scipy, nltk etc)
- Fine-tuning the models using Hyperparameters, grid search, piping etc.