...

Full Bio

Automation Anywhere Join Hands With Microsoft To Advance The Adoption Of RPA Technology

33 days ago

Listed Key Characteristics Of Cloud Computing

119 days ago

A Strong Determination Of Machine Learning In 2K19

159 days ago

Data Science: A Team Spirit

166 days ago

If You Are A Beginner Then Have Handy These Machine Learning Books To Gain Knowledge

167 days ago

These Computer Science Certifications Really Pay Good To You

121137 views

List Of Top 5 Programming Skills Which Makes The Programmer Different From Others?

118761 views

Which Programming Language Should We Use On A Regular Basis?

108687 views

Cloud Engineers Are In Demand And What Programming Language They Should Learn?

91152 views

Python Opens The Door For Computer Programming

69984 views

### Is it Necessary to Know Big Data Before Data Analytics?

- Gathering data from different resources.
- Cleaning and pre-processing the data.
- Studying statistical properties of the data.
- Using Machine Learning techniques to do forecasting and derive insights from the data.
- Communicating the results to decision makers in an easy to understand way.

- HDFS: HDFS known as Hadoop Distributed File System is the file system used by Hadoop. HDFS gives a view of single directory structure to the user while under the hood the file system is distributed in nature.

- Map-Reduce: It is the distributed programming environment provided by Hadoop. Map-Reduce is used to implement the application logic that will use the data stored on HDFS to produce results. Map-Reduce is based on parallel computing. The normal program that you as a programmer write for conventional system will not work on Map-Reduce. For Map-Reduce you have to convert your serial program to a parallel version.

- Hadoop is written in Java and thus has APIs available for Java language.
- For other languages there is a utility known as Hadoop Streaming through which other languages could talk to Hadoop.
- Hadoop mainly works on Linux platform, however recently support for windows is also added.

- Basic statistics: Summary statistics, Correlations, Stratified sampling, Hypothesis testing, Random data generation.
- Classification and regression: linear models (SVM, logistic regression, linear regression), naive Bayes, decision trees, ensembles of trees (Random Forests and Gradient-Boosted Trees), isotonic regression.
- Collaborative filtering: alternating least squares (ALS)
- Clustering: k-means, Gaussian mixture, power iteration clustering (PIC), latent Dirichlet allocation (LDA), streaming k-means,
- Dimensionality reduction: singular value decomposition (SVD), principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining: FP-growth
- Optimization: stochastic gradient descent, limited-memory BFGS (L-BFGS)

- Collaborative Filtering: User-Based Collaborative Filtering, Item-Based Collaborative Filtering, Matrix Factorization with ALS, Matrix Factorization with ALS on Implicit Feedback, Weighted Matrix Factorization, SVD++.
- Classification: Logistic Regression trained via SGD, Naive Bayes / Complementary Naive Bayes, Random Forest, Hidden Markov Models, Multilayer Perceptron.
- Clustering: Canopy Clustering, k-Means Clustering, Fuzzy k-Means, Streaming k-Means, Spectral Clustering.
- Dimensionality Reduction: Singular Value Decomposition, Lanczos Algorithm, Stochastic SVD, PCA (via Stochastic SVD), QR Decomposition.
- Topic Models: Latent Dirichlet Allocation
- Miscellaneous: RowSimilarityJob, ConcatMatrices, Collocations, Sparse TF-IDF Vectors from Text, XML Parsing, Email Archive Parsing, Lucene Integration, Evolutionary Processes.

- Converts Pig Latin statements to Map-Reduce under the hood.
- Allows user defined functions. Thus you could write your custom functions and use them while quering using Pig Latin.
- Easy to use. It requires very less lines of code compared to Map-Reduce for same task.
- Mainly suitable ETL jobs.
- Uses lazy evaluation. It means part of code is only executed only when it is needed.
- Supports creation of data pipelines in form of Directed Acyclic Graphs.